Django queryset.count cache
Once I mentioned, that my django application makes several similiar queries like SELECT COUNT(*) ...
. As it turns out (for me it was surprise), queryset.count()
has not obvious cache. But let me start the story from the beginning (and sorry for my english :) ).
As it is known, django’s queryset
is lazy and have cache.
Lets say we have such model:
class Item(models.Model):
name = models.CharField(max_length=50)
When some django query is declared there is no database attempt (thats why it is called “lazy”):
items = Item.objects.all()
The attempt is made when we actually access elements from queryset, for example in cycle:
for item in items:
print item.name
During executing line for item in items:
sql query was performed:
SELECT "main_item"."id", "main_item"."name" FROM "main_item";
At next access to elements from queryset, database attempt will not occur, because all elements were touched and consequently put into cache. So, this code will hit the DB only once:
for item in items: # hit the database
print item.name
for item in items: # cache
print item.name
Nevertheless there are some cases, when database can be touched again. To not put very much information in a single post, refer to documentation: https://docs.djangoproject.com/en/dev/topics/db/queries/#caching-and-querysets.
Now directly about count
Keeping in mind, that queryset has cache, i thought, that .count() will also be cached. But it is not (not always precisely). If count() method was called before queryset is put in cache, then database will be touched every time count() is called:
items = Item.objects.all() # not hit DB
items.count() # hit DB
items.count() # hit DB
items.count() # hit DB
for item in items: # hit DB and put into cache
print item.name
Нowever, if source queryset is cached, then count will not touch the database:
items = Item.objects.all() # no DB hit
for item in items: # hit DB and put to cache
print item.name
items.count() # cache
items.count() # cache
items.count() # cache
All this concerns django templates too. The code, that made identical queries like SELECT COUNT(*) ...
has several checks like:
{% if items.count %}
and also some count output:
{{ items.count }}
Before these lines the queryset elements were not being accessed. As a result, each mentioned line create database query.
Again, if such cycle was before:
{% for item in items %}
{{item.name}}
{% endfor %}
then {{ items.count }}
doesn’t hit the DB.
Approaches to exclude redundant database calls.
-
If we know, that we’ll need to access all elements from query set, it is ok to use
len
.Python code:
len(items) # DB len(items) # cache for item in items: # cache # ...
and reversed, that is also acceptable:
for item in items: # DB # ... len(items) # cache len(items) # cache
Django template:
{{ items|length }} # DB {{ items|length }} # cache {% if items|length %} # cache {% for item in items %} # cache
and reversed:
{% for item in items %} # DB {{ items|length }} # cache {{ items|length }} # cache {% if items|length %} # cache
-
If it is needed to know only count of elements, or source queryset for count is different to queryset, that will be used to access elements, then we shall use count(). But it is better, if it will be called only once.
If count is needed more than one time, then instead of this:
{{ items.count }} {{ items.count }}
we can create additional variable in template context (in view, that renders this template):
# views.py context['items_count'] = items.count() # template {{ items_count }} {{ items_count }}
or we can use
{% with items.count as items_count %}
and not declare any additional variable in context:# template {% with items.count as items_count %} {{ items_count }} {{ items_count }} {% endwith %}
Of course, everywere in this post word “cache” means queryset internal cache, and not caching.