<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Usware Blog - Django Web Development &#187; aggreagtion</title>
	<atom:link href="http://uswaretech.com/blog/category/aggreagtion/feed/" rel="self" type="application/rss+xml" />
	<link>http://uswaretech.com/blog</link>
	<description>Building Amazing Webapps</description>
	<lastBuildDate>Tue, 08 Jun 2010 14:59:46 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Django aggregation tutorial</title>
		<link>http://uswaretech.com/blog/2009/08/django-aggregation-tutorial/</link>
		<comments>http://uswaretech.com/blog/2009/08/django-aggregation-tutorial/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 12:38:29 +0000</pubDate>
		<dc:creator>shabda</dc:creator>
				<category><![CDATA[aggreagtion]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[models]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://uswaretech.com/blog/?p=654</guid>
		<description><![CDATA[One of the new and most awaited features with Django 1.1 was aggregation. As usual, Django comes with a very comprehensive documentation for this. Here, I have tried to put this in how-to form. Jump to howtos or Get source on Github. Essentially, aggregations are nothing but a way to perform an operation on group [...]


Related posts:<ol><li><a href='http://uswaretech.com/blog/2010/01/django-models-tutorial/' rel='bookmark' title='Permanent Link: Doing things with Django models &#8211; aka &#8211; Django models tutorial'>Doing things with Django models &#8211; aka &#8211; Django models tutorial</a></li>
<li><a href='http://uswaretech.com/blog/2008/04/new-tutorial-building-a-search-engine-with-appengine-and-yahoo/' rel='bookmark' title='Permanent Link: New tutorial &#8211; Building a search engine with Appengine and Yahoo'>New tutorial &#8211; Building a search engine with Appengine and Yahoo</a></li>
<li><a href='http://uswaretech.com/blog/2008/10/dynamic-forms-with-django/' rel='bookmark' title='Permanent Link: Dynamic forms with Django'>Dynamic forms with Django</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p></p><p>One of the new and most awaited features with Django 1.1 was aggregation. As usual,
Django comes with a very <a href="http://docs.djangoproject.com/en/dev/topics/db/aggregation/">comprehensive documentation</a> for this. Here, I have tried to
put this in how-to form.</p>

<p><a href="#howtos">Jump to howtos</a> or <a href="http://github.com/uswaretech/Shiny-New-Django-1.1/tree/master">Get source on Github</a>.</p>

<p>Essentially, aggregations are nothing but a way to perform an operation on group of rows. In databases,
they are represented by operators as <code>sum</code>, <code>avg</code> etc.</p>

<p>To do these operations Django added two new methods to querysets.</p>

<ol>
<li><code>aggregate</code></li>
<li><code>annotate</code></li>
</ol>

<p>When you are have a queryset you can do two operations on it,</p>

<ol>
<li>Operate over the rowset to get a single value from it. (Such as sum of all salaries in the rowset)</li>
<li>Operate over the rowset to get a value for <em>each row in the rowset</em> via some related table.</li>
</ol>

<p>The thing to notice is that option 1, will create one row from rowset, while option 2 will
not change the number of rows in the rowset. If you are into analogies, you can think that
option 1 is like a <a href="http://docs.python.org/library/functions.html#reduce">reduce</a> and option 2 is like a <a href="http://docs.python.org/library/functions.html#map">map</a>.</p>

<p>In sql terms, aggregate is a operation(SUM, AVG, MIN, MAX), without a group by,
while annotate is a operation with a group by on rowset_table.id. (Unless explicitly overriden).</p>

<p><a name="howtos" ></a></p>

<p>Ok enough talk, on to some actual work. Here is a fictional models.py representing
a HRMS application. We will use this to see how to use aggreagtion to solve
some common problems.</p>

<pre><code>from django.db import models

class Department(models.Model):
    dept_name = models.CharField(max_length = 100)
    established_on = models.DateField()

    def __unicode__(self):
        return self.dept_name

class Level(models.Model):
    level_name = models.CharField(max_length = 100)
    pay_min = models.PositiveIntegerField()
    pay_max = models.PositiveIntegerField()

    def __unicode__(self):
        return self.level_name

class Employee(models.Model):
    emp_name = models.CharField(max_length = 100)
    department = models.ForeignKey(Department)
    level = models.ForeignKey(Level)
    reports_to = models.ForeignKey('self', null=True, blank=True)

    pay = models.PositiveIntegerField()
    joined_on = models.DateField()

class Leave(models.Model):
    employee = models.ForeignKey(Employee)
    leave_day = models.DateField()


"""
#Populate DB, so we can do some meaningful queries.
#Create Dept, Levels manually.
#Get the names file from http://dl.getdropbox.com/u/271935/djaggregations/names.pickle
#Or the whole sqlite database from http://dl.getdropbox.com/u/271935/djaggregations/bata.db
import random
from datetime import timedelta, date
import pickle
names = pickle.load(file('/home/shabda/names.pickle'))
for i in range(1000):
    emp = Employee()
    emp.name = random.choice(names)
    emp.department = random.choice(list(Department.objects.all()))
    emp.level = random.choice(Level.objects.all())
    try: emp.reports_to = random.choice(list(Employee.objects.filter(department=emp.department)))
    except:pass
    emp.pay = random.randint(emp.level.pay_min, emp.level.pay_max)
    emp.joined_on = emp.department.established_on + timedelta(days = random.randint(0, 200))
    emp.save()
"""

"""
employees = list(Employees.objects.all())
for i in range(100):
    employee = random.choice(employees)
    leave = Leave(employee = employee)
    leave.leave_day = date.today() - timedelta(days = random.randint(0, 365))
    leave.save()

"""
</code></pre>

<h4>Find the total number of employees.</h4>

<p>In sql you might want to do something like,</p>

<p><code>select count(id) from hrms_employee</code></p>

<p>Which becomes,</p>

<p><code>Employee.objects.all().aggregate(total=Count('id'))</code></p>

<p>If fact doing a <code>connection.queries.pop()</code> shows the exact query.</p>

<p><code>SELECT COUNT("hrms_employee"."id") AS "total" FROM "hrms_employee"</code></p>

<p>But wait, we have a builtin method already for that, <code>Employee.objects.all().count()</code>, so lets try something else.</p>

<h4>Find the total pay of employees.</h4>

<p>The CEO wants to find out what is the total salary expediture, this also converts
the queryset to a single value, so we want to <code>.aggregate</code> here.</p>

<p><code>Employee.objects.all().aggregate(total_payment=Sum('pay'))</code></p>

<p>Gives you the total amount you are paying to your employees.</p>

<h4>Find the total number of employees, per department.</h4>

<p>Here we want a value per row in queryset, so we need to use aggregate here. Also,
there would be one aggregated value per dpeartment, so we need to annotate Department
queryset.</p>

<p><code>Department.objects.all().annotate(Count('employee'))</code></p>

<p>If you are only interested in name of department and employee count for it, you can do,
<code>Department.objects.values('dept_name').annotate(Count('employee'))</code></p>

<p>The sql is</p>

<pre><code>SELECT "hrms_department"."dept_name", COUNT("hrms_employee"."id") AS "employee__count" FROM "hrms_department" LEFT OUTER JOIN "hrms_employee" ON ("hrms_department"."id" = "hrms_employee"."department_id") GROUP BY "hrms_department"."dept_name"
</code></pre>

<h4>Find the total number of employees, for a specific department.</h4>

<p>Here you could use either of <code>.annotate</code> or <code>.aggregate</code>,</p>

<pre><code>Department.objects.filter(dept_name='Sales').values('dept_name').annotate(Count('employee'))
Department.objects.filter(dept_name='Sales').aggregate(Count('employee'))
</code></pre>

<p>If you see the SQLs, you will see that <code>.annotate</code> did a <code>group by</code>, while the <code>.aggregate</code>
did not, but as there was only one row, <code>group by</code> had no effect.</p>

<h4>Find the total number of employees, per department, per level</h4>

<p>This time, we can not annotate either Department model, or the Level model, as we
need to <code>group by</code> both department and level. So we will annotate on Employee</p>

<pre><code>Employee.objects.values('department__dept_name', 'level__level_name').annotate(Count('id'))
</code></pre>

<p>This leads to the sql,</p>

<pre><code>SELECT "hrms_department"."dept_name", "hrms_level"."level_name", COUNT("hrms_employee"."id") AS "id__count" FROM "hrms_employee" INNER JOIN "hrms_department" ON ("hrms_employee"."department_id" = "hrms_department"."id") INNER JOIN "hrms_level" ON ("hrms_employee"."level_id" = "hrms_level"."id") GROUP BY "hrms_department"."dept_name", "hrms_level"."level_name
</code></pre>

<h4>Which combination of Employee and Deparments employes the most people</h4>

<p>We can order on the annotated fields, so the last query is modified to,</p>

<pre><code>Employee.objects.values('department__dept_name', 'level__level_name').annotate(employee_count = Count('id')).order_by('-employee_count')[:1]
</code></pre>

<h4>Which employee name is the most common.</h4>

<p>We can want to <code>group by emp_name</code>, so <code>emp_name</code> is added to values. After that we order on the annotated field
and get the first element, to get the most common name.</p>

<p><code>Employee.objects.values('emp_name').annotate(name_count=Count('id')).order_by('-name_count')[:1]</code></p>

<hr />

<p>This was a overview of how django annotations work. These remove a whole class of queries for which
you had to use custom sql queries in the past.</p>

<hr />

<h3>Resources</h3>

<ol>
<li><a href="http://github.com/uswaretech/Shiny-New-Django-1.1/tree/master">Source on Github</a></li>
<li><a href="http://dl.getdropbox.com/u/271935/djaggregations/bata.db">sqlite file for this model to test</a></li>
<li><a href="http://docs.djangoproject.com/en/dev/topics/db/aggregation/">Aggregation on Django docs</a></li>
</ol>

<hr />

<p>Want to build a Django app? <a href="http://uswaretech.com/contact/">Talk to us</a></p>


<p>Related posts:<ol><li><a href='http://uswaretech.com/blog/2010/01/django-models-tutorial/' rel='bookmark' title='Permanent Link: Doing things with Django models &#8211; aka &#8211; Django models tutorial'>Doing things with Django models &#8211; aka &#8211; Django models tutorial</a></li>
<li><a href='http://uswaretech.com/blog/2008/04/new-tutorial-building-a-search-engine-with-appengine-and-yahoo/' rel='bookmark' title='Permanent Link: New tutorial &#8211; Building a search engine with Appengine and Yahoo'>New tutorial &#8211; Building a search engine with Appengine and Yahoo</a></li>
<li><a href='http://uswaretech.com/blog/2008/10/dynamic-forms-with-django/' rel='bookmark' title='Permanent Link: Dynamic forms with Django'>Dynamic forms with Django</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://uswaretech.com/blog/2009/08/django-aggregation-tutorial/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
	</channel>
</rss>
