How to use Group By in SQL to group data

  • by

Grouping data using Excel spreadsheets or some other tools can be very tricky. If you are struggling with to group various sets of data, this tutorial will help you to learn how to do the work more easily using SQL GROUP BY statement.

The GROUP BY clause is used in the SELECT statement. It can be optionally used with aggregate functions, such as AVG, MIN, MAX, COUNT, etc. to produce summary reports from the database.

GROUP BY in SQL

Let’s take a look at the following case scenario.

Imagine some university worker, who has to manage students and their classes. Let’s say we have the following table of attendees to a subject at university:

idsubjectsemesterattendee
1LKB001Miriam
2LKB001John
3MTN022Adam
4LKB001Samanta
5LKB011Paul
6MTN012Diaz

Now you may want to know how many attendees, go to the class LKB00 in the first semester and how many in the second semester? Or how many attendees are in the first semester?

With the help of GROUP BY clause, we can easily get answers to these questions and many more. Let’s take a deeper look at the statement.

Group By Statement

With the GROUP BY statement in SQL, you can group data based on defined criteria. It allows you to group individual data by one or more table columns. To do the grouping properly, you also can apply aggregate functions to the column(s) within the SELECT statement.

Group By Syntax

Let’s look at the syntax for a basic GROUP BY clause:

SELECT column_name1, column_name2,...
FROM table_name
WHERE condition
GROUP BY column_name1, column_name2,...
HAVING condition
ORDER BY column_name1, column_name2 
  • “SELECT column_name1, column_name2…” is the standard SQL SELECT command query.
  • GROUP BY column_name1” is the clause that performs the grouping based on column_name1, column_name2
  •  “HAVING condition” is optional; it is used to restrict the rows affected by the GROUP BY clause. It is similar to the  WHERE clause.

The placement of GROUP BY SQL clause is important. Remeber to place the GROUP BY after the FROM (and WHERE) clauses and right after ORDER BY statement

The usage of the GROUP BY clause implies basic knowledge of querying data in SQL. If you are a beginner take a look at some online courses or at our blog.

Examples of using Group By in SQL

We will use dbHarbor: SQLite for creating an example SQLite database and executing SQL statements.

To create a sample database, open dbHarbor: SQLite and click on “Create a new database button”.

Create SQLite Databse
Create a database using dbHarbor: SQLite

Then choose a location to save the database.

Empty SQLite Database
Empty SQLite database

To create the table, we mentioned above, enter the following script into the query editor and press CMD+SHIFT+R or the “Run” button in the toolbar.

CREATE TABLE "subject" 
  ( 
     "subject"  TEXT NOT NULL, 
     "semester" INTEGER NOT NULL, 
     "attendee" TEXT NOT NULL 
  )

Alternatively, you can create the table using dbHarbor: SQLite table interface builder. You can learn more about it in our documentation.

Now run the following query to insert some simple data into the just created table.

insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "Miriam");
insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "John");
insert into "subject" ( "subject", "semester", "attendee" ) values ("MTN02", 2, "Adam");
insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "Samanta");
insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB01", 1, "Paul");
insert into "subject" ( "subject", "semester", "attendee" ) values ("MTN01", 2, "Diaz");

So now if open the “subject” table you should see the table populated with data.

SQLite Table View
SQLite table view in dbHarbor

This should be enough, to begin with, some sample queries.

Using group by on single column

The following statement returns rowid, subject and number of students attended to the subject

select rowid, subject, count(rowid) from "subject" group by "subject"

To get the number of attendees in each semester we can use a similar SQL statement, only grouping by semester

select count(rowid), semester from "subject" group by "semester"
USING GROUP BY on multiple columns

In previous examples, we used only one column in the GROUP BY statement. SQL allows you to group data by multiple columns too.

Group By X means put all those with the same value for X in the one group.

Group By X, Y means put all those with the same values for both X and Y in the one group.

If we were to group by two columns subject and semester like this:

select Subject, Semester, Count(*)
from subject
group by Subject, Semester

We will get the following result:

Group By multiple columns query result

This is because, when we group by two columns, it is saying “Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups”. In this example, this is demonstrated by the fact that, when we count them, there are three people doing LKB00 in semester 1.

Be the part of dbHarbor

If you don’t want to miss the latest news about our development journey or join the beta waiting list for an exclusive, early access version, signup for our newsletter. Promise not to SPAM😎

Leave a Reply

Your email address will not be published. Required fields are marked *