Grouping data using Excel spreadsheets or some other tools can be very tricky. If you are struggling with to group various sets of data, this tutorial will help you to learn how to do the work more easily using SQL GROUP BY statement.
The GROUP BY clause is used in the SELECT statement. It can be optionally used with aggregate functions, such as AVG, MIN, MAX, COUNT, etc. to produce summary reports from the database.
GROUP BY in SQL
Let’s take a look at the following case scenario.
Imagine some university worker, who has to manage students and their classes. Let’s say we have the following table of attendees to a subject at university:
Now you may want to know how many attendees, go to the class LKB00 in the first semester and how many in the second semester? Or how many attendees are in the first semester?
With the help of GROUP BY clause, we can easily get answers to these questions and many more. Let’s take a deeper look at the statement.
Group By Statement
With the GROUP BY statement in SQL, you can group data based on defined criteria. It allows you to group individual data by one or more table columns. To do the grouping properly, you also can apply aggregate functions to the column(s) within the SELECT statement.
Group By Syntax
Let’s look at the syntax for a basic GROUP BY clause:
SELECT column_name1, column_name2,... FROM table_name WHERE condition GROUP BY column_name1, column_name2,... HAVING condition ORDER BY column_name1, column_name2
- “SELECT column_name1, column_name2…” is the standard SQL SELECT command query.
- “GROUP BY column_name1” is the clause that performs the grouping based on column_name1, column_name2
- “HAVING condition” is optional; it is used to restrict the rows affected by the GROUP BY clause. It is similar to the WHERE clause.
The usage of the GROUP BY clause implies basic knowledge of querying data in SQL. If you are a beginner take a look at some online courses or at our blog.
Examples of using Group By in SQL
We will use dbHarbor: SQLite for creating an example SQLite database and executing SQL statements.
To create a sample database, open dbHarbor: SQLite and click on “Create a new database button”.
Then choose a location to save the database.
To create the table, we mentioned above, enter the following script into the query editor and press CMD+SHIFT+R or the “Run” button in the toolbar.
CREATE TABLE "subject" ( "subject" TEXT NOT NULL, "semester" INTEGER NOT NULL, "attendee" TEXT NOT NULL )
Alternatively, you can create the table using dbHarbor: SQLite table interface builder. You can learn more about it in our documentation.
Now run the following query to insert some simple data into the just created table.
insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "Miriam"); insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "John"); insert into "subject" ( "subject", "semester", "attendee" ) values ("MTN02", 2, "Adam"); insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB00", 1, "Samanta"); insert into "subject" ( "subject", "semester", "attendee" ) values ("LKB01", 1, "Paul"); insert into "subject" ( "subject", "semester", "attendee" ) values ("MTN01", 2, "Diaz");
So now if open the “subject” table you should see the table populated with data.
This should be enough, to begin with, some sample queries.
Using group by on single column
The following statement returns rowid, subject and number of students attended to the subject
select rowid, subject, count(rowid) from "subject" group by "subject"
To get the number of attendees in each semester we can use a similar SQL statement, only grouping by semester
select count(rowid), semester from "subject" group by "semester"
USING GROUP BY on multiple columns
In previous examples, we used only one column in the GROUP BY statement. SQL allows you to group data by multiple columns too.
Group By X means put all those with the same value for X in the one group.
Group By X, Y means put all those with the same values for both X and Y in the one group.
If we were to group by two columns subject and semester like this:
select Subject, Semester, Count(*) from subject group by Subject, Semester
We will get the following result:
This is because, when we group by two columns, it is saying “Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups”. In this example, this is demonstrated by the fact that, when we count them, there are three people doing LKB00 in semester 1.
Be the part of dbHarbor
If you don’t want to miss the latest news about our development journey or join the beta waiting list for an exclusive, early access version, signup for our newsletter. Promise not to SPAM😎