Welcome to Part 2 of the lesson on database joins with SQL. This lesson builds on the previous lesson about joins, introducing the Outer Join concept. The database uses a "Battle School" as the example.
There is also a slideshow in this directory.
- What is an Outer Join for?
- Live demo
- Syntax review
- Code challenge
If one of your tables references another and allows NULL keys, you may need an Outer Join to get all the records you selected.
Let's talk again about the Principle of Normalization. In our last example, we say how a Join can bring our data back together after we split it into tables using Normalization. One reason for Normalization is so we don't have bunches of NULL columns in rows where that information is not relevant. However, when we join tables together, there is more than one way to do that.
If you look at a Venn Diagram of two tables A and B using the default Join, or Inner Join, you see that the default join only shows data where the two circles overlap.
What this means is that the table on the left may have lots of rows that match our WHERE clause, but the conditions of the Inner Join are preventing them from being returned. That's where the Outer Join comes in. The Outer Join allows us to include matching rows from the table on the left, without a corresponding row on the right. That's why an Outer join can also be referred to as a Left Join.
- Show all of A, and include B if it exists, matching on a condition, given with the ON statement.
e.g. "Show a list of all cadets and awards they have won, if any." - Inner Joins include just those rows that match. Even if some rows from table A meet the criteria, an Inner Join would not include them unless there is a corresponding row in table B.
e.g. an Inner Join would ONLY show cadets who have won awards - Outer Joins include all matching rows from the table on the Left; a match on the Right is not needed.
i.e. All cadets will be shown, their awards only appear if they exist. - If there are no NULL values on the key to the join, there will be no difference between Inner and Outer Join.
- The most common type of Outer Join is a Left Join. SQLite only implements a Left Outer Join.
- You may use
LEFT JOIN
andLEFT OUTER JOIN
interchangeably in SQLite. - The Left and Right tables in Inner Joins can be reordered without affecting the results, but for Outer Joins the ordering of the tables matters.
i.e. Placing awards on the Left would mean all awards were included, not all Cadets. - In a Left join, you want to use the ON or USING syntax. If you use WHERE, you may unintentionally omit rows.
i.e. If a row in theaward
table is NULL, itscadet_id
cannot be equal to anid
in thecadet
table.
This is a review of our earlier procedure to write an SQL SELECT statement, revised to help us write queries that use an Outer Join.
- Which tables to select FROM?
- Which to LEFT JOIN? Table with possible NULL rows on the Right.
- Instead of WHERE, use ON or USING.
- Specify the WHERE clause.
- Decide which fields to SELECT.
- Decide which fields to ORDER BY.
- Add a LIMIT.
In this section, we will re-visit our systematic approach to writing queries, adding a rule for Outer Joins.
- Open the
join2.db
from the folderjoin2-outer-inner
.
$ cd join2-outer-inner
$ sqlite3 join2.db
- List the available tables, and get a schema description for the
cadet
andaward
table.sqlite3> .tables
sqlite3> .schema cadet
sqlite3> .schema award
- Plan your queries.
Remember that we have a system for writing a query: choose the tables (including the joins), the WHERE clause, the fields to SELECT, the ORDER BY and a LIMIT.
When doing an Outer Join, we need to remember to use the ON condition when we specify a table to join.
Here are two example queries broken down in this manner:
- Show a list of cadets and awards they have won, if any.
- Decide which table to select FROM, and which table to LEFT JOIN.
cadet
andaward
- Make sure you place the table with potential NULL columns on the Right.
award
may not have rows for everycadet
, or may have multiple matches for onecadet
. - Remember to use the ON or USING keywords to specify the conditions for the LEFT JOIN.
FROM cadet, LEFT OUTER JOIN award ON award.cadet_id = cadet.cadet_id
- Specify the WHERE clause(s).
WHERE cadet.name IS NOT NULL
i.e. show all Cadets - Decide which field(s) to SELECT.
SELECT cadet.cadet_id, cadet.name, cadet.callsign, award.name
- Decide which field(s) to ORDER BY.
ORDER BY cadet.name
- Write out the whole query in
query1.sql
and add a LIMIT of 20:
SELECT cadet.cadet_id, cadet.name, cadet.callsign, award.name FROM cadet LEFT OUTER JOIN award ON award.cadet_id = cadet.cadet_id WHERE cadet.name IS NOT NULL ORDER BY cadet.name LIMIT 20;
- Open the join2.db and load
query1.sql
then review the results.$ sqlite3 join2.db sqlite> .read query1.sql
- Show a list of cadets and the names of cadets who froze them in battle.
- Decide which table to select FROM, and which table to LEFT JOIN.
cadet
,cadet_battle
, andcadet
We will need aliases to write this query - Make sure you place the table with potential NULL columns on the Right.
If there is a NULL infrozen_by_id
there will be no match incadet
when we join, we must make sure this table uses LEFT OUTER JOIN. - Remember to use the ON or USING keywords to specify the conditions for the LEFT JOIN.
FROM cadet AS c1 JOIN cadet_battle AS cb ON cb.cadet_id = c1.cadet_id LEFT OUTER JOIN cadet AS c2 ON c2.cadet_id = cb.frozen_by_id
- Specify the WHERE clause(s).
WHERE c1.name IS NOT NULL
i.e. show battles by Dragon Army and with cadet Andrew Wiggin - Decide which field(s) to SELECT.
SELECT c1.cadet_id, c1.name, c1.army_id, cb.battle_id, cb.hits_taken, c2.name, c2.army_id
- Decide which field(s) to ORDER BY.
ORDER BY c1.army_id
- Write out the whole query in
query2.sql
and add a LIMIT of 20:
```
SELECT c1.cadet_id, c1.name, c1.army_id, cb.battle_id, cb.hits_taken, c2.name, c2.army_id
FROM cadet AS c1 JOIN cadet_battle AS cb ON cb.cadet_id = c1.cadet_id LEFT JOIN cadet AS c2 ON c2.cadet_id = cb.frozen_by_id
WHERE c1.name IS NOT NULL
ORDER BY c1.army_id
LIMIT 20;
```
- Open the join2.db and load
query2.sql
then review the results.$ sqlite3 join2.db sqlite> .read query2.sql
SELECT * FROM {table_A} LEFT OUTER JOIN {table_B} ON {condition} WHERE ...;
- Join these tables based on the condition given after
ON
. - A Left Join must use the
ON
orUSING
syntax. - The “Left” table includes rows that may not exist in the “Right” table.
- Order of tables matters.
- LEFT JOIN and LEFT OUTER JOIN are synonymous in SQLite.
- The most common type of Outer Join is a Left Outer Join.
- Outer Join - all rows that match A, and and any that correspond in B.
- Inner Join - only rows that match A and B.
- If there are no NULL keys in A, the same records are returned from INNER and OUTER join.
- Decide which table to select FROM, and which table to LEFT JOIN.
- Make sure you place the table with potential NULL columns on the Right.
- Remember to use the ON or USING keywords to specify the conditions for the LEFT JOIN.
- Specify the WHERE clause(s).
- Decide which field(s) to SELECT.
- Decide which field(s) to ORDER BY.
- Add a LIMIT.
Edit query3.sql
and query4.sql
files, and write a query to find the following:
- Show a list of cadets and their awards, and show the name and date of the battle it was awarded for, if any.
- Show a list of cadets who fought in battles from September 4th to 11th, 2032 and the cadet who froze them, if any.
Open the join2.db and load query3.sql
then review the results.
$ sqlite3 join2.db
sqlite> .read query3.sql
Here is a visual diagram of the database tables and some of their fields, seen in join2.db
.
For a fully detailed version, see cadet.sql
in the create
folder inside this example.
- cadet
- cadet_id
- name
- army_id
- [...]
- cadet_battle
- cadet_battle_id
- cadet_id
- battle_id
- frozen_by_id
- [...]
- battle
- battle_id
- fought
- army1_id
- army2_id
- [...]
- army
- army_id
- name
- commander
- [...]
- award
- award_id
- cadet_id
- battle_id
- [...]
In our next tutorial, we will cover the concept of GROUP BY and functions like COUNT() MIN() SUM() and AVG()