Database Joins: A Quick Primer
The following is a simple introduction to various ways of joining database tables. The example code should work identically in MySQL and PostgreSQL unless otherwise noted.
Set Up
First let's lay foundation for the data we'll be joining. We'll use a simple company <> people pattern.
We have a table of three companies.
SELECT * FROM companies;
company_id | company
------------+---------
1 | iMarc
2 | Apple
10 | Startup
We also have a table of people. These people, if employed, can be associated with one of the companies.
SELECT * FROM people;
person_id | company_id | name
-----------+------------+----------------
1 | 1 | Dave Tufts
3 | 2 | Steve Jobs
4 | 2 | The Woz
20 | | Bob Unemployed
- Three of the people are employed at a company.
- Bob Unemployed isn't associated with any company.
- One of the companies listed above, Startup, doesn't have any employees.
I've also included the SQL I used to create and populate these tables below. Feel free to try these joins in your own database.
The Joins
Now we'll look at various ways of pulling back the data.
The Straight Join (or Plain Join, or Regular Join)
This only returns data that can be referenced in both tables.
SELECT *
FROM people, companies
WHERE people.company_id = companies.company_id;
-- ===================================================
person_id | company_id | name | company_id | company
-----------+------------+------------+------------+---------
1 | 1 | Dave Tufts | 1 | iMarc
3 | 2 | Steve Jobs | 2 | Apple
4 | 2 | The Woz | 2 | Apple
The Left Join
Use a left join if you want data that can be referenced in the first table of your select statement
regardless if there is any associated data in the second table. Reading left-to-right,
if we reference people first, a LEFT JOIN will return all people.
If the person is associated with a company, the company info will be returned. If they are
not associated with a company, NULL values will be returned.
SELECT *
FROM people
LEFT JOIN companies
USING (company_id);
-- ===================================================
company_id | person_id | name | company
------------+-----------+----------------+---------
1 | 1 | Dave Tufts | iMarc
2 | 3 | Steve Jobs | Apple
2 | 4 | The Woz | Apple
| 20 | Bob Unemployed |
Both PostgreSQL and MySQL treat a simple LEFT OUTER JOIN identically to a LEFT JOIN. In the example above, I could have added the word 'OUTER' and got back the same results
The Right Join
Use a right join if you want data that can be referenced in the second table of your select regardless if there is any associated data in the first table.
SELECT *
FROM people
RIGHT JOIN companies
USING (company_id);
-- ===================================================
company_id | person_id | name | company
------------+-------------+------------+---------
1 | 1 | Dave Tufts | iMarc
2 | 3 | Steve Jobs | Apple
2 | 4 | The Woz | Apple
0 | | | Startup
Like the left join, MySQL and PostgreSQL would return the identical result if I used "RIGHT OUTER JOIN" above.
Full Outer Join (PostgreSQL-only)
The full outer join returns a single row for all full data sets from both tables. As far as I know, this syntax only works in real databases. However, MySQL has many storage engines, maybe one of them supports the FULL OUTER JOIN.
SELECT *
FROM companies
FULL OUTER JOIN people
ON people.company_id=companies.company_id;
-- ===================================================
company_id | company | person_id | company_id | name
------------+---------+-----------+------------+----------------
1 | iMarc | 1 | 1 | Dave Tufts
2 | Apple | 3 | 2 | Steve Jobs
2 | Apple | 4 | 2 | The Woz
| | 20 | | Bob Unemployed
10 | Startup | | |
Another fun* thing you can do with the FULL OUTER JOIN is only return data that is incomplete — data that's in one table but not the other. Basically the opposite of a straight join.
SELECT *
FROM companies
FULL OUTER JOIN people
ON people.company_id=companies.company_id
WHERE companies.company_id IS NULL
OR people.company_id IS NULL;
-- ===================================================
company_id | company | person_id | company_id | name
------------+---------+-----------+------------+----------------
| | 20 | | Bob Unemployed
10 | Startup | | |
* By fun, I don't really mean fun. There's nothing fun about a full outer join.
The Dreaded Cartesian Join
I can't think of any reason why you'd would to this, but a cartesian join returns every possibility of data relation between the two tables. If you have tables with millions of rows, you should try this now!
SELECT * FROM people, companies;
-- ===================================================
person_id | company_id | name | company_id | company
-----------+------------+----------------+------------+---------
1 | 1 | Dave Tufts | 1 | iMarc
3 | 2 | Steve Jobs | 1 | iMarc
4 | 2 | The Woz | 1 | iMarc
20 | | Bob Unemployed | 1 | iMarc
1 | 1 | Dave Tufts | 2 | Apple
3 | 2 | Steve Jobs | 2 | Apple
4 | 2 | The Woz | 2 | Apple
20 | | Bob Unemployed | 2 | Apple
1 | 1 | Dave Tufts | 10 | Startup
3 | 2 | Steve Jobs | 10 | Startup
4 | 2 | The Woz | 10 | Startup
20 | | Bob Unemployed | 10 | Startup
That's all. I hope that made sense. If you want to try these SELECT and JOIN statements on your own database, here's the SQL I used to create and populate the tables.
CREATE TABLE companies (
company_id serial PRIMARY KEY,
company varchar(32) NOT NULL
);
INSERT INTO companies (company_id, company) VALUES (1, 'iMarc');
INSERT INTO companies (company_id, company) VALUES (2, 'Apple');
INSERT INTO companies (company_id, company) VALUES (10, 'Startup');
CREATE TABLE people (
person_id serial PRIMARY KEY,
company_id integer DEFAULT NULL,
name varchar(32) NOT NULL
);
INSERT INTO people (person_id, company_id, name)
VALUES (1, 1, 'Dave Tufts');
INSERT INTO people (person_id, company_id, name)
VALUES (3, 2, 'Steve Jobs');
INSERT INTO people (person_id, company_id, name)
VALUES (4, 2, 'The Woz');
INSERT INTO people (person_id, company_id, name)
VALUES (20, NULL, 'Bob Unemployed');
Comments
Of course you know a real employment database would have a separate table to identify the relations between person and company so that a given person could have multiple jobs across multiple companies ;-)
@errol: beware feature creep and "that's easy" assumptions... http://blog.bitquabit.com/2009/07/01/one-which-i-call-out-hacker-news/
:-)
Good stuff. If you'd like the Oracle equivalent let me know.
@Nick - Sure. I think if you load the same data in Oracle, you should get back the same results back. I think Oracle also supports the same FULL OUTER JOIN syntax as PostgreSQL. But yes, I'd love to see the results from Oracle, just to confirm they're the same. Feel free to email me [my-first-name] @ [our-domain] - thanks
Another good mention is the NATURAL JOIN, which automatically joins on column names shared between the two tables. This syntax is supported by most of the big names except for SQL Server.
The , operator in your first example is equivalent to what is known as a CROSS JOIN, which is also called a cartesian join, although that term is never used in the SQL language. Since you have a WHERE clause you are in essentially getting a JOIN or INNER JOIN where the ON clause is the same as your WHERE clause.
To further muddy the waters, PostgreSQL (and I believe SQLite, Oracle and SQL Server) properly considers JOIN an INNER JOIN and requires an ON clause. MySQL, liking to play fast and loose, will treat JOIN as a CROSS JOIN if no ON clause is included.
SQLite does not support RIGHT or FULL JOINs, however a RIGHT JOIN is the same as a LEFT JOIN with the tables switched.
One little interesting parts of USING conditions is that they remove one copy of each column listed from any SELECT * queries. If you are using SQL Server, unfortunately it does not support USING conditions.
Unfortunately there are tons of little inconsistencies between the big RDBMSs. I have documented a number of them at the Flourish SQL page.
@robert
You have a good point, though I am generally of the opinion that writing a flexible setup once and reusing it is better than writing a pared down each time its needed.
Still, my original comment was more about being a smart-ass then design philosophy ;-)
Good stuff, Dave. For those who are interested in doing a full outer join in a dbms not supporting it (sql server 2005 and up do actually support it), keep in mind that a full outer join is simply the union of a left join with a right join. In place of Dave's first example we could write:
SELECT * FROM people LEFT JOIN companies ON people.company_id = companies.company_id UNION SELECT * FROM people RIGHT JOIN companies ON people.company_id = companies.company_id
...and get the same results. Likewise, the second example could be written:
SELECT * FROM people LEFT JOIN companies ON people.company_id = companies.company_id UNION SELECT * FROM people RIGHT JOIN companies ON people.company_id = companies.company_id EXCEPT SELECT * FROM people, companies WHERE people.company_id = companies.company_id
...although of course neither of these is as concise as Dave's examples employing the full outer join keyword. Then again, every kind of join keyword is really just short-hand; the same results can always be had using some combination of subqueries and set operators (union, except ["minus" with some vendors], intersect).
Read something more recent.
Statements and opinions expressed in this blog and any comments made are the private opinions of the respective poster, and, as such, iMarc LLC is neither responsible nor liable for such content.
Visitors
Thanks for this, now I have a bookmarked page to send people to when they ask me these sorts of questions ;-)