6 minute read

Previously on SQL: aggregations, group by, where vs having

Today let’s talk about joining results of different searches.

Union

UNION merges multiple queries as one result. Here we are selecting exemplary, non-existing records and their aliases:

-- UNION merges multiple queries as one result
SELECT
    1 AS id, 'Sunrise Ltd.' AS name
UNION
SELECT
    2 AS id, 'Sunset Co.' AS name;

UNION acts like union operator known from set theory, algebra of sets and Boolean algebra.

Only distinct rows are included. There should be a difference in at least one field:

-- only distinct rows are included: prints only one record
SELECT
    1 AS id, 'Sunrise Ltd.' AS name
UNION
SELECT
    1 AS id, 'Sunrise Ltd.' AS name;

-- selects both records:
SELECT
    1 AS id, 'Sunrise Ltd.' AS name
UNION
SELECT
    1 AS id, 'Sunset Ltd.' AS name;

UNION ALL allows duplicated results:

-- UNION ALL allows duplicated rows
SELECT
    * FROM company
UNION ALL
SELECT
    * FROM company;

Now test it on real data - records matching first condition (sql WHERE hq_country = 'JPN') are not re-selected by second part of query (sql SELECT * FROM company):

SELECT * FROM company WHERE hq_country = 'JPN'
UNION
SELECT * FROM company

This works like simple sql SELECT * FROM company - it does not duplicate the results:

SELECT * FROM company
UNION
SELECT * FROM company

Finally, a clean and logical example of unioning two selects. It takes everything from first set and add everything from the second one:

SELECT * FROM company WHERE hq_country = 'JPN'
UNION
SELECT * FROM company WHERE hq_country = 'KOR'

Of course, it is possible to union results from different tables.

-- UNION from different tables is possible but the result set must have same number of columns
-- error:
SELECT
    * FROM company
UNION ALL
SELECT
    * FROM customer;

It does not work, returning [21000][1222] The used SELECT statements have a different number of columns.

Let’s correct it, adjusting requested number of columns:

-- works:
SELECT
    name AS company_or_customer_name, customer_id as id FROM company
UNION ALL
SELECT
    CONCAT(last_name, ' ', first_name), customer_id FROM customer;

The columns selected in both SELECT clauses should be of the same type in some flavours (Postgres, Oracle). No such requirement in MySQL & MariaDB.

Inner join

INNER JOIN connects records from two (or even more) tables.

To match a record from one table to relevant record from another table, it uses fields (columns) marked as keys: primary key and foreign key, so that primary key from a record in one table points to the foreign key of the relevant record in connected table.

Usually, id values are used as primary and foreign keys.

-- INNER JOIN returns records with matching values in both tables (here: customer_id)
SELECT * FROM
    customer
INNER JOIN
    field f ON customer.customer_id = f.customer_id
WHERE
    f.field_name = 'Engineer';

Step-by-step explanation of the script:

-- take all records from ``customer`` table
SELECT * FROM customer
-- connect to records from ``field`` table
INNER JOIN field f
-- but only when ``customer_id`` in ``customer`` table (for given record) matches ``customer_id`` in ``field`` table
ON customer.customer_id = f.customer_id
-- Additional condition: do it only if ``field_name`` in ``field`` table is ``Engineer`` (and discard all the rest).
WHERE f.field_name = 'Engineer';

Primary to foreign key connection:

customer.customer_id -- primary key in ``customer`` table
= 
f.customer_id -- foreign key in ``field`` table

Primary key - unique field or combination of fields, only one row with the same PK may exist in a table.

Foreign key - field or combination of fields, indicates Primary key of a row in another table. May be unique or not.

INNER JOIN works only for the records having not null primary key. It is logical. Without primary key, there is no way to connect a record with another table (foreign keys point to non-null primary keys).

-- INNER JOIN shows only the records from company that have customer id NOT NULL
-- use OUTER JOINS: LEFT / RIGTH JOIN etc. if you expect null fields to be included
SELECT * FROM
    company
INNER JOIN
    customer c ON company.customer_id = c.customer_id;

Using

Instead of explicitly connecting primary key to foreign key, we can indicate it via USING:

-- USING
SELECT * FROM
    company
        INNER JOIN
    customer USING(customer_id);

Inner join on more than two tables

Inner join can connect records from more than two tables, provided that they contain relevant ids (foreign keys). It is useful when multiple conditions using information from various tables are required.

-- INNER JOIN with two more tables, both containing customer_id
SELECT  * FROM
    customer
        INNER JOIN
    field f on customer.customer_id = f.customer_id
        INNER JOIN
    turnover t on customer.customer_id = t.customer_id
WHERE
    customer.registration_date > '2000-01-01'
   OR (
            customer.birth_date < '1980-01-01'
        AND
            t.turnover < 10000
    )
   OR (
            customer.birth_date < '1960-01-01'
        AND
            f.field_name NOT LIKE  '%Engineer%'
    );

Another example: joining using third, helper table:

-- JOIN through third table
SELECT *
FROM customer
         INNER JOIN
     branch_customers
     ON
         customer.customer_id = branch_customers.customer_id
         INNER JOIN
     branch
     ON
         branch_customers.branch_id = branch.branch_id

Use case - find customer number per branch:

-- how many CUSTOMERS per BRANCH ?
SELECT branch_name, COUNT(*) AS number_of_customers
FROM customer
         INNER JOIN
     branch_customers
     ON customer.customer_id = branch_customers.customer_id
         INNER JOIN
     branch
     ON branch_customers.branch_id = branch.branch_id
GROUP BY branch_name

Cross join

Cross join joins each row of the first table with each row of the second table. This join type is also known as Cartesian join.

-- CROSS JOIN joins all rows from one table with all rows of second table
-- on given condition
-- without condition it makes Cartesian product
SELECT * FROM
    customer
        CROSS JOIN
    company
WHERE customer.customer_id = company.customer_id

Left, right and outer join

LEFT JOIN shows everything from left table.

-- LEFT JOIN shows all rows from left table (company) - even the records that cannot be joined
-- with customer table records due to NULL customer_id in company table
SELECT * FROM
    company
LEFT JOIN
    customer c ON company.customer_id = c.customer_id

RIGHT JOIN takes every record from right table.

-- on the other hand, RIGHT JOIN shows all rows from right table (customer)
-- - even the records that cannot be joined with company table
-- due to missing customer_id in company table
SELECT * FROM
    company
RIGHT JOIN
    customer c ON company.customer_id = c.customer_id

OUTER JOIN lists all records from left and right, even if they have null as their id (so that they cannot be normally joined).

-- FULL OUTER JOIN lists all rows from both tables
-- no matter if NULL
-- FULL OUTER JOIN is not supported in MySql
-- workaround: https://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/
SELECT * FROM
    company
        LEFT OUTER JOIN
    customer c ON company.customer_id = c.customer_id
UNION
SELECT * FROM
    company
        RIGHT OUTER JOIN
    customer c ON company.customer_id = c.customer_id
ORDER BY company_id DESC

Some other workaround of FULL OUTER JOIN:

-- workaround of FULL OUTER JOIN without using LEFT / RIGHT JOIN
SELECT * FROM
    company
        INNER JOIN
    customer
    ON
            company.customer_id = customer.customer_id
UNION
SELECT *, NULL, NULL, NULL, NULL, NULL, NULL FROM
    company
WHERE
    NOT
            company.customer_id
            IN
            (
                SELECT DISTINCT
                    company.customer_id
                FROM
                    company
                        INNER JOIN
                    customer
                    ON
                            company.customer_id = customer.customer_id
            )
   OR
    company.customer_id IS NULL
UNION
SELECT
       NULL, NULL, NULL, NULL, NULL,
       customer.customer_id, customer.birth_date, customer.first_name,
       customer.last_name, customer.gender, customer.registration_date
FROM customer
ORDER BY company_id DESC
    LIMIT 10;

UNION workarounds for JOIN

-- UNION workaround instead of OUTER JOIN (without LEFT / RIGHT JOIN)
-- customer_id must be not null
SELECT name, company.customer_id FROM
    company
        INNER JOIN
    customer
        ON
    company.customer_id = customer.customer_id
UNION
SELECT name, company.customer_id FROM
    company
WHERE
    NOT
        company.customer_id
    IN
        (
        SELECT DISTINCT
            company.customer_id
        FROM
            company
                INNER JOIN
            customer
                ON
            company.customer_id = customer.customer_id
    )
-- above workaround with all columns from both tables included
-- and rows with null customer_id
SELECT * FROM -- returns columns from company and customer
    company
        INNER JOIN
    customer
    ON
            company.customer_id = customer.customer_id
UNION
SELECT *, NULL, NULL, NULL, NULL, NULL, NULL FROM
-- returns colums from company only (no join), hence null to replace missing columns from customer
    company
WHERE
    NOT
            company.customer_id
            IN
            (
                SELECT DISTINCT
                    company.customer_id
                FROM
                    company
                        INNER JOIN
                    customer
                    ON
                            company.customer_id = customer.customer_id
            )
    OR
        company.customer_id IS NULL;

TBC