9 minute read

Previously on SQL: CRUD, n+1, migrations

Summary: SQL basics for Java devs

Practice your SQL skills. Do not have a feeling that you need to start from the scratch over and over again! In particular, if you’re a beginner, or you do not work with SQL very often (it is not uncommon). SQL problems can be like a Nemesis: I saw senior architects hairs growing grey because of complex SQL issues affecting company performance. They were stammering trying to admit that there is a bug that no one is able to easily resolve.

You can excercise SQL kata at various programming website that I’ve already mentionned earlier.

You are also in position to use any of SQL playgrounds (a.k.a. fiddles), accessible in the web, to run and test some simple query, like db-fiddle It is more like shadow-fighting: you try something, and when it fails, you need to counter your imaginative opponent. For testing more advance queries and their performance, local database, Docker database or remote cloud database would be better along with any of SQL clients, like Workbench or IntelliJ.

Create a basic SQL schema - to have a sample table:

CREATE TABLE test (
    id INT
);
INSERT INTO test (id) VALUES (1);
INSERT INTO test (id) VALUES (2);

Let’s check what this fiddle offers:

SELECT @@version;
-- 5.7.38

and now sample queries to play with - note that this fiddle requires backticks, which weren’t needed in previous examples:

ALTER TABLE `test` add customer_id int;
SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE `TABLE_NAME` = 'test';
SELECT `COLUMN_NAME` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_NAME` = 'test';
SELECT 'anything';

Now, remember, that SELECT can evaluate Boole’s algebra expressions, as well as it can execute arithmetic calculations:

SELECT 1 < 0 AS boolean_value;
SELECT IF(2 + 2 = 4, 'TRUE', 'FALSE') AS two_plus_two_is_four;

The SQL playground I tested did not have problems with more advanced XOR gate example (see first part of this SQL series):

SET @false_xor := 'gate returns false';
SET @true_xor := 'gate returns true';
SELECT
    IF(0 XOR 0, @true_xor, @false_xor) AS '0 XOR 0',
    IF(0 XOR 1, @true_xor, @false_xor) AS '0 XOR 1',
    IF(1 XOR 0, @true_xor, @false_xor) AS '1 XOR 0',
    IF(1 XOR 1, @true_xor, @false_xor) AS '1 XOR 1'

Basic SELECT can be wrapped into null check clause: if value of the field is null, it will be filled with given substitute:

SELECT IFNULL(customer_id, 'it is null, though!') AS 'null checked customer_id' FROM test WHERE id = 1
---

**Query #1**

    SELECT IFNULL(customer_id, 'it is null, though!') AS 'null checked customer_id' FROM test WHERE id = 1;

| null checked customer_id |
| ------------------------ |
| it is null, though!      |

---

Let’s extend the table with some text column:

-- create new column:
ALTER TABLE `test` add country varchar(3);
-- and then add something there:
INSERT INTO test (id, customer_id, country) VALUES (7, null, 'FIN');
INSERT INTO test (id, customer_id, country) VALUES (8, null, 'NOR');

or update existing records with new values:

UPDATE test SET country = 'FIN' WHERE id = 1;
UPDATE test SET country = 'SWE' WHERE id = 2;

To check equality, use equal sign:

SELECT * FROM test WHERE country = 'FIN'

To make loose comparision, use the percent sign as a wildcard on a given side of look up expression:

SELECT * FROM test WHERE country LIKE '%N%';
SELECT * FROM test WHERE country LIKE '%N';
SELECT * FROM test WHERE country LIKE 'N%';

One wildcard replaces one or more characters. The underscore replaces one character:

SELECT * FROM test WHERE country LIKE '__N'

The last but not least, the coalesce keyword returns first non-null value of these listed in parentheses:

SELECT COALESCE(country, "Unknown") FROM test;

It is used to replace null value with a substitute:

FIN
Unknown
FIN
NOR

Aggregations

Let’s create new schema with two tables to test aggregations:

CREATE TABLE country
(
    id   INT,
    code VARCHAR(3)
);
INSERT INTO country (id, code)
VALUES (1, 'SWE'),
       (2, 'FIN'),
       (3, 'NOR'),
       (4, 'ISL'),
       (5, 'DNK');


CREATE TABLE player
(
    id    INT,
    name  VARCHAR(5),
    city  VARCHAR(10),
    games INT
);

INSERT INTO player (id, name, city, games)
VALUES (1, 'Swen', 'Kiruna', 10),
       (2, 'Antti', 'Kotka', 11),
       (3, 'Marit', 'Bergen', 13),
       (4, 'Katja', 'Keflavik', 4),
       (5, 'Karin', 'Odense', 22);

Here are all aggregation commands:

SELECT COUNT(*) AS count_all_records,
MAX(games), 
MIN(games), 
AVG(games)
FROM player

Counting occurences of each name (and grouping by the same name):

SELECT name, COUNT(*) AS occurences
FROM player
GROUP BY (name)

Counting occurences of names with ‘K’:

SELECT name, COUNT(name) AS occurences
FROM player
WHERE name LIKE '%K%'
GROUP BY (name)

Group by played games and count how many players achieved this number:

SELECT games, COUNT(games) AS players_with_this_qty_of_games
FROM player
GROUP BY (games)
HAVING count(games) < 13

Remember: WHERE is used before GROUP BY, HAVING after GROUP BY.

Both can be used in the same query!

Joining

UNION joins results of one query with results of another query from the same table, but omits duplicated rows (which matches both queries). You can UNION different tables, but number of colums in one table must equal numer of columns in the other table.

SELECT * FROM player WHERE games > 10
UNION
SELECT * FROM player WHERE name LIKE '%K%'

UNION ALL allows duplicates, so Karin would be listed twice.

Let’s change id column names to precisely indicate what id they are referring to.

CREATE TABLE country (
                         country_id INT,
                         code VARCHAR(3)
);
INSERT INTO country (country_id, code) VALUES (1, 'SWE'),
(2, 'FIN'), (3, 'NOR'), (4, 'ISL'), (5, 'DNK');


CREATE TABLE player (
                        player_id INT,
                        name VARCHAR(5),
                        city VARCHAR(10),
                        games INT
);
INSERT INTO player (player_id, name, city, games) VALUES (1, 'Swen', 'Kiruna', 10),
(2, 'Antti', 'Kotka', 11),(3, 'Marit', 'Bergen', 13),(4, 'Katja', 'Keflavik', 4),(5, 'Karin', 'Odense', 22);

Let’s update schemas by adding primary and foreign keys, required in joining operations.

ALTER TABLE player ADD country_id INT;
-- this does the trick: by accident, country_id should be the same as player id, so let's take advantage of that
UPDATE player SET country_id = player_id;

INNER JOIN joins table using primary and foreign keys. Inner join syntax looks like this:

SELECT * FROM
    customer
INNER JOIN
    field f ON customer.customer_id = f.customer_id

Apply it to our schema:

SELECT * FROM player INNER JOIN country ON player.country_id = country.id

Alternative syntax with USING. Foreign key column name: country_id in Player table matches country_id in Country table. It is required when using USING keyword, so that SQL knows how to connect tables via columns.

-- USING
SELECT * FROM
    company
        INNER JOIN
    customer USING(customer_id);

With Player and Country tables:

SELECT * FROM player INNER JOIN country USING (country_id)

INNER JOIN can be applied to more than two tables. You can also join using third, “helper” table.

CROSS JOIN makes Cartesian product if no WHERE is specified (each row x each row). With WHERE, it joins:

SELECT * FROM player CROSS JOIN country WHERE country.country_id = player_id

LEFT JOIN, RIGHT JOIN, OUTER JOIN

What is the difference between them?

ChatGPT offers concise summary:

LEFT JOIN

  • Also known as a LEFT OUTER JOIN.
  • Returns all the rows from the left table (the table mentioned before the LEFT JOIN clause) and the matching rows from the right table (the table mentioned after the LEFT JOIN clause).
  • If there are no matching rows in the right table, NULL values are returned for the columns of the right table.
  • This type of join ensures that all rows from the left table are included in the result, with the possibility of additional data from the right table if a match exists.
SELECT employees.name, departments.department_name
FROM employees
LEFT JOIN departments ON employees.department_id = departments.id;

RIGHT JOIN Also known as a RIGHT OUTER JOIN. Returns all the rows from the right table and the matching rows from the left table. If there are no matching rows in the left table, NULL values are returned for the columns of the left table. This join is less commonly used than the LEFT JOIN but has the same purpose, ensuring that all rows from the right table are included in the result.

SELECT employees.name, departments.department_name
FROM employees
RIGHT JOIN departments ON employees.department_id = departments.id;

FULL OUTER JOIN (OUTER JOIN): A FULL OUTER JOIN combines the result sets of both the left and right tables. It returns all the rows from both tables and matches rows where the join condition is met. If there are no matches in either table, NULL values are returned for the columns from the table without a match. The result includes all rows from both tables, ensuring that no data is excluded.

SELECT employees.name, departments.department_name
FROM employees
FULL OUTER JOIN departments ON employees.department_id = departments.id;

It’s important to note that not all database systems support RIGHT JOIN and FULL OUTER JOIN directly, and you may need to use alternative methods to achieve the same results in those cases, such as swapping the order of tables or using UNION clauses.

See previous article on JOINs: union vs join, left join, right join, inner vs outter join

Update: 15.11.2023 - other questions

Inner join vs outer join: what’s the difference?

An inner join returns only the rows from both tables that satisfy the specified join condition (can be joined by indicated field). Rows that do not have matching values in the joined columns are excluded from the result set.

An outer join returns all the rows from one table and the matching rows from the other table, being connected by indicated field. But if there is no match, the result will contain NULL values for columns from the table that does not have a matching row.

How SQL GROUP BY command works?

GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, often for the purpose of applying aggregate functions to each group:

SELECT security_branch, COUNT(user_id) as user_count, MAX(last_login_datetime) as latest_login
FROM cybersecurity_users
GROUP BY security_branch;

with result:

+-------------------+------------+------------------------+
| security_branch   | user_count | latest_login  |
+-------------------+------------+------------------------+
| Threat Analysis   | 25         | 2023-11-15T08:30:00Z   |
| Incident Response | 18         | 2023-09-28T15:45:00Z   |
| Penetration Testing | 12       | 2023-07-05T12:10:00Z   |
| Security Operations | 30      | 2023-08-10T18:22:30Z   |
| Compliance        | 15         | 2023-09-02T09:55:45Z   |
+-------------------+------------+------------------------+

What is ORM?

ORM stands for Object-Relational Mapping. It is a programming paradigm that allows you to interact with a relational database using an object-oriented programming. ORM consists on mirroring logical entries (entities) from database tables to entites written in programing language on the application side.

Key features:

  • Mapping: ORM systems map database tables to classes, with each row in a table corresponds to an instance of a class, and each column corresponds to an attribute or property of that class.
  • Data abstraction: ORM abstracts away the details of database interactions, you deal with the objects / classes, not with the SQL queries.
  • CRUD: ORM systems provide methods and APIs for performing CRUD (Create, Read, Update, Delete) operations on database entities.
  • Relationships: ORM systems handle relationships between entities, such as one-to-one, one-to-many, and many-to-many relationships.
  • Portability: ORM systems often provide a level of database portability, allowing developers to switch between different database management systems (e.g., MySQL, PostgreSQL, Oracle) with minimal code changes. The ORM system abstracts the differences in SQL syntax and handles them internally.
  • Performance optimization: ORM systems may include features for optimizing database access, such as lazy loading (loading data on demand), caching, and query optimization.