<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.5">Jekyll</generator><link href="https://mzacki.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mzacki.github.io/" rel="alternate" type="text/html" /><updated>2024-04-11T17:35:50+00:00</updated><id>https://mzacki.github.io/feed.xml</id><title type="html">Enter the Bee</title><subtitle>Engineer's logbook</subtitle><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><entry><title type="html">Schema-on-read vs schema-on-write</title><link href="https://mzacki.github.io/schemas/" rel="alternate" type="text/html" title="Schema-on-read vs schema-on-write" /><published>2024-04-10T21:37:00+00:00</published><updated>2024-04-10T21:37:00+00:00</updated><id>https://mzacki.github.io/schemas</id><content type="html" xml:base="https://mzacki.github.io/schemas/"><![CDATA[<ul>
  <li>what is the <strong>schema versus no schema</strong> opposition?</li>
  <li>is there always some schema in the database?</li>
  <li>how implicit schema differs from explicit schema?</li>
</ul>

<h3 id="schema-less--schema-on-read">Schema-less / schema-on-read</h3>

<p>The term <strong>schema-less</strong> (or schema-on-read) refers to a data processing approach where the structure or schema of the data is interpreted or applied
just when the data is read from the storage rather than when it is initially loaded into or written to the persistency layer.</p>

<p>In this approach, the data is stored in its raw or semi-structured form.</p>

<p>Now is the good time to ask the question:</p>

<blockquote>
  <p>Is there always a schema in the database?</p>
</blockquote>

<p>Even in schema-less databases, there are some expectations about the data structure. 
One may expect key-value pairs or document structures, which form a kind of an <strong>implicit schema</strong>.
However, this schema is not enforced at the database level, allowing for more flexibility in data storage and retrieval.
It means that the database system itself does not impose strict rules or requirements on the structure of the data being stored.
So each document can have different fields and structures. It allows for more dynamic and agile development process.
Documents, fields, entities can be created, updated, transformed and removed during development of an application.</p>

<p>But there is a drawback: changing the structure of data may lead to some inconsistency between the software layer and the persistency layer,
for example, if a field has been renamed or its type changed and there are new and old documents in the same database. 
Depending on a detailed implementation, some operations using such critical field, like sorting, may fail.</p>

<p><strong>Schema-on-read</strong> approach gives flexibility during development (no need for traditional data migration known from relational databases).
It also gives scalability which makes the maintenance process cost- and asset-efficient. This matters significantly in the context of
contemporary distributed systems, orchestration (Kubernetes) and cloud-native application development. Another quality is adaptability to business requirements.</p>

<p>The opposite of schema-less / schema-on-read is the <strong>schema-on-write</strong> concept.</p>

<h3 id="schema-enforced--schema-on-write">Schema-enforced / schema-on-write</h3>

<p>Before data is inserted into the database, it must conform to a predefined schema, otherwise it is rejected or throws an error.
In traditional relational databases, <strong>you must define a schema upfront</strong>: specifying the tables, columns, and their data types before inserting any data.
Any attempt to insert data that doesn’t conform to this schema will result in an error.</p>

<p>Schema-less systems are very fast. They offer easy to query / easy to store possibilities of working with databases, favouring availability over consistency and integrity.</p>

<p>On the other hand, the schema-on-write approach means that <strong>database system must conduct verification of queries</strong>, to check whether it matches required data structure, and so on.
The verification process is not straightforward and it certainly takes some time, which implies some time overhead.
Then, after the initial checks, the database system applies optimization of queries. 
Why? After all, it should not be too slow! In this way, the traditional, realational DBMS (SQL-like) could catch up its speed drawbacks, comparing to the NoSQL solutions.</p>

<p>Typically, the schema-on-write approach is related to relational databases, e.g. RDMS using SQL dialects.</p>

<h3 id="schema-on-write-in-details-how-a-sql-statement-is-processed-in-database">Schema-on-write in details: how a SQL statement is processed in database?</h3>

<p>In case of the SQL example, every SQL statement (in general, every query sent to the database) is subjected to verification and optimization.
But before it happens, first it needs to be parsed by SQL engine.</p>

<h6 id="parsing">Parsing</h6>

<p>The parser breaks down statement input into parse tree (syntax tree).</p>

<p>What is syntax tree?</p>

<p>An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program.
It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language.
Each node of the tree denotes a construct occurring in the text (variable, assignment, operator).
To put it simply, the syntax tree is a kind of a graph that depicts the structure of a given statement, its elements and relations between them.</p>

<h5 id="validation">Validation</h5>

<p>A query (statement) that has been already parsed is subjected to the validation process. Database tables, columns, and permissions are checked.
First, the syntax analysis happens. It checks if the statement matches required SQL dialect (and if it uses SQL at all, and not tagalog, for instance!).
Then, the semantic analysis takes place. It checks whether the statement matches the database structure, regarding the naming of tables and columns.
Also it verifies the sender’s permissions to execute given operation.</p>

<h5 id="optimization">Optimization</h5>

<p>How SQL statement is optimized?</p>

<p>SQL query optimization happens in the database query processing pipeline. It involves the query optimizer.
The query optimizer job is to determine the most efficient query execution plan by various actions:</p>
<ul>
  <li>query rewriting &amp; transformation (e.g. commands reordering, changing subqueries into joins, etc.)</li>
  <li>cost-based optimization</li>
  <li>index selection</li>
  <li>join order change</li>
  <li>filtering and predicate can be moved closer to the data source</li>
  <li>check if there is a possibility of parallel execution</li>
  <li>memory management</li>
  <li>check if there is a possibility of query plan caching</li>
</ul>

<h5 id="execution">Execution</h5>

<p>The final stage is the query execution.
The chosen execution plan is passed to the query executor, which carries out the actual retrieval, modification, or insertion of data.
Data are accessed and the result is generated and returned to the sender.</p>

<h5 id="what-algorithms-are-used-to-calculate-the-most-efficient-query-execution-plan">What algorithms are used to calculate the most efficient query execution plan?</h5>

<p>Specific algorithms used by the query optimizer can vary between different database systems.
Common techniques include:</p>
<ul>
  <li>cost of each query estimations (in terms of the CPU, I/O, memory usage), it uses mathematical models to count the cost (so-called <strong>cost models</strong>)</li>
  <li>proper search algorithms selection</li>
  <li>application of statistical information</li>
  <li>join ordering heuristics</li>
  <li>and others.</li>
</ul>

<h5 id="are-execution-plans-cached-in-databases">Are execution plans cached in databases?</h5>

<p>Yes, the query execution plans might be cached as it gives better performance.
Many RDBMS employ cache to store and reuse execution plans for frequently executed SQL queries.
This caching mechanism helps to avoid the cost of repeatedly optimizing and generating execution plans for the same queries.</p>

<h5 id="what-are-steps-for-query-execution-plan-caching">What are steps for query execution plan caching?</h5>

<ol>
  <li>Query compilation: When a SQL query is first executed, the database system goes through the parsing, optimization, and execution steps to generate an execution plan tailored 
to the current state of the database.</li>
  <li>Query plan storage: After optimization, the generated execution plan is stored in the query plan cache associated with the specific query text or a hash of the query text.</li>
  <li>During subsequent executions, when the same or a similar query is executed again, the database system checks the query plan cache first.</li>
  <li>Cache hit: if a matching execution plan is found in the cache (a cache hit), the system can directly reuse the stored plan instead of going through the entire optimization 
process again.</li>
  <li>Cache miss: if there is no matching execution plan in the cache (a cache miss), the database system reoptimizes the query and generates a new execution plan, which is then 
stored in the cache for future use.</li>
</ol>

<h3 id="schema-less-vs-schema-enforced-in-practice-nosql-vs-sql">Schema-less vs schema-enforced in practice: NoSQL vs SQL</h3>

<p>As we could see, schema-enforced solutions, related with SQL databases, imply a lot of validations, verifications, checks and optimizations before, during and after statement 
execution (see SQL triggers, for instance). This may suggest that they are generally more secure. Certainly, here, the emphasis is being put on data integrity and constraints.
Schema-enforce not only allows for writing more secure applications, especially when dealing with sensitive data, critical sections and crucial operations, but even requires
serious approach to data consistency and security.</p>

<p>Here, there are multiple security layers:</p>

<p>RDBMS infrastructure (this aspect is present in NoSQL databases as well)</p>
<ul>
  <li>security of physical storage (bare metal, cloud)</li>
  <li>protection of the connection between application and database (e.g REST API calls)</li>
  <li>Identity and Access Management (IAM): access to the database as admin, user, application)</li>
</ul>

<p>RDMS verification &amp; constraints layer</p>
<ul>
  <li>data consistency &amp; integrity requirements</li>
  <li>data constraints</li>
  <li>authorization check to execute operations / to access databases and tables</li>
  <li>query verification</li>
</ul>

<p>Software layer</p>
<ul>
  <li>code should be consitent with database constraints</li>
  <li>may add additional authenticaton &amp; authorization layer, like user roles and authorities</li>
  <li>ORM framework adds another layer of protection</li>
  <li>there are prepared statements to protect against SQL injection</li>
  <li>custom validation for input, data and database access may be applied</li>
</ul>

<p>SQL-like languages have highly structured schemas and often use normalization. They put a strong emphasis on ACID, transactions, <strong>data integrity and consistency</strong>
Their drawback is lower speed and worse scalability. Here, the NoSQL solutions offers much more:</p>

<ul>
  <li>they are based on flexible schema, not relational tables</li>
  <li>storage formats: JSON, BSON (binary JSON) key-value pairs, key-document pairs, BSON, graphs</li>
  <li>easier for horizontal scaling (more nodes or servers)</li>
  <li>flexible schemas (document-bases, key-value pairs, column-based, graphs)</li>
  <li><strong>often uses denormalization</strong></li>
  <li>NoSQL <strong>favors system availability and fault tolerance instead of consistency / integrity</strong></li>
  <li>JSON-based queries, SQL languages, other DDL languages</li>
</ul>

<p>There are various types of NoSQL databases:</p>
<ul>
  <li>Documents store (MongoDb). Data as documents (JSON, BSON). Each document is a set of key-value pairs or key-document pairs.</li>
  <li>Key-value pairs (Redis)</li>
  <li>Column stores (Apache Cassandra). Data organized in columns instead of rows. Well-suited for analytics and time-series data.</li>
  <li>Graph databases</li>
  <li>Object-oriented: store data as objects.</li>
  <li>Multimodel - different approaches in one base</li>
  <li>others: XML, NewSQL, time-series</li>
</ul>

<p>Schema-less solutions are ideal for scenarios with large amounts of unstructured or semi-structured data, high read/write throughput, and the need for horizontal scalability.
They are common in web applications, content management, real-time big data processing, and distributed systems.</p>

<p>A particular feature for speed and scalability is <strong>sharding</strong>.
Sharding is a database architecture strategy where a large database is partitioned into smaller, more manageable pieces called shards.
Each shard is an independent database server with independent storage. It stores a subset of the overall data.
The goal of sharding is to distribute the data and the associated workload across multiple servers, improving performance, scalability, and resource utilization.</p>

<p>Used, for example, in Redis, ElasticSearch, MongoDB and Postgres / MySQL (with extensions for sharding).</p>

<h3 id="nosql---secure-or-not">NoSQL - secure or not?</h3>

<p>NoSQL systems are facing the same challenges as traditional databases. They were subjected to data breaches as well (see for example Mexican voters registry leak, where data were 
stored in 
MongoDB, 2016). <strong>NoSQL injection attacks might be possible</strong>. Some sources claim the authorization and encryption mechanisms are weaker in NoSQL, but this opinion is 
disputable and has no clear argumentation.</p>

<p>If a client communicates with a NoSQL database via plain text, it poses risk of the man-in-the-middle attack. But the same can be true in case of a relational database if the 
network traffic is not properly secured.</p>

<p>When comparing SQL with NoSQL solutions in terms of security, the only thing that seems to be in favour the former is the strict schema that requires predefined data structure 
and applies internal validation of statements. In addition, it requires stronger data integrity and consistency, often with explicit data constraints.</p>

<p>However, it does not guarantee security. On the other hand, it is possible, that schema-less persistency supported by a proper security layer and data validation could be 
compensated on the application side.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="security" /><category term="SQL" /><category term="NoSQL" /><category term="security, SQL, NoSQL, data protection" /><summary type="html"><![CDATA[Schemas - which of them are more secure?]]></summary></entry><entry><title type="html">Defensive programming in Java</title><link href="https://mzacki.github.io/defensive-programming/" rel="alternate" type="text/html" title="Defensive programming in Java" /><published>2023-11-11T16:53:00+00:00</published><updated>2023-11-11T16:53:00+00:00</updated><id>https://mzacki.github.io/defensive-programming</id><content type="html" xml:base="https://mzacki.github.io/defensive-programming/"><![CDATA[<p>In this article we will discuss the principles of defensive programming in Java.
Defensive programming involves writing code that anticipates and protects against potential errors and unexpected situations 
to ensure the reliability and robustness of a software.</p>

<h3 id="ensure-proper-exception-handling">Ensure proper exception handling</h3>

<p>As a reminder: all exceptions (including errors) are subclasses of <code class="language-plaintext highlighter-rouge">Throwable</code> class.</p>

<p><code class="language-plaintext highlighter-rouge">Throwable</code> objects - commonly called <strong>Exceptions</strong> - can be either <strong>exceptions</strong> or <strong>errors</strong>.</p>

<p>Because of that, <code class="language-plaintext highlighter-rouge">Throwable</code> has two subclasses - <code class="language-plaintext highlighter-rouge">Error</code> and <code class="language-plaintext highlighter-rouge">Exception</code>:</p>

<p><code class="language-plaintext highlighter-rouge">Throwable -&gt; Error</code></p>

<p><code class="language-plaintext highlighter-rouge">Throwable -&gt; Exception</code></p>

<p><strong>What is the difference between error and exception?</strong></p>

<blockquote>
  <p>Errors in Java represent serious, usually unrecoverable problems that occur at runtime.
They are rather related to a sudden change of state of the application and its environment.
They are typically caused by issues outside the control of the program, such as system failures, hardware problems, or severe environmental conditions.</p>
</blockquote>

<p>Errors are <strong>unchecked</strong>: they are not meant to be caught or handled by the application code.</p>

<p>Examples of errors: <code class="language-plaintext highlighter-rouge">OutOfMemoryError</code>, <code class="language-plaintext highlighter-rouge">StackOverflowError</code></p>

<blockquote>
  <p>Exceptions are related to abnormal conditions or unexpected situations during the execution of a program. 
Exceptions are often caused by faulty code or invalid inputs.</p>
</blockquote>

<p>Exceptions are either meant to be caught and handled by the application (when they are “checked”), or they should be avoided by correct program logic, including input validation (when they are “unchecked”).</p>

<p>Proper handling of checked exceptions  requires try-catch blocks to catch exceptions, implementing error-handling logic (exception handlers, e.g. using Spring framework) or recovery mechanisms (also possible with Spring).</p>

<p>Examples of unchecked exceptions: infamous <code class="language-plaintext highlighter-rouge">NullPointerException</code>, <code class="language-plaintext highlighter-rouge">ArrayIndexOutOfBoundsException</code>. <code class="language-plaintext highlighter-rouge">&lt;--- should not be caught &amp; handled, should be avoided by correct coding!</code></p>

<p>Examples of checked exceptions: <code class="language-plaintext highlighter-rouge">IOException</code>, <code class="language-plaintext highlighter-rouge">FileNotFoundException</code>, <code class="language-plaintext highlighter-rouge">SQLException</code>. <code class="language-plaintext highlighter-rouge">&lt;--- we should predict them and be prepared (catch &amp; handle)!</code></p>

<p>So exceptions can be <strong>checked</strong> or <strong>unchecked</strong>.
Unchecked exceptions are subclasses of <code class="language-plaintext highlighter-rouge">RuntimeException</code>.
All other exceptions are checked. They are directly under <code class="language-plaintext highlighter-rouge">Exception</code> class.</p>

<p><code class="language-plaintext highlighter-rouge">Throwable -&gt; Exception -&gt; RuntimeException -&gt; (unchecked exceptions)</code></p>

<p><code class="language-plaintext highlighter-rouge">Throwable -&gt; Exception -&gt; (checked exceptions)</code></p>

<p>To sum up, unchecked issues are Errors, RuntimeException and its descendants.</p>

<p>The Exception class and its descendants are checked issues, and they require proper handling. 
When such exception has been thrown, the control flow is transferred to the nearest exception handler.
In Java, checked exceptions are tracked by the compiler.</p>

<p><strong>Exception rethrowing and chaining</strong></p>

<p><strong>Exception rethrowing</strong> is simply throwing again caught exception in <code class="language-plaintext highlighter-rouge">catch</code> clause.</p>

<p><strong>Exception chaining</strong> is wrapping caught exception into a new one (of another class) 
It is also useful if a checked exception occurs in a method that is not allowed to throw a checked exception. 
You can catch the checked exception and chain it to an unchecked one.</p>

<p>Use <code class="language-plaintext highlighter-rouge">initCause</code>, <code class="language-plaintext highlighter-rouge">getCause()</code> and a constructor with <code class="language-plaintext highlighter-rouge">Cause</code> to pass original exception so that it could be retrieved later.</p>

<h3 id="exception-handling-and-security">Exception handling and security</h3>

<p>Now it is important to ask what is the meaning of proper exception handling for application security.</p>

<p>Handling the exceptions may be seen as trivial, but often it is not, leading to non-readible, overcomplicated spaghetti code, with nested try-catch clauses.</p>

<p>Not all business scenarios are “happy path”.</p>

<p>When exception handling is simple and robust, potential vulnerabilities or sensitive information exposure are handled safely to prevent security breaches.
It increases software integrity then. The control over the system is enhanced, with greater stability and easier monitoring.
It allows to prepare right responses to exceptional situations.</p>

<h3 id="close-the-resources-try-with-resources-autocloseable-vs-closeable">Close the resources: try-with-resources, Autocloseable vs Closeable</h3>

<p>Exception handling in resource management is another problem.</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// suppose there is some collection to iterate through it:</span>
<span class="kt">var</span> <span class="n">input</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">ArrayList</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">&gt;();</span>
<span class="c1">// resources comes into play...</span>
<span class="nc">PrintWriter</span> <span class="n">out</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">PrintWriter</span><span class="o">(</span><span class="s">"output.txt"</span><span class="o">);</span>
<span class="k">for</span> <span class="o">(</span><span class="nc">String</span> <span class="n">row</span> <span class="o">:</span> <span class="n">input</span><span class="o">)</span> <span class="o">{</span>
<span class="n">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">row</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">());</span>
<span class="o">}</span>
<span class="c1">// when exception occurs, this code is never reached:</span>
<span class="n">out</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
</code></pre></div></div>

<p>Try-with-resources guarantees that the resources are closed regardless the exception:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">var</span> <span class="n">out</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">PrintWriter</span><span class="o">(</span><span class="s">"output.txt"</span><span class="o">);</span>
<span class="k">try</span> <span class="o">(</span><span class="n">out</span><span class="o">)</span> <span class="o">{</span>
<span class="k">for</span> <span class="o">(</span><span class="nc">String</span> <span class="n">row</span> <span class="o">:</span> <span class="n">input</span><span class="o">)</span>
<span class="n">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">row</span><span class="o">.</span><span class="na">toLowerCase</span><span class="o">());</span>
<span class="o">}</span> <span class="c1">// out.close() is called implicitely!</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">out.close()</code> method is called behind the scenes, because <code class="language-plaintext highlighter-rouge">PrintWriter</code> implements <code class="language-plaintext highlighter-rouge">AutoCloseable</code>.
The resources will be closed when try-with-resources exits, no matter if an exception has been thrown.
And no need to worry about closing resources when the code executes normally. It happens automagically!</p>

<blockquote>
  <p>Closeable implements AutoCloseable so it can use try-with resources to automatically close the resources.
Autocloseable: close() throws Exception
Closeable: close() throws IOException. Older interface, specifically designed for I/O-related classes.</p>
</blockquote>

<p>When there are two resources declared and initialized in try() clause (which is a valid and acceptable case):</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">(</span><span class="nc">Scanner</span> <span class="n">in</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Scanner</span><span class="o">(</span><span class="nc">Paths</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"/usr/share/dict/words"</span><span class="o">));</span>
<span class="nc">PrintWriter</span> <span class="n">out</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">PrintWriter</span><span class="o">(</span><span class="s">"output.txt"</span><span class="o">))</span> <span class="o">{</span>
<span class="k">while</span> <span class="o">(</span><span class="n">in</span><span class="o">.</span><span class="na">hasNext</span><span class="o">())</span>
<span class="n">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">in</span><span class="o">.</span><span class="na">next</span><span class="o">().</span><span class="na">toLowerCase</span><span class="o">());</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Here:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Resources are closed in reverse order of their initialization: <code class="language-plaintext highlighter-rouge">out</code> is closed before <code class="language-plaintext highlighter-rouge">in</code>.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />When PrintWriter throws exception, try() mechanism closes <code class="language-plaintext highlighter-rouge">in</code> and propagates the exception from <code class="language-plaintext highlighter-rouge">out</code>.</li>
</ul>

<h3 id="suppression-mechanism">Suppression mechanism</h3>

<p>Very interesting aspect is discussed by Cay S. Horstmann in his Core Java:</p>

<blockquote>
  <p>Some close methods can throw exceptions. If that happens when the try block completed normally, the exception is thrown to the caller. 
However, if another exception had been thrown, causing the close methods of the resources to be called, and one of them throws an exception, 
that exception is likely to be of lesser importance than the original one. In this situation, the original exception gets rethrown, and the exceptions
from calling close are caught and attached as “suppressed” exceptions.</p>
</blockquote>

<p>After catching the first exception, it is possible to get to the supressed exception using <code class="language-plaintext highlighter-rouge">getSuppressed()</code>:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">try</span> <span class="o">{</span>
<span class="c1">// something here throws the more important exception</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">IOException</span> <span class="n">ex</span><span class="o">)</span> <span class="o">{</span>
<span class="nc">Throwable</span><span class="o">[]</span> <span class="n">secondaryExceptions</span> <span class="o">=</span> <span class="n">ex</span><span class="o">.</span><span class="na">getSuppressed</span><span class="o">();</span>
<span class="c1">// here we can catch supressed</span>
<span class="o">}</span>
</code></pre></div></div>
<p><strong>Do not throw from finally</strong>.</p>

<p>The suppression mechanism works only for try-with-resources. <strong>Do not throw exceptions in <code class="language-plaintext highlighter-rouge">finally</code> clause</strong>. 
If try() block terminates with an exception, this exception is masked by an exception
in <code class="language-plaintext highlighter-rouge">finally</code> clause.</p>

<p><strong>Do not return from finally</strong>.</p>

<p>If try() has <code class="language-plaintext highlighter-rouge">return</code> statement and there is another <code class="language-plaintext highlighter-rouge">return</code> in <code class="language-plaintext highlighter-rouge">finally</code>, the latter (<code class="language-plaintext highlighter-rouge">return</code> from <code class="language-plaintext highlighter-rouge">finally</code>) overshadows the former (from <code class="language-plaintext highlighter-rouge">try</code>).</p>

<h3 id="resource-management-and-security">Resource management and security</h3>

<p>While try-with-resources itself is not a security feature per se, 
it does play a role in promoting secure coding practices and <strong>preventing resource leaks</strong> that could lead to security vulnerabilities.</p>

<p>By automatically managing the closing of resources, try-with-resources reduces the likelihood of resource leaks. 
Leaked resources, such as open file handles or network connections, can pose security risks and impact the stability of an application.</p>

<p>Properly closing resources prevents <strong>resource exhaustion attacks</strong> where an attacker intentionally consumes available resources 
(e.g., file descriptors) to degrade system performance or cause <strong>denial-of-service</strong> conditions.</p>

<p>For security-sensitive resources like cryptographic streams or database connections, try-with-resources ensures that they are properly released, 
<strong>reducing the window of opportunity</strong> for potential security vulnerabilities related to resource mismanagement.</p>

<p>Last but not least, with try-with-resources, handling exceptions related to resource cleanup is simplified.</p>

<h3 id="assertions">Assertions</h3>

<p>Java supports assertions, which are boolean expressions that the programmer believes will be true at that point in the code. 
They are useful during development and testing to catch potential issues early.
They can be disabled in production code (actually, assertions are disabled by default).</p>

<p>Instead of checking the condition with <code class="language-plaintext highlighter-rouge">if</code>:</p>

<p><code class="language-plaintext highlighter-rouge">if (x &lt; 0) throw... </code></p>

<p>which is expensive and slows down the program, we can use assertions.</p>

<p>Enable assertions:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java <span class="nt">-ea</span> MainClass
</code></pre></div></div>

<p>and then put into code:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">assert</span> <span class="n">x</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">:</span> <span class="s">"x should be greater than 0"</span><span class="o">;</span>
</code></pre></div></div>

<p>We can enable assertions for single classer or packages as well:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java <span class="nt">-ea</span>:MyCustomClass <span class="nt">-ea</span>:com.mycompany.somepackage... MainClass
</code></pre></div></div>

<p>Assertions are handled by the class loader.
When they are disabled (by default!), the class loader removes all assertion code so that it won’t slow execution (e.g. in production environment).</p>

<p>We can customize disabling assertions as well using <code class="language-plaintext highlighter-rouge">-da</code> flag (a.k.a. switch):</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>java <span class="nt">-ea</span>:... <span class="nt">-da</span>:MyCustomClass MainClass
</code></pre></div></div>

<p>During the development phase, assertions can serve as a form of security audit by checking that security-critical conditions or assumptions are met. 
However, assertions should not be relied upon as the primary means of enforcing security.</p>

<h3 id="many-faces-of-defensive-programming">Many faces of defensive programming</h3>

<p>Other rules of defensive programming are related to:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Null-checks</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Immutable classes and defensive copying</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Input validation (already discussed a propos SQL injection)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Proper logging</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Design patterns &amp; the use of tested and known algorithms</li>
</ul>

<p>It is particularly important not reinvent the wheel, when it comes to the existing algorithms. 
They are well-grounded in computer science theory, battle-tested by wide community and mesurable
in terms of efficiency and outcome. Writing own implementation is not always a good idea, as it may lead to inefficient behaviour and unpredictable results.
And unless you are a cryptography maven, do not try to implement your own cryptography solutions into industry-grade code.</p>

<p>TBC in next articles.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="security" /><category term="defensive_programming, security" /><summary type="html"><![CDATA[Defensive programming: exception handling, try-with-resources, AutoCloseable and Closeable, assertions.]]></summary></entry><entry><title type="html">SQL injection and how to mitigate</title><link href="https://mzacki.github.io/sql-security-2/" rel="alternate" type="text/html" title="SQL injection and how to mitigate" /><published>2023-10-01T20:23:00+00:00</published><updated>2023-10-01T20:23:00+00:00</updated><id>https://mzacki.github.io/sql-security-2</id><content type="html" xml:base="https://mzacki.github.io/sql-security-2/"><![CDATA[<p>Previously on SQL: <a href="/sql-security-1">Intro to SQL security</a></p>

<h3 id="sql-injection-attack">SQL injection attack</h3>

<p><strong>SQL injection</strong> - vulnerability that occurs when an attacker is able to manipulate SQL query or inject own SQL command (a part of SQL query) into an SQL query.</p>

<p><strong>SQL injection attack</strong> - happens when a vector of attack is closely related with a SQL vulnerability and an attacker takes advantage of such vulnerabilty, injecting malicioius SQL code.</p>

<blockquote>
  <p>SQL injection is widespread because it is easily detected and exploited!</p>
</blockquote>

<p>Possible results of SQL injection attack: unauthorized access to database, unauthorized read of data, such as user login and passwords, 
data manipulation, taking control over the operating system.</p>

<h3 id="malicious-sql">Malicious SQL</h3>

<p>Suppose there is a web application that exposes the endpoint:</p>

<pre><code class="language-txt">https://somecompany.com/customers/search?last_name=something
</code></pre>
<p>This endpoint allows searching for customers of this company in the application database. 
Let’s imagine the app is connected to the <code class="language-plaintext highlighter-rouge">customers</code> database, similar to the one discussed in previous posts. 
Request sent to the endpoint executes some code of the application, performing search in the database. 
Let’s say it is a method that triggers following query:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">last_name</span> <span class="k">LIKE</span> <span class="s1">'%something%'</span> <span class="k">AND</span> <span class="n">active</span><span class="o">=</span><span class="k">true</span>
</code></pre></div></div>

<p>where <code class="language-plaintext highlighter-rouge">something</code> is mocking user input, pasted into the browser, e.g. into html input field.</p>

<p>In Java such method could look like this:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">java.sql.Connection</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.DriverManager</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.ResultSet</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.SQLException</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.Statement</span><span class="o">;</span>

<span class="kd">public</span> <span class="kd">class</span> <span class="nc">DatabaseQueryExecutor</span> <span class="o">{</span>

    <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">executeSearchQuery</span><span class="o">(</span><span class="nc">String</span> <span class="n">lastName</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">Connection</span> <span class="n">connection</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
        <span class="nc">Statement</span> <span class="n">statement</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>

        <span class="k">try</span> <span class="o">{</span>
            <span class="c1">// Set up the database connection (replace with your database URL, username, and password)</span>
            <span class="nc">String</span> <span class="n">dbUrl</span> <span class="o">=</span> <span class="s">"jdbc:mysql://your-database-url"</span><span class="o">;</span>
            <span class="nc">String</span> <span class="n">dbUser</span> <span class="o">=</span> <span class="s">"your-username"</span><span class="o">;</span>
            <span class="nc">String</span> <span class="n">dbPassword</span> <span class="o">=</span> <span class="s">"your-password"</span><span class="o">;</span>

            <span class="n">connection</span> <span class="o">=</span> <span class="nc">DriverManager</span><span class="o">.</span><span class="na">getConnection</span><span class="o">(</span><span class="n">dbUrl</span><span class="o">,</span> <span class="n">dbUser</span><span class="o">,</span> <span class="n">dbPassword</span><span class="o">);</span>
            <span class="n">statement</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="na">createStatement</span><span class="o">();</span>

            <span class="c1">// Construct the SQL query using placeholders for search terms</span>
            <span class="nc">String</span> <span class="n">sqlQuery</span> <span class="o">=</span> <span class="s">"SELECT * FROM customer WHERE last_name LIKE '%"</span> <span class="o">+</span> <span class="n">lastName</span> <span class="o">+</span> <span class="err">"</span><span class="o">%</span><span class="err">'</span> <span class="no">AND</span> <span class="n">department</span> <span class="o">=</span><span class="mi">1</span><span class="o">;</span>

            <span class="c1">// Execute the SQL query</span>
            <span class="nc">ResultSet</span> <span class="n">resultSet</span> <span class="o">=</span> <span class="n">statement</span><span class="o">.</span><span class="na">executeQuery</span><span class="o">(</span><span class="n">sqlQuery</span><span class="o">);</span>

            <span class="c1">// Process the results (you can replace this with your specific logic)</span>
            <span class="k">while</span> <span class="o">(</span><span class="n">resultSet</span><span class="o">.</span><span class="na">next</span><span class="o">())</span> <span class="o">{</span>
                <span class="c1">// Retrieve and process data here</span>
            <span class="o">}</span>
        <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
        <span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
            <span class="c1">// Close resources in the reverse order of their creation</span>
            <span class="k">try</span> <span class="o">{</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">statement</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">statement</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
                <span class="o">}</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">connection</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">connection</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
                <span class="o">}</span>
            <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
                <span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>

</code></pre></div></div>
<p>With URL request for word: <code class="language-plaintext highlighter-rouge">malicious'</code>:</p>

<p><code class="language-plaintext highlighter-rouge">https://somecompany.com/customers/search?first_name=malicious'</code></p>

<p>SQL should be:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">last_name</span> <span class="k">LIKE</span> <span class="s1">'%malicious'</span><span class="o">%</span><span class="s1">' AND active=true
</span></code></pre></div></div>

<p>but there is syntax error due to unescaped character (apostrophe / single quote symbol).</p>

<p>Single quote symbol denotes opening and closing a string; adding redundant character of this type closes string prematurely and opens another one without proper closing.</p>

<p>So when you put some special character into input box of a web application and the result is similiar to this message: <code class="language-plaintext highlighter-rouge">Incorrect syntax near...</code>,
it might suggest that the input had been directly injected into some backend SQL query.
And, apparently, this input character was executed as part of SQL query, which ended with SQL error message.
The conclusion: a bad actor (as bad as the actors from Mick Herron’s Slough House series) can try to tinker with this vulnerability, 
looking for an opportunity to execute SQL injection attack.</p>

<p>There is a way to get rid of this error, commenting the rest of query, that should not be executed with two dashes:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- this is SQL comment</span>
</code></pre></div></div>

<blockquote>
  <p>Note, that in some flavours (e.g. MySQL, MariaDB), the dashes should be followed by a new line character. See documentation.</p>
</blockquote>

<p>With the comment trick, the URL will look like <code class="language-plaintext highlighter-rouge">https://somecompany.com/customers/search?first_name=malicious'-- </code>,
which triggers this query underneath:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">last_name</span> <span class="k">LIKE</span> <span class="s1">'%malicious'</span><span class="c1">-- %' AND active=true</span>
</code></pre></div></div>

<p>(Kramdown in Jekyll has been set to MySQL, so in a Markdown file we can see that with a trailing space it works perfectly fine. It should look fine
also for other plugins.)</p>

<p>By commenting the end of query, the second part of <code class="language-plaintext highlighter-rouge">WHERE</code> condition has been disabled.</p>

<p>As a result, the query returns all customers with given name (just put something instead of <code class="language-plaintext highlighter-rouge">malicious</code>), disregarding their department.</p>

<p>Let’s add infamous <code class="language-plaintext highlighter-rouge">OR 1=1--</code> to the URL:</p>

<p><code class="language-plaintext highlighter-rouge">https://somecompany.com/customers/search?first_name=malicious' OR 1=1-- </code></p>

<p>so that we trigger another SQL query through above-mentionned Java method (which is, by the way, a strong evidence of bad practice):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">last_name</span> <span class="k">LIKE</span> <span class="s1">'%malicious'</span> <span class="k">OR</span> <span class="mi">1</span><span class="o">=</span><span class="mi">1</span><span class="c1">-- %' AND active=true</span>
</code></pre></div></div>

<p>As the <code class="language-plaintext highlighter-rouge">OR</code> condition is tautology (always true), the database will return all customers irrespectively of their name.</p>

<p>By adding a boolean SQL command, evaluated to true, and attaching it to the <code class="language-plaintext highlighter-rouge">WHERE</code> clause, one can make the query always true. Open sesame!</p>

<p>Given that <code class="language-plaintext highlighter-rouge">OR</code> clause has precedence, the <code class="language-plaintext highlighter-rouge">WHERE</code> condition will be skipped and the query executed because of <code class="language-plaintext highlighter-rouge">true</code> condition after <code class="language-plaintext highlighter-rouge">OR</code>.</p>

<h3 id="methods-of-sql-injection-attack">Methods of SQL injection attack</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Union-based</li>
</ul>

<p>Using previously discussed mechanisms, in this scenaro <code class="language-plaintext highlighter-rouge">UNION</code> clause is attached to the initial query. By trial and error procedure
(using brute force or <code class="language-plaintext highlighter-rouge">ORDER BY</code> command), an attacker will know the numbers of columns
in a given table. He will also discover database metada from <code class="language-plaintext highlighter-rouge">information_schema.tables</code> (in some database engines), such as name of tabels and their columns.
During following steps, the perpetrator can see the content of database, including user logins, passwords (hased or not) and the like.
I will not paste examples of SQL queries nor detailed instruction, as it is easily accessible in the internet and in the books.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Error-based</li>
</ul>

<p>Error messages can contain information useful to execute the attack. An example is casting string to integer or concatenating character to the result of version function.
Actual syntax depends on SQL dialect.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Blind (content-based)</li>
</ul>

<p>Using boolean condition and substring function, blind attack is similar to bruteforcing. Its goal is to get the table content by fetching entries by single characters.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Blind (time-based)</li>
</ul>

<p>The fundamentals for this type of SQL injection is to check time between request and response.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Stacked queries</li>
</ul>

<p>In Java, <code class="language-plaintext highlighter-rouge">java.sql.Statement.executeQuery()</code> method allows to execute only a single query.
If other possibility has been enabled in the database, a malicious actor can attach another SQL query to the initial one, executing SQL injection attack.
For example, by attaching <code class="language-plaintext highlighter-rouge">DROP DATABASE users--</code> command.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Login bypassing, saving &amp; loading a file, execution of OS command, and the like</li>
</ul>

<p>In badly secured applications, where stored passwords are not hashed, it is possible to bypass login gateway.
A hacker can also modify / update data, execute code remotely, load or save a file on a disk, and finally, inject commands into OS.</p>

<h3 id="how-to-detect-vulnerability">How to detect vulnerability?</h3>

<p>At the very basic level, SQL injection vulnerabilities can be noticed on the front-end side (UI) and by REST API behaviour (given that the REST API
fetches data from the database). Playing with web application search engines and endpoints through UI and http calls might be useful.
Use examples described earlier or find more SQL injection test cases for this purpose.</p>

<p>At the same time, the code base should be checked for vulnerabilities. 
Linters, code-style checkers, security scanners may detect potential dangers.
As we could see earlier, the main problem lies in String concatenation.</p>

<h3 id="mitigation---query-parametrization-prepared-statements-stored-procedures">Mitigation - query parametrization, prepared statements, stored procedures</h3>

<p>From programmer’s point of view, SQL injection can be mitigated using query parametrization, where input is not literally treated as part of SQL, but as a separate variable of type String,
which is substituted into wild cards of pre-prepared SQL query in a way that seriously limits the possibility of SQL injection.
In this solution, there is no String concatenation, so the risk is lower.</p>

<p>In Java, we can also use PreparedStatement. The refactored method looks like this:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">java.sql.Connection</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.DriverManager</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.PreparedStatement</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.ResultSet</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.sql.SQLException</span><span class="o">;</span>

<span class="kd">public</span> <span class="kd">class</span> <span class="nc">SafeDatabaseQueryExecutor</span> <span class="o">{</span>

    <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">executeSearchQuery</span><span class="o">(</span><span class="nc">String</span> <span class="n">lastName</span><span class="o">)</span> <span class="o">{</span>
        <span class="nc">Connection</span> <span class="n">connection</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
        <span class="nc">PreparedStatement</span> <span class="n">preparedStatement</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>

        <span class="k">try</span> <span class="o">{</span>
            <span class="c1">// Set up the database connection (replace with your database URL, username, and password)</span>
            <span class="nc">String</span> <span class="n">dbUrl</span> <span class="o">=</span> <span class="s">"jdbc:mysql://your-database-url"</span><span class="o">;</span>
            <span class="nc">String</span> <span class="n">dbUser</span> <span class="o">=</span> <span class="s">"your-username"</span><span class="o">;</span>
            <span class="nc">String</span> <span class="n">dbPassword</span> <span class="o">=</span> <span class="s">"your-password"</span><span class="o">;</span>

            <span class="n">connection</span> <span class="o">=</span> <span class="nc">DriverManager</span><span class="o">.</span><span class="na">getConnection</span><span class="o">(</span><span class="n">dbUrl</span><span class="o">,</span> <span class="n">dbUser</span><span class="o">,</span> <span class="n">dbPassword</span><span class="o">);</span>

            <span class="c1">// Construct the SQL query with a placeholder for the last name</span>
            <span class="nc">String</span> <span class="n">sqlQuery</span> <span class="o">=</span> <span class="s">"SELECT * FROM customer WHERE last_name LIKE ? AND department = 1"</span><span class="o">;</span>
            <span class="n">preparedStatement</span> <span class="o">=</span> <span class="n">connection</span><span class="o">.</span><span class="na">prepareStatement</span><span class="o">(</span><span class="n">sqlQuery</span><span class="o">);</span>

            <span class="c1">// Set the parameter for the prepared statement</span>
            <span class="n">preparedStatement</span><span class="o">.</span><span class="na">setString</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="s">"%"</span> <span class="o">+</span> <span class="n">lastName</span> <span class="o">+</span> <span class="s">"%"</span><span class="o">);</span>

            <span class="c1">// Execute the SQL query</span>
            <span class="nc">ResultSet</span> <span class="n">resultSet</span> <span class="o">=</span> <span class="n">preparedStatement</span><span class="o">.</span><span class="na">executeQuery</span><span class="o">();</span>

            <span class="c1">// Process the results (replace with your specific logic)</span>
            <span class="k">while</span> <span class="o">(</span><span class="n">resultSet</span><span class="o">.</span><span class="na">next</span><span class="o">())</span> <span class="o">{</span>
                <span class="c1">// Retrieve and process data here</span>
            <span class="o">}</span>
        <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
            <span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
        <span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
            <span class="c1">// Close resources in the reverse order of their creation</span>
            <span class="k">try</span> <span class="o">{</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">preparedStatement</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">preparedStatement</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
                <span class="o">}</span>
                <span class="k">if</span> <span class="o">(</span><span class="n">connection</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
                    <span class="n">connection</span><span class="o">.</span><span class="na">close</span><span class="o">();</span>
                <span class="o">}</span>
            <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="nc">SQLException</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span>
                <span class="n">e</span><span class="o">.</span><span class="na">printStackTrace</span><span class="o">();</span>
            <span class="o">}</span>
        <span class="o">}</span>
    <span class="o">}</span>
<span class="o">}</span>

</code></pre></div></div>

<p>Stored procedures are predefined queries stored on the database server, allowing repeated execution across various applications. 
When implemented appropriately, stored procedures provide a level of defense against SQL injection attacks comparable to that of prepared statements. 
Controlling permissions for executing a stored procedure enables us to regulate access and, if needed, limit direct interaction with the underlying table, 
thus mitigating the potential impact of SQL injection.</p>

<p>Similar to prepared statements, stored procedures support parameterized queries, treating user input as data rather than executable SQL code. 
Moreover, the database automatically sanitizes the parameters transmitted to the procedure, preventing malicious code from being injected by potential attackers.</p>

<blockquote>
  <p>Stored procedures, like SQL queries, can be vulnerable to injection. 
To prevent this, parameterize the stored procedure queries instead of concatenating parameters.</p>
</blockquote>

<p>Do not concatenate query in Stored procedures:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="k">SET</span> <span class="o">@</span><span class="k">Statement</span> <span class="o">=</span> <span class="n">CONCAT</span><span class="p">(</span><span class="s1">'SELECT * FROM customer WHERE name = '</span><span class="p">,</span> <span class="n">customer_name</span><span class="p">,</span> <span class="s1">' );
</span></code></pre></div></div>

<p>I would rather go for prepared statement with parameter:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREPARE</span> <span class="k">statement</span> <span class="k">FROM</span> <span class="s1">'SELECT * FROM customer WHERE name = ?'</span><span class="p">;</span>
</code></pre></div></div>

<p>More on avoiding SQL injection in Java and other languages (e.g. Python): <a href="https://bobby-tables.com/java">Bobby Tables</a> and <a href="https://owasp.org/www-community/attacks/SQL_Injection">OWASP</a></p>

<h3 id="frontend-vs-backend-input-validation-both">Frontend vs backend input validation? Both!</h3>

<p>Sanitization involves the removal of undesirable characters (such as curly braces, quotes, slashes and backslashes) or unsafe code from user-supplied data. 
Validation, on the other hand, ensures that user-supplied data adheres to the expected format defined by the database. 
For instance, we can verify input length and reject excessively long inputs, as well as enforce specific formats for email addresses and dates. 
This approach effectively thwarts attackers attempting to submit specially crafted input values containing malicious SQL statements.</p>

<p>While sanitizing and validating input contributes to controlling the input in SQL queries, it’s important to note that it’s not foolproof. 
Attackers may employ techniques like double encoding to circumvent these safeguards.</p>

<p>Front-end validation is useful and user-friendly. User receives immediate information if an input is invalid (along with the reason - at least in well-designed applications).
Thanks to that the backend does not need to handle multiple incorrect request coming from the UI.</p>

<p>But front-end validation can be bypassed by a bad actor (malicious user or attacker). 
Validation on the front-end side will not secure against SQL injections.
Angular, JavaScript, TypeScript and other front-end code seen in the browser developer’s tools as script can be manipulated and exploited.</p>

<p>Not to mention that in client-server architecture one may execute direct HTTP call (e.g. via Postman or <code class="language-plaintext highlighter-rouge">curl</code>) 
to the backend application, bypassing UI validation. It would deliver a payload that has not been validated yet to the backend API.
Such payload could contain SQL query to execute during the attack on the backend side.</p>

<p>In fact, front-end validation should always be paired with backend validation, like Java &amp; Spring Validation API.</p>

<h3 id="whitelisting-vs-blacklisting-dont-rely-on-blacklisting">Whitelisting vs blacklisting: don’t rely on blacklisting</h3>

<p>A whitelist (allowlist) enables us to establish precise rules that exclusively permit specific characters or patterns in the input, ensuring the rejection of any malicious input.</p>

<p>Compared to a blacklist (denylist), a whitelist proves to be a superior strategy for thwarting SQL injection attacks. 
By explicitly defining the permissible input types, a whitelist leaves less room for maneuvering, unlike a blacklist, which can be circumvented by attackers through input variation. 
In essence, a whitelist provides greater control over the accepted input.</p>

<p>Require proper formatting for text, data and numerical values. Use selecting option (drop-down, calendars) where possible.</p>

<h3 id="restricted-access---principle-of-the-least-privilege-polp">Restricted access - Principle of the least privilege (PoLP)</h3>

<p>According to this well-known rule:</p>

<blockquote>
  <p>Every module (such as a process, a user, or a program, depending on the subject) 
must be able to access only the information and resources that are necessary for its legitimate purpose.
 Saltzer, Jerome H.; Schroeder, Michael D. (1975). “The protection of information in computer systems”</p>
</blockquote>

<p>Ensure that database accounts used by your application have the least necessary privileges to reduce the impact of a successful attack.</p>

<p>When establishing a database user for your application, it’s essential to carefully consider the privileges assigned to that user. 
For instance, does your application necessitate full access to read, write, and modify all databases? 
Should it have the authority to truncate or drop tables? By restricting your application’s access to the database, you can mitigate the potential impact of SQL injection attacks. Instead of relying on a single database user for your application, 
it is advisable to create multiple database users and associate them with specific application roles. 
Security vulnerabilities often propagate like a chain reaction, so it’s imperative to remain vigilant about each link in the chain to prevent significant harm.</p>

<h3 id="database-hardening">Database hardening</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />An app should not have admin privileges when connecting to the database. Even is someone injects some malicious code, chances are the damage will be limited.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Users should be separated when connecting to given database, even within one application. Preferably, databases should be separated also. This should be doable in microservices. 
Thus, SQL injection to one table most probably won’t extend to other tables, and breaking into one database will not hurt the others.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Disable <strong>stacked queries</strong> so that another SQL query would not be attached to the initial one during the attack. It will be more tedious and time-consuming
to fetch or alter the data step by step than to delete a table or a database at once.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Check given database engine for vulnerabilities. Disable dangerous options.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Database should not have <code class="language-plaintext highlighter-rouge">root</code> permissions in the operating systems</li>
</ul>

<h3 id="orm-layer">ORM layer</h3>

<p>Object-relational mapping (ORM) layer can also be your line of defense.</p>

<p>But do not shoot yourself in the foot: easy-to-use Hibernate framework applied with superficial knowledge only can be far less efficient than low-level JPA &amp; JDBC solutions, not to mention pure SQL written by an expert in given database engine!</p>

<p>An ORM layer translates data between the database and objects bidirectionally, reducing explicit SQL queries and minimizing the risk of SQL injection. 
However, when custom queries are needed, Hibernate in Java introduces Hibernate Query Language (HQL), requiring careful use of the createQuery() function to mitigate injection risks. 
Despite the benefits, it’s crucial to acknowledge that ORM libraries must convert logic back to SQL statements, necessitating trust in proper parameter escaping. 
To ensure the absence of SQL injection vulnerabilities, regularly scan for known weaknesses and avoid outdated library versions.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data, security" /><category term="SQL, database, data, persistency, security" /><summary type="html"><![CDATA[SQL injection, query parametrization, prepared statements, input validation.]]></summary></entry><entry><title type="html">SQL security basics</title><link href="https://mzacki.github.io/sql-security-1/" rel="alternate" type="text/html" title="SQL security basics" /><published>2023-09-11T20:23:00+00:00</published><updated>2023-09-11T20:23:00+00:00</updated><id>https://mzacki.github.io/sql-security-1</id><content type="html" xml:base="https://mzacki.github.io/sql-security-1/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-7">Advanced SQL for Java developers: coursor, function, index.</a></p>

<h3 id="normalized-database-vs-denormalized-database">Normalized database vs denormalized database</h3>

<p>Normalized database is optimized for minimizing redundancy, not for lowest possible read time.
Such database contains many tables, uses joins and rather complex queries etc.</p>

<p>In a normalized database, data are organized into multiple related tables. 
Each table is designed to store a specific type of data, and relationships between tables are established through foreign keys.</p>

<p>Storage efficiency is better as data are stored in most space-efficient manner.
Read efficiency is harder to achieve, when it retrieves data from multiple related tables. 
While this can be computationally expensive, it allows for flexibility in querying the data.</p>

<p>With proper normalization, data consistency is usually easier to maintain, as changes to data only need to be made in one place (the corresponding table).
Normalized databases are typically favored for systems where data integrity and consistency are critical, such as financial and transactional systems.</p>

<p>Denormalized databases is optimized for read time, not for minimizing redundancy.
Such database contains as many columns in one table as possible. Here, there is no need to create more tables, but smaller ones.
It does not look like a clear solution, but it is faster. Data are stored in fewer tables, 
and there may be duplication of data across tables. This is done to reduce the need for JOIN operations.
However, denormalization can be a valid design choice if it serves specific performance needs.
Denormalization can be more storage-intensive because it may involve redundancy.</p>

<h3 id="security-in-normalized-and-denormalized-database">Security in normalized and denormalized database</h3>

<p>Normalization aims to minimize data redundancy and ensure data integrity.
By breaking data into smaller, related tables, it reduces the risk of data anomalies, such as insertion, update, and deletion anomalies.
Normalized databases are typically favored for systems where data integrity and consistency are critical, such as financial and transactional systems.</p>

<p>In normalized databases, the primary concern is managing data relationships, and security measures should focus on access controls, 
auditing, and preventing unauthorized changes to the database schema, as the structure is more complex.</p>

<p>Denormalization can lead to some loss of data integrity, as redundancy increases the risk of anomalies. 
Here, data duplication can be a problem.</p>

<p>In denormalized databases, security measures need to consider data duplication, as the same data might exist in multiple places. 
Special attention must be paid to keeping all copies of the data secure and ensuring consistency in access controls.</p>

<h3 id="integrity">Integrity</h3>

<p>In SQL databases, data integrity refers to the accuracy and consistency of data stored in the database. 
There are several types of data integrity in SQL databases, each serving a specific purpose. 
These integrity constraints help ensure that data remains reliable and valid.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Entity integrity: No duplicate rows exist in a table.</li>
</ul>

<p>Entity integrity ensures that each row (record) in a table is uniquely identifiable, typically through a primary key. 
This means that each row must have a unique value in its primary key column, preventing duplicate records in the table.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Domain integrity: Restricting the type of values that one can insert in order to enforce correct values (in Java, for example, using enums may be helpful).</li>
</ul>

<p>Domain integrity enforces that data values in a column meet specific criteria, such as data type, format, and allowable values. 
Common examples include ensuring that a date column contains valid dates or that an integer column contains only whole numbers.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Referential integrity: Records that are used by other records cannot be deleted (using constraints).</li>
</ul>

<p>Referential integrity establishes and enforces relationships between tables through foreign keys. 
It ensures that data in a foreign key column in one table corresponds to data in the primary key column of another table. 
This constraint prevents orphaned records and maintains the consistency of relationships.
Here, for example, you won’t be able to delete a record from one table which is related to another table via constraint.
Either you delete both of them, or none.</p>

<p>Cascading actions, such as CASCADE DELETE and CASCADE UPDATE, are often associated with referential integrity. 
They define what should happen to related records when a referenced record is deleted or updated. Cascading actions can help maintain data consistency.</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Custom integrity / custom constraints</li>
</ul>

<p>User-defined integrity allows users to define custom constraints or business rules to maintain data integrity. 
This can include rules specific to a particular application or domain, ensuring that data adheres to business logic:
validation rules, data calculations, and workflow-related checks.</p>

<p>Triggers are event-driven actions that can be executed automatically in response to changes in the database. 
Triggers can enforce custom data integrity rules and actions.</p>

<p>Combination of integrity requirements may happen, 
\like domain key integrity that combines elements of both domain and entity integrity by ensuring that the primary key values in a table are unique and also meet domain constraints, 
such as data type and format requirements.</p>

<h3 id="integrity-and-security">Integrity and security</h3>

<p>Database integrity, in the context of security, refers to the fundamental aspect of ensuring the accuracy, consistency, and reliability of the data stored in a database as a means to enhance data security.</p>

<p>Ensuring data integrity means that data stored in the database is accurate and free from errors. Accuracy is crucial for making informed decisions and avoiding security incidents that could arise from erroneous data.</p>

<p>Consistency prevents data anomalies that might be exploited for security breaches.</p>

<p>Part of data integrity involves validating and sanitizing data input to the database. 
This practice minimizes the risk of SQL injection and other security vulnerabilities that could compromise the integrity and security of the database.</p>

<p>Data integrity helps prevent unauthorized changes or tampering with the data. This safeguards against both accidental and malicious alterations to the data.</p>

<p>Data corruption can lead to security risks, as corrupted data might have unpredictable consequences on the application.
Data integrity measures help minimize the risk of data corruption by ensuring that data is stored consistently and accurately.</p>

<p>Even within an organization, there is a risk of insider threats. Data integrity measures, such as access controls and audit trails, 
help detect and prevent unauthorized access, alterations, or exfiltration of data by employees or insiders.</p>

<p>Last but not least, maintaining data quality, ensuring data recovery and continuity are also important feature of integrity that serves the purpose of security.
n the event of a security incident or data breach, maintaining data integrity ensures that backup and recovery processes can restore a reliable and consistent database state. 
Data integrity is essential for business continuity and disaster recovery planning.</p>

<p>In summary, database integrity plays a crucial role in data security by ensuring the accuracy, consistency, and reliability of the data stored in the database. When data is trustworthy, it reduces the likelihood of security incidents, minimizes vulnerabilities, and supports the overall security of the database and the applications that rely on it. 
Data integrity and security measures work hand in hand to protect sensitive information and maintain the integrity of the database.</p>

<h3 id="idempotent-vs-deterministic-function">Idempotent vs deterministic function</h3>

<p>Although sometimes both terms are mistaken or not clear enough, these are two different concepts.</p>

<p>An idempotent function is a fuction that has the same effect when applied multiple times. 
No matter if executed once or more, the result is the same. It is used to ensure that a specific operation is performed only once, even if it is requested multiple times.
An example of an idempotent function is SQL <code class="language-plaintext highlighter-rouge">DELETE</code>. Deleting a resource one time or multiple times has the same result: the resource has been deleted.</p>

<p>In mathematics: <code class="language-plaintext highlighter-rouge">f(f(x)) = f(x)</code></p>

<p>On the other hand, a deterministic function is a function in which the output is completely determined by the input. 
In other words, given the same input, a deterministic function will always produce the same output, making it predictable and consistent.</p>

<p>For each input x, there is only one corresponding output y, so it’s simply <code class="language-plaintext highlighter-rouge">f(x)</code>.</p>

<p>Deterministic functions are commonly used in various fields, including computer science, cryptography, and databases. 
They are valuable for ensuring data consistency and predictability, as they guarantee that the same input will always result in the same output.</p>

<p>A common use case of deterministic functions is <strong>unit testing</strong>: for the same input data, their result of unit tests is always the same. There should not be any other factors impacting the result.
If such test starts to fail, it means the code has been broken.</p>

<p>In SQL, examples of deterministic functions are mathematical and date and time functions: <code class="language-plaintext highlighter-rouge">SELECT ABS(-9)</code>, <code class="language-plaintext highlighter-rouge">SELECT 2 - 2</code>, <code class="language-plaintext highlighter-rouge">SELECT DATEADD(day, 5, '2023-01-01')</code>.</p>

<p>Non-deterministic SQL functions are getting random number, generating UUID, selecting current user etc.</p>

<h3 id="idempotent-and-deterministic-sql-functions---implications-for-security">Idempotent and deterministic SQL functions - implications for security</h3>

<p>Idempotent functions can help prevent data corruption and security incidents. 
For example, idempotent SQL DELETE or UPDATE operations ensure that critical data is not accidentally or maliciously deleted or modified multiple times.
Idempotent functions are often used within transactions to ensure that a sequence of operations can be safely retried without introducing data inconsistencies
(atomic operations).</p>

<p>One of their features is preventing unwanted side effects. Idempotent functions follows <strong>Security Through Predictable Behavior</strong> principle. 
The predictability of idempotent functions can help prevent unwanted side effects or actions that could lead to security incidents. 
When a function’s behavior is consistent, it is easier to anticipate and control its impact on the database.
In disaster recovery and backup scenarios, idempotent operations can be valuable for restoring the database to a known state without introducing additional inconsistencies.</p>

<p>Deterministic functions are critical for data integrity and help in auditing and compliance efforts. Deterministic functions are commonly used in cryptographic operations.
They ensure consistent encryption and decryption, which is essential for data security.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data, security" /><category term="SQL, database, data, persistency, security" /><summary type="html"><![CDATA[Security in normalized and denormalized database, integrity, idempotent vs deterministic functions.]]></summary></entry><entry><title type="html">Database monitoring, SQL optimization &amp;amp; transactions</title><link href="https://mzacki.github.io/sql-transactions/" rel="alternate" type="text/html" title="Database monitoring, SQL optimization &amp;amp; transactions" /><published>2023-09-01T08:00:00+00:00</published><updated>2023-09-01T08:00:00+00:00</updated><id>https://mzacki.github.io/sql-transactions</id><content type="html" xml:base="https://mzacki.github.io/sql-transactions/"><![CDATA[<h3 id="what-is-query-execution-plan">What is query execution plan?</h3>

<p>A query plan, also known as an execution plan or query execution plan, is a detailed, step-by-step blueprint that the database management system (DBMS) uses to execute a specific SQL query. 
The query plan is generated by the query optimizer, a component of the DBMS responsible for determining the most efficient way to execute a query based on the database schema, indexes, statistics, and other factors.</p>

<p>The query plan provides insights into how the DBMS will retrieve and process the data to satisfy the query, including details on which indexes will be used, the order of table access, and the algorithms employed for sorting and joining data. 
Understanding and analyzing the query plan can be crucial for optimizing the performance of SQL queries.</p>

<h3 id="how-to-find-a-query-execution-plan">How to find a query execution plan?</h3>

<p>Many relational database systems support the <code class="language-plaintext highlighter-rouge">EXPLAIN</code> statement, which provides information about how a query will be executed without actually executing it.
For example:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">EXPLAIN</span> <span class="k">SELECT</span> <span class="n">column1</span><span class="p">,</span> <span class="n">column2</span> <span class="k">FROM</span> <span class="n">my_table</span> <span class="k">WHERE</span> <span class="n">column1</span> <span class="o">=</span> <span class="s1">'empty'</span> <span class="k">AND</span> <span class="n">column2</span> <span class="o">=</span> <span class="s1">'non_empty'</span><span class="p">;</span>
</code></pre></div></div>

<p>Different database systems have specific commands to obtain query plans. 
For example:</p>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />PostgreSQL: use EXPLAIN or EXPLAIN ANALYZE</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />MySQL: use EXPLAIN or EXPLAIN EXTENDED</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />SQL Server: use SHOWPLAN_XML</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />MariaDb: EXPLAIN with many more options…</li>
</ul>

<p>Feel free to check documentation / manual for given SQL flavour. Almost all is there!</p>

<p>Some database management tools provide graphical representations of query plans, for example, <a href="https://github.com/dbeaver/dbeaver/wiki/Query-Execution-Plan">DBeaver</a></p>

<p>IntelliJ IDEA supports <a href="https://www.jetbrains.com/help/idea/query-execution-plan.html">two types of execution plans</a>:</p>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Explain Plan: the result is shown in a mixed tree and table format on a dedicated Plan tab.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Explain Plan (Raw): the result is shown in a table format.</li>
</ul>

<h3 id="how-to-interpret-query-execution-plan">How to interpret query execution plan?</h3>

<p>Table access: look for information on how tables are accessed, including whether full table scans or index scans are used. 
Consider whether indexes are being utilized effectively.</p>

<p>Joins: check how joins between tables are executed. Different join algorithms (nested loops, hash joins, merge joins) have different performance characteristics. 
The choice of the join algorithm depends on the size of the tables and the available indexes.</p>

<p>Filter predicates: examine the conditions used to filter rows. 
Ensure that indexes are used for selective conditions and that the query is leveraging the available statistics.</p>

<p>Sorting and group operations: check for any sorting or grouping operations. Determine if the query plan is using indexes or other methods to satisfy these operations.</p>

<p>Index usage: verify that indexes are being used efficiently. Check if the indexes cover the columns needed for the query and if they are selective.</p>

<p>Parallel execution: some query plans may involve parallel execution, where multiple processes are used to speed up the query. Understand if and how parallelism is being employed.</p>

<h3 id="use-flame-graph">Use flame graph!</h3>

<p>In IntelliJ Idea, Flame Graph is a part of Query Execution Plan feature.
A flame graph in the context of SQL typically refers to a visualization technique used for profiling and analyzing the performance of SQL queries or database operations. 
While flame graphs, in the broader sense, are often associated with the visualization of stack traces in programming, a flame graph in the SQL domain focuses on representing the execution flow and time distribution of SQL queries.
Sometimes flame graphs represents not only SQL queries, but also REST API requests associated with them along with microservices that handle the complete excecution flow (for example, in DataDog).</p>

<h3 id="get-familiar-with-database-profiler">Get familiar with database profiler</h3>

<p>A database profiler is a tool or feature provided by database management systems (DBMS) to capture and analyze information about the execution of SQL queries and operations against the database. 
Profilers are valuable for performance tuning, optimization, and troubleshooting, as they allow database administrators and developers to identify bottlenecks, inefficient queries, 
and areas for improvement in the database system.</p>

<p>Profilers provide access to <strong>query execution plans</strong>, but they offer much more features.</p>

<p>Profilers capture detailed <strong>statistics about the execution of SQL queries</strong>, including the time taken for execution, resource usage (CPU, memory, disk I/O), and the number of rows affected.</p>

<p>They can report on <strong>locking and blocking issues</strong>, helping to identify situations where transactions are contending for the same resources and causing delays.
Profilers provide information about the start and end of <strong>transactions</strong>, as well as the duration and resource consumption of transactions.
This can be essential for understanding the impact of transactions on overall system performance.
You will find details about active database <strong>sessions and connections</strong>, including the users accessing the database, the duration of their sessions, and the resources they are consuming.
Profilers may <strong>log errors and exceptions</strong> encountered during query execution.
Some of them offer real-time monitoring capabilities.</p>

<p>Finally, profilers may support the creation of triggers or events that automatically capture information when specific conditions are met. 
For example, a profiler might capture information whenever a query takes longer than a defined threshold.</p>

<h3 id="how-to-optimize-database">How to optimize database?</h3>

<p><strong>Use indexes wisely</strong></p>

<p>Ensure that tables are appropriately indexed based on the queries being executed. Analyze if the existing indexes are being utilized effectively.</p>

<p><strong>Update statistics</strong></p>

<p>Regularly update table statistics to provide the query optimizer with accurate information about the distribution of data in tables.</p>

<p><strong>Consider query rewriting</strong></p>

<p>In some cases, rewriting the query or restructuring the schema can lead to more efficient query plans.</p>

<p><strong>Avoid functions on indexed columns</strong></p>

<p>Avoid using functions on columns involved in WHERE clauses, as it may prevent the use of indexes.</p>

<p><strong>Thoroughly check joins</strong></p>

<p>Ensure that join conditions are well-defined and that indexes are available for columns used in join conditions.</p>

<p><strong>Review transactions</strong></p>

<p>Think about transactions: are they used effectively? Is locking strategy adequate to the purpose?</p>

<h3 id="what-is-transaction">What is transaction?</h3>

<p>In SQL, a transaction is a sequence of one or more SQL statements that are executed as a single, indivisible unit of work.
The properties of a transaction are often described by the ACID properties, which stand for Atomicity, Consistency, Isolation, and Durability.</p>

<p>In general, transaction phases are:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />acquiring lock</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />read</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />update</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />validation</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />commit</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />rollback</li>
</ul>

<h3 id="when-to-use-transactions">When to use transactions?</h3>

<p>Transactions are good solution in following situations:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />to perform multiple database operations as a cohesive unit, such as updating multiple tables, inserting records, or deleting data</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />enforce data integrity and consistency</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />financial, banking, e-commerce</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />concurrency</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />where a large number of records need to be updated or processed</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />offline / online synchronization</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />when rollback is needed as an option</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />for isolation</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />complex operations (multiple steps)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />ensuring durability of data</li>
</ul>

<h3 id="when-not-to-use-transactions">When not to use transactions?</h3>

<p>In these cases, avoid transactions:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />simple read-only operations</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />where high concurrency is a top priority and conflicts are unlikely</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />individual, independent operations</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />performance-critical scenarios</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />where data is cached or denormalized</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />bulk data loading or large-scale batch processing</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />non-critical data, short-lived operations</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />logging &amp; auditing</li>
</ul>

<h3 id="deadlock">Deadlock</h3>

<p>A deadlock in SQL occurs when two or more transactions are blocked,
each waiting for the other to release a lock on a resource, resulting in a circular waiting condition.
It must be avoided at all cost.</p>

<h3 id="optimistic-lock">Optimistic lock</h3>

<p>Locking is a way of preventing lost update. Optimistic lock checks whether a value to be updated has not been changed since last read.
The optimistic locking approach allows multiple transactions to proceed with their operations without acquiring locks on the data.
Instead, it relies on a mechanism to detect conflicts and resolve them at the time of committing the changes.</p>

<p>During read phase, the transaction records some form of a version identifier associated with the data (e.g., a timestamp, a version number, a hash value).</p>

<p><strong>It does not acquire any lock</strong>.</p>

<p>First, it reads data (1) and records some form of a version identifier associated with the data (e.g., a timestamp, a version number, a hash value).</p>

<p>Then starts the second phase: update (2).</p>

<p>During validation phase (3), it checks for any modifiactions done by another transaction in the meantime.
This is typically done by comparing the recorded version identifier with the current version of the data.</p>

<p>Commit / rollback phase (4): if no changes, do commit. If there are changes, perform rollback or conflict resolution.</p>

<h3 id="when-optimistic-locking-is-a-good-strategy">When optimistic locking is a good strategy?</h3>

<p><strong>With high concurrency requirements</strong>: in scenarios where high levels of concurrent access to the data are crucial, optimistic locking can be more suitable. It allows multiple transactions to read and modify data concurrently, reducing contention and increasing overall system performance.</p>

<p><strong>With low risk of conflicts:</strong> when the likelihood of conflicts between transactions is low, optimistic locking is an efficient choice. If the data is not frequently updated by multiple transactions simultaneously, the overhead of acquiring and releasing locks may be unnecessary.</p>

<p><strong>For short transactions:</strong> optimistic locking is well-suited for short-duration transactions where the time between reading and updating the data is minimal. Short transactions reduce the window during which conflicts might occur, making it less likely for two transactions to modify the same data concurrently.</p>

<p><strong>When optimizing read-heavy workloads:</strong> in situations where the workload is predominantly read-heavy, and write operations are infrequent, optimistic locking can be effective. Readers are not impeded by locks, and conflicts during write operations are addressed when they occur.</p>

<p><strong>To reducing lock contention:</strong> optimistic locking helps in reducing lock contention (<strong>competition for acquiring locks</strong>). By allowing multiple transactions to read data simultaneously and <strong>only checking for conflicts at the time of update</strong>, contention is minimized.</p>

<p>Optimistic locking is often <strong>more compatible with distributed systems</strong>. In scenarios where data is distributed across multiple nodes or databases, acquiring locks might be challenging or impractical. Optimistic locking allows for a more decentralized approach.</p>

<p>Optimistic locking is commonly used in scenarios where data may be edited <strong>offline</strong>, and changes need to be merged with the central database. Each offline editor can make changes independently, and conflicts are resolved when attempting to merge the changes.
It seems to be the way the deck synchronization in Anki works.</p>

<p>It is also more scalable solution for systems with a large number of transactions and a desire to reduce the load on the database caused by acquiring and releasing locks.</p>

<h3 id="pessimistic-lock">Pessimistic lock</h3>

<p>It is another way of preventing lost update. Pessimistic lock explicitly forces other threads to wait until an update is done.</p>

<p>Lock acquisition (untill commit / rollback) is done in this strategy. In many cases, it involves <strong>exclusive locks</strong>. 
Another type of pessimistic is a <strong>shared lock</strong>, which might be later escalated to exclusive lock.</p>

<p><strong>Pessimistic locking may lead to deadlocks.</strong> Pessimistic locking is often associated with higher isolation levels, with more consistency and less concurrency.</p>

<h3 id="when-pessimistic-locking-is-a-good-strategy">When pessimistic locking is a good strategy?</h3>

<p>Where certain sections of code or database operations are <strong>critical</strong> and must be executed without interference from other transactions, pessimistic locking can be beneficial. This ensures that only one transaction at a time can access or modify the protected resource.</p>

<p>When maintaining <strong>data integrity is a top priority</strong>, pessimistic locking can be appropriate. For example, if an application enforces business rules that require consistency in data relationships, acquiring locks during transactions helps prevent concurrent modifications that could violate those rules.</p>

<p>In situations where transactions involve <strong>resource-intensive operations or complex calculations</strong>, pessimistic locking can be used to avoid conflicts and ensure that a transaction completes without interference from other transactions.
Also, when transactions involve **multiple steps or span different parts of the application, pessimistic locking can be used to ensure that the entire transaction is executed atomically without interference from other transactions.</p>

<p>Pessimistic locking is effective in <strong>preventing race conditions</strong>, where multiple transactions compete to read or modify the same data simultaneously. By acquiring locks, the system can control access and avoid conflicts.</p>

<p>In <strong>batch processing scenarios where large volumes of data are processed</strong>, pessimistic locking can help maintain order and prevent concurrent transactions from affecting each other. This is especially important when the order of processing is crucial.
Maintaining Consistency in Distributed Systems:</p>

<p>In <strong>distributed systems with shared resources</strong>, pessimistic locking can be used to ensure that only one node at a time makes modifications to a shared resource.</p>

<h3 id="what-is-acid">What is ACID?</h3>

<p>Transactions should be designed and implemented accordingly to ACID rules.</p>

<p>Atomicity (A) ensures that a transaction is treated as a single, indivisible unit of work. Either all the changes made within the transaction are committed to the database, or none of them are.
If any part of the transaction fails, the entire transaction is rolled back to its previous state.</p>

<p>Consistency (C) ensures that a transaction brings the database from one valid state to another.
The database must satisfy certain integrity constraints before and after the transaction.
If a transaction violates any integrity constraints, the database is left unchanged.</p>

<p>Isolation (I) ensures that the execution of one transaction is isolated from the execution of other transactions,  even if they are executed concurrently.</p>

<p>Durability (D) guarantees that once a transaction is committed, its effects are permanent and survive subsequent system failures.
The changes made by the transaction are stored in non-volatile storage (such as disk) and can be recovered even if the system crashes or restarts.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data, security" /><category term="SQL, database, data, persistency, security" /><summary type="html"><![CDATA[Query execution plan, profiler, flame graph, transactions, locking, ACID.]]></summary></entry><entry><title type="html">SQL cheatsheet: part 7</title><link href="https://mzacki.github.io/sql-cheatsheet-7/" rel="alternate" type="text/html" title="SQL cheatsheet: part 7" /><published>2023-08-14T20:23:00+00:00</published><updated>2023-08-14T20:23:00+00:00</updated><id>https://mzacki.github.io/sql-cheatsheet-7</id><content type="html" xml:base="https://mzacki.github.io/sql-cheatsheet-7/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-6">Advanced SQL for Java developers: procedure, view</a></p>

<h3 id="what-is-sql-coursor">What is SQL coursor?</h3>

<p>It allows you to retrieve and manipulate rows from a result set one at a time. Mainly used for iteration. Rows under cursor can be transformed (e.g. updated, deleted).
Other purposes: pagination, data validation.</p>

<h3 id="coursor-example-of-use">Coursor: example of use</h3>

<p>The provided code is a SQL script that creates a stored procedure in a database.
This stored procedure uses a cursor to iterate through records in a table and display the values of the columns in each row as it traverses them.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- db cursor is a kind of a pointer similar to Java iterator</span>
<span class="c1">-- implemented as stored procedure</span>
<span class="c1">-- cursor iterates through records one by one</span>
<span class="k">DELIMITER</span> <span class="err">$$</span>
<span class="k">CREATE</span> <span class="k">PROCEDURE</span> <span class="n">pointer</span><span class="p">()</span>
<span class="k">BEGIN</span>
    <span class="c1">-- variables holding values of colums in row that are currently being traversed by cursor</span>
    <span class="k">DECLARE</span> <span class="n">cursor_company_id</span> <span class="nb">INT</span><span class="p">;</span>
    <span class="k">DECLARE</span> <span class="n">cursor_company_name</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">255</span><span class="p">);</span>
    <span class="k">DECLARE</span> <span class="n">cursor_country_code</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">4</span><span class="p">);</span>
    <span class="c1">-- boolean variable (false / true flag) showing if cursor iteration is finished</span>
    <span class="k">DECLARE</span> <span class="n">iteration_completed</span> <span class="nb">BIT</span> <span class="k">DEFAULT</span> <span class="mi">0</span><span class="p">;</span>
    <span class="c1">-- cursor declaration</span>
    <span class="k">DECLARE</span> <span class="n">company_cursor</span> <span class="k">CURSOR</span> <span class="k">FOR</span>
    <span class="k">SELECT</span> <span class="n">company_id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">hq_country</span> <span class="k">FROM</span> <span class="n">company</span><span class="p">;</span>
    <span class="c1">-- handler of continue type launched when not found occurs</span>
    <span class="c1">-- not found means no more rows to iterate</span>
    <span class="c1">-- in case of not found flag is raised</span>
    <span class="k">DECLARE</span> <span class="k">CONTINUE</span> <span class="k">HANDLER</span> <span class="k">FOR</span> <span class="k">NOT</span> <span class="k">FOUND</span>
    <span class="k">SET</span> <span class="n">iteration_completed</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="c1">-- opening cursor and fetching first row, procedure starts</span>
    <span class="c1">-- first row is mapped to declared variables</span>
    <span class="k">OPEN</span> <span class="n">company_cursor</span><span class="p">;</span>
    <span class="k">FETCH</span> <span class="n">company_cursor</span> <span class="k">INTO</span> <span class="n">cursor_company_id</span><span class="p">,</span> <span class="n">cursor_company_name</span><span class="p">,</span> <span class="n">cursor_country_code</span><span class="p">;</span>
    <span class="c1">-- 'WHILE' loop (do as long as no empty rows)</span>
    <span class="n">WHILE</span> <span class="n">iteration_completed</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">DO</span>
    <span class="k">SELECT</span> <span class="n">cursor_company_id</span><span class="p">,</span> <span class="n">cursor_company_name</span><span class="p">,</span> <span class="n">cursor_country_code</span><span class="p">;</span> <span class="c1">-- displaying declared variables containing values currently being traversed by cursor</span>
    <span class="k">FETCH</span> <span class="n">company_cursor</span> <span class="k">INTO</span> <span class="n">cursor_company_id</span><span class="p">,</span> <span class="n">cursor_company_name</span><span class="p">,</span> <span class="n">cursor_country_code</span><span class="p">;</span> <span class="c1">-- map another row to declared variables, repeat the flow</span>
    <span class="k">END</span> <span class="n">WHILE</span><span class="p">;</span>
    <span class="c1">-- procedure ends, cursor closed</span>
    <span class="k">CLOSE</span> <span class="n">company_cursor</span><span class="p">;</span>
<span class="k">END</span>

</code></pre></div></div>
<p>Let’s go through it step by step.</p>

<p>DELIMITER \(: This statement changes the delimiter used in the SQL script to\). It allows you to define the stored procedure using multiple SQL statements within the procedure.</p>

<p>CREATE PROCEDURE pointer(): This line begins the definition of the pointer stored procedure. The procedure has no parameters.</p>

<p>BEGIN: This keyword marks the beginning of the procedure’s executable code block.</p>

<p>DECLARE statements: In this section, several local variables are declared for storing values from the rows as the cursor iterates through the result set. These variables include cursor_company_id, cursor_company_name, cursor_country_code, and iteration_completed.</p>

<p>cursor_company_id: It will hold the company_id value from the current row.
cursor_company_name: It will hold the name value from the current row.
cursor_country_code: It will hold the hq_country value from the current row.
iteration_completed: This is a boolean variable used to indicate whether the cursor iteration is finished. It’s initialized to 0 (false).
DECLARE company_cursor CURSOR FOR …: Here, a cursor named company_cursor is declared. The cursor is associated with a SELECT statement that retrieves data from the company table, specifically the company_id, name, and hq_country columns.</p>

<p>DECLARE CONTINUE HANDLER FOR NOT FOUND …: This line declares a handler for the NOT FOUND condition. It means that when the cursor reaches the end of the result set (no more rows to iterate), the iteration_completed variable will be set to 1, indicating that the cursor iteration is completed.</p>

<p>OPEN company_cursor;: This statement opens the cursor, allowing it to start iterating through the rows of the result set.</p>

<p>FETCH company_cursor INTO …: This line fetches the first row from the cursor’s result set and maps the values of the columns (company_id, name, and hq_country) to the corresponding declared variables.</p>

<p>WHILE iteration_completed = 0 DO … END WHILE;: This section of the code creates a WHILE loop. The loop will continue executing as long as the iteration_completed variable is 0. Inside the loop, it displays the values of the declared variables containing the current row’s data and then fetches the next row.</p>

<p>CLOSE company_cursor;: After the loop finishes, this statement closes the cursor to release resources.</p>

<p>END: This keyword marks the end of the stored procedure definition.</p>

<p>In summary, this stored procedure (pointer) iterates through the records of the company table using a cursor. 
It displays the values of each row’s columns as it traverses them. 
The loop continues until there are no more rows to fetch, at which point the cursor is closed, and the procedure ends.</p>

<h3 id="what-is-sql-coursor-for">What is SQL coursor for?</h3>

<p>In SQL, a cursor is a database object that allows you to retrieve and manipulate rows from a result set one at a time. 
Cursors are commonly used within stored procedures or other database objects to navigate through the records in a result set, perform operations on each record, and manage the flow of data processing.</p>

<p>Common use cases:</p>

<p>Iterating through records - cursors are used to loop through the rows returned by a query one by one, allowing you to perform actions on each row.</p>

<p>Processing and transforming data - cursors are helpful when you need to apply complex calculations, transformations, or business logic to individual rows within the result set.</p>

<p>Data validation and error handling - cursors can be used to validate data, perform data integrity checks, and handle exceptions or errors on a per-row basis.</p>

<p>Cursor-based pagination - cursors can be used for paginating large result sets. You fetch a certain number of rows at a time, improving performance and reducing memory consumption.</p>

<h3 id="various-kinds-of--sql-coursor">Various kinds of  SQL coursor</h3>

<p>There are multiple types of coursor, depending on SQL flavour:</p>

<p>Forward-only. This type of cursor can only navigate forward through the result set, making it suitable for read-only operations.</p>

<p>Scrollable cursors can move both forward and backward within the result set, allowing you to revisit previous rows.</p>

<p>A static cursor populates the result set at the time of cursor creation and the query result is cached for the lifetime of the cursor. 
A static cursor can move forward and backward direction. A static cursor is slower and use more memory in comparison to other cursor. 
Hence, you should use it only if scrolling is required and other types of cursors are not suitable.
No UPDATE, INSERT, or DELETE operations are reflected in a static cursor (unless the cursor is closed and reopened). 
By default, static cursors are scrollable. SQL Server static cursors are always read-only.</p>

<p>A dynamic cursor allows you to see the data update, deletion and insertion in the data source while the cursor is open. 
Hence, a dynamic cursor is sensitive to any changes to the data source and supports update, delete operations. By default, dynamic cursors are scrollable.</p>

<p>MySQL cursor is read-only, non-scrollable and asensitive.</p>

<p>Read-only: you cannot update data in the underlying table through the cursor.
Non-scrollable: you can only fetch rows in the order determined by the SELECT statement. You cannot fetch rows in the reversed order. In addition, you cannot skip rows or jump to a specific row in the result set.
Asensitive: there are two kinds of cursors: asensitive cursor and insensitive cursor. An asensitive cursor points to the actual data, whereas an insensitive cursor uses a temporary copy of the data. 
An asensitive cursor performs faster than an insensitive cursor because it does not have to make a temporary copy of data. However, any change that made to the data from other connections will affect the data that is being used by an asensitive cursor, therefore, it is safer if you do not update the data that is being used by an asensitive cursor. MySQL cursor is asensitive.</p>

<h3 id="function">Function</h3>

<p>In SQL, a function is a database object that allows you to encapsulate a set of SQL statements or expressions into a reusable and named unit. 
Functions take zero or more input parameters, perform specific operations or calculations, and return a single value as their result. SQL functions can be used in queries, data manipulation, and various SQL statements to simplify and modularize database operations. 
There different types of SQL functions: scalar functions (return a single value), table-valued (return result set), aggregations, date-time functions and the like.
This is another story related to SQL server internals.</p>

<p>How can we program SQL function to count logarithm:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- CREATE FUCTION</span>
<span class="k">DELIMITER</span> <span class="err">$$</span>
<span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">logarithm</span><span class="p">(</span>
    <span class="n">base</span> <span class="nb">INT</span><span class="p">,</span>
    <span class="n">n</span> <span class="nb">INT</span>
    <span class="p">)</span>
    <span class="k">RETURNS</span> <span class="nb">INT</span> <span class="k">DETERMINISTIC</span> <span class="c1">-- like idempotent but does not alter db state even in first call</span>
<span class="k">BEGIN</span>
<span class="c1">-- DECLARE local variables</span>
<span class="k">DECLARE</span> <span class="n">a</span> <span class="nb">INT</span> <span class="k">DEFAULT</span> <span class="mi">2</span><span class="p">;</span>
<span class="k">DECLARE</span> <span class="n">b</span> <span class="nb">INT</span><span class="p">;</span>
<span class="k">SET</span> <span class="n">b</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">IF</span> <span class="n">base</span> <span class="o">&gt;</span> <span class="mi">1</span>
    <span class="k">THEN</span> <span class="k">SET</span> <span class="n">a</span> <span class="o">=</span> <span class="n">base</span> <span class="c1">-- log base cannot be 0 or 1, use default 2 in such case</span>
<span class="n">IF</span> <span class="n">n</span> <span class="o">&lt;=</span> <span class="mi">0</span>
    <span class="k">THEN</span> <span class="k">RETURN</span> <span class="k">NULL</span> <span class="c1">-- n must be &gt; 0, return null as log cannot be counted</span>
<span class="k">RETURN</span> <span class="n">LOG</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">n</span><span class="p">);</span>
<span class="k">END</span><span class="err">$$</span>

<span class="c1">-- HOW TO CALL</span>
<span class="k">SELECT</span> <span class="n">logarithm</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">128</span><span class="p">)</span><span class="err">$$</span>
</code></pre></div></div>

<h3 id="function-vs-stored-procedure">Function vs stored procedure</h3>

<p>Function:</p>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />database object, a set of SQL statements or expressions wrapped into a reusable and named single unit</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />returns value(s) (single value - scalar function, result set - table-valued functions).</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />they can be used as expressions in queries, such as SELECT, WHERE…</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />scalar functions can be used to modify data, but they are typically designed for computations and transformations</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />designed to be used as read-only, they cannot contain control statements like COMMIT or ROLLBACK</li>
</ul>

<p>Stored procedures:</p>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />may or may not return values - it is optional</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />called explicitly using the EXECUTE or EXEC statement, cannot be used directly in a SELECT statement or a WHERE clause</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />can include data modification statements (INSERT, UPDATE, DELETE) and transaction control statements (COMMIT, ROLLBACK)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />suitable for operations that involve both reading and writing data</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />can contain transaction control statements, allowing for explicit control over transactions</li>
</ul>

<h3 id="triggers">Triggers</h3>

<p>A trigger is a set of instructions or a program that is automatically executed (“triggered”) in response to specific events on a particular table or view.
These events can include data manipulation language (DML) events like <code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code>, <code class="language-plaintext highlighter-rouge">DELETE</code>, or data definition language (DDL) events like <code class="language-plaintext highlighter-rouge">CREATE</code>, <code class="language-plaintext highlighter-rouge">ALTER</code>, or <code class="language-plaintext highlighter-rouge">DROP</code>.
Triggers are often used to enforce business rules, maintain referential integrity, and automate certain tasks.</p>

<h3 id="dml-triggers">DML triggers</h3>

<p>These triggers respond to data manipulation language (DML) events, such as <code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code>, and <code class="language-plaintext highlighter-rouge">DELETE</code> operations on a table.
Common use cases include enforcing data integrity rules, auditing changes, and automating specific actions based on data modifications.
Example of an <code class="language-plaintext highlighter-rouge">AFTER INSERT</code> trigger:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TRIGGER</span> <span class="n">AfterInsertTrigger</span>
<span class="k">AFTER</span> <span class="k">INSERT</span>
<span class="k">ON</span> <span class="n">Employees</span>
<span class="k">FOR</span> <span class="k">EACH</span> <span class="k">ROW</span>
<span class="k">BEGIN</span>
    <span class="c1">-- Trigger logic, e.g., update a related table, log the change, etc.</span>
<span class="k">END</span><span class="p">;</span>
</code></pre></div></div>

<h3 id="ddl-triggers">DDL triggers</h3>

<p>These triggers respond to data definition language (DDL) events, such as CREATE, ALTER, and DROP operations on a database or table.
Common use cases include restricting certain schema modifications, logging schema changes, or implementing specific actions when database objects are altered.
Example of a BEFORE CREATE trigger:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TRIGGER</span> <span class="n">BeforeCreateTrigger</span>
<span class="k">BEFORE</span> <span class="k">CREATE</span>
<span class="k">ON</span> <span class="k">DATABASE</span>
<span class="k">BEGIN</span>
    <span class="c1">-- Trigger logic, e.g., check if the user has permission to create a table</span>
<span class="k">END</span><span class="p">;</span>
</code></pre></div></div>

<h3 id="what-does-the-trigger-consist-of">What does the trigger consist of?</h3>

<p>Each trigger consist of three elements:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />trigger event (when it should happen?) - specifies the event that causes the trigger to be executed (e.g., AFTER INSERT, BEFORE UPDATE)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />trigger condition (why it should happen?) - <strong>optionally</strong> specifies a condition that must be true for the trigger to execute</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />trigger action (what should happen?) - contains SQL statements or procedures that are executed when the trigger is fired</li>
</ul>

<h3 id="how-triggers-are-used">How triggers are used?</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />to enforce business rules,</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />to enforce / maintain referential integrity rules</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />auditing &amp; logging schema changes</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />automating data modifications</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />restricting certain schema modifications, logging schema changes</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />to database objects</li>
</ul>

<h3 id="what-is-the-risk-of-using-triggers">What is the risk of using triggers?</h3>

<p>They introduce additional complexity and can impact performance!
Overuse of triggers can make database behavior less transparent and harder to manage.
Therefore, triggers are often employed for tasks that are best handled within the database layer (meta-level, database management), such as enforcing integrity constraints or automating certain actions, rather than for general application logic.</p>

<h3 id="what-is-sql-index">What is SQL index?</h3>

<p>A SQL index consists of a data structure that stores a sorted or hashed subset of the columns from a database table,
along with pointers to the corresponding rows, to facilitate efficient and quick data retrieval operations.</p>

<h3 id="index-more-insight">Index: more insight</h3>

<p>It is a separate bunch of data, created from indexed field (column) and pointer to full record containing such field.
SQL indexes work by providing (theoretically) a faster way to retrieve data from a database table. 
Indexing creates a data structure that maps specific column values to their corresponding rows.
It’s smaller than full record, contains less disk space, it’s sorted allowing binary search, so it’s faster to iterate through it.
As index record contains only the indexed field and a pointer to the original record, it stands to reason that it will be smaller than the multi-field record that it points to.
So the index itself requires fewer disk blocks than the original table, which therefore requires fewer block accesses to iterate through.</p>

<p>According to nice explanation in <a href="https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html">MySQL manual</a>:</p>

<blockquote>
  <p>Indexes are used to find rows with specific column values quickly. 
Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. 
The larger the table, the more this costs. If the table has an index for the columns in question, 
MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. 
This is much faster than reading every row sequentially.</p>
</blockquote>

<p>This is similar to hashing in data structures, like Hash Map. In fact, some SQL indexing methods are using hashing.
Most of MySQL indexes use B-Trees, some use R-Trees and hashes.</p>

<p>OK, but what are B-Trees? MySQL manual offers helpful <a href="https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_b_tree">glossary of terms</a>
with B-Tree concept explained. B-Tree is a data structure, but not the same as binary tree. B-Tree can have multiple children, binary tree only two children per node.</p>

<p>The index allows the database to avoid a full table scan (row by row), resulting in significantly faster query execution (in theory).
Instead of going through all the rows in the table, the database directly accesses the row(s) that match the condition.</p>

<p>Why I am writing “in theory”?</p>

<p>Indexes are especially beneficial for SELECT, WHERE, JOIN, and ORDER BY clauses, as they help the database engine quickly pinpoint the desired data. 
**However, it’s important to note that indexes come with some trade-offs. **
They consume storage space and can slightly slow down write operations (INSERT, UPDATE, DELETE) because the index must be updated when the data changes.</p>

<p><a href="https://www.postgresql.org/docs/current/indexes.html">Postgres manual is also a great source of knowledge on indexes</a>.</p>

<h3 id="index---detailed-explanation">Index - detailed explanation</h3>

<p>Data stored on disk-based storage devices is organized into blocks, which serve as the fundamental unit of disk access. 
Each block is accessed as a whole, representing the smallest disk access (atomic) operation. 
The structure of disk blocks resembles that of linked lists, with each block consisting of a data section and a pointer indicating the location of the next node or block. 
Importantly, these blocks do not necessarily need to be stored consecutively on the disk.</p>

<p>Search operation on unsorted data is called linear search.</p>

<p>What is linear search and why it requires <code class="language-plaintext highlighter-rouge">(n+1)/2</code> accesses on average - when searching an unordered list with n elements?</p>

<h5 id="searching-mechanism-of-linear-search">Searching mechanism of linear search</h5>

<p>In a linear search, you start searching from the beginning of the list and examine each element one by one until you find the target element or determine that it doesn’t exist in the list. You stop as soon as you find a match.</p>

<h5 id="best-case-scenario">Best-case scenario</h5>

<p>The best-case scenario is when the target element is found in the first position of the list. In this case, only one access is required.</p>

<h5 id="worst-case-scenario">Worst-case scenario</h5>

<p>The worst-case scenario is when the target element is located at the end of the list or is not present at all. In this case, you must access every element in the list to determine that the target element is not there.</p>

<h5 id="average-case-scenario">Average-case scenario</h5>

<p>To calculate the average number of accesses, you need to consider all possible positions of the target element in the list. On average, you would expect to find it somewhere in the middle, requiring roughly <code class="language-plaintext highlighter-rouge">(n+1)/2</code> accesses. This average assumes that the target element is equally likely to be in any position in the list.</p>

<blockquote>
  <p>The formula (n+1)/2 represents the arithmetic mean or average of all possible access scenarios in a linear search. 
It provides a reasonable estimate of the expected number of accesses needed to find an element when the position of the target element is not known in advance.</p>
</blockquote>

<p>This is a result of the way linear searches work and is based on the concept of “expected number of accesses.” It’s just an estimation.</p>

<p>In practice, the actual number of accesses in a specific search may vary, but the <code class="language-plaintext highlighter-rouge">(n+1)/2</code> formula provides a useful average estimation for the linear search algorithm’s performance.</p>

<h5 id="what-in-case-of-non-unique-fields">What in case of non-unique fields?</h5>

<p><code class="language-plaintext highlighter-rouge">(n+1)/2</code> is appropriate only if we search for a unique value (which cannot be doubled) - so once it is found, no need to search for more of them.
If searched record is a non-key field (i.e. doesn’t contain unique entries), we must find all fields that matches expectation, 
so the entire table must be searched. Then it requires <code class="language-plaintext highlighter-rouge">n</code> block accesses.</p>

<h5 id="what-if-data-are-sorted">What if data are sorted?</h5>

<p>When data is stored in a sorted field, you can employ a Binary Search algorithm to locate specific values. 
Binary Search is highly efficient and typically requires <code class="language-plaintext highlighter-rouge">log2(n)</code> block accesses to find a particular value. Here, <code class="language-plaintext highlighter-rouge">n</code> represents the number of elements or records in the sorted field.
(In contrast, a linear search in an unsorted field might require <code class="language-plaintext highlighter-rouge">n</code> block accesses in the worst case, which is significantly less efficient.)</p>

<h5 id="what-about-duplication-problem">What about duplication problem?</h5>

<p>In a sorted field, once a value higher than the target value is found during the search, you can be confident that the target value doesn’t exist in the remaining portion of the field. 
This is because, in a sorted field, all values are ordered, and any duplicate values would be adjacent to each other. 
Therefore, you don’t need to continue searching for duplicate values once a higher value is encountered.</p>

<p>The combined effect of using Binary Search and the elimination of duplicate searches in a sorted field results in a substantial performance increase compared to an unsorted field. 
It allows for quicker and more efficient retrieval of data, especially when searching for specific values or performing range-based queries.</p>

<p>In summary, the key advantages of using a sorted field include the efficiency of Binary Search and the ability to eliminate duplicate searches, both of which significantly enhance query performance and data retrieval speed.</p>

<h5 id="advantages-of-using-indexes">Advantages of using indexes</h5>

<p>And now, indexing comes into play, offering some benefits:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />avoiding a full table scan (row by row), using trees and hashing in searching</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />index is a given field + pointer to the record, so it is fewer data than original record</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />speed up SELECT, WHERE, JOIN, and ORDER BY</li>
</ul>

<p>As this great <a href="https://stackoverflow.com/questions/1108/how-does-database-indexing-work">StackOverflow article</a> explains:</p>

<blockquote>
  <p>Indexing is a way of sorting a number of records on multiple fields. 
Creating an index on a field in a table creates another data structure which holds the field value, and a pointer to the record it relates to. 
This index structure is then sorted, allowing Binary Searches to be performed on it.</p>
</blockquote>

<p>To sum up, indexing takes advantage of the fact that data are sorted, and it allows to use searching algorithms that are more efficient than simple linear search.</p>

<h5 id="are-there-any-drawbacks-of-indexes">Are there any drawbacks of indexes?</h5>

<p>Unfortunately, nothing comes for free. Here, I would like to quote StackOverflow again:</p>

<blockquote>
  <p>The downside to indexing is that these indices require additional space on the disk since the indices are stored together in a table using the MyISAM engine, 
this file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.</p>
</blockquote>

<p>In short:</p>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />index takes additional storage space (it needs additional data structure that stores a sorted or hashed subset of the columns)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />index can slightly slow down write operations (INSERT, UPDATE, DELETE) because the index must be updated when the data changes</li>
</ul>

<h3 id="when-to-use-indexes">When to use indexes:</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />high-cardinality columns (uniqness of data in particular column)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />frequent searches</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />large tables</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />JOIN, GROUP BY, ORDER</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />unique constraints (PRIMARY_KEY, UNIQUE)</li>
</ul>

<h3 id="when-not-to-use-indexes">When not to use indexes:</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />small tables</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />sequential data, increasing or decreasing, like timestamps: the benefits of indexing might be limited, as new values are continuously added at one end of the index</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />frequent write operations</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />low-cardinality columns</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />temporary tables</li>
</ul>

<h3 id="what-is-this-cardinality-after-all">What is this cardinality, after all?</h3>

<p>Cardinality means degree of uniqueness of data values contained in a particular column. 
High-cardinality refers to columns with values that are very uncommon or unique -
a good use case to apply indexes: e.g. user_id (which is unique).
Data of normal-cardinality would be: address, name, etc. 
And finally, examples of low-cardinality data are booleans, flags, Y/N switch, etc. -
do not use indexes on such columns!</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data" /><category term="SQL, database, data, persistency" /><summary type="html"><![CDATA[Advanced SQL for Java devs: cursor, function, trigger, index.]]></summary></entry><entry><title type="html">SQL cheatsheet: part 6</title><link href="https://mzacki.github.io/sql-cheatsheet-6/" rel="alternate" type="text/html" title="SQL cheatsheet: part 6" /><published>2023-07-20T04:23:00+00:00</published><updated>2023-07-20T04:23:00+00:00</updated><id>https://mzacki.github.io/sql-cheatsheet-6</id><content type="html" xml:base="https://mzacki.github.io/sql-cheatsheet-6/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-5">Medium SQL for Java developers: recapitulation</a></p>

<h3 id="procedure">Procedure</h3>

<p>SQL procedure is a kind of SQL query embedded in a SQL script. In other words, it is SQL function that executes pre-programmed query.
A procedure can accept arguments. Here is an example of a stored procedure that selects all columns from the table <code class="language-plaintext highlighter-rouge">company</code> according
to the country code argument passed to this procedure.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- STORED PROCEDURE - SQL query saved directly in db as a function</span>
<span class="c1">-- such function accepts argument being used to execute query within stored procedure</span>
<span class="c1">-- procedure can be then called multiple times with different args</span>

<span class="c1">-- HOW TO CREATE</span>
<span class="k">DELIMITER</span>
<span class="err">$$</span>
<span class="k">CREATE</span> <span class="k">PROCEDURE</span> <span class="n">get_company_by_country_code</span><span class="p">(</span>
    <span class="n">country_code</span> <span class="nb">VARCHAR</span> <span class="p">(</span><span class="mi">4</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">BEGIN</span>
    <span class="n">IF</span>
<span class="n">country_code</span> <span class="k">IS</span> <span class="k">NULL</span>
<span class="k">THEN</span>
<span class="k">SELECT</span> <span class="s1">'Function argument is null'</span><span class="p">;</span>
    <span class="k">ELSE</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="n">company</span>
<span class="k">WHERE</span> <span class="n">company</span><span class="p">.</span><span class="n">hq_country</span> <span class="o">=</span> <span class="n">country_code</span><span class="p">;</span>
<span class="k">END</span> <span class="n">IF</span><span class="p">;</span>
<span class="k">END</span> <span class="err">$$</span>

<span class="c1">-- HOW TO CALL</span>
<span class="k">CALL</span> <span class="n">get_company_by_country_code</span><span class="p">(</span><span class="s1">'JPN'</span><span class="p">)</span> <span class="c1">-- returns result</span>
<span class="err">$$</span>
<span class="k">CALL</span> <span class="n">get_company_by_country_code</span><span class="p">(</span><span class="k">null</span><span class="p">)</span> <span class="c1">-- returns hard-coded answer</span>
<span class="err">$$</span>
<span class="k">CALL</span> <span class="n">get_company_by_country_code</span><span class="p">()</span> <span class="c1">-- returns error as arg is expected</span>
<span class="err">$$</span>
</code></pre></div></div>

<p>I asked ChatGPT to explain the procedure. Here is detailed explanation:</p>

<blockquote>
  <p>The code sets the delimiter to $$. This is used to specify the end of the stored procedure definition since it contains semicolons (;) within its body.</p>
</blockquote>

<blockquote>
  <p>The CREATE PROCEDURE statement is used to define the stored procedure get_company_by_country_code with the country_code parameter.</p>
</blockquote>

<blockquote>
  <p>The BEGIN keyword indicates the start of the stored procedure’s body.</p>
</blockquote>

<blockquote>
  <p>The code checks if the country_code parameter is NULL using the IF statement.</p>
</blockquote>

<blockquote>
  <p>If the country_code is NULL, the code executes the SELECT ‘Function argument is null’; statement. This statement will return a single row with the string value ‘Function argument is null’.</p>
</blockquote>

<blockquote>
  <p>If the country_code is not NULL, the code executes the SELECT * FROM company WHERE company.hq_country = country_code; statement. This statement selects all columns (*) from the company table where the hq_country column matches the provided country_code parameter.</p>
</blockquote>

<blockquote>
  <p>The END IF; statement denotes the end of the IF block.</p>
</blockquote>

<blockquote>
  <p>The END $$ statement denotes the end of the stored procedure definition, using the previously set delimiter.</p>
</blockquote>

<blockquote>
  <p>Overall, this stored procedure retrieves company data based on the provided country_code. If the country_code parameter is NULL, it returns the string ‘Function argument is null’. Otherwise, it selects all columns from the company table where the hq_country column matches the provided country_code.</p>
</blockquote>

<p>The explanation created by ChatGPT contains rather obvious, self-explanatory statements, but it can be helpful, nevertheless.</p>

<h3 id="why-we-should-use-stored-procedures">Why we should use stored procedures?</h3>

<p><a href="https://docs.oracle.com/javase/tutorial/jdbc/basics/storedprocedures.html">The Oracle JDBC tutorial explains this issue.</a></p>

<blockquote>
  <p>JDBC means Java Database Connectivity.</p>

  <p>It’s Java API (a.k.a. “abstraction layer”) designed to standarize and simplify the process of connecting Java software to relational database management systems (RDBMS)</p>
</blockquote>

<p>In short:</p>

<p>Stored procedures are precompiled database scripts (group of statements) that can be executed from a database client, such as a Java application, using JDBC.
Stored procedures offer better performance, reduced network traffic, and improved security. They can encapsulate complex SQL logic and business rules.
JDBC provides the <strong>CallableStatement</strong> interface for executing stored procedures.</p>

<blockquote>
  <p>The CallableStatement interface allows the use of SQL statements to call stored procedures.</p>
</blockquote>

<p>You can prepare and execute stored procedures using <strong>CallableStatement</strong>. 
You can use <code class="language-plaintext highlighter-rouge">execute()</code>, <code class="language-plaintext highlighter-rouge">executeQuery()</code>, and <code class="language-plaintext highlighter-rouge">executeUpdate()</code> methods to invoke stored procedures.
They can have input and output parameters, or parameters that are both input and output
Error handling for stored procedures is also explained, including handling exceptions using <code class="language-plaintext highlighter-rouge">SQLException</code>.</p>

<p>What about advantages of stored procedures in Java, including security context?</p>

<h4 id="easier-maintenance">Easier maintenance</h4>

<p><strong>1. Encapsulation of logic and operations</strong></p>

<p>Stored procedures allow you to encapsulate business logic and database operations into a single unit, looking like a simple script. 
This helps enforce <strong>data integrity rules and security constraints.</strong> 
By centralizing the implementation of data operations within the stored procedure, 
<strong>you can ensure that security checks, access controls, and validation rules are consistently applied</strong> across different parts of the application.</p>

<p><strong>2. Reusability of code</strong></p>

<p>Stored procedures can be called and executed from various parts of an application or by multiple users. 
This promotes code reuse and reduces redundancy, as the same logic can be executed without rewriting it.
Multiple Java applications or components can invoke the same stored procedure, 
reducing duplication of code and promoting consistency across different parts of your application.</p>

<p><strong>3. Transaction management</strong></p>

<p>Stored procedures can be used to define complex transactions that involve multiple SQL statements.
This allows for consistent and reliable data modifications, with the ability to roll back changes if necessary.
Storing SQL logic in stored procedures allows for centralized management and versioning of database operations.
Modifications to the SQL code can be made in the stored procedures without requiring changes to the Java codebase.</p>

<p>This separation of concerns makes it easier to track and manage changes, <strong>ensuring that security updates and fixes can be applied more efficiently</strong>.</p>

<p><strong>4. Database decoupling</strong></p>

<p>A stored procedure is a helpful tool when thinking about vendor-specific database independence (like, for example, future data migration).
Using stored procedures can help abstract the underlying database implementation from your Java code.
By relying on stored procedures, you can write database-agnostic Java code that can work with different database systems without major modifications.
This can simplify database migrations or switching to a different database platform in the future.</p>

<h5 id="better-performance">Better performance</h5>

<p>Stored procedures can enhance performance by reducing network traffic.
They are typically compiled and optimized by the database server during creation or the first execution.
This can lead to faster execution times compared to dynamically generating SQL statements in Java code. 
By offloading data processing to the database server, you can reduce network latency and utilize the database’s query optimization capabilities.
Instead of sending multiple SQL statements over the network, 
a single call to the stored procedure is made, reducing the overhead of multiple round trips.</p>

<p><strong>It could make a difference if someone tried to DDOS your application.</strong></p>

<h5 id="increased-security">Increased security</h5>

<p><strong>1. Security enhanced by limited access</strong></p>

<p>Stored procedures allow you to grant permissions to execute the procedure without granting direct access to the underlying database tables. 
This means that users or applications can <strong>interact with the database only through the stored procedure</strong>, 
and they don’t have direct control over the underlying data. It provides <strong>a layer of abstraction and restricts unauthorized access to sensitive data</strong>.</p>

<p><strong>2. Safe parametrized queries</strong></p>

<p>Stored procedures typically use parameterized queries, 
where user input is passed as parameters rather than directly concatenating them into SQL statements. 
This helps <strong>prevent SQL injection attacks</strong>, a common security vulnerability where malicious input is injected into SQL statements. 
By using parameterized queries, stored procedures can ensure that user input is properly sanitized and reduce the risk of SQL injection attacks.</p>

<p><strong>3. Auditing and logging for security control</strong></p>

<p>Stored procedures provide a natural point for implementing auditing and logging mechanisms. 
You can log the execution of stored procedures, capturing details such as <strong>who executed the procedure, 
when it was executed, and what parameters were used</strong>. This can help with compliance requirements, troubleshooting, 
and identifying potential security breaches or suspicious activities.</p>

<h3 id="view">View</h3>

<p>SQL view can be considered as a virtual table that consolidates data from one or more tables. 
Unlike physical tables, view doesn’t store data itself and exists only logically in the database, where it is saved.</p>

<p>Each view in a database must have a unique name, just like a regular SQL table. 
It is defined by a predefined set of SQL queries that retrieve data from the underlying database tables. 
View can incorporate tables from a single database or multiple databases.</p>

<p>Example of SQL view:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- VIEW acts like reusable saved SELECT</span>
<span class="c1">-- is stored directly in db</span>

<span class="k">CREATE</span> <span class="k">VIEW</span> <span class="n">company_left_join_customer</span> <span class="k">AS</span>
<span class="k">SELECT</span> <span class="n">company</span><span class="p">.</span><span class="o">*</span><span class="p">,</span>
       <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span> <span class="k">AS</span> <span class="n">customer_number</span><span class="p">,</span> <span class="c1">-- table name even in different tables must be unique</span>
       <span class="k">c</span><span class="p">.</span><span class="n">first_name</span><span class="p">,</span>
       <span class="k">c</span><span class="p">.</span><span class="n">last_name</span><span class="p">,</span>
       <span class="k">c</span><span class="p">.</span><span class="n">registration_date</span>
<span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span><span class="p">;</span>

<span class="c1">-- example of use</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company_left_join_customer</span><span class="p">;</span> <span class="c1">-- view alias</span>

<span class="c1">-- customer, current branch, current turnover</span>
<span class="k">CREATE</span> <span class="k">VIEW</span> <span class="n">customer_branch_turnover_current</span> <span class="k">AS</span>
<span class="k">SELECT</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span><span class="p">,</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">first_name</span><span class="p">,</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">last_name</span><span class="p">,</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">registration_date</span><span class="p">,</span>
    <span class="n">b</span><span class="p">.</span><span class="n">branch_name</span><span class="p">,</span>
    <span class="n">bc</span><span class="p">.</span><span class="n">from_date</span> <span class="k">AS</span> <span class="n">branch_since</span><span class="p">,</span>
    <span class="n">t</span><span class="p">.</span><span class="n">turnover</span><span class="p">,</span>
    <span class="n">t</span><span class="p">.</span><span class="n">from_date</span> <span class="k">AS</span> <span class="n">monthly_turnover_since</span>
<span class="k">FROM</span>
    <span class="n">customer</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">branch_customers</span> <span class="n">bc</span>
    <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">bc</span><span class="p">.</span><span class="n">customer_id</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">branch</span> <span class="n">b</span>
    <span class="k">ON</span> <span class="n">bc</span><span class="p">.</span><span class="n">branch_id</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">branch_id</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">turnover</span> <span class="n">t</span>
    <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">WHERE</span>
        <span class="n">bc</span><span class="p">.</span><span class="n">to_date</span> <span class="o">=</span> <span class="s1">'9999-01-01'</span>
  <span class="k">AND</span>
        <span class="n">t</span><span class="p">.</span><span class="n">to_date</span> <span class="o">=</span> <span class="s1">'9999-01-01'</span>
</code></pre></div></div>

<h3 id="view-versus-procedure">View versus procedure</h3>

<p>What’s the difference between procedure and view?</p>

<p>SQL view is a virtual table or tables, similar to a product of <code class="language-plaintext highlighter-rouge">SELECT</code> query, optionally with <code class="language-plaintext highlighter-rouge">JOIN</code>.</p>

<p><strong>No logic</strong>: There is no procedural logic in that, no conditional statements nor loops.</p>

<p><strong>No parameters</strong>: Views don’t accept parameters.</p>

<p><strong>No storage</strong>: A view doesn’t store data itself but provides a way to present data from one or more underlying tables or other views.</p>

<p>Views are primarily used for <strong>data retrieval and presentation</strong>.</p>

<p><strong>Read-only:</strong> Views are <strong>read-only</strong> and can be used to simplify complex queries, filter data, 
and provide a consistent interface for users or applications.</p>

<p>In terms of security, <strong>views can be used to restrict access</strong> to specific columns or rows in a table, 
but they don’t provide the same level of security and control as stored procedures.</p>

<p>In terms of performance, views <strong>may improve query performance</strong> by providing a pre-defined and optimized representation of data.</p>

<p>However, <strong>complex views may introduce performance issues</strong>.</p>

<p>On the other hand, SQL procedure is like a pre-programmed query, often with custom parameters. 
Stored procedures support conditional logic, error handling, and the ability to return multiple result sets.
Stored procedures can be used for <strong>data manipulation</strong>, such as CRUD operations.
They can significantly improve security and performance.</p>

<h3 id="summary-updated-15112023">Summary (updated 15.11.2023)</h3>

<h3 id="what-is-stored-procedure-what-to-use-it-for">What is stored procedure? What to use it for?</h3>

<p>Stored procedures are precompiled database scripts (group of statements) that can be executed from a database client, such as a Java application, using JDBC.
Stored procedures offer better performance, reduced network traffic, and improved security.
They can encapsulate complex SQL logic and business rules.
JDBC provides the CallableStatement interface for executing stored procedures.</p>

<h3 id="what-is-database-view">What is database view?</h3>

<p>SQL view can be considered as a virtual table that consolidates data from one or more tables.
Contrary to physical tables, view doesn’t store data itself and exists only logically in the database, where it is saved.
Unlike procedures, view doesn’t have logic (it is only for presentation). No params, no storage and it’s read-only.</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data" /><category term="SQL, database, data, persistency" /><summary type="html"><![CDATA[Advanced SQL for Java developers: procedure, view.]]></summary></entry><entry><title type="html">SQL cheatsheet: part 5</title><link href="https://mzacki.github.io/sql-cheatsheet-5/" rel="alternate" type="text/html" title="SQL cheatsheet: part 5" /><published>2023-06-20T04:23:00+00:00</published><updated>2023-06-20T04:23:00+00:00</updated><id>https://mzacki.github.io/sql-cheatsheet-5</id><content type="html" xml:base="https://mzacki.github.io/sql-cheatsheet-5/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-4">CRUD, n+1, migrations</a></p>

<h3 id="summary-sql-basics-for-java-devs">Summary: SQL basics for Java devs</h3>

<p>Practice your SQL skills. Do not have a feeling that you need to start from the scratch over and over again! 
In particular, if you’re a beginner, or you do not work with SQL very often (it is not uncommon).
SQL problems can be like a Nemesis: I saw senior architects hairs growing grey because of complex SQL issues affecting company performance.
They were stammering trying to admit that there is a bug that no one is able to easily resolve.</p>

<p>You can excercise SQL kata at various programming website that I’ve already mentionned earlier.</p>

<p>You are also in position to use any of SQL playgrounds (a.k.a. fiddles), accessible in the web, to run and test some simple query, like <a href="https://www.db-fiddle.com/" target="_blank">db-fiddle</a>
It is more like shadow-fighting: you try something, and when it fails, you need to counter your imaginative opponent.
For testing more advance queries and their performance, local database, Docker database or remote cloud database would be better along with any of SQL
clients, like Workbench or IntelliJ.</p>

<p>Create a basic SQL schema - to have a sample table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">test</span> <span class="p">(</span>
    <span class="n">id</span> <span class="nb">INT</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">test</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">test</span> <span class="p">(</span><span class="n">id</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">2</span><span class="p">);</span>
</code></pre></div></div>

<p>Let’s check what this fiddle offers:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">@@</span><span class="k">version</span><span class="p">;</span>
<span class="c1">-- 5.7.38</span>
</code></pre></div></div>

<p>and now sample queries to play with - note that this fiddle requires backticks, which weren’t needed in previous examples:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="nv">`test`</span> <span class="k">add</span> <span class="n">customer_id</span> <span class="nb">int</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">INFORMATION_SCHEMA</span><span class="p">.</span><span class="n">TABLES</span> <span class="k">WHERE</span> <span class="nv">`TABLE_NAME`</span> <span class="o">=</span> <span class="s1">'test'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="nv">`COLUMN_NAME`</span> <span class="k">FROM</span> <span class="nv">`INFORMATION_SCHEMA`</span><span class="p">.</span><span class="nv">`COLUMNS`</span> <span class="k">WHERE</span> <span class="nv">`TABLE_NAME`</span> <span class="o">=</span> <span class="s1">'test'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="s1">'anything'</span><span class="p">;</span>
</code></pre></div></div>

<p>Now, remember, that <code class="language-plaintext highlighter-rouge">SELECT</code> can evaluate Boole’s algebra expressions, as well as it can execute arithmetic calculations:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="mi">1</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="k">AS</span> <span class="n">boolean_value</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="n">IF</span><span class="p">(</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="s1">'TRUE'</span><span class="p">,</span> <span class="s1">'FALSE'</span><span class="p">)</span> <span class="k">AS</span> <span class="n">two_plus_two_is_four</span><span class="p">;</span>
</code></pre></div></div>

<p>The SQL playground I tested did not have problems with more advanced XOR gate example (see first part of this SQL series):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="o">@</span><span class="n">false_xor</span> <span class="p">:</span><span class="o">=</span> <span class="s1">'gate returns false'</span><span class="p">;</span>
<span class="k">SET</span> <span class="o">@</span><span class="n">true_xor</span> <span class="p">:</span><span class="o">=</span> <span class="s1">'gate returns true'</span><span class="p">;</span>
<span class="k">SELECT</span>
    <span class="n">IF</span><span class="p">(</span><span class="mi">0</span> <span class="n">XOR</span> <span class="mi">0</span><span class="p">,</span> <span class="o">@</span><span class="n">true_xor</span><span class="p">,</span> <span class="o">@</span><span class="n">false_xor</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'0 XOR 0'</span><span class="p">,</span>
    <span class="n">IF</span><span class="p">(</span><span class="mi">0</span> <span class="n">XOR</span> <span class="mi">1</span><span class="p">,</span> <span class="o">@</span><span class="n">true_xor</span><span class="p">,</span> <span class="o">@</span><span class="n">false_xor</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'0 XOR 1'</span><span class="p">,</span>
    <span class="n">IF</span><span class="p">(</span><span class="mi">1</span> <span class="n">XOR</span> <span class="mi">0</span><span class="p">,</span> <span class="o">@</span><span class="n">true_xor</span><span class="p">,</span> <span class="o">@</span><span class="n">false_xor</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'1 XOR 0'</span><span class="p">,</span>
    <span class="n">IF</span><span class="p">(</span><span class="mi">1</span> <span class="n">XOR</span> <span class="mi">1</span><span class="p">,</span> <span class="o">@</span><span class="n">true_xor</span><span class="p">,</span> <span class="o">@</span><span class="n">false_xor</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'1 XOR 1'</span>
</code></pre></div></div>

<p>Basic <code class="language-plaintext highlighter-rouge">SELECT</code> can be wrapped into <strong>null check clause</strong>: if value of the field is null, it will be filled with given substitute:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">IFNULL</span><span class="p">(</span><span class="n">customer_id</span><span class="p">,</span> <span class="s1">'it is null, though!'</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'null checked customer_id'</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">1</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---

**Query #1**

    SELECT IFNULL(customer_id, 'it is null, though!') AS 'null checked customer_id' FROM test WHERE id = 1;

| null checked customer_id |
| ------------------------ |
| it is null, though!      |

---

</code></pre></div></div>

<p>Let’s extend the table with some text column:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- create new column:</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="nv">`test`</span> <span class="k">add</span> <span class="n">country</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span>
<span class="c1">-- and then add something there:</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">test</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">customer_id</span><span class="p">,</span> <span class="n">country</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span> <span class="s1">'FIN'</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">test</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">customer_id</span><span class="p">,</span> <span class="n">country</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span> <span class="s1">'NOR'</span><span class="p">);</span>
</code></pre></div></div>

<p>or update existing records with new values:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">UPDATE</span> <span class="n">test</span> <span class="k">SET</span> <span class="n">country</span> <span class="o">=</span> <span class="s1">'FIN'</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">UPDATE</span> <span class="n">test</span> <span class="k">SET</span> <span class="n">country</span> <span class="o">=</span> <span class="s1">'SWE'</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
</code></pre></div></div>

<p>To check equality, use <strong>equal sign</strong>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">country</span> <span class="o">=</span> <span class="s1">'FIN'</span>
</code></pre></div></div>

<p>To make loose comparision, use <strong>the percent sign</strong> as a wildcard on a given side of look up expression:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">country</span> <span class="k">LIKE</span> <span class="s1">'%N%'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">country</span> <span class="k">LIKE</span> <span class="s1">'%N'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">country</span> <span class="k">LIKE</span> <span class="s1">'N%'</span><span class="p">;</span>
</code></pre></div></div>
<p>One wildcard replaces one or more characters.
The underscore replaces one character:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">test</span> <span class="k">WHERE</span> <span class="n">country</span> <span class="k">LIKE</span> <span class="s1">'__N'</span>
</code></pre></div></div>

<p>The last but not least, the coalesce keyword returns first non-null value of these listed in parentheses:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">COALESCE</span><span class="p">(</span><span class="n">country</span><span class="p">,</span> <span class="nv">"Unknown"</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">test</span><span class="p">;</span>
</code></pre></div></div>
<p>It is used to replace null value with a substitute:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FIN
Unknown
FIN
NOR
</code></pre></div></div>

<h3 id="aggregations">Aggregations</h3>

<p>Let’s create new schema with two tables to test aggregations:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">country</span>
<span class="p">(</span>
    <span class="n">id</span>   <span class="nb">INT</span><span class="p">,</span>
    <span class="n">code</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">country</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">code</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'SWE'</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'FIN'</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'NOR'</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'ISL'</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'DNK'</span><span class="p">);</span>


<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">player</span>
<span class="p">(</span>
    <span class="n">id</span>    <span class="nb">INT</span><span class="p">,</span>
    <span class="n">name</span>  <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span>
    <span class="n">city</span>  <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
    <span class="n">games</span> <span class="nb">INT</span>
<span class="p">);</span>

<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">player</span> <span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">city</span><span class="p">,</span> <span class="n">games</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Swen'</span><span class="p">,</span> <span class="s1">'Kiruna'</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Antti'</span><span class="p">,</span> <span class="s1">'Kotka'</span><span class="p">,</span> <span class="mi">11</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Marit'</span><span class="p">,</span> <span class="s1">'Bergen'</span><span class="p">,</span> <span class="mi">13</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Katja'</span><span class="p">,</span> <span class="s1">'Keflavik'</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span>
       <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'Karin'</span><span class="p">,</span> <span class="s1">'Odense'</span><span class="p">,</span> <span class="mi">22</span><span class="p">);</span>
</code></pre></div></div>

<p>Here are all aggregation commands:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="n">count_all_records</span><span class="p">,</span>
<span class="k">MAX</span><span class="p">(</span><span class="n">games</span><span class="p">),</span> 
<span class="k">MIN</span><span class="p">(</span><span class="n">games</span><span class="p">),</span> 
<span class="k">AVG</span><span class="p">(</span><span class="n">games</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">player</span>
</code></pre></div></div>

<p>Counting occurences of each name (and grouping by the same name):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="n">occurences</span>
<span class="k">FROM</span> <span class="n">player</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="p">(</span><span class="n">name</span><span class="p">)</span>
</code></pre></div></div>

<p>Counting occurences of names with ‘K’:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">AS</span> <span class="n">occurences</span>
<span class="k">FROM</span> <span class="n">player</span>
<span class="k">WHERE</span> <span class="n">name</span> <span class="k">LIKE</span> <span class="s1">'%K%'</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="p">(</span><span class="n">name</span><span class="p">)</span>
</code></pre></div></div>

<p>Group by played games and count how many players achieved this number:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">games</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="n">games</span><span class="p">)</span> <span class="k">AS</span> <span class="n">players_with_this_qty_of_games</span>
<span class="k">FROM</span> <span class="n">player</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="p">(</span><span class="n">games</span><span class="p">)</span>
<span class="k">HAVING</span> <span class="k">count</span><span class="p">(</span><span class="n">games</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">13</span>
</code></pre></div></div>

<p><strong>Remember</strong>: <code class="language-plaintext highlighter-rouge">WHERE</code> is used before <code class="language-plaintext highlighter-rouge">GROUP BY</code>, <code class="language-plaintext highlighter-rouge">HAVING</code> after <code class="language-plaintext highlighter-rouge">GROUP BY</code>.</p>

<p><strong>Both can be used in the same query!</strong></p>

<h3 id="joining">Joining</h3>

<p><code class="language-plaintext highlighter-rouge">UNION</code> joins results of one query with results of another query from the same table, but omits duplicated rows (which matches both queries).
You can <code class="language-plaintext highlighter-rouge">UNION</code> different tables, but number of colums in one table must equal numer of columns in the other table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">player</span> <span class="k">WHERE</span> <span class="n">games</span> <span class="o">&gt;</span> <span class="mi">10</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">player</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="k">LIKE</span> <span class="s1">'%K%'</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">UNION ALL</code> allows duplicates, so Karin would be listed twice.</p>

<p>Let’s change id column names to precisely indicate what id they are referring to.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">country</span> <span class="p">(</span>
                         <span class="n">country_id</span> <span class="nb">INT</span><span class="p">,</span>
                         <span class="n">code</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">country</span> <span class="p">(</span><span class="n">country_id</span><span class="p">,</span> <span class="n">code</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'SWE'</span><span class="p">),</span>
<span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'FIN'</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'NOR'</span><span class="p">),</span> <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'ISL'</span><span class="p">),</span> <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'DNK'</span><span class="p">);</span>


<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">player</span> <span class="p">(</span>
                        <span class="n">player_id</span> <span class="nb">INT</span><span class="p">,</span>
                        <span class="n">name</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span>
                        <span class="n">city</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
                        <span class="n">games</span> <span class="nb">INT</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">player</span> <span class="p">(</span><span class="n">player_id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">city</span><span class="p">,</span> <span class="n">games</span><span class="p">)</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Swen'</span><span class="p">,</span> <span class="s1">'Kiruna'</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
<span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Antti'</span><span class="p">,</span> <span class="s1">'Kotka'</span><span class="p">,</span> <span class="mi">11</span><span class="p">),(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Marit'</span><span class="p">,</span> <span class="s1">'Bergen'</span><span class="p">,</span> <span class="mi">13</span><span class="p">),(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Katja'</span><span class="p">,</span> <span class="s1">'Keflavik'</span><span class="p">,</span> <span class="mi">4</span><span class="p">),(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'Karin'</span><span class="p">,</span> <span class="s1">'Odense'</span><span class="p">,</span> <span class="mi">22</span><span class="p">);</span>
</code></pre></div></div>
<p>Let’s update schemas by adding primary and foreign keys, required in joining operations.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">player</span> <span class="k">ADD</span> <span class="n">country_id</span> <span class="nb">INT</span><span class="p">;</span>
<span class="c1">-- this does the trick: by accident, country_id should be the same as player id, so let's take advantage of that</span>
<span class="k">UPDATE</span> <span class="n">player</span> <span class="k">SET</span> <span class="n">country_id</span> <span class="o">=</span> <span class="n">player_id</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">INNER JOIN</code> joins table using primary and foreign keys. Inner join syntax looks like this:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">customer</span>
<span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">field</span> <span class="n">f</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">customer_id</span>
</code></pre></div></div>

<p>Apply it to our schema:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">player</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">country</span> <span class="k">ON</span> <span class="n">player</span><span class="p">.</span><span class="n">country_id</span> <span class="o">=</span> <span class="n">country</span><span class="p">.</span><span class="n">id</span>
</code></pre></div></div>

<p>Alternative syntax with <code class="language-plaintext highlighter-rouge">USING</code>. Foreign key column name: country_id in <code class="language-plaintext highlighter-rouge">Player</code> table matches country_id in <code class="language-plaintext highlighter-rouge">Country</code> table.
It is required when using <code class="language-plaintext highlighter-rouge">USING</code> keyword, so that SQL knows how to connect tables via columns.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- USING</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">USING</span><span class="p">(</span><span class="n">customer_id</span><span class="p">);</span>
</code></pre></div></div>

<p>With <code class="language-plaintext highlighter-rouge">Player</code> and <code class="language-plaintext highlighter-rouge">Country</code> tables:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">player</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">country</span> <span class="k">USING</span> <span class="p">(</span><span class="n">country_id</span><span class="p">)</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">INNER JOIN</code> can be applied to more than two tables. You can also join using third, “helper” table.</p>

<p><code class="language-plaintext highlighter-rouge">CROSS JOIN</code> makes Cartesian product if no <code class="language-plaintext highlighter-rouge">WHERE</code> is specified (each row x each row). With <code class="language-plaintext highlighter-rouge">WHERE</code>, it joins:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">player</span> <span class="k">CROSS</span> <span class="k">JOIN</span> <span class="n">country</span> <span class="k">WHERE</span> <span class="n">country</span><span class="p">.</span><span class="n">country_id</span> <span class="o">=</span> <span class="n">player_id</span>
</code></pre></div></div>

<h3 id="left-join-right-join-outer-join">LEFT JOIN, RIGHT JOIN, OUTER JOIN</h3>

<p>What is the difference between them?</p>

<p><strong>ChatGPT offers concise summary:</strong></p>

<p><code class="language-plaintext highlighter-rouge">LEFT JOIN</code></p>
<ul>
  <li>Also known as a <code class="language-plaintext highlighter-rouge">LEFT OUTER JOIN</code>.</li>
  <li>Returns all the rows from the left table (the table mentioned before the <code class="language-plaintext highlighter-rouge">LEFT JOIN</code> clause) and the matching rows from the right table (the table mentioned after the <code class="language-plaintext highlighter-rouge">LEFT JOIN</code> clause).</li>
  <li>If there are no matching rows in the right table, <code class="language-plaintext highlighter-rouge">NULL</code> values are returned for the columns of the right table.</li>
  <li>This type of join ensures that all rows from the left table are included in the result, with the possibility of additional data from the right table if a match exists.</li>
</ul>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">departments</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">departments</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">RIGHT JOIN</code>
Also known as a <code class="language-plaintext highlighter-rouge">RIGHT OUTER JOIN</code>.
Returns all the rows from the right table and the matching rows from the left table.
If there are no matching rows in the left table, <code class="language-plaintext highlighter-rouge">NULL</code> values are returned for the columns of the left table.
This join is less commonly used than the <code class="language-plaintext highlighter-rouge">LEFT JOIN</code> but has the same purpose, ensuring that all rows from the right table are included in the result.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">RIGHT</span> <span class="k">JOIN</span> <span class="n">departments</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">departments</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">FULL OUTER JOIN</code> (<code class="language-plaintext highlighter-rouge">OUTER JOIN</code>):
A <code class="language-plaintext highlighter-rouge">FULL OUTER JOIN</code> combines the result sets of both the left and right tables.
It returns all the rows from both tables and matches rows where the join condition is met. If there are no matches in either table, <code class="language-plaintext highlighter-rouge">NULL</code> values are returned for the columns from the table without a match.
The result includes all rows from both tables, ensuring that no data is excluded.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">employees</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">departments</span><span class="p">.</span><span class="n">department_name</span>
<span class="k">FROM</span> <span class="n">employees</span>
<span class="k">FULL</span> <span class="k">OUTER</span> <span class="k">JOIN</span> <span class="n">departments</span> <span class="k">ON</span> <span class="n">employees</span><span class="p">.</span><span class="n">department_id</span> <span class="o">=</span> <span class="n">departments</span><span class="p">.</span><span class="n">id</span><span class="p">;</span>
</code></pre></div></div>
<p>It’s important to note that <strong>not all database systems support <code class="language-plaintext highlighter-rouge">RIGHT JOIN</code> and <code class="language-plaintext highlighter-rouge">FULL OUTER JOIN</code> directly</strong>, 
and you may need to use alternative methods to achieve the same results in those cases, such as swapping the order of tables or using UNION clauses.</p>

<p>See previous article on <code class="language-plaintext highlighter-rouge">JOIN</code>s: <a href="/sql-cheatsheet-3">union vs join, left join, right join, inner vs outter join</a></p>

<h3 id="update-15112023---other-questions">Update: 15.11.2023 - other questions</h3>

<h3 id="inner-join-vs-outer-join-whats-the-difference">Inner join vs outer join: what’s the difference?</h3>

<p>An inner join returns only the rows from both tables that satisfy the specified join condition (can be joined by indicated field).
Rows that do not have matching values in the joined columns are excluded from the result set.</p>

<p>An outer join returns all the rows from one table and the matching rows from the other table, being connected by indicated field.
But if there is no match, the result will contain <code class="language-plaintext highlighter-rouge">NULL</code> values for columns from the table that does not have a matching row.</p>

<h3 id="how-sql-group-by-command-works">How SQL <code class="language-plaintext highlighter-rouge">GROUP BY</code> command works?</h3>

<p><code class="language-plaintext highlighter-rouge">GROUP BY</code> clause is used to group rows that have the same values in specified columns into summary rows,
often for the purpose of applying aggregate functions to each group:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">security_branch</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="n">user_id</span><span class="p">)</span> <span class="k">as</span> <span class="n">user_count</span><span class="p">,</span> <span class="k">MAX</span><span class="p">(</span><span class="n">last_login_datetime</span><span class="p">)</span> <span class="k">as</span> <span class="n">latest_login</span>
<span class="k">FROM</span> <span class="n">cybersecurity_users</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">security_branch</span><span class="p">;</span>
</code></pre></div></div>
<p>with result:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+-------------------+------------+------------------------+
| security_branch   | user_count | latest_login  |
+-------------------+------------+------------------------+
| Threat Analysis   | 25         | 2023-11-15T08:30:00Z   |
| Incident Response | 18         | 2023-09-28T15:45:00Z   |
| Penetration Testing | 12       | 2023-07-05T12:10:00Z   |
| Security Operations | 30      | 2023-08-10T18:22:30Z   |
| Compliance        | 15         | 2023-09-02T09:55:45Z   |
+-------------------+------------+------------------------+
</code></pre></div></div>

<h3 id="what-is-orm">What is ORM?</h3>

<p>ORM stands for Object-Relational Mapping. 
It is a programming paradigm that allows you to interact with a relational database using an object-oriented programming.
ORM consists on mirroring logical entries (entities) from database tables to entites written in programing language on the application side.</p>

<p>Key features:</p>
<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Mapping: ORM systems map database tables to classes, with each row in a table corresponds to an instance of a class, and each column corresponds to an attribute or property of that class.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Data abstraction: ORM abstracts away the details of database interactions, you deal with the objects / classes, not with the SQL queries.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />CRUD: ORM systems provide methods and APIs for performing CRUD (Create, Read, Update, Delete) operations on database entities.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Relationships: ORM systems handle relationships between entities, such as one-to-one, one-to-many, and many-to-many relationships.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Portability: ORM systems often provide a level of database portability, allowing developers to switch between different database management systems (e.g., MySQL, PostgreSQL, Oracle) with minimal code changes. The ORM system abstracts the differences in SQL syntax and handles them internally.</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Performance optimization: ORM systems may include features for optimizing database access, such as lazy loading (loading data on demand), caching, and query optimization.</li>
</ul>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data" /><category term="SQL, database, data, persistency" /><summary type="html"><![CDATA[Medium SQL for Java developers: recapitulation.]]></summary></entry><entry><title type="html">SQL cheatsheet: part 4</title><link href="https://mzacki.github.io/sql-cheatsheet-4/" rel="alternate" type="text/html" title="SQL cheatsheet: part 4" /><published>2023-05-18T05:23:00+00:00</published><updated>2023-05-18T05:23:00+00:00</updated><id>https://mzacki.github.io/sql-cheatsheet-4</id><content type="html" xml:base="https://mzacki.github.io/sql-cheatsheet-4/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-3">union vs join, left join, right join, inner vs outter join</a></p>

<h4 id="crud">CRUD</h4>

<p><strong>CRUD</strong> is an acronym that stands for Create, Read, Update, and Delete. 
It is a set of four basic operations that are commonly used in the context of database management systems and web development to manage data.</p>

<p>Here’s a breakdown of each operation:</p>

<blockquote>
  <p>Create (C): This operation involves creating new records or entities in a database. It typically involves inserting data into a database table or creating a new object in an object-oriented programming context.</p>
</blockquote>

<blockquote>
  <p>Read (R): This operation involves retrieving or reading existing data from a database. It allows you to query and fetch specific records or information from a database. Reading data could involve retrieving a single record, a subset of records, or all the records in a table.</p>
</blockquote>

<blockquote>
  <p>Update (U): This operation involves modifying or updating existing data in a database. It allows you to make changes to specific records or fields within a record. Updating data could involve modifying values, adding new information, or altering existing data.</p>
</blockquote>

<blockquote>
  <p>Delete (D): This operation involves removing or deleting existing data from a database. It allows you to delete specific records or objects from a database. Deleting data could involve removing a single record, a subset of records, or all the records in a table.</p>
</blockquote>

<p>These four operations form the fundamental building blocks for performing data manipulation within a database or application. They provide the basic functionality to create, retrieve, update, and delete data, enabling developers to perform various operations on data stored in a system.</p>

<p>CRUD applications are often (mistakenly) perceived by programmers as super-simple, even trivial, and certainly not impressing at all. 
In fact, many corporate-grade Java applications are CRUD-type or, at least, they have a part of code that is executing CRUD requests.
This, of course, must be designed as business-oriented software that plays an important role for the client. Otherwise, nobody would pay for a simple CRUD app.</p>

<p>More experienced engineers will tell you that even CRUD operations are not so simple to proper execution and require some
thinking to be correctly planned. CRUD requests are commonly performed on complex,
internally related entities, and they involve a lot of records. Then performance and cost-efficiency comes into play.
Hasty and supperficial use of simple SQL queries, very often offered out of the box by ORM frameworks (like Hibernate / JPA),
may lead to peformance issues, like “n+1” problem.</p>

<h4 id="n1-problem">n+1 problem</h4>

<p>The “n+1 problem” is a term commonly used in the context of database querying, particularly in Object-Relational Mapping (ORM) frameworks. It refers to an issue that arises when retrieving data from a database with relationships between entities.</p>

<p>In an n+1 problem scenario, let’s say you have two entities with a one-to-many relationship, such as “Blog” and “Comment,” where each blog can have multiple comments. When you want to fetch a list of blogs and their associated comments, the n+1 problem occurs if the ORM framework generates n+1 queries to the database.</p>

<p>Here’s how it typically unfolds:</p>

<p>The initial query retrieves a list of blogs from the database.
For each blog in the result set, the ORM framework executes an additional query to fetch the associated comments for that specific blog.
This leads to n+1 queries, where n represents the number of blogs fetched in the initial query.
The problem with the n+1 approach is that it incurs additional overhead and can result in significant performance issues, especially when dealing with large datasets. Each additional query introduces network latency and database overhead, causing the overall retrieval process to be slower and less efficient.</p>

<p>To mitigate the n+1 problem, ORM frameworks often provide ways to eager load or prefetch related data, allowing you to fetch the necessary information in a single query or a reduced number of queries. By doing so, you can avoid the performance pitfalls associated with the n+1 problem.</p>

<p>But beware: even such solution may not fully resolve the issues. Sometimes, <strong>plain SQL</strong> is a better (but harder) way to deal with data.</p>

<blockquote>
  <p>Use ORM frameworks with moderation.</p>
</blockquote>

<p>It is very important to know SQL basics and be aware of more complex topics, to be able to predict possible traps.</p>

<h4 id="a-little-more-on-create">A little more on CREATE</h4>

<p>As part of this course, let’s make a short recapitulation of fondamental <code class="language-plaintext highlighter-rouge">SELECT</code> queries, basic SQL syntax and operators.</p>

<p>In first lesson, we created database tables. Obviously, <code class="language-plaintext highlighter-rouge">CREATE</code> operations correspond to <strong>Create</strong> part of <strong>CRUD</strong> acronym. 
At first glance, database and table creation seemed to be complex and difficult task, but in reality, it is easier than other queries.</p>

<ol>
  <li>Creation is usually made step-by-step, meaning: one table after another, during time, accordingly to when it is needed. See <strong>Flyway</strong> or <strong>Liquibase</strong> <strong>migrations</strong>. No need to write whole script at once.</li>
  <li>Creation is one-time act. An application may create or recreate database structure during deployment, but usually it is not repeated during application working time nor on-demand (e.g. via REST endpoints).</li>
  <li>If creation fails, no problem, nothing is lost. The creation / migration script will be fixed.</li>
</ol>

<h4 id="flyway-and-liquibase-what-are-database-migrations">Flyway and Liquibase: what are database migrations</h4>

<p>Database migration refers to the process of modifying the structure or schema of a database in a controlled and organized manner. 
It involves making changes to the database schema, such as adding or modifying tables, columns, constraints, or indexes, while ensuring that existing data is properly migrated or transformed to accommodate the new structure.</p>

<p>Database migrations are typically performed to introduce changes in an application’s data model, accommodate new features, fix issues, or improve performance. 
The process is crucial when working with evolving software systems that require continuous updates to the database schema.</p>

<p>Flyway and Liquibase are both popular database migration tools that help developers manage and version control database schema changes. 
They provide a systematic approach to perform and track database migrations, ensuring smooth and controlled updates to the database structure.</p>

<blockquote>
  <p>Flyway is an open-source database migration tool. It allows developers to define database changes using SQL scripts or Java-based migrations and tracks the execution of these scripts. Flyway maintains a metadata table in the database to keep track of which migrations have been applied. When running an application, Flyway automatically checks the metadata table and applies any pending migrations, keeping the database schema up to date. 
Flyway supports a wide range of databases and integrates well with various build tools and frameworks.</p>
</blockquote>

<blockquote>
  <p>Liquibase is another popular open-source database migration tool. It follows a similar approach as Flyway but offers additional features and flexibility. Liquibase allows developers to define database changes using XML, YAML, JSON, or SQL formats. It tracks migrations using a changelog file that specifies the sequence of changes to be applied. Liquibase supports various databases and provides features like rollback support, preconditions, and more advanced change types. 
It also offers integration with different build tools and frameworks.</p>
</blockquote>

<p>Flyway has a simpler and more lightweight design, focusing on simplicity and ease of use. 
It encourages convention over configuration and follows a strictly ordered migration approach.</p>

<p>Liquibase provides more flexibility and customization options. It supports a wider range of change types, offers advanced features like rollbacks, and allows more fine-grained control over migrations.
Flyway uses SQL-based migrations by default, whereas Liquibase supports multiple file formats for defining changes (XML, YAML, JSON, or SQL).
Both tools provide integrations with various build tools, frameworks, and Continuous Integration/Continuous Deployment (CI/CD) pipelines.</p>

<h4 id="more-on-select">More on SELECT</h4>

<p><code class="language-plaintext highlighter-rouge">SELECT</code> operators are doing the <strong>Read</strong> part of CRUD, so they are only relatively safe to execute - data won’t be modified - but there might be pitfalls.</p>

<p>Enough theory. Let’s recall some practical skills:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- select all columns matching both (AND) given conditions (note how operators were used for text and date values):</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">hq_country</span><span class="o">=</span><span class="s1">'JPN'</span> <span class="k">AND</span> <span class="nv">`established_date`</span> <span class="o">&lt;</span> <span class="s1">'1987-06-26'</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- select given columns matching at least one (OR) of two conditions</span>
<span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">country</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">hq_country</span><span class="o">=</span><span class="s1">'JPN'</span> <span class="k">OR</span> <span class="n">hq_country</span><span class="o">=</span><span class="s1">'KOR'</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- more elastic way of searching, limit the results</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="k">LIKE</span> <span class="s1">'S%'</span> <span class="k">LIMIT</span> <span class="mi">2</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">GROUP BY</code> and <code class="language-plaintext highlighter-rouge">COUNT</code> are commonly used for getting some numerical values:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- group by (counts rows grouped by country)</span>
<span class="c1">-- name may be replaced by any column</span>
<span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="n">name</span><span class="p">),</span> <span class="n">hq_country</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">hq_country</span><span class="p">;</span>
</code></pre></div></div>

<p>Sort (order) the results. Ascending is default ordering strategy, <strong>so <code class="language-plaintext highlighter-rouge">ASC</code> keyword is redundant here</strong>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- order result</span>
<span class="k">SELECT</span> <span class="n">birth_date</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">first_name</span> <span class="k">LIKE</span> <span class="s1">'Fran%'</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span> <span class="k">ASC</span><span class="p">;</span> <span class="c1">-- ASC is redundant</span>
<span class="k">SELECT</span> <span class="n">birth_date</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span> <span class="k">FROM</span> <span class="n">customer</span> <span class="k">WHERE</span> <span class="n">first_name</span> <span class="k">LIKE</span> <span class="s1">'Fran%'</span> <span class="k">ORDER</span> <span class="k">BY</span> <span class="nv">`birth_date`</span> <span class="k">DESC</span><span class="p">;</span>
</code></pre></div></div>

<p>But descending is not default strategy, so do not forget <code class="language-plaintext highlighter-rouge">DESC</code> keyword.</p>

<p>We said that reading data is only <strong>relatively</strong> safe operation, because data are not modified. But the other side of the coin is that selecting data is not for free - sometimes it heavily impacts the database,
that is doing all the hard work for us. Especially when we made a complex, incorrect query that should have been optimized.</p>

<blockquote>
  <p>Generally, SQL and databases are projected and optimized for data handling, even when dealing with large amount of data.
Example: it might not be the best idea to map 100K records to ORM entities, then to Data Transfer Objects or other Java objects, in order to make
some operations on them through Java streams, like sort or filter.</p>

  <p>On the other hand, database might not be necessarily optimized for given use case. Not to mention, that sometimes is cheaper to fetch a bigger chunk of data
in one query, and then to process it programatically, just to avoid n+1 problem.</p>

  <p>Quid pro quo.</p>
</blockquote>

<h4 id="update">Update</h4>

<p>Once database has been created and data inserted, it can be therefore updated (this is the update part of CRUD).
Modifying data is doubly burdensome. First, the data to be updated should be selected beforehand, accordingly to some cirteria.
Here, as we said before, there might be some performance issues, no matter if we want to make a single update (one time, “by hand”),
or regularly, as part of normal flow of the application.</p>

<p>Secondly, we are changing the data. We can lose some information or break the data integrity.</p>

<blockquote>
  <p>SELECT some data first. If SELECT works correctly, then you can think of an UPDATE.</p>
</blockquote>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- update record</span>
<span class="k">UPDATE</span> <span class="n">company</span> <span class="k">SET</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Seoul 88'</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'SEOUL_88'</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">UPDATE</code> with <code class="language-plaintext highlighter-rouge">JOIN</code>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- update record copying column from joined table</span>
<span class="k">UPDATE</span> <span class="n">company</span>
    <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span>
<span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
    <span class="k">SET</span> <span class="n">company</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">CONCAT</span><span class="p">(</span>
        <span class="n">company</span><span class="p">.</span><span class="n">name</span><span class="p">,</span>
        <span class="s1">'_'</span><span class="p">,</span>
        <span class="n">customer</span><span class="p">.</span><span class="n">first_name</span><span class="p">,</span>
        <span class="s1">'_'</span><span class="p">,</span>
        <span class="n">customer</span><span class="p">.</span><span class="n">last_name</span><span class="p">);</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">REVERSE</code> operator:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- REVERSE name</span>
<span class="k">UPDATE</span> <span class="n">company</span>
    <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span>
<span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">SET</span> <span class="n">company</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">REVERSE</span><span class="p">(</span><span class="n">company</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
</code></pre></div></div>

<p>Substract or add to date:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- SUB / ADD DATE</span>
<span class="k">UPDATE</span> <span class="n">company</span>
<span class="k">SET</span> <span class="n">established_date</span> <span class="o">=</span> <span class="n">DATE_SUB</span><span class="p">(</span><span class="n">established_date</span><span class="p">,</span> <span class="n">INTERVAL</span> <span class="mi">1</span> <span class="nb">YEAR</span><span class="p">)</span>
<span class="k">WHERE</span>
        <span class="n">established_date</span> <span class="o">&gt;</span> <span class="s1">'2020-01-01'</span>
</code></pre></div></div>

<p>More painstaking tricks:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- insert space before last three chars:</span>
<span class="c1">-- (e.g. Entity Ltd instead of EntityLtd)</span>
<span class="c1">-- remove last tree chars</span>
<span class="c1">-- concat string, space and last three chars</span>
<span class="k">UPDATE</span> <span class="n">company</span>
<span class="k">SET</span> <span class="n">name</span> <span class="o">=</span><span class="n">CONCAT</span><span class="p">(</span><span class="k">LEFT</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="k">LENGTH</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">-</span> <span class="mi">3</span><span class="p">),</span> <span class="s1">' Ltd'</span><span class="p">)</span>
<span class="k">WHERE</span>
        <span class="n">established_date</span> <span class="o">&gt;</span> <span class="s1">'2020-01-01'</span>
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- substract 2 years from date in case of even year, odd id and given country</span>
<span class="c1">-- substract 1 year in case of even year, even odd and given country</span>
<span class="k">UPDATE</span> <span class="n">company</span>
<span class="k">SET</span> <span class="n">established_date</span> <span class="o">=</span> <span class="p">(</span>
        <span class="k">CASE</span>
            <span class="k">WHEN</span>
                <span class="k">EXTRACT</span><span class="p">(</span><span class="nb">YEAR</span> <span class="k">from</span> <span class="n">established_date</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="k">AND</span>
        <span class="n">company_id</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span>
                <span class="k">AND</span>
                        <span class="n">hq_country</span> <span class="o">=</span> <span class="s1">'USA'</span>
            <span class="k">THEN</span> <span class="n">DATE_SUB</span><span class="p">(</span><span class="n">established_date</span><span class="p">,</span> <span class="n">INTERVAL</span> <span class="mi">2</span> <span class="nb">YEAR</span><span class="p">)</span>
        <span class="k">WHEN</span>
                        <span class="k">EXTRACT</span><span class="p">(</span><span class="nb">YEAR</span> <span class="k">from</span> <span class="n">established_date</span><span class="p">)</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span>
                <span class="k">AND</span>
                        <span class="n">company_id</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">=</span> <span class="mi">0</span>
                <span class="k">AND</span>
                        <span class="n">hq_country</span> <span class="o">=</span> <span class="s1">'USA'</span>
            <span class="k">THEN</span> <span class="n">DATE_SUB</span><span class="p">(</span><span class="n">established_date</span><span class="p">,</span> <span class="n">INTERVAL</span> <span class="mi">1</span> <span class="nb">YEAR</span><span class="p">)</span>
        <span class="k">ELSE</span> <span class="n">established_date</span>
        <span class="k">END</span>
    <span class="p">);</span>
</code></pre></div></div>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- funny thing, UPDATE date (if null, use current) by reversing year</span>
<span class="k">UPDATE</span> <span class="n">company</span>
<span class="k">SET</span> <span class="n">established_date</span> <span class="o">=</span> <span class="n">CONCAT</span><span class="p">(</span>
    <span class="c1">-- CASE should be add for null check...</span>
    <span class="n">REVERSE</span><span class="p">(</span><span class="k">EXTRACT</span><span class="p">(</span><span class="nb">YEAR</span> <span class="k">from</span> <span class="n">CURDATE</span><span class="p">())),</span>
                              <span class="s1">'-'</span><span class="p">,</span>
                              <span class="k">EXTRACT</span><span class="p">(</span><span class="k">MONTH</span> <span class="k">from</span> <span class="n">CURDATE</span><span class="p">()),</span>
                              <span class="s1">'-'</span><span class="p">,</span>
                              <span class="k">EXTRACT</span><span class="p">(</span><span class="k">DAY</span> <span class="k">from</span> <span class="n">CURDATE</span><span class="p">()))</span>
<span class="k">WHERE</span>
        <span class="n">name</span> <span class="o">=</span> <span class="s1">'Ale Lipa'</span><span class="p">;</span>
</code></pre></div></div>
<h4 id="delete">Delete</h4>

<p>Finally, last item of <strong>CRUD</strong>: data deletion. It is risky because of potential unwanted data loss.
<code class="language-plaintext highlighter-rouge">DELETE</code> is rather not executed frequently “in real application life”. 
Also, an external customer or user of a corporate-grade software hardly ever has an easy, overt possibility to trigger direct data deletion process.
More often, it is a multistep process due to security reasons. And there should be backups… but as all security experts know, sometimes there are no backups.</p>

<p>Here, we can use the same trick as with the update. The query should have <code class="language-plaintext highlighter-rouge">SELECT</code> instead of <code class="language-plaintext highlighter-rouge">DELETE</code>. If we selected exactly what we wanted,
we can replace the keywords (<code class="language-plaintext highlighter-rouge">DELETE</code> instead of <code class="language-plaintext highlighter-rouge">SELECT</code>).</p>

<p>Simple SQL syntax for delete looks like:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- DELETE row duplicates (copies)</span>
<span class="k">DELETE</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">WHERE</span>
    <span class="n">company</span><span class="p">.</span><span class="n">name</span> <span class="k">LIKE</span> <span class="s1">'%_COPY'</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">DELETE</code> with <code class="language-plaintext highlighter-rouge">JOIN</code>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- JOIN and DELETE</span>
<span class="c1">-- joining three tables, delete records from two (branch remains intact)</span>
<span class="k">DELETE</span> <span class="n">customer</span><span class="p">,</span> <span class="n">bc</span> <span class="k">FROM</span>
<span class="n">customer</span>
<span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">branch_customers</span> <span class="n">bc</span>
<span class="k">ON</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">bc</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">branch</span> <span class="n">b</span>
<span class="k">ON</span>
    <span class="n">bc</span><span class="p">.</span><span class="n">branch_id</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">branch_id</span>
<span class="k">WHERE</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">last_name</span> <span class="k">LIKE</span> <span class="s1">'%smith%'</span>
<span class="k">AND</span>
    <span class="n">bc</span><span class="p">.</span><span class="n">to_date</span> <span class="o">=</span> <span class="s1">'9999-01-01'</span>
<span class="k">AND</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">gender</span> <span class="o">=</span> <span class="s1">'M'</span>
</code></pre></div></div>

<p>Table removal:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- remove table</span>
<span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">company</span><span class="p">;</span>
</code></pre></div></div>

<p>But do not do this in production (or any other important environment) (unless you are told to do so, but even then, double check it with someone).</p>

<p>TBC</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data" /><category term="SQL, database, data, persistency" /><summary type="html"><![CDATA[Medium SQL for Java developers: CRUD operations, n+1 problem, Flyway & Liquibase migrations.]]></summary></entry><entry><title type="html">SQL cheatsheet: part 3</title><link href="https://mzacki.github.io/sql-cheatsheet-3/" rel="alternate" type="text/html" title="SQL cheatsheet: part 3" /><published>2023-04-26T01:23:00+00:00</published><updated>2023-04-26T01:23:00+00:00</updated><id>https://mzacki.github.io/sql-cheatsheet-3</id><content type="html" xml:base="https://mzacki.github.io/sql-cheatsheet-3/"><![CDATA[<p>Previously on SQL: <a href="/sql-cheatsheet-2">aggregations, group by, where vs having</a></p>

<p>Today let’s talk about joining results of different searches.</p>

<h4 id="union">Union</h4>

<p><code class="language-plaintext highlighter-rouge">UNION</code> merges multiple queries as one result. Here we are selecting exemplary, non-existing records and their aliases:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- UNION merges multiple queries as one result</span>
<span class="k">SELECT</span>
    <span class="mi">1</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunrise Ltd.'</span> <span class="k">AS</span> <span class="n">name</span>
<span class="k">UNION</span>
<span class="k">SELECT</span>
    <span class="mi">2</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunset Co.'</span> <span class="k">AS</span> <span class="n">name</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">UNION</code> acts like <a href="https://en.wikipedia.org/wiki/Union_(set_theory)">union operator known from set theory, algebra of sets and Boolean algebra</a>.</p>

<p>Only distinct rows are included. There should be a difference in at least one field:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- only distinct rows are included: prints only one record</span>
<span class="k">SELECT</span>
    <span class="mi">1</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunrise Ltd.'</span> <span class="k">AS</span> <span class="n">name</span>
<span class="k">UNION</span>
<span class="k">SELECT</span>
    <span class="mi">1</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunrise Ltd.'</span> <span class="k">AS</span> <span class="n">name</span><span class="p">;</span>

<span class="c1">-- selects both records:</span>
<span class="k">SELECT</span>
    <span class="mi">1</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunrise Ltd.'</span> <span class="k">AS</span> <span class="n">name</span>
<span class="k">UNION</span>
<span class="k">SELECT</span>
    <span class="mi">1</span> <span class="k">AS</span> <span class="n">id</span><span class="p">,</span> <span class="s1">'Sunset Ltd.'</span> <span class="k">AS</span> <span class="n">name</span><span class="p">;</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">UNION ALL</code> allows duplicated results:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- UNION ALL allows duplicated rows</span>
<span class="k">SELECT</span>
    <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span>
<span class="k">UNION</span> <span class="k">ALL</span>
<span class="k">SELECT</span>
    <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span><span class="p">;</span>
</code></pre></div></div>

<p>Now test it on real data - records matching first condition (<code class="language-plaintext highlighter-rouge">sql WHERE hq_country = 'JPN'</code>) are not re-selected by second part of query (<code class="language-plaintext highlighter-rouge">sql SELECT * FROM company</code>):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">hq_country</span> <span class="o">=</span> <span class="s1">'JPN'</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span>
</code></pre></div></div>

<p>This works like simple <code class="language-plaintext highlighter-rouge">sql SELECT * FROM company</code> - it does not duplicate the results:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span>
</code></pre></div></div>

<p>Finally, a clean and logical example of unioning two selects. It takes everything from first set and add everything from the second one:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">hq_country</span> <span class="o">=</span> <span class="s1">'JPN'</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span> <span class="k">WHERE</span> <span class="n">hq_country</span> <span class="o">=</span> <span class="s1">'KOR'</span>
</code></pre></div></div>

<p>Of course, it is possible to union results from different tables.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- UNION from different tables is possible but the result set must have same number of columns</span>
<span class="c1">-- error:</span>
<span class="k">SELECT</span>
    <span class="o">*</span> <span class="k">FROM</span> <span class="n">company</span>
<span class="k">UNION</span> <span class="k">ALL</span>
<span class="k">SELECT</span>
    <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span><span class="p">;</span>
</code></pre></div></div>

<p>It does not work, returning <code class="language-plaintext highlighter-rouge">[21000][1222] The used SELECT statements have a different number of columns</code>.</p>

<p>Let’s correct it, adjusting requested number of columns:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- works:</span>
<span class="k">SELECT</span>
    <span class="n">name</span> <span class="k">AS</span> <span class="n">company_or_customer_name</span><span class="p">,</span> <span class="n">customer_id</span> <span class="k">as</span> <span class="n">id</span> <span class="k">FROM</span> <span class="n">company</span>
<span class="k">UNION</span> <span class="k">ALL</span>
<span class="k">SELECT</span>
    <span class="n">CONCAT</span><span class="p">(</span><span class="n">last_name</span><span class="p">,</span> <span class="s1">' '</span><span class="p">,</span> <span class="n">first_name</span><span class="p">),</span> <span class="n">customer_id</span> <span class="k">FROM</span> <span class="n">customer</span><span class="p">;</span>
</code></pre></div></div>

<blockquote>
  <p>The columns selected in both SELECT clauses should be of the same type in some flavours (Postgres, Oracle).
No such requirement in MySQL &amp; MariaDB.</p>
</blockquote>

<h4 id="inner-join">Inner join</h4>

<p><code class="language-plaintext highlighter-rouge">INNER JOIN</code> connects records from two (or even more) tables.</p>

<p>To match a record from one table to relevant record from another table, it uses
fields (columns) marked as <strong>keys</strong>: <strong>primary key</strong> and <strong>foreign key</strong>, so that primary key from a record in one table points to the foreign key of the relevant record in connected table.</p>

<p>Usually, <code class="language-plaintext highlighter-rouge">id</code> values are used as <code class="language-plaintext highlighter-rouge">primary and foreign keys</code>.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- INNER JOIN returns records with matching values in both tables (here: customer_id)</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">customer</span>
<span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">field</span> <span class="n">f</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">WHERE</span>
    <span class="n">f</span><span class="p">.</span><span class="n">field_name</span> <span class="o">=</span> <span class="s1">'Engineer'</span><span class="p">;</span>
</code></pre></div></div>

<p>Step-by-step explanation of the script:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- take all records from ``customer`` table</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">customer</span>
<span class="c1">-- connect to records from ``field`` table</span>
<span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">field</span> <span class="n">f</span>
<span class="c1">-- but only when ``customer_id`` in ``customer`` table (for given record) matches ``customer_id`` in ``field`` table</span>
<span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">customer_id</span>
<span class="c1">-- Additional condition: do it only if ``field_name`` in ``field`` table is ``Engineer`` (and discard all the rest).</span>
<span class="k">WHERE</span> <span class="n">f</span><span class="p">.</span><span class="n">field_name</span> <span class="o">=</span> <span class="s1">'Engineer'</span><span class="p">;</span>
</code></pre></div></div>

<p>Primary to foreign key connection:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="c1">-- primary key in ``customer`` table</span>
<span class="o">=</span> 
<span class="n">f</span><span class="p">.</span><span class="n">customer_id</span> <span class="c1">-- foreign key in ``field`` table</span>
</code></pre></div></div>

<blockquote>
  <p>Primary key - unique field or combination of fields, only one row with the same PK may exist in a table.</p>
</blockquote>

<blockquote>
  <p>Foreign key - field or combination of fields, indicates Primary key of a row in another table. May be unique or not.</p>
</blockquote>

<p><code class="language-plaintext highlighter-rouge">INNER JOIN</code> works only for the records having not null primary key. 
It is logical. Without primary key, there is no way to connect a record with another table (foreign keys point to non-null primary keys).</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- INNER JOIN shows only the records from company that have customer id NOT NULL</span>
<span class="c1">-- use OUTER JOINS: LEFT / RIGTH JOIN etc. if you expect null fields to be included</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span><span class="p">;</span>
</code></pre></div></div>

<h4 id="using">Using</h4>

<p>Instead of explicitly connecting primary key to foreign key, we can indicate it via <code class="language-plaintext highlighter-rouge">USING</code>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- USING</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">USING</span><span class="p">(</span><span class="n">customer_id</span><span class="p">);</span>
</code></pre></div></div>

<h4 id="inner-join-on-more-than-two-tables">Inner join on more than two tables</h4>

<p>Inner join can connect records from more than two tables, provided that they contain relevant ids (foreign keys).
It is useful when multiple conditions using information from various tables are required.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- INNER JOIN with two more tables, both containing customer_id</span>
<span class="k">SELECT</span>  <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">customer</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">field</span> <span class="n">f</span> <span class="k">on</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">customer_id</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">turnover</span> <span class="n">t</span> <span class="k">on</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">t</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">WHERE</span>
    <span class="n">customer</span><span class="p">.</span><span class="n">registration_date</span> <span class="o">&gt;</span> <span class="s1">'2000-01-01'</span>
   <span class="k">OR</span> <span class="p">(</span>
            <span class="n">customer</span><span class="p">.</span><span class="n">birth_date</span> <span class="o">&lt;</span> <span class="s1">'1980-01-01'</span>
        <span class="k">AND</span>
            <span class="n">t</span><span class="p">.</span><span class="n">turnover</span> <span class="o">&lt;</span> <span class="mi">10000</span>
    <span class="p">)</span>
   <span class="k">OR</span> <span class="p">(</span>
            <span class="n">customer</span><span class="p">.</span><span class="n">birth_date</span> <span class="o">&lt;</span> <span class="s1">'1960-01-01'</span>
        <span class="k">AND</span>
            <span class="n">f</span><span class="p">.</span><span class="n">field_name</span> <span class="k">NOT</span> <span class="k">LIKE</span>  <span class="s1">'%Engineer%'</span>
    <span class="p">);</span>
</code></pre></div></div>
<p>Another example: joining using third, helper table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- JOIN through third table</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="n">customer</span>
         <span class="k">INNER</span> <span class="k">JOIN</span>
     <span class="n">branch_customers</span>
     <span class="k">ON</span>
         <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">branch_customers</span><span class="p">.</span><span class="n">customer_id</span>
         <span class="k">INNER</span> <span class="k">JOIN</span>
     <span class="n">branch</span>
     <span class="k">ON</span>
         <span class="n">branch_customers</span><span class="p">.</span><span class="n">branch_id</span> <span class="o">=</span> <span class="n">branch</span><span class="p">.</span><span class="n">branch_id</span>
</code></pre></div></div>

<p>Use case - find customer number per branch:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- how many CUSTOMERS per BRANCH ?</span>
<span class="k">SELECT</span> <span class="n">branch_name</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="n">number_of_customers</span>
<span class="k">FROM</span> <span class="n">customer</span>
         <span class="k">INNER</span> <span class="k">JOIN</span>
     <span class="n">branch_customers</span>
     <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">branch_customers</span><span class="p">.</span><span class="n">customer_id</span>
         <span class="k">INNER</span> <span class="k">JOIN</span>
     <span class="n">branch</span>
     <span class="k">ON</span> <span class="n">branch_customers</span><span class="p">.</span><span class="n">branch_id</span> <span class="o">=</span> <span class="n">branch</span><span class="p">.</span><span class="n">branch_id</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="n">branch_name</span>
</code></pre></div></div>

<h4 id="cross-join">Cross join</h4>

<p><code class="language-plaintext highlighter-rouge">Cross join</code> joins each row of the first table with each row of the second table. This join type is also known as Cartesian join.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- CROSS JOIN joins all rows from one table with all rows of second table</span>
<span class="c1">-- on given condition</span>
<span class="c1">-- without condition it makes Cartesian product</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">customer</span>
        <span class="k">CROSS</span> <span class="k">JOIN</span>
    <span class="n">company</span>
<span class="k">WHERE</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
</code></pre></div></div>

<h4 id="left-right-and-outer-join">Left, right and outer join</h4>

<p><code class="language-plaintext highlighter-rouge">LEFT JOIN</code> shows everything from left table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- LEFT JOIN shows all rows from left table (company) - even the records that cannot be joined</span>
<span class="c1">-- with customer table records due to NULL customer_id in company table</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">LEFT</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">RIGHT JOIN</code> takes every record from right table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- on the other hand, RIGHT JOIN shows all rows from right table (customer)</span>
<span class="c1">-- - even the records that cannot be joined with company table</span>
<span class="c1">-- due to missing customer_id in company table</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">RIGHT</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">OUTER JOIN</code> lists all records from left and right, even if they have null as their id (so that they cannot be normally joined).</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- FULL OUTER JOIN lists all rows from both tables</span>
<span class="c1">-- no matter if NULL</span>
<span class="c1">-- FULL OUTER JOIN is not supported in MySql</span>
<span class="c1">-- workaround: https://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">LEFT</span> <span class="k">OUTER</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">RIGHT</span> <span class="k">OUTER</span> <span class="k">JOIN</span>
    <span class="n">customer</span> <span class="k">c</span> <span class="k">ON</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="k">c</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">company_id</span> <span class="k">DESC</span>
</code></pre></div></div>

<p>Some other workaround of <code class="language-plaintext highlighter-rouge">FULL OUTER JOIN</code>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- workaround of FULL OUTER JOIN without using LEFT / RIGHT JOIN</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span>
    <span class="k">ON</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">WHERE</span>
    <span class="k">NOT</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
            <span class="k">IN</span>
            <span class="p">(</span>
                <span class="k">SELECT</span> <span class="k">DISTINCT</span>
                    <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
                <span class="k">FROM</span>
                    <span class="n">company</span>
                        <span class="k">INNER</span> <span class="k">JOIN</span>
                    <span class="n">customer</span>
                    <span class="k">ON</span>
                            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
            <span class="p">)</span>
   <span class="k">OR</span>
    <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="k">IS</span> <span class="k">NULL</span>
<span class="k">UNION</span>
<span class="k">SELECT</span>
       <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span>
       <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span><span class="p">,</span> <span class="n">customer</span><span class="p">.</span><span class="n">birth_date</span><span class="p">,</span> <span class="n">customer</span><span class="p">.</span><span class="n">first_name</span><span class="p">,</span>
       <span class="n">customer</span><span class="p">.</span><span class="n">last_name</span><span class="p">,</span> <span class="n">customer</span><span class="p">.</span><span class="n">gender</span><span class="p">,</span> <span class="n">customer</span><span class="p">.</span><span class="n">registration_date</span>
<span class="k">FROM</span> <span class="n">customer</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">company_id</span> <span class="k">DESC</span>
    <span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</code></pre></div></div>

<h4 id="union-workarounds-for-join">UNION workarounds for JOIN</h4>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- UNION workaround instead of OUTER JOIN (without LEFT / RIGHT JOIN)</span>
<span class="c1">-- customer_id must be not null</span>
<span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="k">FROM</span>
    <span class="n">company</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span>
        <span class="k">ON</span>
    <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="n">name</span><span class="p">,</span> <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="k">FROM</span>
    <span class="n">company</span>
<span class="k">WHERE</span>
    <span class="k">NOT</span>
        <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
    <span class="k">IN</span>
        <span class="p">(</span>
        <span class="k">SELECT</span> <span class="k">DISTINCT</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
        <span class="k">FROM</span>
            <span class="n">company</span>
                <span class="k">INNER</span> <span class="k">JOIN</span>
            <span class="n">customer</span>
                <span class="k">ON</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
    <span class="p">)</span>
</code></pre></div></div>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- above workaround with all columns from both tables included</span>
<span class="c1">-- and rows with null customer_id</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="c1">-- returns columns from company and customer</span>
    <span class="n">company</span>
        <span class="k">INNER</span> <span class="k">JOIN</span>
    <span class="n">customer</span>
    <span class="k">ON</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">UNION</span>
<span class="k">SELECT</span> <span class="o">*</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span><span class="p">,</span> <span class="k">NULL</span> <span class="k">FROM</span>
<span class="c1">-- returns colums from company only (no join), hence null to replace missing columns from customer</span>
    <span class="n">company</span>
<span class="k">WHERE</span>
    <span class="k">NOT</span>
            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
            <span class="k">IN</span>
            <span class="p">(</span>
                <span class="k">SELECT</span> <span class="k">DISTINCT</span>
                    <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span>
                <span class="k">FROM</span>
                    <span class="n">company</span>
                        <span class="k">INNER</span> <span class="k">JOIN</span>
                    <span class="n">customer</span>
                    <span class="k">ON</span>
                            <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span>
            <span class="p">)</span>
    <span class="k">OR</span>
        <span class="n">company</span><span class="p">.</span><span class="n">customer_id</span> <span class="k">IS</span> <span class="k">NULL</span><span class="p">;</span>
</code></pre></div></div>

<p>TBC</p>]]></content><author><name>Mateusz Zacki</name><email>zacki[dot]mateusz[at]gmail[dot]com</email></author><category term="SQL, database, data" /><category term="SQL, database, data, persistency" /><summary type="html"><![CDATA[Basic SQL for Java developers: union vs join, left join, right join, inner vs outter join.]]></summary></entry></feed>