SQL techniques for subquery filtering beyond EXISTS

4 min read · Apr 24, 2023

Photo by Caspar Camille Rubin on Unsplash

SQL is a powerful language for working with relational databases. It provides many commands and operators that allow users to manipulate and retrieve data from databases. One of these commands is the EXISTS command, which is used to filter results based on the existence of a record in a subquery. In this article, we’ll explore the EXISTS command and provide examples of how it can be used with simple join and common table expressions.

The EXISTS command is a logical operator that returns true or false based on the existence of a record in a subquery. It is commonly used in conjunction with the WHERE clause to filter results based on the existence of a record in a subquery. Here’s the basic syntax of the EXISTS command:

SELECT column1, column2, ...
FROM table1
WHERE EXISTS (SELECT column_name FROM table_name WHERE condition);

The EXISTS command evaluates the subquery and returns true if it returns at least one row, and false if it returns no rows.

Let’s take a look at an example of how the EXISTS command can be used with a simple join. Suppose we want to find all suppliers in the database of the fictitious import-export company “Northwind Traders”, that have products priced under $20. We can write the following query:

SELECT CompanyName
FROM Suppliers
WHERE EXISTS
(
SELECT ProductName FROM Products
WHERE Products.SupplierID = Suppliers.supplierID AND UnitPrice < 20
) ORDER BY CompanyName;

This query retrieves the CompanyName of all suppliers that have products with a UnitPrice under $20 and order them by CompanyName. The EXISTS command is used to filter the results to only include suppliers that have products that meet the condition.

Another way to write this query is to use a common table expression (CTE) instead of the EXISTS command. Here’s the SQL code for this query using a CTE:

WITH products20 AS (
SELECT SupplierID, ProductName FROM Products
WHERE UnitPrice < 20
)
SELECT DISTINCT CompanyName
FROM Suppliers,products20
WHERE products20.SupplierID = Suppliers.supplierID
ORDER BY CompanyName;

This query creates a CTE called products20 that retrieves the SupplierID and ProductName of all products with a UnitPrice under $20. The CTE is then used in a join with the Suppliers table to retrieve the CompanyName of all suppliers that have products with a UnitPrice under $20.

Finally, we can write this query using a simple join instead of the EXISTS command. Here’s the SQL code for this query using a simple join:

SELECT DISTINCT CompanyName
FROM Suppliers,Products
WHERE Products.SupplierID = Suppliers.supplierID
AND UnitPrice < 20
ORDER BY CompanyName;

This query retrieves the CompanyName of all suppliers that have products with a UnitPrice under $20 using a simple join between the Suppliers and Products tables. The DISTINCT keyword is used to remove any duplicate records, and the results are ordered by CompanyName.

More Complex Use of SQL EXISTS

Suppose we want to find all customers who have placed orders for products in the ‘Dairy Products’ category. We can achieve this by using the EXISTS operator in combination with a subquery. Here’s the SQL code for this query:

SELECT DISTINCT Customers.CustomerID, Customers.ContactName, Customers.Country
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID
INNER JOIN Products ON OrderDetails.ProductID = Products.ProductID
WHERE EXISTS (
SELECT ProductID FROM Products WHERE CategoryID = 4 --'Dairy Products'
) AND Customers.Country = 'USA'
ORDER BY Customers.CustomerID;

Let’s break down this query to understand how it works.

The first part of the query selects the columns we want to retrieve from the Customers table: CustomerID, ContactName, and Country.

SELECT DISTINCT Customers.CustomerID, Customers.ContactName, Customers.Country
FROM Customers

The second part of the query uses the INNER JOIN clause to join the Orders, Order Details, and Products tables to the Customers table.

INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID
INNER JOIN Products ON OrderDetails.ProductID = Products.ProductID

The third part of the query uses the EXISTS operator with a subquery to filter the results based on the existence of products in the ‘Dairy Products’ category.

WHERE EXISTS (
SELECT ProductID FROM Products WHERE CategoryID = 4
)

The fourth part of the query further filters the results to only include customers from the USA.

AND Customers.Country = 'USA'

Finally, the query sorts the results by customer ID.

ORDER BY Customers.CustomerID;

When we run this query, we get a list of customers who have placed orders for products in the ‘Dairy Products’ category and are from the USA. The DISTINCT keyword is used to remove any duplicate records.

I strongly advocate for simplicity and breaking down complex queries into smaller, more manageable parts. The method I prefer to use for this purpose is the common table expression method, which helps keep my code easy to read and comprehend.

In conclusion, the EXISTS command is a valuable tool in the SQL and Data Analyst toolkit, and it can be used to filter results based on the existence of records in a subquery. By using this command in combination with other SQL clauses and functions, you can create complex queries that extract meaningful insights from databases. If you want to practice using the EXISTS command and other SQL techniques, try experimenting with the examples provided in this article.


Leave a Reply

Your email address will not be published. Required fields are marked *