Mastering the Essential SQL Functions for Efficient Data Analysis

Photo by Austin Distel on Unsplash

Structured Query Language (SQL) is one of the most popular programming languages used in data analytics. It is a powerful tool that allows analysts to extract, manipulate, and transform data from relational databases. However, SQL has a vast number of functions, and it can be overwhelming for new data analysts to learn all of them.

Iโ€™ll save you some time here. Learn these SQL functions and become a better data analysts. These are 20% of SQL functions that data analysts use 80% of the time. You knowโ€ฆ the Pareto principle. Each SQL function is using the Northwind database.

The SELECT Statement

The SELECT statement is the most commonly used SQL function, and it is used to retrieve data from one or more tables.

SELECT * 
FROM Customers;

This query selects all columns from the Customers table.

The WHERE Clause

The WHERE clause is used to filter data based on specific conditions. It is often used in conjunction with the SELECT statement.

SELECT * 
FROM Orders
WHERE CustomerID='ALFKI';

This query selects all columns from the Orders table where the CustomerID is equal to โ€˜ALFKIโ€™.

The GROUP BY Clause

The GROUP BY clause is used to group rows that have the same values in one or more columns.

SELECT 
CustomerID,
COUNT(OrderID)
FROM Orders
GROUP BY CustomerID;

This query groups the Orders table by CustomerID and counts the number of orders for each customer.

The ORDER BY Clause

The ORDER BY clause is used to sort the data in ascending or descending order.

SELECT * 
FROM Products
ORDER BY ProductName ASC;

This query sorts the Products table by ProductName in ascending order.

The JOIN Clause

The JOIN function is used to combine data from two or more tables based on a common column.

SELECT 
c.CustomerName,
o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;

This query combines data from the Customers and Orders tables based on the CustomerID column.

The COUNT Aggregate Function

The COUNT function is used to count the number of rows in a table that meet a specific condition.

SELECT COUNT(*) 
FROM Customers;

This query counts the number of rows in the Customers table.

The AVG Aggregate Function

The AVG function is used to calculate the average value of a column.

SELECT AVG(UnitPrice) 
FROM Products;

This query calculates the average UnitPrice of all products in the Products table.

The SUM Aggregate Function

The SUM function is used to calculate the sum of a column.

SELECT SUM(Quantity) 
FROM OrderDetails;

This query calculates the total quantity of all orders in the OrderDetails table.

The MAX Aggregation Function

The MAX function is used to find the maximum value in a column.

SELECT MAX(UnitPrice) 
FROM Products;

This query finds the highest UnitPrice value in the Products table.

The MIN Aggregate Function

The MIN function is used to find the minimum value in a column.

SELECT MIN(UnitPrice) 
FROM Products;

This query finds the lowest UnitPrice value in the Products table.

The LIKE Operator

The LIKE operator is used to search for patterns in a column.

SELECT * 
FROM Products
WHERE ProductName LIKE '%Chai%';

This query selects all columns from the Products table where the ProductName contains the word โ€˜Chaiโ€™.

The IN Operator

The IN operator is used to match a value to a list of values.

SELECT * 
FROM Customers
WHERE Country IN ('USA', 'Canada');

This query selects all columns from the Customers table where the Country is either โ€˜USAโ€™ or โ€˜Canadaโ€™.

The NOT Operator

The NOT operator is used to negate a condition.

SELECT * 
FROM Customers
WHERE NOT Country='USA';

-- This can also be written
SELECT *
FROM Customers
WHERE Country <> 'USA';

This query selects all columns from the Customers table where the Country is not โ€˜USAโ€™.

The UNION Clause

The UNION operator is used to combine the results of two or more SELECT statements.

SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers;

This query combines the results of two SELECT statements (Customers and Suppliers) and eliminates duplicate rows.

The HAVING Clause

The HAVING clause is used to filter the results of a GROUP BY clause based on a condition.

SELECT 
CustomerID,
COUNT(OrderID)
FROM Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) > 10;

This query groups the Orders table by CustomerID and selects only the rows where the number of orders for each customer is greater than 10.

The DISTINCT Keyword

The DISTINCT keyword is used to eliminate duplicate rows in a SELECT statement.

SELECT DISTINCT City 
FROM Customers;

This query selects all distinct City values from the Customers table.

The CASE Statement

The CASE statement is used to create conditional expressions.

SELECT ProductName, 
CASE
WHEN UnitPrice < 10 THEN 'Low Price'
WHEN UnitPrice >= 10 AND UnitPrice < 20 THEN 'Medium Price'
ELSE 'High Price'
END AS PriceCategory
FROM Products;

This query creates a new column called PriceCategory that categorizes the UnitPrice values in the Products table.

The COALESCE Function

The COALESCE function is used to return the first non-null value in a list of values.

SELECT 
ProductName,
COALESCE(UnitsInStock, 0) AS UnitsInStock
FROM Products;

This query selects the ProductName and UnitsInStock columns from the Products table. If the UnitsInStock value is null, the COALESCE function will return 0 instead.

In conclusion, these are the 20% of SQL functions that data analysts use 80% of the time. By mastering these functions, data analysts can efficiently extract, manipulate, and transform data from relational databases, and produce valuable insights for their organizations.


Leave a Reply

Your email address will not be published. Required fields are marked *