Mastering the Essential SQL Functions for Efficient Data Analysis
Structured Query Language (SQL) is one of the most popular programming languages used in data analytics. It is a powerful tool that allows analysts to extract, manipulate, and transform data from relational databases. However, SQL has a vast number of functions, and it can be overwhelming for new data analysts to learn all of them.
Iโll save you some time here. Learn these SQL functions and become a better data analysts. These are 20% of SQL functions that data analysts use 80% of the time. You knowโฆ the Pareto principle. Each SQL function is using the Northwind database.
The SELECT Statement
The SELECT statement is the most commonly used SQL function, and it is used to retrieve data from one or more tables.
SELECT *
FROM Customers;
This query selects all columns from the Customers table.
The WHERE Clause
The WHERE clause is used to filter data based on specific conditions. It is often used in conjunction with the SELECT statement.
SELECT *
FROM Orders
WHERE CustomerID='ALFKI';
This query selects all columns from the Orders table where the CustomerID is equal to โALFKIโ.
The GROUP BY Clause
The GROUP BY clause is used to group rows that have the same values in one or more columns.
SELECT
CustomerID,
COUNT(OrderID)
FROM Orders
GROUP BY CustomerID;
This query groups the Orders table by CustomerID and counts the number of orders for each customer.
The ORDER BY Clause
The ORDER BY clause is used to sort the data in ascending or descending order.
SELECT *
FROM Products
ORDER BY ProductName ASC;
This query sorts the Products table by ProductName in ascending order.
The JOIN Clause
The JOIN function is used to combine data from two or more tables based on a common column.
SELECT
c.CustomerName,
o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;
This query combines data from the Customers and Orders tables based on the CustomerID column.
The COUNT Aggregate Function
The COUNT function is used to count the number of rows in a table that meet a specific condition.
SELECT COUNT(*)
FROM Customers;
This query counts the number of rows in the Customers table.
The AVG Aggregate Function
The AVG function is used to calculate the average value of a column.
SELECT AVG(UnitPrice)
FROM Products;
This query calculates the average UnitPrice of all products in the Products table.
The SUM Aggregate Function
The SUM function is used to calculate the sum of a column.
SELECT SUM(Quantity)
FROM OrderDetails;
This query calculates the total quantity of all orders in the OrderDetails table.
The MAX Aggregation Function
The MAX function is used to find the maximum value in a column.
SELECT MAX(UnitPrice)
FROM Products;
This query finds the highest UnitPrice value in the Products table.
The MIN Aggregate Function
The MIN function is used to find the minimum value in a column.
SELECT MIN(UnitPrice)
FROM Products;
This query finds the lowest UnitPrice value in the Products table.
The LIKE Operator
The LIKE operator is used to search for patterns in a column.
SELECT *
FROM Products
WHERE ProductName LIKE '%Chai%';
This query selects all columns from the Products table where the ProductName contains the word โChaiโ.
The IN Operator
The IN operator is used to match a value to a list of values.
SELECT *
FROM Customers
WHERE Country IN ('USA', 'Canada');
This query selects all columns from the Customers table where the Country is either โUSAโ or โCanadaโ.
The NOT Operator
The NOT operator is used to negate a condition.
SELECT *
FROM Customers
WHERE NOT Country='USA';
-- This can also be written
SELECT *
FROM Customers
WHERE Country <> 'USA';
This query selects all columns from the Customers table where the Country is not โUSAโ.
The UNION Clause
The UNION operator is used to combine the results of two or more SELECT statements.
SELECT City, Country FROM Customers
UNION
SELECT City, Country FROM Suppliers;
This query combines the results of two SELECT statements (Customers and Suppliers) and eliminates duplicate rows.
The HAVING Clause
The HAVING clause is used to filter the results of a GROUP BY clause based on a condition.
SELECT
CustomerID,
COUNT(OrderID)
FROM Orders
GROUP BY CustomerID
HAVING COUNT(OrderID) > 10;
This query groups the Orders table by CustomerID and selects only the rows where the number of orders for each customer is greater than 10.
The DISTINCT Keyword
The DISTINCT keyword is used to eliminate duplicate rows in a SELECT statement.
SELECT DISTINCT City
FROM Customers;
This query selects all distinct City values from the Customers table.
The CASE Statement
The CASE statement is used to create conditional expressions.
SELECT ProductName,
CASE
WHEN UnitPrice < 10 THEN 'Low Price'
WHEN UnitPrice >= 10 AND UnitPrice < 20 THEN 'Medium Price'
ELSE 'High Price'
END AS PriceCategory
FROM Products;
This query creates a new column called PriceCategory that categorizes the UnitPrice values in the Products table.
The COALESCE Function
The COALESCE function is used to return the first non-null value in a list of values.
SELECT
ProductName,
COALESCE(UnitsInStock, 0) AS UnitsInStock
FROM Products;
This query selects the ProductName and UnitsInStock columns from the Products table. If the UnitsInStock value is null, the COALESCE function will return 0 instead.
In conclusion, these are the 20% of SQL functions that data analysts use 80% of the time. By mastering these functions, data analysts can efficiently extract, manipulate, and transform data from relational databases, and produce valuable insights for their organizations.
Leave a Reply