5 min read โ Feb 26, 2023
In SQL, Window Functions are a powerful tool for performing complex analytical queries on data. A window function is a function that performs a calculation across a set of rows that are related to the current row in a result set. The set of related rows is called a โwindow,โ hence the name โWindow Functions.โ
In this article, we will explore the different types of Window Functions in SQL, starting from the simplest and most commonly used functions to more complex ones. We will use examples to illustrate how to use these functions in SQL queries.
ROW_NUMBER()
ROW_NUMBER() is one of the simplest Window Functions. It assigns a unique sequential number to each row in the result set. This function is commonly used to generate a unique identifier for each row, which can be used for further analysis or grouping.
Example:
Suppose we have a table called โordersโ with the following columns: โorder_id,โ โcustomer_id,โ and โorder_date.โ We can use the ROW_NUMBER() function to generate a unique identifier for each row, and use the PARTITION BY clause to group the result set by โcustomer_idโ:
SELECT order_id, customer_id, order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as row_num
FROM orders
This query will return the โorder_id,โ โcustomer_id,โ โorder_date,โ and a unique identifier for each row called โrow_num.โ
Example result table:
SUM(), AVG(), MIN(), and MAX()
The SUM() window function
The SUM() window function is used to calculate the running total of a specified column over a partition of rows in a result set, without grouping the rows. It can be useful for calculating cumulative sums, running totals, and rolling averages.
Suppose we have a table called โsalesโ with columns โregion,โ โproduct,โ โsales_date,โ and โsales_amount,โ and we want to calculate the total sales amount for each region, ordered by product and sales date.
SELECT region, product, sales_date, sales_amount,
SUM(sales_amount) OVER (PARTITION BY region ORDER BY product, sales_date) as total_sales
FROM sales
This query adds a SUM() window function to calculate the running total of โsales_amountโ for each unique value of the โregionโ column, ordered by โproductโ and โsales_date.โ
Another example, joining two tables, orders and [Order Details], calculating a new column โtotal_order_amountโ with the SUM or UnitPrice by Quantity for each Customer (PARTITION BY)
SELECT orders.OrderID,
CustomerID,
YEAR(OrderDate),
UnitPrice * Quantity as OrderAmount,
SUM(UnitPrice * Quantity) OVER (PARTITION BY CustomerID) as total_order_amount
FROM orders
JOIN [Order Details] od on orders.OrderID = od.OrderID;
Here is the reults table
The AVG() window function
The AVG window function is used to calculate the average value of a column over a partition of rows in a result set. It is commonly used in analytical queries to find trends and patterns in data.
Suppose we have a table called โemployeesโ with columns โdepartment,โ โemployee_name,โ โsalary,โ and we want to calculate the average salary for each department.
SELECT department, employee_name, salary,
AVG(salary) OVER (PARTITION BY department) as avg_salary
FROM employees
This query adds an AVG() window function to calculate the average salary for each unique value of the โdepartmentโ column.
The MIN() and MAX() window functions
The MIN() and MAX() window functions are used to find the minimum and maximum values of a column over a partition of rows in a result set. Here are some examples of using these functions with the Northwind database:
- Example of using MIN() window function:
Suppose we want to find the earliest order date for each customer in the Northwind database. We can use the MIN() window function to find the minimum order date for each unique value of the โCustomerIDโ column:
SELECT CustomerID, OrderDate,
MIN(OrderDate) OVER (PARTITION BY CustomerID) as EarliestOrderDate
FROM Orders
This query adds a MIN() window function to find the earliest โOrderDateโ for each unique value of the โCustomerIDโ column.
Example of result table
2. Example of using MAX() window function:
Suppose we want to find the most expensive product in each category in the Northwind database. We can use the MAX() window function to find the maximum unit price for each unique value of the โCategoryIDโ column:
SELECT ProductID, ProductName, UnitPrice, CategoryID,
MAX(UnitPrice) OVER (PARTITION BY CategoryID) as MaxUnitPrice
FROM Products
This query adds a MAX() window function to find the maximum โUnitPriceโ for each unique value of the โCategoryIDโ column.
LEAD() and LAG()
LEAD and LAG are window functions in SQL that allow accessing data from other rows within the same result set. They can be used to retrieve data from a previous or next row, respectively, based on a given order.
The LAG() function returns the value of a specified column from the row that precedes the current row, while the LEAD() function returns the value of a specified column from the row that follows the current row. Both functions take three arguments: the first is the column that will be retrieved, the second is the offset (how many rows to move forward or backward), and the third is the default value (to be used when the requested offset goes beyond the limits of the window).
Conclusion
Window functions are a powerful tool in SQL that allow for advanced data analysis and manipulation. The SUM(), MIN(), MAX(), and AVG() functions allow for calculations of aggregate values within a specified window or partition of data. The ROW_NUMBER() function assigns a unique integer value to each row within a result set. The PARTITION BY clause allows for further partitioning of data for calculations. These functions can be used to provide valuable insights into data sets and make complex queries simpler to construct.
Leave a Reply