5 min read โ€“ Feb 26, 2023

In SQL, Window Functions are a powerful tool for performing complex analytical queries on data. A window function is a function that performs a calculation across a set of rows that are related to the current row in a result set. The set of related rows is called a โ€œwindow,โ€ hence the name โ€œWindow Functions.โ€

In this article, we will explore the different types of Window Functions in SQL, starting from the simplest and most commonly used functions to more complex ones. We will use examples to illustrate how to use these functions in SQL queries.

ROW_NUMBER()

ROW_NUMBER() is one of the simplest Window Functions. It assigns a unique sequential number to each row in the result set. This function is commonly used to generate a unique identifier for each row, which can be used for further analysis or grouping.

Example:

Suppose we have a table called โ€œordersโ€ with the following columns: โ€œorder_id,โ€ โ€œcustomer_id,โ€ and โ€œorder_date.โ€ We can use the ROW_NUMBER() function to generate a unique identifier for each row, and use the PARTITION BY clause to group the result set by โ€œcustomer_idโ€:

SELECT order_id, customer_id, order_date, 
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as row_num
FROM orders

This query will return the โ€œorder_id,โ€ โ€œcustomer_id,โ€ โ€œorder_date,โ€ and a unique identifier for each row called โ€œrow_num.โ€

Example result table:

SUM(), AVG(), MIN(), and MAX()

The SUM() window function

The SUM() window function is used to calculate the running total of a specified column over a partition of rows in a result set, without grouping the rows. It can be useful for calculating cumulative sums, running totals, and rolling averages.

Suppose we have a table called โ€œsalesโ€ with columns โ€œregion,โ€ โ€œproduct,โ€ โ€œsales_date,โ€ and โ€œsales_amount,โ€ and we want to calculate the total sales amount for each region, ordered by product and sales date.

SELECT region, product, sales_date, sales_amount, 
SUM(sales_amount) OVER (PARTITION BY region ORDER BY product, sales_date) as total_sales
FROM sales

This query adds a SUM() window function to calculate the running total of โ€œsales_amountโ€ for each unique value of the โ€œregionโ€ column, ordered by โ€œproductโ€ and โ€œsales_date.โ€

Another example, joining two tables, orders and [Order Details], calculating a new column โ€œtotal_order_amountโ€ with the SUM or UnitPrice by Quantity for each Customer (PARTITION BY)

SELECT orders.OrderID, 
CustomerID,
YEAR(OrderDate),
UnitPrice * Quantity as OrderAmount,
SUM(UnitPrice * Quantity) OVER (PARTITION BY CustomerID) as total_order_amount
FROM orders
JOIN [Order Details] od on orders.OrderID = od.OrderID;

Here is the reults table

The AVG() window function

The AVG window function is used to calculate the average value of a column over a partition of rows in a result set. It is commonly used in analytical queries to find trends and patterns in data.

Suppose we have a table called โ€œemployeesโ€ with columns โ€œdepartment,โ€ โ€œemployee_name,โ€ โ€œsalary,โ€ and we want to calculate the average salary for each department.

SELECT department, employee_name, salary, 
AVG(salary) OVER (PARTITION BY department) as avg_salary
FROM employees

This query adds an AVG() window function to calculate the average salary for each unique value of the โ€œdepartmentโ€ column.

The MIN() and MAX() window functions

The MIN() and MAX() window functions are used to find the minimum and maximum values of a column over a partition of rows in a result set. Here are some examples of using these functions with the Northwind database:

  1. Example of using MIN() window function:

Suppose we want to find the earliest order date for each customer in the Northwind database. We can use the MIN() window function to find the minimum order date for each unique value of the โ€œCustomerIDโ€ column:

SELECT CustomerID, OrderDate, 
MIN(OrderDate) OVER (PARTITION BY CustomerID) as EarliestOrderDate
FROM Orders

This query adds a MIN() window function to find the earliest โ€œOrderDateโ€ for each unique value of the โ€œCustomerIDโ€ column.

Example of result table

2. Example of using MAX() window function:

Suppose we want to find the most expensive product in each category in the Northwind database. We can use the MAX() window function to find the maximum unit price for each unique value of the โ€œCategoryIDโ€ column:

SELECT ProductID, ProductName, UnitPrice, CategoryID,
MAX(UnitPrice) OVER (PARTITION BY CategoryID) as MaxUnitPrice
FROM Products

This query adds a MAX() window function to find the maximum โ€œUnitPriceโ€ for each unique value of the โ€œCategoryIDโ€ column.

LEAD() and LAG()

LEAD and LAG are window functions in SQL that allow accessing data from other rows within the same result set. They can be used to retrieve data from a previous or next row, respectively, based on a given order.

The LAG() function returns the value of a specified column from the row that precedes the current row, while the LEAD() function returns the value of a specified column from the row that follows the current row. Both functions take three arguments: the first is the column that will be retrieved, the second is the offset (how many rows to move forward or backward), and the third is the default value (to be used when the requested offset goes beyond the limits of the window).

Conclusion

Window functions are a powerful tool in SQL that allow for advanced data analysis and manipulation. The SUM(), MIN(), MAX(), and AVG() functions allow for calculations of aggregate values within a specified window or partition of data. The ROW_NUMBER() function assigns a unique integer value to each row within a result set. The PARTITION BY clause allows for further partitioning of data for calculations. These functions can be used to provide valuable insights into data sets and make complex queries simpler to construct.


Leave a Reply

Your email address will not be published. Required fields are marked *