Grouping data in T-SQL
1 2 3 4 5 6 7
SELECT SalesPerson, SalesYear, CurrentQuota, ModifiedDate FROM SaleGoal WHERE SalesYear = 2011
Window syntax in T-SQL
- Create the window with
OVER
caluse PARTITION BY
creates the frame- If you do not include
PARTITION BY
the frame is the entire table - To arrange the results, use
ORDER BY
- Allows aggregations to be created at the same time as the window
1 2
-- Create a Window Data Grouping OVER (PARTITION BY SalesYear ORDER BY SalesYear)
- Create the window with
Window Functions (SUM)
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, SUM(CurrentQuota) OVER (PARTITION BY SalesYear) AS YearlyTotal, ModifiedDate AS ModDate FROM SaleGoal
Window Functions (COUNT)
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, COUNT(CurrentQuota) OVER (PARTITION BY SalesYear) AS QuotaPerYear, ModifiedDate as ModDate FROM SaleGoal
Common Window Functions
FIRST_VALUE() and LAST_VALUE()
FIRST_VALUE()
returns the first value in the windowLAST_VALUE()
returns the last value in the windowNote that for
FIRST_VALUE
andLAST_VALUE
theORDER BY
command is required
1 2 3 4 5 6 7 8 9 10 11 12 13 14
-- Select the columns SELECT SalesPerson, SalesYear, CurrentQuota, -- First value from every window FIRST_VALUE(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY ModifiedDate) AS StartQuota, -- Last value from every window LAST_VALUE(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY ModifiedDate) AS EndQuota, ModifiedDate as ModDate FROM SaleGoal
Gettting the next value with LEAD()
- Provides the ability to query the value from the next row
- NextQuota Column is created by using
LEAD()
- Requires the use of
ORDER BY
to order the rows
1 2 3 4 5 6 7 8 9
SELECT SalesPerson, SalesYear, CurrentQuota, -- Create a window function to get the values from the next row LEAD(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY ModifiedDate) AS NextQuota, ModifiedDate AS ModDate FROM SaleGoal
Getting the previous value with LAG()
- Provides the ability to query the value from the previous row
- PreviousQuota Column is created by using
LAG()
- Requires the use of
ORDER BY
to order the rows
1 2 3 4 5 6 7 8 9
SELECT SalesPerson, SalesYear, CurrentQuota, -- Create a window function to get the values from the previous row LAG(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY ModifiedDate) AS PreviousQuota, ModifiedDate AS ModDate FROM SaleGoal
Increasing Window Complexity
Reviewing Aggregations
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, SUM(CurrentQuota) OVER (PARTITION BY SalesYear) AS YearlyTotal, ModifiedDate AS ModDate FROM SaleGoal
Adding ORDER BY to an aggregation
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, SUM(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY SalesPerson) AS YearlyTotal, ModifiedDate AS ModDate FROM SaleGoal
Creating a running total with ORDER BY
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, SUM(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY ModifiedDate) AS RunningTotal, ModifiedDate AS ModDate FROM SaleGoal
Adding row numbers
ROW_NUMBER()
sequentially numbers the rows in the windowORDER BY
is required when usingROW_NUMBER()
1 2 3 4 5 6 7
SELECT SalesPerson, SalesYear, CurrentQuota, ROW_NUMBER() OVER (PARTITION BY SalesPerson ORDER BY SalesYear) AS QuotabySalesPerson FROM SaleGoal
Using Windows For Calculating Statistics
Calculating the Standard Deviation
- Calculate standard deviation either for the entire table or for each window
STDEV()
calculates the standard deviation
Calculating the standard deviation for the entire table
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYear, CurrentQuota, STDEV(CurrentQuota) OVER() AS StandardDev, ModifiedDate as ModDate FROM SaleGoal
Calculating the standard deviation for each partition
1 2 3 4 5 6 7 8
SELECT SalesPerson, SalesYeaer, CurrentQuota, STDEV(CurrentQuota) OVER (PARTITION BY SalesYear ORDER BY SalesYear) AS StDev, ModifiedDate AS ModDate FROM SaleGoal
Calculating the Mode
- Mode is the value which appears the most often in your data
- To calculate mode:
- Create a CTE containing an ordered count of values using ROW_NUMBER
- Write a query using the CTE to pick the value with the highest row number
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
WITH QuotaCount AS ( SELECT SalesPerson, SalesYear, CurrentQuota, ROW_NUMBER() OVER (PARTITION BY CurrentQuota ORDER BY CurrentQuota) AS QuotaList FROM SaleGoal ) SELECT CurrentQuota, QuotaList AS Mode FROM QuotaCount WHERE QuotaList In (SELECT MAX(QuotaList) FROM QuotaCount)
Advanced Window Functinos