Key Aggregation Functions
- Counts
COUNT()
COUNT_BIG()
COUNT(DISTINCT)
- Other Aggregates
SUM()
MIN()
MAX()
- Counts
What counts with COUNT()
- Number of Rows
COUNT(*)
COUNT(1)
COUNT(1/0)
- Non-NULL Values
COUNT(d.YR)
COUNT(NULLIF(d.YR, 1990))
- Number of Rows
Distinct Counts
1 2 3 4
SELECT COUNT(DISTINCT c.CalendarYear) AS Years, COUNT(DISTINCT NULLIF(c.CalendarYear, 2010)) AS Y2 FROM dbo.Calendar c;
Filtering aggregates with CASE
1 2 3 4 5 6 7 8 9 10
SELECT MAX(CASE WHEN ir.IncidentTypeId = 1 THEN ir.IncidentDate ELSE NULL END) AS I1, MAX(CASE WHEN ir.IncidentTypeID = 2 THEN ir.IncidentDate ELSE NULL END) AS I2, FROM dbo.IncidentRollup ir;
Statistical Aggregate Functions
AVG()
: MeanSTDEV()
: Standard DeviationSTDEVP()
: Population Stadard DeviationVAR()
: VarianceVARP()
: Population Variance
What about median?
1 2 3 4 5
SELECT TOP(1) PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY l.SomeVal DESC) OVER () AS MedianIncidents FROM dbo.LargeTable l;
The cost of median
Median Mean Est. Cost 95.7% 4.3% Duration 68.5s 0.37s CPU 68.5s 8.1s Reads 72,560,946 39,468 Writes 87,982 0
Downsampling and Upsampling Data
Data in nature
1 2 3
SELECT SomeDate FROM dbo.SomeTable
Downsampling Data
1 2 3
SELECT CAST(SomeDate AS DATE) AS SomeDate FROM dbo.SomeTable
Further Downsampling
1 2 3
SELECT DATEADD(HOUR, DATEDIFF(HOUR,0,SomeDate), 0) AS SomeDate FROM dbo.SomeTable
DATEDIFF(HOUR,0,'2019-08-11 06:21:16') = 1,048,470
DATEADD(HOUR< 1048748,0) = 2019-08-11 06:00:00
What about upsampling?
Downsampling Upsampling Aggregate data Disaggregate data Can usually sum or coutn results Need an allocation rule Provides a higher-level picture of the data Provides artifical granularity Acceptable for most purposes Acceptable for data generation, calculated averages
Grouping by ROLLUP, CUBE, and GROUPING SETS
Hierarchial rollups with ROLLUP
1 2 3 4 5 6 7 8 9 10 11 12
SELECT t.Month, t.Day, SUM(t.Events) AS Events FROM TABLE GROUP BY t.Month, t.Day WITH ROLLUP ORDER BY t.Month, t.Day;
Cartesian aggregation with CUBE
1 2 3 4 5 6 7 8 9 10 11 12
SELECT t.IncidentType, t.Office, SUM(t.Events) AS Events FROM TABLE GROUP BY t.IncidentType, t.Office WITH ROLLUP ORDER BY t.IncidentType, t.Office;
Define grouping sets with GROUPING SETS
1 2 3 4 5 6 7 8 9 10 11 12 13
SELECT t.IncidenType, t.Office, SUM(t.Events) AS Events FROM TABLE GROUP BY GROUPING SETS ( (t.IncidentType, t.Office), () ) ORDER BY t.IncidentType, t.Office;