Exploring Data With Aggregation
- Reviewing summarized values for each column is a common first step in analyzing data
 - If the data exists in a database, fastest way to aggregate is to use SQL
 
- Common Summary Statistics
MIN()for the minimum value of a columnMAX()for the maximum value of a columnAVG()for the mean or average value of a column
Example:
1 2 3 4 5
SELECT AVG(InternetUse) AS MeanInternetUse, MIN(InternetUse) AS MINInternet, MAX(InternetUse) AS MAXInternet FROM EconomicIndicators
 - Filtering Summary Data with Where
This SQL query filters the aggregated values using a WHERE caluse
1 2 3 4 5 6
SELECT AVG(InternetUse) AS MeanInternetUse, MIN(InternetUse) AS MINInternet, MAX(InternetUse) AS MAXInternet FROM EconomicIndicators WHERE Country = 'Solomon Islands'
 - Subtotaling Aggregations into Groups with GROUP BY
1 2 3 4 5 6 7
SELECT Country, AVG(InternetUse) AS MeanInternetUse, MIN(InternetUse) AS MINInternet, MAX(InternetUse) AS MAXInternet FROM EconomicIndicators GROUP BY Country
 - HAVING is the WHERE for Aggregations
Cannot use
WHEREwithGROUP BYas it will give error This throws as error1 2 3
... GROUP BY WHERE MAX(InternetUse) > 100
Instead, use
HAVINGThis is how you filter with a GROUP BY1 2
GROUP BY HAVING MAX(InternetUse) > 100
Example:
1 2 3 4 5 6 7 8
SELECT Country, AVG(InternetUse) AS MeanInternetUse, MIN(GDP) AS SmallestGDP, MAX(InternetUse) AS MAXInternetUse FROM EconomicIndicators GROUP BY Country HAVING MAX(InternetUse) > 100
 
Finding and Resolving Missing Data
Detecting Missing Values
- When you have no data, the empty database field contains the word 
NULL - Because 
NULLis not a number, it is not possible to use=,<,>to find or compare missing values - To determine if a column contains a 
NULLvalue, useIS NULLandIS NOT NULL 
- When you have no data, the empty database field contains the word 
 Returning No NULL Values in T-SQL
1 2 3 4 5 6
SELECT Country, InternetUse, Year FROM EconomicIndicators WHERE InternetUse IS NOT NULL
The above SQL query returns records where the value is InternetUse field is not
NULLDetecting NULLs in T-SQL
1 2 3 4 5 6
SELECT Country, InternetUse, Year FROM EconomicIndicators WHERE InternetUse IS NULL
The above SQL query returns records where the value is InternetUse field is
NULLBlank Is Not Null
- A blank is not the same as a NULL value
 - May show up in columns containing text
 - An empty string 
''can be used to find blank values - The best way is to look for a column where the Length or LEN > 0
 
1 2 3 4 5 6
SELECT Country, GDP, Year FROM EconomicIndicators WHERE LEN(GDP) > 0
Substituting missing data with a specific value using ISNULL
1 2 3 4 5
SELECT GDP, Country, ISNULL(Country, 'Unknown') AS NewCountry FROM EconomicIndicators
The above SQL query returns a new column called NewCountry which has all the same values as the Country field but the only difference being every NULL value is replaced by
Unknownkeyword.Substituting missing data with a column using ISNULL
Substituting values from one column or another with ISNULL
1 2 3 4 5
SELECT TradeGDPPercent, ImportGoodPercent, ISNULL(TradeGDPPercent,ImportGoodPercent) AS NewPercent FROM EconomicIndicators
Substituting NULL values using COALESCE
COALESCEreturns the first non-missing value
Syntax: COALESCE(value_1, value_2, value_3,….,value_n)- If 
value_1isNULLandvalue_2is notNULL, returnvalue_2 - If 
value_1andvalue_2areNULLandvalue_3is notNULL, returnvalue_3 
Example:
1 2 3 4 5
SELECT TradeGDPPercent, ImportGoodPercent COALESCE(TradeGDPPercent, ImportGoodPercent, 'N/A') AS NewPercent FROM EconomicIndicators
- If 
 
Changing column values with CASE
Syntax:
CASE
WHEN Boolean_expression THEN result_expression [ …n ]
[ ELSE else_result_expression ]
ENDChanging column values with CASE IN T-SQL
1 2 3 4 5 6
SELECT Continent CASE WHEN Continent = 'Europe' or Continent = 'Asia' THEN 'Eurasia' ELSE 'Other' END AS NewContinent FROM EconomicIndicators
Using CASE statements to create value groups
We are binning the data here into discrete groups
1 2 3 4 5 6 7 8 9 10 11
SELECT Country, LifeExp, CASE WHEN LifeExp < 30 Then 1 WHEN LifeExp > 29 AND LifeExp < 40 THEN 2 WHEN LifeExp > 39 AND LifeExp < 50 THEN 3 WHEN LifeExp > 49 AND LifeExp < 60 THEN 4 ELSE 5 END AS LifeExpGroup FROM EconomicIndicators WHERE Year = 2007