Home Intermediate To SQL Server
Post
Cancel

Intermediate To SQL Server

  • Exploring Data With Aggregation

    • Reviewing summarized values for each column is a common first step in analyzing data
    • If the data exists in a database, fastest way to aggregate is to use SQL
  • Common Summary Statistics
    • MIN() for the minimum value of a column
    • MAX() for the maximum value of a column
    • AVG() for the mean or average value of a column

    Example:

    1
    2
    3
    4
    5
    
          SELECT
              AVG(InternetUse) AS MeanInternetUse,
              MIN(InternetUse) AS MINInternet,
              MAX(InternetUse) AS MAXInternet
          FROM EconomicIndicators
    
  • Filtering Summary Data with Where

    This SQL query filters the aggregated values using a WHERE caluse

    1
    2
    3
    4
    5
    6
    
          SELECT 
              AVG(InternetUse) AS MeanInternetUse,
              MIN(InternetUse) AS MINInternet,
              MAX(InternetUse) AS MAXInternet
          FROM EconomicIndicators
          WHERE Country = 'Solomon Islands'
    
  • Subtotaling Aggregations into Groups with GROUP BY
    1
    2
    3
    4
    5
    6
    7
    
          SELECT 
              Country,
              AVG(InternetUse) AS MeanInternetUse,
              MIN(InternetUse) AS MINInternet,
              MAX(InternetUse) AS MAXInternet
          FROM EconomicIndicators
          GROUP BY Country
    
  • HAVING is the WHERE for Aggregations

    Cannot use WHEREwith GROUP BY as it will give error This throws as error

    1
    2
    3
    
          ...
          GROUP BY
          WHERE MAX(InternetUse) > 100
    

    Instead, use HAVING This is how you filter with a GROUP BY

    1
    2
    
          GROUP BY
          HAVING MAX(InternetUse) > 100
    

    Example:

    1
    2
    3
    4
    5
    6
    7
    8
    
          SELECT 
              Country,
              AVG(InternetUse) AS MeanInternetUse,
              MIN(GDP) AS SmallestGDP,
              MAX(InternetUse) AS MAXInternetUse
          FROM EconomicIndicators
          GROUP BY Country
          HAVING MAX(InternetUse) > 100
    

  • Finding and Resolving Missing Data

  • Detecting Missing Values

    • When you have no data, the empty database field contains the word NULL
    • Because NULL is not a number, it is not possible to use =,<,> to find or compare missing values
    • To determine if a column contains a NULL value, use IS NULL and IS NOT NULL
  • Returning No NULL Values in T-SQL

    1
    2
    3
    4
    5
    6
    
          SELECT
              Country,
              InternetUse, 
              Year
          FROM EconomicIndicators
          WHERE InternetUse IS NOT NULL
    

    The above SQL query returns records where the value is InternetUse field is not NULL

  • Detecting NULLs in T-SQL

    1
    2
    3
    4
    5
    6
    
          SELECT
              Country,
              InternetUse, 
              Year
          FROM EconomicIndicators
          WHERE InternetUse IS NULL
    

    The above SQL query returns records where the value is InternetUse field is NULL

  • Blank Is Not Null

    • A blank is not the same as a NULL value
    • May show up in columns containing text
    • An empty string '' can be used to find blank values
    • The best way is to look for a column where the Length or LEN > 0
    1
    2
    3
    4
    5
    6
    
          SELECT 
              Country, 
              GDP, 
              Year
          FROM EconomicIndicators
          WHERE LEN(GDP) > 0
    
  • Substituting missing data with a specific value using ISNULL

    1
    2
    3
    4
    5
    
          SELECT 
              GDP, 
              Country,
              ISNULL(Country, 'Unknown') AS NewCountry
          FROM EconomicIndicators
    

    The above SQL query returns a new column called NewCountry which has all the same values as the Country field but the only difference being every NULL value is replaced by Unknown keyword.

  • Substituting missing data with a column using ISNULL

    Substituting values from one column or another with ISNULL

    1
    2
    3
    4
    5
    
          SELECT
              TradeGDPPercent,
              ImportGoodPercent,
              ISNULL(TradeGDPPercent,ImportGoodPercent) AS NewPercent
          FROM EconomicIndicators
    
  • Substituting NULL values using COALESCE

    COALESCE returns the first non-missing value
    Syntax: COALESCE(value_1, value_2, value_3,….,value_n)

    • If value_1 is NULL and value_2 is not NULL, return value_2
    • If value_1 and value_2 are NULL and value_3 is not NULL, return value_3

    Example:

    1
    2
    3
    4
    5
    
          SELECT
              TradeGDPPercent, 
              ImportGoodPercent
          COALESCE(TradeGDPPercent, ImportGoodPercent, 'N/A') AS NewPercent
          FROM EconomicIndicators
    

  • Changing column values with CASE

    Syntax:
    CASE
    WHEN Boolean_expression THEN result_expression [ …n ]
    [ ELSE else_result_expression ]
    END

  • Changing column values with CASE IN T-SQL

    1
    2
    3
    4
    5
    6
    
          SELECT
              Continent
          CASE WHEN Continent = 'Europe' or Continent = 'Asia' THEN 'Eurasia'
              ELSE 'Other'
              END AS NewContinent
          FROM EconomicIndicators
    
  • Using CASE statements to create value groups

    We are binning the data here into discrete groups

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
          SELECT
              Country,
              LifeExp,
          CASE WHEN LifeExp < 30 Then 1
              WHEN LifeExp > 29 AND LifeExp < 40 THEN 2
              WHEN LifeExp > 39 AND LifeExp < 50 THEN 3
              WHEN LifeExp > 49 AND LifeExp < 60 THEN 4
              ELSE 5
              END AS LifeExpGroup
          FROM EconomicIndicators
          WHERE Year = 2007
    
This post is licensed under CC BY 4.0 by the author.