Home OLTP and OLAP
Post
Cancel

OLTP and OLAP

  • How should we organize and manage data?

    • Schemas: How should my data be logically organized?
    • Normalization: Should my data have minimal dependency and redundancy?
    • Views: What joins will be done most ofte?
    • Access Control: Should all users of the data have the same level of access
    • DBMS: How do I pick between all the SQL and noSQL options?
  • Approaches to processing data

    OLTP
    OLAP
    Online Transaction ProcessingOnline Analytical Processing
    - Find the price of a book- Calculate books with best profit margin
    - Update latest customer transaction- Find most loyal customers
    - Keep track of employee hours- Decide employee of the month
  • OLVAP vs OLTP

     OLTPOLAP
    Purposesupport daily transactionsreport and analyze data
    Dataup-to-date, operationalconsolidated, historical
    Sizesnapshot, gigabytersarchive, terrabytes
    Queriessimple transactions & frequent updatescomplex, aggregate queries & limited updates
    Usersthousandshundreds

Storing Data

  • Structuring Data

    1. Structured data
      • Follows a schema
      • Defined data types & relationships
      • E.g:- SQL, tables in a relational database
    2. Unstuctured Data
      • Schemaless
      • Makes up most of data in the world
      • E.g: - Photos, chat logs, MP3
    3. Semi-structued data
      • Does not follow larger schema
      • Self-describing structure
      • E.g: - NoSQL, XML, JSON
      1
      2
      3
      4
      5
      6
      7
      8
      
           # Example of a JSON file
           'user': {
               'profile_use_background_image': true,
               'statuses_count': 31,
               'profile_background_color': 'CODEED',
               'followers_count': 3066,
               ...
           }
      
  • Storing data beyond traditional databases

    • Traditional databases
      • For storing real-time relational structured data? OLTP
    • Data warehouses
      • For analyzing achieved structured data? OLAP
    • Data lakes
      • For storing data of all structures = flexibility and scalability
      • For analyzing big data
  • Data Warehouses

    • Optimized for analytics - OLAP
      • Organized for reading/aggregating data
      • Usually read-only
    • Contains data from multiple sources
    • Massively Parallel Processing (MPP)
    • Typically uses a denormalized schema and dimensional modeling

    Data Marts

    • Subset of data warehouses
    • Dedicated to a specific topic

    image

  • Data Lakes

    • Store all types of data at a lower cost
      • E.g:- raw, operational databases, IoT device logs, real-time, relational and non-relational
      • Retains all data and can take up petabytes
      • Schema-on-read as opposed to schema-on-write
      • Need to catalog data otherwise becomes a data swamp
      • Run big data analytics using services such as Apache Spark and Hadoop
        • Useful for deep learning and data discovery because activites requires so much data
  • ETL

    image

  • ELT

    image


Database Design

  • What is database design?

    • Determines how data is logically stored
      • How is data going to be read and updated?
    • Uses *database models**: high-level specifications for database structure
      • Most popular: relational model
      • Some other options: NoSQL models, object-oriented model, network model
    • Use schemas: blueprint of the database
      • Defines tables, fields, relationships, indexes, and views
      • When inserting data in relational databases, schemas must be respected
  • Data Modeling

    Process of creating a data model for the data to be stored.

    1. Conceptual data model: describes entities, relationships, and attributes
      • Tools: data structure diagrams, e.g:- entity-relational diagrams and UML diagrams
    2. Logical data modle: defines tables, columns, relationships
      • Tools: database models and schemas, e.g:- relational model and star schema
    3. Physical data model: describes physical storage
      • Tools: partitions, CPUs, indexes, backup systems and tablespaces image
  • Beyond the relational model-Dimensional Modeling

    Adaptation of the relational model for data warehouse design

    • Optimized for OLAP queries: aggregate data, not updating (OLTP)
    • Built using the star schema
    • Easy to interpret and extend schema
  • Element of dimensional modeling

    image Organize by:

    • What is being analyzed?
    • How often do entities change?

    Fact tables

    • Decided by business use-case
    • Holds records of a metric
    • Changes regularly
    • Connects to dimensions via foreign keys

    Dimension tables

    • Holds descriptions of attributes
    • DOes not change as often
This post is licensed under CC BY 4.0 by the author.