InfoBlueprint

Click here to see our Press Centre for Latest News and Events

To discuss your requirements
with one of our information quality
experts, or to hear more about our
services, please contact us for an
informal, no obligation, discussion.

C06 Data Warehouse design

3 Day Course
Course Overview:
This course will cover all aspects of Data Warehouse design, including:

  • Differences between the design of a transactional database and a warehouse  database.
  • Components of the warehouse architecture (e.g. source database, operational data store, staging database, warehouse database, data marts, ETL software).
  • Inmon versus Kimball approach to data warehouse design.
  • Star versus snowflake schema, pros and cons of each.
  • High-level overview of the ETL process and why it is necessary.
  • Areas of complexity within the ETL process:
  • Multiple source systems.
  • Different data structures / formats in each source system.
  • Missing data and generation of appropriate values.
  • Identifying which records to extract (i.e. identifying new and updated data).
  • Transformation of inconsistent formats / values.
  • Validation of ETL process (e.g. footer and header records; control totals).
  • Effects of updates in the source database on transformed (i.e. derived, aggregated) data in the target.
  • Types of tables in a data warehouse (i.e. Dimension versus Fact).
  • The hierarchies inherent in many dimension tables (e.g. the  Time dimension).
    • Use of surrogate keys in warehouse tables.
    • The various ways of dealing with slowly-changing dimensions (i.e. Overwrite the dimension record; Add a new dimension record; Create new fields in the dimension record) and when to use which).
    • The various ways of dealing with Rapidly Changing Dimensions, or Large Slowly Changing Dimensions.
    • Multi-use dimensions (also referred to as junk dimensions) in which a number of small, unrelated dimensions are combined into a single physical dimension. This can greatly reduce the size of the fact table by reducing the number of foreign keys in fact table records. Often the combined dimension will be pre-populated with the Cartesian product of all dimension values.
    • Measures in fact tables: Additive and Non-additive Measures; calculated measures.
    • Some performance improvement approaches (e.g. partitioning – such as where a separate physical table is created for each month of data) and pros and cons of each approach.
    • Warehouse metadata (e.g. data lineage, schema change history, data type usage, ETL statistics).

Most of the above will be introduced to the students through hands-on, practical learning. For example, the actual warehouse database design and ETL process will be covered in great detail, as students will be required to design, in groups, solutions to a business scenario that we have created. We will then (as a group) evaluate each alternative, and discuss the pros and cons of each, while at the same time introducing (or reinforcing) each of the concepts mentioned in the list above.

Duration:   
3 days (8.00 to 16.00)                   

Intended participants: 
Anyone involved in the design of a Data Warehouse.

Presenter:
Amberjack Systems Engineering.  Two trainers are used for the duration of the course, which ensures that all participants receive the necessary focused attention during practical exercises. Both trainers have extensive experience of Data Warehousing. 

Pre-requisites
None. However, it would be very advantageous to have attended the ‘Introduction to Data Modelling’ course.

Nr of participants:
Minimum 8    Maximum 12

Registration:
For Course Schedules, Locations, Fees and Registration information:  see www.infoblueprint.co.za > Services > Education and Coaching,  or contact Tammy on 021 551 2410.   
We can also present this course ‘in-house’ if required. Please contact Alan Snow on 079 907 7680 to discuss your requirements.