Enterprise Data Warehouse for an Online Casino

About Client

The client is an award-winning online gaming group based in Malta.

Background

  • A fundamental part of the ‘Data-first approach’ is the data acting as the ‘single source of truth’ for the entire organization. This ensures transparency, readability, higher reliability on the data and more valuable insights from the data. However, plenty of challenges arise through the whole process.
  • While some of these challenges are technical, most are business-related issues like incorrect decisions made while defining data architecture, or mismanagement of the master data.

Challenge

The client’s data architecture had a few serious threats that affected its overall performance.

  • The data architecture was not scalable.
  • The data was managed inconsistently due to which the processing of data was very time consuming.
  • High operational cost overall to maintain their complex data system

The Objective

Build a scalable EDW model that

  • Speed up the processing time, and ensured no queries failed
  • Implemented a standard data framework across the organization to consolidate multi-brand data for easy extraction and analysis.

Our Solution

  • Meetings were held with all the stakeholders from the client’s side to decide on a single definition of KPIs and variables to be used across the organization. The process resolves the data consistency issues.
  • A logical data model was designed along with data governance policies, which included- same KPI definitions across the organization, improving user access control, and data security measures.
  • The EDW model was built in Amazon Redshift using the data from their earlier data lake (Hive with storage as S3) as the source.
  • AWS Glue was used for ETL jobs and Apache Airflow for orchestrating the Glue Jobs. Redshift Spectrum was used to read data from S3, such that it bypasses Hive which was partly responsible for slowing down the queries.
  • Python Shell Glue Jobs was deemed to be the optimal solution to call Redshift stored procedures; to both load data from source to the warehouse as well as to process data in the warehouse. This reduced query run times and operational cost significantly compared to the conventional Spark jobs.

Outcome

  • A Scalable Enterprise Data Warehouse was built.
  • A standard data framework was built to consolidate multi-brand data for enterprise wide data consistency with high processing speed.

a. Any future brand acquisitions will need minimal integration effort.

The total annual costs incurred for ETL was reduced by

90.44%
Owing to the usage of Redshift Spectrum and Python Shell Glue jobs

The Enterprise Data Warehouse was built in a record

4 months
For the client who needed it urgently.

The complete historical data from the client’s largest table with 1.3Tb compressed size, was loaded into the EDW within

48 hours