A Real-Time Data Streaming Solution for a digital marketing company using Kafka and AWS

About Client:

The client is a digital marketing expert, driving results for global iGaming leaders with end-to-end marketing strategies and seamless execution.

Background:

The client aimed to modernize their data integration approach to support real-time campaigns and decision-making for multiple brands. 

Their existing infrastructure relied on a traditional ETL framework, which they wanted to move away from, in favor of a unified, real-time streaming architecture. 

The key objective was to establish a secure, cost-efficient, and scalable pipeline that could deliver real-time data for accurate and timely reporting.

Challenge:

  • Compatibility with Existing Infrastructure: The client’s platform provider already operated a Kafka-based on-prem cluster for one of their gaming brands. Extending this setup to support real-time data for other brands introduced complexities.
  • Stringent Security Requirements: The client mandated robust security, including certificate-based authentication, which rendered certain managed services like AWS MSK Connect unsuitable.
  • Cost Considerations: While cloud solutions such as Confluent Cloud offered robust features, their cost and scalability were not aligned with the client’s requirements.
  • Limited Flexibility in Managed Services: AWS managed Kafka services lacked granular control over topics and the ability to pause, reset, or replay streams—capabilities critical for operational flexibility.

Solution:

After a thorough research and validation via a POC, a bespoke architecture was developed to address the client’s unique requirements:

  • Tool Selection: Apache Kafka (Open Source) was chosen for its flexibility, cost-effectiveness, and ability to meet stringent security protocols.
  • Data Ingestion: Configured a Kafka Connect cluster on secure, auto-scaling EC2 instances to ingest data from Kafka topics.
  • Data Storage: Implemented a Kafka S3 Sink Connector to store data in Amazon S3, organized into a timestamp-based folder structure for easy retrieval and processing.
  • Data Processing: AWS Lambda functions triggered by S3 events processed the ingested data and loaded it into Aurora PostgreSQL staging tables for downstream analysis.
  • Monitoring and Metrics: Amazon CloudWatch was leveraged to track logs, create custom metrics, and set alarms for error detection and performance monitoring.
  • Alerting and Notifications: Integrated CloudWatch Alarms with SNS and Slack to provide real-time alerts and notifications.

Outcome:

  • Successfully transitioned both existing and new brands data to a real-time streaming architecture.
  • Real-time streaming enabled accurate reporting, supporting AI/ML models and advanced analytics for deeper insights
  • Delivered a cost-effective solution that meets all functional and security requirements, achieving a 75% cost reduction compared to both Confluent Cloud’s Enterprise version and AWS MSK Connect.
  • Gained operational flexibility with full control over consumption of Kafka topics.
  • Provided a flexible and secure architecture capable of accommodating future use cases and scaling as needed.
  • The architecture is designed to be flexible, allowing seamless configuration and integration of additional brands or business units in the future without significant rework. This ensures scalability to support the client’s growing portfolio.

Leave a Reply

Your email address will not be published. Required fields are marked *