Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Mesh: Netflix's Real-Time Data Platform Built with Open-Source Technologies #571

Open
sh-aps opened this issue Jan 19, 2025 · 0 comments

Comments

@sh-aps
Copy link
Collaborator

sh-aps commented Jan 19, 2025

Netflix leverages real-time data processing to power its wide range of applications, from personalised recommendations to game analytics and live events. At the heart of this is Data Mesh, a powerful, general-purpose data movement and processing platform built with open-source technologies. This session will explore the architecture and key features of Data Mesh, emphasising its reliance on, and contributions to, the open-source ecosystem.Originally developed for Change Data Capture (CDC), Data Mesh has evolved into a scalable platform for moving data between Netflix systems. It features a modular design, with a control plane (Data Mesh Controller) and a data plane (Data Mesh Pipeline). The data plane uses Flink jobs for processing, with Kafka as the underlying transport layer. Data Mesh manages over 20,000 Flink jobs and 50,000 Kafka topics, processing over 60 million events per second at peak traffic.This talk will demonstrate how Data Mesh directly utilises and benefits from key open-source technologies like Flink, Kafka and Avro. We will delve into the core components of Data Mesh, showing how open-source tools are used to achieve scalable and reliable data processing. Key concepts include:●Connectors●Processors●Streaming SQL●Schema Management●Data LineageThis talk is particularly relevant to the open-source community because it highlights a practical, large-scale application of several prominent open-source technologies. Through its extensive use of Flink, Kafka, and Avro, Data Mesh provides valuable feedback to these open-source communities. We’ll discuss how Data Mesh is evolving to meet new challenges and the roadmap to make it the recommended solution for data movement and processing at Netflix. Finally, we'll discuss the lessons learned from building and operating a platform of this scale, and where the future of Data Mesh is headed, including its role in powering AI and machine learning use cases. The development of Data Mesh, along with the sharing of its architecture and best practices, shows a willingness to engage with and contribute to the open-source community and shows an understanding of the challenges in real time data processing. This presentation offers valuable insights for anyone interested in real-time data processing, scalable architecture, and the practical application of open-source technologies in large-scale data platforms. It is a testament to the power of open source and how it can be leveraged to create complex and innovative systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant