Kafka - Architecture What is Kafka? Kafka is an event-streaming platform that is designed to process high volumes of data in real-time. Developed by LinkedIn in 2011, it has quickly become the infrastructural backbone of companies like Netflix, Twitter, and Spotify. Why do we need Kafka? In today’s data-driven world, tracking information like user clicks, recommendations, and shopping carts can be invaluable for a company’s growth. With these analytics, companies can make the product improvements needed to boost user engagement and conversion rates. However, on sites with millions of daily users, collecting and analyzing this data is nontrivial. Kafka was des i gned to streamline this operation, acting as a robust tool that maintains efficient, real-time processing capabilities with incredible quantities of data. For instance, as of late 2019, LinkedIn’s Kafka deployments were managing more than 7 trillion messages per day. How does Kafka work? Kafka provides a structured architecture
Data Engineering - Tools & Intro So I just realized that I am here after a month or so. I was busy at work and traveling. I am starting a kind of new series, I say it Data Engineering Series in which I will be discussing different tools. Of course, I am not able to discuss the entire concept of Data Engineering neither I know it as I will be learning myself. What is Data Engineering? Data Engineering is all about developing, maintaining systems that are responsible for transferring data in large volumes and make it available for analysts and data scientists to use it for analyzing and data modeling. Data engineering is a superset of Data Science or the subset, not clear to me but the collaboration of data engineers and scientists fruits useful data-driven solutions. Data Engineering tools It consists of several tools. Some are dealing with data storage while others with analysis and ETL. Ofcourse, Apache Kafka is one of them. The others tools that I might be covering are Apache