Skip to main content

Posts

KAFKA - Architecture

Kafka - Architecture What is Kafka? Kafka is an event-streaming platform that is designed to process high volumes of data in real-time. Developed by LinkedIn in 2011, it has quickly become the infrastructural backbone of companies like Netflix, Twitter, and Spotify. Why do we need Kafka? In today’s data-driven world, tracking information like user clicks, recommendations, and shopping carts can be invaluable for a company’s growth. With these analytics, companies can make the product improvements needed to boost user engagement and conversion rates. However, on sites with millions of daily users, collecting and analyzing this data is nontrivial. Kafka was des i gned to streamline this operation, acting as a robust tool that maintains efficient, real-time processing capabilities with incredible quantities of data. For instance, as of late 2019, LinkedIn’s Kafka deployments were managing more than 7 trillion messages per day. How does Kafka work? Kafka provides a structured architecture
Recent posts

Data Engineering - Tools & Intro

Data Engineering - Tools & Intro So I just realized that I am here after a month or so. I was busy at work and traveling. I am starting a kind of new series, I say it Data Engineering Series in which I will be discussing different tools. Of course, I am not able to discuss the entire concept of Data Engineering neither I know it as I will be learning myself. What is Data Engineering? Data Engineering is all about developing, maintaining systems that are responsible for transferring data in large volumes and make it available for analysts and data scientists to use it for analyzing and data modeling. Data engineering is a superset of Data Science or the subset, not clear to me but the collaboration of data engineers and scientists fruits useful data-driven solutions. Data Engineering tools It consists of several tools. Some are dealing with data storage while others with analysis and ETL. Ofcourse, Apache Kafka is one of them. The others tools that I might be covering are Apache

EVENT DRIVEN MICROSERVICES

EVENT BASED MICROSERVICES - Event Sourcing In a Microservice Architecture, especially with Database per Microservice, the Microservices need to exchange data. For resilient, highly scalable, and fault-tolerant systems, they should communicate asynchronously by exchanging Events. In such a case, you may want to have Atomic operations, e.g., update the Database and send the message. If you have SQL databases and want to have distributed transactions for a high volume of data, you cannot use the two-phase locking (2PL) as it does not scale. If you use NoSQL Databases and want to have a distributed transaction, you cannot use 2PL as many NoSQL databases do not support two-phase locking. In such scenarios, use Event based Architecture with Event Sourcing. In traditional databases, the Business Entity with the current “state” is directly stored. In Event Sourcing, any state-changing event or other significant events are stored instead of the entities. It means the modifications of a Busines

Usage: UseStringDeduplication : Pros and Cons

Usage: UseStringDeduplication : Pros and Cons Let me start this article with an interesting statistic (based on the research conducted by the JDK development team): 25 percent of Java applications memory is filled up with strings. 13.5 percent are duplicate strings in Java applications. Average string length is 45 characters. Yes, you are right — 13.5 percent of memory is wasted due to duplicate strings. 13.5 percent is the average amount of duplicate strings present in Java application. To figure out how much memory your application is wasting because of duplicate strings, you may use tools like  HeapHero , which can report how much memory is wasted because of duplicate strings and other inefficient programming practices. What Are Duplicate Strings? First, let’s understand what a duplicate string means. Look at the below code snippet: String string1 = new String ( "Hello World" ); String string2 = new String ( "Hello World&quo

Automatic Builds With GCP Cloud Build

Automatic Builds With GCP Cloud Build If you are looking for an easy way to automatically build your application in the cloud, then maybe Google Cloud Platform (GCP) Cloud Build is for you. In this post, we will build a Spring Boot Maven project with Cloud Build, create a Docker image for it, and push it to GCP Container Registry. 1. Introduction Cloud Build is the build server tooling of GCP, something similar as Jenkins. But, Cloud Build is available out-of-the-box in your GCP account and that is a major advantage. The only thing you will need is a build configuration file in your git repository containing the build steps. Each build step is running in its own Docker container. Several cloud builders which can be used as a build step are generally available. You can read more about Cloud Build on the  overview  and  concepts  website of GCP. There are three categories of build steps: Official  cloud builders provided by GCP; Community  cloud builders; You can create you