Skip to main content

Data Engineering - Tools & Intro

Data Engineering - Tools & Intro So I just realized that I am here after a month or so. I was busy at work and traveling. I am starting a kind of new series, I say it Data Engineering Series in which I will be discussing different tools. Of course, I am not able to discuss the entire concept of Data Engineering neither I know it as I will be learning myself. What is Data Engineering? Data Engineering is all about developing, maintaining systems that are responsible for transferring data in large volumes and make it available for analysts and data scientists to use it for analyzing and data modeling. Data engineering is a superset of Data Science or the subset, not clear to me but the collaboration of data engineers and scientists fruits useful data-driven solutions. Data Engineering tools It consists of several tools. Some are dealing with data storage while others with analysis and ETL. Ofcourse, Apache Kafka is one of them. The others tools that I might be covering are Apache Airflow, an ETL tool and Hadoop Ecosystem components like HDFS, Hive, Yarn, Pig etc. There is no such specific roadmap so tools can be covered in any order. Since I mostly work in Python, Java so will be trying my best to find some way to interact with Python or Java but it is not necessary as most of Hadoop related systems are in either Java or Scala. So, stay tuned and I will be back shortly with the new post.

Comments

Popular posts from this blog

gRPC with Java : Build Fast & Scalable Modern API & Microservices using Protocol Buffers

gRPC Java Master Class : Build Fast & Scalable Modern API for your Microservice using gRPC Protocol Buffers gRPC is a revolutionary and modern way to define and write APIs for your microservices. The days of REST, JSON and Swagger are over! Now writing an API is easy, simple, fast and efficient. gRPC is created by Google and Square, is an official CNCF project (like Docker and Kubernetes) and is now used by the biggest tech companies such as Netflix, CoreOS, CockRoachDB, and so on! gRPC is very popular and has over 15,000 stars on GitHub (2 times what Kafka has!). I am convinced that gRPC is the FUTURE for writing API for microservices so I want to give you a chance to learn about it TODAY. Amongst the advantage of gRPC: 1) All your APIs and messages are simply defined using Protocol Buffers 2) All your server and client code for any programming language gets generated automatically for free! Saves you hours of programming 3) Data is compact and serialised 4) API ...

What is Big Data ?

What is Big Data ? It is now time to answer an important question – What is Big Data? Big data, as defined by Wikipedia, is this: “Big data is a broad term for  data sets  so large or complex that traditional  data processing  applications are inadequate. Challenges include  analysis , capture,  data curation , search,  sharing ,  storage , transfer ,  visualization ,  querying  and  information privacy . The term often refers simply to the use of  predictive analytics  or certain other advanced methods to extract value from data, and seldom to a particular size of data set.” In simple terms, Big Data is data that has the 3 characteristics that we mentioned in the last section – • It is big – typically in terabytes or even petabytes • It is varied – it could be a traditional database, it could be video data, log data, text data or even voice data • It keeps increasing as new data keeps flowing in This kin...

GraphQL - A Short Intro

Why GraphQL is the future of APIs Since the beginning of the web, developing APIs has been a difficult task for developers. The way we develop our APIs must evolve with time so that we can always build good, intuitive and well-designed APIs. In the last few years, GraphQL has been growing in popularity among developers. A lot of companies have started adopting this technology to build their APIs. GraphQL is a query language developed by Facebook in 2012 and released publicly in 2015. It has been gaining a lot of traction. It has been adopted by a lot of big companies such as Spotify, Facebook, GitHub, NYTimes, Netflix, Walmart, and so on. In this series of tutorials, we’re going to examine GraphQL, understand what it is, and see what features make this query language so intuitive and easy to use. So, let’s get started by examining the problems with REST, and how GraphQL solves them. We will also find out why companies have been building their APIs with GraphQL, and ...