Skip to main content

Hadoop Framework and Ecosystem

Hadoop Framework and Ecosystem


In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. 
Hadoop Ecosystem is neither a programming language nor a service, it is a platform or framework which solves big data problems. You can consider it as a suite which encompasses a number of services (ingesting, storing, analyzing and maintaining) inside it. Let us discuss and get a brief idea about how the services work individually and in collaboration.
Below are the Hadoop components, that together form a Hadoop ecosystem, I will be covering each of them in this blog:



  • HDFS -> Hadoop Distributed File System
  • YARN -> Yet Another Resource Negotiator
  • MapReduce -> Data processing using programming
  • Spark -> In-memory Data Processing
  • PIG, HIVE-> Data Processing Services using Query (SQL-like)
  • HBase -> NoSQL Database
  • Mahout, Spark MLlib -> Machine Learning
  • Apache Drill -> SQL on Hadoop
  • Zookeeper -> Managing Cluster
  • Oozie -> Job Scheduling
  • Flume, Sqoop -> Data Ingesting Services
  • Solr & Lucene -> Searching & Indexing 
  • Ambari -> Provision, Monitor and Maintain cluster

I hope this blog is informative and added value to you. If you are interested to learn more, you can go through this blog which tells you how Big Data is used in Industries and How Hadoop Is Revolutionizing Analytics.

Comments

Popular posts from this blog

gRPC with Java : Build Fast & Scalable Modern API & Microservices using Protocol Buffers

gRPC Java Master Class : Build Fast & Scalable Modern API for your Microservice using gRPC Protocol Buffers gRPC is a revolutionary and modern way to define and write APIs for your microservices. The days of REST, JSON and Swagger are over! Now writing an API is easy, simple, fast and efficient. gRPC is created by Google and Square, is an official CNCF project (like Docker and Kubernetes) and is now used by the biggest tech companies such as Netflix, CoreOS, CockRoachDB, and so on! gRPC is very popular and has over 15,000 stars on GitHub (2 times what Kafka has!). I am convinced that gRPC is the FUTURE for writing API for microservices so I want to give you a chance to learn about it TODAY. Amongst the advantage of gRPC: 1) All your APIs and messages are simply defined using Protocol Buffers 2) All your server and client code for any programming language gets generated automatically for free! Saves you hours of programming 3) Data is compact and serialised 4) API ...

Let's Understand Ten Machine Learning Algorithms

Ten Machine Learning Algorithms to Learn Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them. 1. Principal Component Analysis(PCA)/SVD PCA is an unsupervised method to understand global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to un...

Automatic Builds With GCP Cloud Build

Automatic Builds With GCP Cloud Build If you are looking for an easy way to automatically build your application in the cloud, then maybe Google Cloud Platform (GCP) Cloud Build is for you. In this post, we will build a Spring Boot Maven project with Cloud Build, create a Docker image for it, and push it to GCP Container Registry. 1. Introduction Cloud Build is the build server tooling of GCP, something similar as Jenkins. But, Cloud Build is available out-of-the-box in your GCP account and that is a major advantage. The only thing you will need is a build configuration file in your git repository containing the build steps. Each build step is running in its own Docker container. Several cloud builders which can be used as a build step are generally available. You can read more about Cloud Build on the  overview  and  concepts  website of GCP. There are three categories of build steps: Official  cloud builders provided by GCP; Community  cloud ...