Skip to main content

Hadoop Framework and Ecosystem

Hadoop Framework and Ecosystem


In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. 
Hadoop Ecosystem is neither a programming language nor a service, it is a platform or framework which solves big data problems. You can consider it as a suite which encompasses a number of services (ingesting, storing, analyzing and maintaining) inside it. Let us discuss and get a brief idea about how the services work individually and in collaboration.
Below are the Hadoop components, that together form a Hadoop ecosystem, I will be covering each of them in this blog:



  • HDFS -> Hadoop Distributed File System
  • YARN -> Yet Another Resource Negotiator
  • MapReduce -> Data processing using programming
  • Spark -> In-memory Data Processing
  • PIG, HIVE-> Data Processing Services using Query (SQL-like)
  • HBase -> NoSQL Database
  • Mahout, Spark MLlib -> Machine Learning
  • Apache Drill -> SQL on Hadoop
  • Zookeeper -> Managing Cluster
  • Oozie -> Job Scheduling
  • Flume, Sqoop -> Data Ingesting Services
  • Solr & Lucene -> Searching & Indexing 
  • Ambari -> Provision, Monitor and Maintain cluster

I hope this blog is informative and added value to you. If you are interested to learn more, you can go through this blog which tells you how Big Data is used in Industries and How Hadoop Is Revolutionizing Analytics.

Comments

Popular posts from this blog

Automatic Builds With GCP Cloud Build

Automatic Builds With GCP Cloud Build If you are looking for an easy way to automatically build your application in the cloud, then maybe Google Cloud Platform (GCP) Cloud Build is for you. In this post, we will build a Spring Boot Maven project with Cloud Build, create a Docker image for it, and push it to GCP Container Registry. 1. Introduction Cloud Build is the build server tooling of GCP, something similar as Jenkins. But, Cloud Build is available out-of-the-box in your GCP account and that is a major advantage. The only thing you will need is a build configuration file in your git repository containing the build steps. Each build step is running in its own Docker container. Several cloud builders which can be used as a build step are generally available. You can read more about Cloud Build on the  overview  and  concepts  website of GCP. There are three categories of build steps: Official  cloud builders provided by GCP; Community  cloud ...

EVENT DRIVEN MICROSERVICES

EVENT BASED MICROSERVICES - Event Sourcing In a Microservice Architecture, especially with Database per Microservice, the Microservices need to exchange data. For resilient, highly scalable, and fault-tolerant systems, they should communicate asynchronously by exchanging Events. In such a case, you may want to have Atomic operations, e.g., update the Database and send the message. If you have SQL databases and want to have distributed transactions for a high volume of data, you cannot use the two-phase locking (2PL) as it does not scale. If you use NoSQL Databases and want to have a distributed transaction, you cannot use 2PL as many NoSQL databases do not support two-phase locking. In such scenarios, use Event based Architecture with Event Sourcing. In traditional databases, the Business Entity with the current “state” is directly stored. In Event Sourcing, any state-changing event or other significant events are stored instead of the entities. It means the modifications of a Busines...

Introduction to Customer Segmentation in Python !!!

Introduction to Customer Segmentation in Python In this tutorial, you're going to learn how to implement customer segmentation using RFM(Recency, Frequency, Monetary) analysis from scratch in Python. In the Retail sector, the various chain of hypermarkets generating an exceptionally large amount of data. This data is generated on a daily basis across the stores. This extensive database of customers transactions needs to analyze for designing profitable strategies. All customers have different-different kind of needs. With the increase in customer base and transaction, it is not easy to understand the requirement of each customer. Identifying potential customers can improve the marketing campaign, which ultimately increases the sales. Segmentation can play a better role in grouping those customers into various segments. In this tutorial, you will cover the following topics: What is Customer Segmentation? Need of Customer Segmentation Types of Segmentation Customer...