Hadoop Framework and Ecosystem
In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop.
Hadoop Ecosystem is neither a programming language nor a service, it is a platform or framework which solves big data problems. You can consider it as a suite which encompasses a number of services (ingesting, storing, analyzing and maintaining) inside it. Let us discuss and get a brief idea about how the services work individually and in collaboration.
Below are the Hadoop components, that together form a Hadoop ecosystem, I will be covering each of them in this blog:
- HDFS -> Hadoop Distributed File System
- YARN -> Yet Another Resource Negotiator
- MapReduce -> Data processing using programming
- Spark -> In-memory Data Processing
- PIG, HIVE-> Data Processing Services using Query (SQL-like)
- HBase -> NoSQL Database
- Mahout, Spark MLlib -> Machine Learning
- Apache Drill -> SQL on Hadoop
- Zookeeper -> Managing Cluster
- Oozie -> Job Scheduling
- Flume, Sqoop -> Data Ingesting Services
- Solr & Lucene -> Searching & Indexing
- Ambari -> Provision, Monitor and Maintain cluster
I hope this blog is informative and added value to you. If you are interested to learn more, you can go through this blog which tells you how Big Data is used in Industries and How Hadoop Is Revolutionizing Analytics.
Comments
Post a Comment