Skip to main content

Hadoop Framework and Ecosystem

Hadoop Framework and Ecosystem


In the previous blog on Hadoop Tutorial, we discussed about Hadoop, its features and core components. Now, the next step forward is to understand Hadoop Ecosystem. It is an essential topic to understand before you start working with Hadoop. 
Hadoop Ecosystem is neither a programming language nor a service, it is a platform or framework which solves big data problems. You can consider it as a suite which encompasses a number of services (ingesting, storing, analyzing and maintaining) inside it. Let us discuss and get a brief idea about how the services work individually and in collaboration.
Below are the Hadoop components, that together form a Hadoop ecosystem, I will be covering each of them in this blog:



  • HDFS -> Hadoop Distributed File System
  • YARN -> Yet Another Resource Negotiator
  • MapReduce -> Data processing using programming
  • Spark -> In-memory Data Processing
  • PIG, HIVE-> Data Processing Services using Query (SQL-like)
  • HBase -> NoSQL Database
  • Mahout, Spark MLlib -> Machine Learning
  • Apache Drill -> SQL on Hadoop
  • Zookeeper -> Managing Cluster
  • Oozie -> Job Scheduling
  • Flume, Sqoop -> Data Ingesting Services
  • Solr & Lucene -> Searching & Indexing 
  • Ambari -> Provision, Monitor and Maintain cluster

I hope this blog is informative and added value to you. If you are interested to learn more, you can go through this blog which tells you how Big Data is used in Industries and How Hadoop Is Revolutionizing Analytics.

Comments

Popular posts from this blog

Let's Understand Ten Machine Learning Algorithms

Ten Machine Learning Algorithms to Learn Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them. 1. Principal Component Analysis(PCA)/SVD PCA is an unsupervised method to understand global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to un

GraphQL - A Short Intro

Why GraphQL is the future of APIs Since the beginning of the web, developing APIs has been a difficult task for developers. The way we develop our APIs must evolve with time so that we can always build good, intuitive and well-designed APIs. In the last few years, GraphQL has been growing in popularity among developers. A lot of companies have started adopting this technology to build their APIs. GraphQL is a query language developed by Facebook in 2012 and released publicly in 2015. It has been gaining a lot of traction. It has been adopted by a lot of big companies such as Spotify, Facebook, GitHub, NYTimes, Netflix, Walmart, and so on. In this series of tutorials, we’re going to examine GraphQL, understand what it is, and see what features make this query language so intuitive and easy to use. So, let’s get started by examining the problems with REST, and how GraphQL solves them. We will also find out why companies have been building their APIs with GraphQL, and why

gRPC with Java : Build Fast & Scalable Modern API & Microservices using Protocol Buffers

gRPC Java Master Class : Build Fast & Scalable Modern API for your Microservice using gRPC Protocol Buffers gRPC is a revolutionary and modern way to define and write APIs for your microservices. The days of REST, JSON and Swagger are over! Now writing an API is easy, simple, fast and efficient. gRPC is created by Google and Square, is an official CNCF project (like Docker and Kubernetes) and is now used by the biggest tech companies such as Netflix, CoreOS, CockRoachDB, and so on! gRPC is very popular and has over 15,000 stars on GitHub (2 times what Kafka has!). I am convinced that gRPC is the FUTURE for writing API for microservices so I want to give you a chance to learn about it TODAY. Amongst the advantage of gRPC: 1) All your APIs and messages are simply defined using Protocol Buffers 2) All your server and client code for any programming language gets generated automatically for free! Saves you hours of programming 3) Data is compact and serialised 4) API