Skip to main content

Hadoop As Big Data

Introduction:
In this blog, I will discuss Big Data, its characteristics, different sources of Big Dataand some key components of Hadoop Framework.
In the two part blog series, I will cover the basics of Hadoop Ecosystem.
Let us start with Big Data and its importance in Hadoop Framework. Ethics, privacy, security measures are very important and need to be taken care while dealing with the challenges of Big Data.

Big Data: When the Data itself becomes the part of the problem.
Data is crucial for all organizations. It has to be stored for future use. We can refer the term Big Data as the data, which is beyond the storage capacity and the processing power of an organization.

What are the sources of this huge data?
There are different sources of data such as the social networks, CCTV cameras, sensors, online shopping portals, hospitality data, GPS, automobile industry etc., that generate data massively.
Big Data can be characterized as:
  • The Volume of the Data
  • Velocity of the Data
  • The Variety of Data being processed
Volume of Data à Data is increasing rapidly in GB, TB, PB and so on, and requires huge disk space to store it.
Velocity of Data à Huge Data is stored in Data Centres to cater to the organizational needs. In order to get data to the local workstation high-speed data processors are needed.
Variety of Data à Data can be broadly classified into the following types-Structured, Unstructured & Semi structured.
Big Data = (Volume + Velocity + Variety) of Data
big data consulting services

Comments

Popular posts from this blog

Automatic Builds With GCP Cloud Build

Automatic Builds With GCP Cloud Build If you are looking for an easy way to automatically build your application in the cloud, then maybe Google Cloud Platform (GCP) Cloud Build is for you. In this post, we will build a Spring Boot Maven project with Cloud Build, create a Docker image for it, and push it to GCP Container Registry. 1. Introduction Cloud Build is the build server tooling of GCP, something similar as Jenkins. But, Cloud Build is available out-of-the-box in your GCP account and that is a major advantage. The only thing you will need is a build configuration file in your git repository containing the build steps. Each build step is running in its own Docker container. Several cloud builders which can be used as a build step are generally available. You can read more about Cloud Build on the  overview  and  concepts  website of GCP. There are three categories of build steps: Official  cloud builders provided by GCP; Community  cloud ...

EVENT DRIVEN MICROSERVICES

EVENT BASED MICROSERVICES - Event Sourcing In a Microservice Architecture, especially with Database per Microservice, the Microservices need to exchange data. For resilient, highly scalable, and fault-tolerant systems, they should communicate asynchronously by exchanging Events. In such a case, you may want to have Atomic operations, e.g., update the Database and send the message. If you have SQL databases and want to have distributed transactions for a high volume of data, you cannot use the two-phase locking (2PL) as it does not scale. If you use NoSQL Databases and want to have a distributed transaction, you cannot use 2PL as many NoSQL databases do not support two-phase locking. In such scenarios, use Event based Architecture with Event Sourcing. In traditional databases, the Business Entity with the current “state” is directly stored. In Event Sourcing, any state-changing event or other significant events are stored instead of the entities. It means the modifications of a Busines...

Introduction to Customer Segmentation in Python !!!

Introduction to Customer Segmentation in Python In this tutorial, you're going to learn how to implement customer segmentation using RFM(Recency, Frequency, Monetary) analysis from scratch in Python. In the Retail sector, the various chain of hypermarkets generating an exceptionally large amount of data. This data is generated on a daily basis across the stores. This extensive database of customers transactions needs to analyze for designing profitable strategies. All customers have different-different kind of needs. With the increase in customer base and transaction, it is not easy to understand the requirement of each customer. Identifying potential customers can improve the marketing campaign, which ultimately increases the sales. Segmentation can play a better role in grouping those customers into various segments. In this tutorial, you will cover the following topics: What is Customer Segmentation? Need of Customer Segmentation Types of Segmentation Customer...