Skip to main content

Posts

Showing posts from 2017

Artificial Intelligence vs. Machine Learning vs. Deep Learning

The world as we know it is moving towards machines — big time. But we cannot fully utilize the working of any machine without a lot of human interaction. In order to do that, we need some kind of intelligence for the machines. This is where artificial intelligence comes in. It is the concept of machines being smart enough to carry out numerous tasks without any human intervention. The terms  artificial intelligence  and  machine learning  often lead to confusion and many of us don't exactly know the difference between them. Hence, we end up using these terms interchangeably. Machine learning is basically the learning concepts of machines through which we can achieve artificial intelligence.  Deep learning  is the latest thing in the artificial intelligence field. It is one of the ways to implement machine learning to achieve AI. Most of us have seen AI-based movies with machines having their own intelligence like the  Terminator  series or  I, Robot . But in real life, the co

Apache Spark v/s Hadoop MapReduce

The term big data has created a lot of hype already in the business world. Hadoop and Spark are both big data frameworks; they provide some of the most popular tools used to carry out common big data-related tasks. In this article, we will cover the differences between Spark and Hadoop MapReduce. Introduction Spark :   It is an open-source big data framework. It provides a faster and more general-purpose data processing engine. Spark is basically designed for fast computation. It also covers a wide range of workloads — for example, batch, interactive, iterative, and streaming. Hadoop MapReduce :   It is also an open-source framework for writing applications. It also processes structured and unstructured data that are stored in HDFS. Hadoop MapReduce is designed in a way to process a large volume of data on a cluster of commodity hardware. MapReduce can process data in batch mode. Data Processing Spark :   Apache Spark is a good fit for both batch processing and stream proce

Recommendation Engines - Know How

Recommendation Engines perform a variety of tasks - but the most important one is to find products that are most relevant to the user. Content based filtering, collaborative filtering and Association rules are common approaches to do so. So let's first  Understand basics of Recommendation Engines and then we'll later on Build Our Own Recommendation Engine !!! HIGH QUALITY, PERSONALIZED  ARE THE HOLY GRAIL FOR EVERY ONLINE STORE. UNLIKE OFFLINE STORES,  ONLINE STORES HAVE NO SALES PEOPLE. USERS ON THE OTHER HAND  HAVE LIMITED TIME AND PATIENCE,  ARE NOT SURE WHAT THEY ARE LOOKING FOR  ONLINE STORES HAVE A HUGE NUMBER OF  PRODUCTS. RECOMMENDATIONS HELP USERS  NAVIGATE THE MAZE OF ONLINE STORES  FIND WHAT THEY ARE LOOKING FOR  FIND THINGS THEY MIGHT LIKE, BUT DIDN’T KNOW OF. RECOMMENDATIONS HELP ONLINE STORES  SOLVE THE PROBLEM OF DISCOVERY. BUT HOW? Lets Explain this. ONLINE STORES HAVE DATA 1) WHAT USERS  BOUGHT 2) WHAT USERS  BROWSED 3) WHAT USERS  CLICKED 4) WH

What is Apache Ambari?

What is Apache Ambari? It provides a highly interactive dashboard which allows the administrators to visualize the progress and status of every application running over the Hadoop cluster. Its flexible and scalable user-interface allows a range of tools such as Pig, MapReduce, Hive, etc., to be installed on the cluster and administers their performances in a user-friendly fashion. Some of the key features of this technology can be highlighted as: Instantaneous insight into the health of Hadoop cluster using pre-configured operational metrics User-friendly configuration providing an easy step-by-step guide for installation Dependencies and performances monitored by visualizing and analyzing jobs and tasks Authentication, authorization and auditing by installing Kerberos-based Hadoop clusters Flexible and adaptive technology fitting perfectly in the enterprise environment. How is Ambari different from ZooKeeper? This description may confuse you as ZooKeeper performs

Basics Of Hadoop Ecosystem - 1

Basics of Hadoop Ecosystem – 1 Built on a commodity hardware, Hadoop works on the basic assumption that hardware failures are common. These failures are taken care by Hadoop Framework. Hadoop is an open source framework developed using Java based programming language to support and process humongous data in a distributed computing environment.  Click here  to read Part 2... Developed back in 2005, Hadoop is an open source framework developed using Java based programming language to support and process humongous data in a  distributed computing environment . Doug Cutting and Mike Cafarella are the developers of the Hadoop.   Built on a commodity hardware, Hadoop works on the basic assumption that hardware failures are common. These failures are taken care by Hadoop Framework. What is Hadoop Ecosystem? Hadoop Ecosystem refers to the various components of Apache Hadoop Software library. It is a set of tools and accessories to address particular needs in pr

Hadoop As Big Data

Introduction: In this blog, I will discuss Big Data, its characteristics, different sources of Big Dataand some key components of Hadoop Framework. In the two part blog series, I will cover the basics of Hadoop Ecosystem. Let us start with Big Data and its importance in Hadoop Framework. Ethics, privacy, security measures are very important and need to be taken care while dealing with the challenges of Big Data. Big Data: When the Data itself becomes the part of the problem. Data is crucial for all organizations. It has to be stored for future use. We can refer the term Big Data as the data, which is beyond the storage capacity and the processing power of an organization. What are the sources of this huge data? There are different sources of data such as the social networks, CCTV cameras, sensors, online shopping portals, hospitality data, GPS, automobile industry etc., that generate data massively. Big Data can be characterized as: The Volume of the Data Velocity of

Basics Of Hadoop Ecosystem - 2

Basics of Hadoop Ecosystem – 2 Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters. It has a very simple yet highly interactive UI to install various tools and perform various management, configuration and monitoring tasks.  Introduction: In part 2 of the blog series I will cover the other core components of a Hadoop Framework including querying part, external integration, data exchange, co-ordination and managing as well as monitoring of Hadoop Clusters. Please refer to Basic’s of Hadoop Ecosystem  Part 1 ... QUERYING: Pig seems quite useful however; I am more of an SQL person. For those of us who still like SQL, we have SQL for Hadoop. HIVE Hive is a distributed data warehouse built on atop of HDFS to manage and organize large amounts of data. It provides a query based on SQL semantics  (HiveQL)  which is translated by the runtime engine to Map Reduce jobs for querying the data. Schematized data store