Data Engineering - Tools & Intro
So I just realized that I am here after a month or so. I was busy at work and traveling.
I am starting a kind of new series, I say it Data Engineering Series in which I will be discussing different tools. Of course, I am not able to discuss the entire concept of Data Engineering neither I know it as I will be learning myself.
What is Data Engineering?
Data Engineering is all about developing, maintaining systems that are responsible for transferring data in large volumes and make it available for analysts and data scientists to use it for analyzing and data modeling. Data engineering is a superset of Data Science or the subset, not clear to me but the collaboration of data engineers and scientists fruits useful data-driven solutions.
Data Engineering tools
It consists of several tools. Some are dealing with data storage while others with analysis and ETL. Ofcourse, Apache Kafka is one of them. The others tools that I might be covering are Apache Airflow, an ETL tool and Hadoop Ecosystem components like HDFS, Hive, Yarn, Pig etc. There is no such specific roadmap so tools can be covered in any order. Since I mostly work in Python, Java so will be trying my best to find some way to interact with Python or Java but it is not necessary as most of Hadoop related systems are in either Java or Scala.
So, stay tuned and I will be back shortly with the new post.
Ten Machine Learning Algorithms to Learn Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them. 1. Principal Component Analysis(PCA)/SVD PCA is an unsupervised method to understand global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to un...
Comments
Post a Comment