Skip to main content

Let's Understand Ten Machine Learning Algorithms

Ten Machine Learning Algorithms to Learn

machine learning algorithms
Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them.

1. Principal Component Analysis(PCA)/SVD

PCA is an unsupervised method to understand global properties of a dataset consisting of vectors. Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important (ie have high variance amongst themselves, but low covariance with others). One way to think of top PCs of a matrix is to think of its eigenvectors with highest eigenvalues. SVD is essentially a way to calculate ordered components too, but you don’t need to get the covariance matrix of points to get it.
machine learning algorithms
This Algorithm helps one fight curse of dimensionality by getting datapoints with reduced dimensions.

Libraries:

Introductory Tutorial:

2a. Least Squares and Polynomial Fitting

Remember your Numerical Analysis code in college, where you used to fit lines and curves to points to get an equation. You can use them to fit curves in Machine Learning for very small datasets with low dimensions. (For large data or datasets with many dimensions, you might just end up terribly overfitting, so don’t bother). OLS has a closed form solution, so you don’t need to use complex optimization techniques.
machine learning algorithms
As is obvious, use this algorithm to fit simple curves / regression

Libraries:

Introductory Tutorial:

2b. Constrained Linear Regression

Least Squares can get confused with outliers, spurious fields and noise in data. We thus need constraints to decrease the variance of the line we fit on a dataset. The right method to do it is to fit a linear regression model which will ensure that the weights do not misbehave. Models can have L1 norm (LASSO) or L2 (Ridge Regression) or both (elastic regression). Mean Squared Loss is optimized.
machine learning algorithms
Use these algorithms to fit regression lines with constraints, avoiding overfitting and masking noise dimensions from model.

Libraries:

Introductory Tutorial(s):

3. K means Clustering

Everyone’s favorite unsupervised clustering algorithm. Given a set of data points in form of vectors, we can make clusters of points based on distances between them. It’s an Expectation Maximization algorithm that iteratively moves the centers of clusters and then clubs points with each cluster centers. The input the algorithm has taken is the number of clusters which are to be generated and the number of iterations in which it will try to converge clusters.
machine learning algorithms
As is obvious from the name, you can use this algorithm to create K clusters in dataset

Library:  

Introductory Tutorial(s):

4. Logistic Regression

Logistic Regression is constrained Linear Regression with a nonlinearity (sigmoid function is used mostly or you can use tanh too) application after weights are applied, hence restricting the outputs close to +/- classes (which is 1 and 0 in case of sigmoid). Cross-Entropy Loss functions are optimized using Gradient Descent. A note to beginners: Logistic Regression is used for classification, not regression. You can also think of Logistic regression as a one layered Neural Network. Logistic Regression is trained using optimization methods like Gradient Descent or L-BFGS. NLP people will often use it with the name of Maximum Entropy Classifier.
This is what a Sigmoid looks like:
machine learning algorithms
Use LR to train simple, but very robust classifiers.

Library:

Introductory Tutorial(s):

5. SVM (Support Vector Machines)

SVMs are linear models like Linear/ Logistic Regression, the difference is that they have different margin-based loss function (The derivation of Support Vectors is one of the most beautiful mathematical results I have seen along with eigenvalue calculation). You can optimize the loss function using optimization methods like L-BFGS or even SGD.
machine learning algorithms
Another innovation in SVMs is the usage of kernels on data to feature engineer. If you have good domain insight, you can replace the good-old RBF kernel with smarter ones and profit.
One unique thing that SVMs can do is learn one class classifiers.
SVMs can used to Train a classifier (even regressors)

Library:

Introductory Tutorial(s):

Note: SGD based training of both Logistic Regression and SVMs are found in SKLearn’s http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html, which I often use as it lets me check both LR and SVM with a common interface. You can also train it on >RAM sized datasets using mini batches.

6. Feedforward Neural Networks

These are basically multilayered Logistic Regression classifiers. Many layers of weights separated by non-linearities (sigmoid, tanh, relu + softmax and the cool new selu). Another popular name for them is Multi-Layered Perceptrons. FFNNs can be used for classification and unsupervised feature learning as autoencoders.
machine learning algorithms
Multi-Layered perceptron

machine learning algorithms
FFNN as an autoencoder
FFNNs can be used to train a classifier or extract features as autoencoders

Libraries:

Introductory Tutorial(s):

7. Convolutional Neural Networks (Convnets)

Almost any state of the art Vision based Machine Learning result in the world today has been achieved using Convolutional Neural Networks. They can be used for Image classification, Object Detection or even segmentation of images. Invented by Yann Lecun in late 80s-early 90s, Convnets feature convolutional layers which act as hierarchical feature extractors. You can use them in text too (and even graphs).

Use convnets for state of the art image and text classification, object detection, image segmentation.

Libraries:

Introductory Tutorial(s):

8. Recurrent Neural Networks (RNNs):

RNNs model sequences by applying the same set of weights recursively on the aggregator state at a time t and input at a time t (Given a sequence has inputs at times 0..t..T, and have a hidden state at each time t which is output from t-1 step of RNN). Pure RNNs are rarely used now but its counterparts like LSTMs and GRUs are state of the art in most sequence modeling tasks.
machine learning algorithms
RNN (If here is a densely connected unit and a nonlinearity, nowadays f is generally LSTMs or GRUs ). LSTM unit which is used instead of a plain dense layer in a pure RNN.
machine learning algorithms
Use RNNs for any sequence modelling task specially text classification, machine translation, language modelling

Library:

https://github.com/tensorflow/models(Many cool NLP research papers from Google are here)

Introductory Tutorial(s):

9. Conditional Random Fields (CRFs)

CRFs are probably the most frequently used models from the family of Probabilitic Graphical Models (PGMs). They are used for sequence modeling like RNNs and can be used in combination with RNNs too. Before Neural Machine Translation systems came in CRFs were the state of the art and in many sequence tagging tasks with small datasets, they will still learn better than RNNs which require a larger amount of data to generalize. They can also be used in other structured prediction tasks like Image Segmentation etc. CRF models each element of the sequence (say a sentence) such that neighbors affect a label of a component in a sequence instead of all labels being independent of each other.
Use CRFs to tag sequences (in Text, Image, Time Series, DNA etc.)

Library:

Introductory Tutorial(s):

7 part lecture series by Hugo Larochelle on Youtube: https://www.youtube.com/watch?v=GF3iSJkgPbA

10. Decision Trees

Let’s say I am given an Excel sheet with data about various fruits and I have to tell which look like Apples. What I will do is ask a question “Which fruits are red and round ?” and divide all fruits which answer yes and no to the question. Now, All Red and Round fruits might not be apples and all apples won’t be red and round. So I will ask a question “Which fruits have red or yellow color hints on them? ” on red and round fruits and will ask “Which fruits are green and round ?” on not red and round fruits. Based on these questions I can tell with considerable accuracy which are apples. This cascade of questions is what a decision tree is. However, this is a decision tree based on my intuition. Intuition cannot work on high dimensional and complex data. We have to come up with the cascade of questions automatically by looking at tagged data. That is what Machine Learning based decision trees do. Earlier versions like CART trees were once used for simple data, but with bigger and larger dataset, the bias-variance tradeoff needs to solved with better algorithms. The two common decision trees algorithms used nowadays are Random Forests (which build different classifiers on a random subset of attributes and combine them for output) and Boosting Trees (which train a cascade of trees one on top of others, correcting the mistakes of ones below them).
Decision Trees can be used to classify datapoints (and even regression)

Libraries

Introductory Tutorial:

TD Algorithms (Good To Have)

If you are still wondering how can any of the above methods solve tasks like defeating Go world champion like DeepMind did, they cannot. All the 10 type of algorithms we talked about before this was Pattern Recognition, not strategy learners. To learn strategy to solve a multi-step problem like winning a game of chess or playing Atari console, we need to let an agent-free in the world and learn from the rewards/penalties it faces. This type of Machine Learning is called Reinforcement Learning. A lot (not all) of recent successes in the field is a result of combining perception abilities of a Convnet or LSTM to a set of algorithms called Temporal Difference Learning. These include Q-Learning, SARSA and some other variants. These algorithms are a smart play on Bellman’s equations to get a loss function that can be trained with rewards an agent gets from the environment.
These algorithms are used to automatically play games mostly :D, also other applications in language generation and object detection.

Libraries:

Introductory Tutorial(s):

Watch David Silver course: https://www.youtube.com/watch?v=2pWv7GOvuf0
These are the 10 machine learning algorithms which you can learn to become a data scientist.

Comments

Popular posts from this blog

gRPC with Java : Build Fast & Scalable Modern API & Microservices using Protocol Buffers

gRPC Java Master Class : Build Fast & Scalable Modern API for your Microservice using gRPC Protocol Buffers gRPC is a revolutionary and modern way to define and write APIs for your microservices. The days of REST, JSON and Swagger are over! Now writing an API is easy, simple, fast and efficient. gRPC is created by Google and Square, is an official CNCF project (like Docker and Kubernetes) and is now used by the biggest tech companies such as Netflix, CoreOS, CockRoachDB, and so on! gRPC is very popular and has over 15,000 stars on GitHub (2 times what Kafka has!). I am convinced that gRPC is the FUTURE for writing API for microservices so I want to give you a chance to learn about it TODAY. Amongst the advantage of gRPC: 1) All your APIs and messages are simply defined using Protocol Buffers 2) All your server and client code for any programming language gets generated automatically for free! Saves you hours of programming 3) Data is compact and serialised 4) API ...

What is Big Data ?

What is Big Data ? It is now time to answer an important question – What is Big Data? Big data, as defined by Wikipedia, is this: “Big data is a broad term for  data sets  so large or complex that traditional  data processing  applications are inadequate. Challenges include  analysis , capture,  data curation , search,  sharing ,  storage , transfer ,  visualization ,  querying  and  information privacy . The term often refers simply to the use of  predictive analytics  or certain other advanced methods to extract value from data, and seldom to a particular size of data set.” In simple terms, Big Data is data that has the 3 characteristics that we mentioned in the last section – • It is big – typically in terabytes or even petabytes • It is varied – it could be a traditional database, it could be video data, log data, text data or even voice data • It keeps increasing as new data keeps flowing in This kin...