Cheat Sheets for AI, Machine Learning, Neural Networks, Big Data & Deep Learning

Cheat Sheets for AI, Machine Learning, Neural Networks, Big Data & Deep Learning

I have been collecting AI cheat sheets for the last few months, and I’ve been sharing them with friends and colleagues from time to time. Recently, a lot of inquiries concerning the same sheets have been made, and so I’ve decided to organize and share the entire collection of the sheets. In this article, I have added descriptions and excerpts to contextualize and make things more interesting.

Below is the comprehensive list that I have compiled on this topic with Big-O provided at the end of the article.

Machine Learning Overview

Machine Learning Cheat Sheet

Machine Learning: Scikit-learn algorithm

The machine learning cheat sheet helps you get the right estimator for the job which is the most challenging part. The flowchart helps you check the documentation and rough guide of each estimator which assists you to discover more information about related problems and their ultimate solutions.


Scikit-learn (previously know as scikits.learn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means, and DBSCAN. The software is designed to inter-operate with the Python numerical and scientific libraries NumPy and SciPy.

Scikit-Learn Cheat Sheet


The below machine learning cheat is from Microsoft Azure. It will help you choose the appropriate machine learning algorithms for your predictive analytics solution. To start with, the cheat sheet will ask you about the nature of the data and then suggest the best algorithm for the job.

Neural Network

A cheat sheet for Neural Networks Graphs

Neutral Networks Cheat Sheet

Python for Data Science

Python Data Science Cheat Sheet

Big data cheat sheet


Numpy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version are often slower compared to compiled equivalents. Numpy solves the slowness problem partially by providing multidimensional arrays and functions and operators that operate efficiently on arrays, requiring rewriting some codes, in most cases, inner loops using Numpy.

Numpy Cheat Sheet


Google announced the second-generation of the TPU as well as the availability of the TPUs in Google Compute Engine in May 2017. The second-generation TPUs deliver up to 180 teraflops of performance. When organized into clusters of 64 TPUs, they provided up to 11.5 petaflops.

TensorFlow Cheat Sheet



This term ‘Pandas’ is coined from the term “panel data” which is an econometrics term meaning multidimensional structured data sets.

Pandas Cheat Sheet


After Google’s TensorFlow team decided to support Keras in TensorFlow’s core library in 2017, Chollet explained that Keras was conceived to be an interface rather than an end-to-end machine-learning framework. Keras, therefore, presents a more advanced, more intuitive set of abstractions which make it easy to configure neural networks despite the back-end scientific computing library.

Keras Cheat Sheet

Data Wrangling

The term “data wrangler” has started to gain popularity in the pop culture. In the 2017 movie Kong: Skull Island, Marc Evan Jackson as a character is introduced as “Steve Woodward, our data wrangler.”

Data Wrangling Cheat Sheet

Data wrangling with dplyr and tidyr

Data wrangling with dplyr and tidyr cheat Sheet


Matplotlib refers to a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib.

Pyplot is a matplotlib module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python, with the advantage that it is free.

Matplotlib Cheat Sheet


Data Visualization

Data Visualization Cheat Sheet


Scipy builds on the Numpy array object and is part of the Numpy stack which includes tools like Matplotlib, pandas, and SymPy, and an expanding set of scientific computing libraries.

This NumPy stack has similar users to other applications such as MATLAB, GNU Octave, and Scilab. The NumPy stack is also sometimes referred to as the Scipy stack.


PySpark Cheat Sheet


Big-O Algorithm Cheat Sheet

Big-O Algorithm Complexity Chart

BIG-O Algorithm Data Structure Operations

Big-O Array Sorting Algorithms

Blog Kurator

portrait Christina

Christina Friede

Business Development


10 Machine Learning Algorithms You Should Know in 2018

10 Machine Learning Algorithms You Should Know in 2018

In 2017, the word big data became quite popular and it will continue to prevail even in the coming years. This article provides a list of the most commonly used ML algorithms which you should be aware of.

  1. Random forest

First, from the original data, select randomly and form different subsets.

The original data is called Matrix S and contains1-N rows. A, B, C are the features and C stands for categories.

Create, let’s say, M sets of random subsets from S.  

From these subsets, we then get M sets of decision trees. When you add new data into these trees, it is possible to get M sets of results.

Using certain attributes, classify a data set into groups. At each node, execute a test using branch judgment. Further split the data into two groups. Tests are carried out based on the existing data, and the new data being added is classified to the corresponding group. Group data according to some features. Each time the process takes a step forward, there is a judging branch. The work of the judgment is to divide the data into two phases, and the process repeats itself. When there is incoming data, the computer categorizes it into the right leaves.

  1. K-NearestNeighbour

When a new datum comes in, determining the category in which it falls involves looking at the category that has the most points nearest to it. An example is distinguishing between a cat and dog. We base on our judgment on these two features- claws and sound.

The known symbols used to represent categories are circles and rectangles. This brings us to our next question- what do stars represent?

When K=3, the nearest three points are connected by these three lines. The circles are also more. This means the star belongs to the cat.


  1. Markov

Markov Chain is made up of states and transitions. An example is a Markov Chain that is based on ‚the quick brown fox jumps over the lazy dog‘ ‚.

The first step is to set every word under a state, then to calculate the probability of state transitions. By just a single sentence, these probabilities can be calculated. By using large data of texts in the computer, you want to get a larger state transition matrix.

  1. Neural Network

The neural network is made up of neures and neures connections. The first layer is called the input layer while the last is called the output layer. Hidden and output layers both have their own classifiers.

In Neutral Network, an input can end up into at least two classes. Once an input comes in the network, it is activated, and the result passed down to the layer that is next. Scores in the output layer are the same as the scores for each class.


From the example above, input passes through different knots to generate different scores.

  1. Logistic Regression

If the probability of the predicting target is greater than zero and equal to or less than 1, the simple linear model is cannot be used. This is because when the domain of division does not lie within a given level, the range could be bigger than the laid out interval.

Therefore, we would have to go with this one.

How does one get this model? It should be noted that the model has to either be larger than or equal to zero and less than or equal to one

Transforming the formula gives us the logistic regression model

We can get the corresponding coefficients by calculating the original data. This is the logistic model we get.

  1. Support Vector Machine

In order to separate the two classes from hyperplane, it’s best to use the hyperplane that leaves the maximum margin from both classes. Given that Z2 is greater than Z1, the green one is a better choice.

The class that is above the line is greater than or equal to one while the one below it is less than or equal to -1.

Find the distance from the point to the surface using the equation in the graph

The result is the expression of total margin. The goal is to maximize the margin, which we do by minimizing the denominator.



Using 3 points to determine the optimal hyperplane, define weight vector=(2,3)-(1,1)

We get a weight factor (a, 2a) and substitute in the equation.

  1. Adaboost

It is a boosting measure. Boosting refers to the process of gathering the classifiers that lacked satisfied results and generating a classifier with better effects.

From the illustration above, you can see that the individual trees 1 and 2 have no good effects when put independently. However, when we input the same data and add up the results, we get some more convincing results.

Taking a handwriting recognition below as an example, the panel extracts a lot of features such as beginning direction and distance from beginning to the ending point.

When training the machine, it gets each feature’s weight. It, therefore, does little classification, so it has little weight.

But the alpha angle here has great recognizability so the weight issue will be dealt with. The result will be achieved considering all these features.

  1. K-means

The first step is to divide the data into three different classes. The biggest part is the pink one while the smallest is the yellow one. Select 3, 2, 1 as the default, then calculate the distance between the rest of the data and the defaults. After that, classify it into the class with the shortest distance.

Once you classify, calculate the means of every class and make it the new center.

Stop when there are no more changes.

  1. Naive Bayes

Below is an NLP example;

Determine if the texts‘ attitude is positive or negative.

To solve the problem, choose a few words

Words and their count

You may be given a question and asked the category in which it belongs. Using the Bayes rule, you can easily solve it.

In this class, that question becomes “what’s the probability of occurrence of the sentence?”

Blog Kurator

portrait Christina

Christina Friede

Business Development