Machine Learning Glossary
Engaging in Machine Learning often gets you overwhelmed at first. There are a ton of different terms, discribing functions, algorithms, mathematical objects, procedures and so on and so forth.
The following glossary compiles some of the most important terms in machine learning. You will find algorithms, mathematical objects, programming tools and more. All the terms stem my university courses on machine learning. Grasping the terms and their meanings at first helped me see the bigger picture.
Often, code examples are provided, mostly in Python, using libaries like Numpy and Sckit learn.
Accuracy
Section titled “Accuracy”A measure for the accuracy of a classification model. In easy words, the amount of predictions the model got correctly.
Code example:
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)Activation function
Section titled “Activation function”A function that determines the degree of activeness of a single neuron in a neural network. While biological neurons are inactive or active, artifical neurons are in a range between 0 and 1.
A commonly used activation function is the ReLU.
Often used variable of the single coefficients in a mathematical function.
Choleksy
Section titled “Choleksy”Classification
Section titled “Classification”Task for a machine learning model, which is about classifing data into given categories. The most famous example might be the iris dataset, containing information about different types of a flower. Through classification, the model can predict to which type of the flower species the test data belongs.
Convex
Section titled “Convex”A function, that has only one global minimum. Only valid type for some regression models, as some are unable to deal with multiple minima.
Cosine
Section titled “Cosine”Distance function.
from scipy.spatial import distance
distance.cosine([1, 2], [1, 2])# 0Returns 0 as the vectors are identical.
Also check out the Cityblock distance and the Euclidean distance.
Cityblock
Section titled “Cityblock”Another name for the Manhattan / Taxicab metric.
from scipy.spatial import distance
distance.cityblock([1, 2], [1, 5])# 3Deep learning
Section titled “Deep learning”A subset of neural network applications in which 3 or more layers are used.
Derivate
Section titled “Derivate”The derivate of a function is its rate of change. Therefore, we can use it to find the slope of a function at a certain point, or the minimum and maximum of it.
Code example using SymPy:
from sympy import diff, Symbol
x = Symbol("x")func = 2 * x + 2
diff(func, x) # 2Outputs as it is the derivate of
Determinant
Section titled “Determinant”import numpy as np
A = np.array([[1,2], [2,3]])
np.linalg.det(A)# -1Distance
Section titled “Distance”Mostly the distance of two vectors / points in a two or three dimensional space (or higher). Often used for solving classification problems, as the nearest neigbours of a data point need to be found.
Elastic net
Section titled “Elastic net”Regularized regression model, which combines both L1 and L2 penalties.
Euclidean distance
Section titled “Euclidean distance”A distance measure for vectors of n dimensions. Often used for the k-nearest neighbour algorithm.
Code example:
from scipy.spatial import distance
x = np.array([1, 3, 4])y = np.array([2, 4, 1])
distance.euclidean(x, y)# 3.3166Gaussian
Section titled “Gaussian”Gated graph sequence neural network
Section titled “Gated graph sequence neural network”Gradient
Section titled “Gradient”Gradient descent
Section titled “Gradient descent”Grid search
Section titled “Grid search”Ground truth
Section titled “Ground truth”The set of data, mostly used for testing our model, which is assumed to be correct.
Hessian
Section titled “Hessian”Hyperparameter
Section titled “Hyperparameter”k-means
Section titled “k-means”k-nearest-neighbor
Section titled “k-nearest-neighbor”A algorithm for regularization
Learning
Section titled “Learning”Mostly the process of minizing the cost function of the model. This leads to the predictions being more precise - therefore, the model is learning.
Linear discriminant analysis
Section titled “Linear discriminant analysis”A model for regression.
Linear Regression
Section titled “Linear Regression”A function used to convert a real number to a probability score (between 0 and 1).
Alternatives to this function are:
- Softmax
- Sigmoid
- ReLU
Logistic regression
Section titled “Logistic regression”Not to be confused with linear regression, as logistic regression is not an approach to solving regression problems. Logistic regression is an algorithm for classification.
Loss function
Section titled “Loss function”The function that describes the differences of the predictions to the actual data. The higher the difference, the less precise is our model. Therefore, the loss function is to be minimized.
Manhattan distance
Section titled “Manhattan distance”Also known as Taxicab or Cityblock distance, is a metric which can be thought of like the route a taxi driver has to take. While he can’t drive through a building, he has to take (the shortest) a path around it.
Code example with SciPy:
from scipy.spatial import distance
x = np.array([1, 3, 4])y = np.array([2, 4, 1])
distance.cityblock(x, y) # 5You can calculate the result for yourself like this: |1 - 2| + |3 - 4| + |4 - 1| = 1 + 1 + 3 = 5
Matrix
Section titled “Matrix”Maximum
Section titled “Maximum”Minimum
Section titled “Minimum”Famous dataset for testing classifaction models.
Neural network
Section titled “Neural network”Neuron
Section titled “Neuron”Neurons are the building blocks of neural networks. Most neural networks consist of thousands of Neurons, sometimes even more. One can imagine a neuron like a biological neuron - yet, the machine learning implemention is a function underneath. The neuron has a state of activeness, which can lead to triggering other neurons. A single neuron or multiple ones can solve different sub tasks in an application.
In image recognition, there can be neurons focusing on the background, on the shape, on the color and so on.
Normalization
Section titled “Normalization”The process of scaling data for training a model (sometimes even for testing).
Code example with Numpy:
import numpy as np
X_train = (np.array(X_train) - np.mean(X_train, axis = 0).ravel())\/(np.std(X_train,axis = 0).ravel())
X_test = (np.array(X_test) - np.mean(X_test, axis = 0).ravel())\/(np.std(X_test,axis = 0).ravel())A Python library for scientific programming, known for high performance. For more, checkout the cheatsheet on Numpy
Ordinal
Section titled “Ordinal”One hot encoding
Section titled “One hot encoding”Overfitting
Section titled “Overfitting”A model that adapts to much to the data can be called overfitting. A great example is image classification. Imagine you provide your model with many pictures of dogs. Yet, the overwhelming number of images are of a single dog breed. Therefore, the model adapts to this specific breed and its characteristics and is now likely unable to identify other breeds as dogs in general. The model is overfitting. Of course, providing a strongly biased set of images must be avoided - but even slight biases can lead to overfitting.
Python
Section titled “Python”The most popular programming language for machine learning
Regularization
Section titled “Regularization”ReLU stands for rectifier linear unit. It is used as an activation function in neural networks, as it maps real numbers onto the range 0 to 1. The definition is quite easy, it simply returns the maximum of (0, x), so x for any x > 0.
As it maps onto 0 - 1, it does quite the same as Softmax, Logit and Sigmoid.
Risidual sum of squares
Section titled “Risidual sum of squares”A measure of the variance in a regression model.
Root mean squared error
Section titled “Root mean squared error”In short RMSE, is the squared difference of predicted values and the actual values. The actual values mostly describe the test data, while we provided the model for learning with the train dataset. The lower the RMSE, the more precise is our model.
Ridge regression
Section titled “Ridge regression”Shrinkage
Section titled “Shrinkage”Sigmoid
Section titled “Sigmoid”A function, that renders the famous S-shaped curve, also known as sigmoid curve. This function maps a real number onto the range 0 to 1, and is therefore often used as a wrapper to receive a probability score.
Or:
Alternatively, functions like Softmax, Logit and ReLU can be used. Especially ReLU is today more often used than Sigmoid.
Splitting
Section titled “Splitting”The act of splitting means splitting the whole data set into two sets: One for training the ML model, one for testing the model.
Code example: Splitting data 80/20 with Scikit learn:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=69)Keep in mind the data is not necessarily normalized yet.
Supervised learning
Section titled “Supervised learning”A subset of machine learning tasks in which the algorithm is provided with labeled data - the model knows the correct output.
Supervised learning is most classification and regression.
Support vector machine
Section titled “Support vector machine”In short, SVMs, are models for classifing data, based on computing support vectors for linearly and non-linearly separable data.
Stepsize backtracking
Section titled “Stepsize backtracking”An algorithmic approach to control the stepsizes for the gradient descent algorithm.
Stochastic gradient descent
Section titled “Stochastic gradient descent”Taxicab distance
Section titled “Taxicab distance”Another word for the Manhattan distance.
Underfitting
Section titled “Underfitting”Unsupervised learning
Section titled “Unsupervised learning”A subset of machine learning tasks, which are not about predictions. Therefore, the algorithm is not provided with labels and human interaction isn’t necessary.
Vector
Section titled “Vector”Weight
Section titled “Weight”A weight or multiple weights can be applied to parameters in different ML models to regularize their impact.