Machine learning terminology from A to Z

A
Active learning
Active learning in machine learning refers to the cases when an algorithm can query a user (also called teacher or oracle) to label new data points with the desired outputs.
Annotation
In the field of AI development, data annotation (also called labeling or tagging) is the process of adding one or more meaningful labels to data.
Artificial Neural Network (ANN)
Artificial neural network (ANN) is a computing system with an architecture inspired by the biological brains of living organisms.
Automated machine learning (AutoML)
AutoML is the process of automating various complicated or time-consuming tasks in machine learning.
B
Backpropagation
In machine learning, backpropagation is a form of supervised learning algorithm, mostly used to train feedforward neural networks.
Bidirectional Encoder Representations from Transformers (BERT)
BERT is a family of masked-language models based on the transformers architecture.
Bounding box
In data labeling, specifically in object detection tasks, bounding boxes are rectangles used to describe spatial location of objects in images.
C
Classification
Classification is a type of algorithm used in supervised learning.
Computational linguistics
Computational linguistics is a field of scientific study of natural language and modeling of language via computing systems.
Computer vision
Computer vision is an interdisciplinary field that deals with enabling computers to gain high-level understanding from digital images or videos.
Convolutional neural networks
Convolutional neural networks are a common type of artificial neural network used for analyzing visual images.
D
Data annotation
In the field of machine learning, annotation usually refers to data annotation (also called data labeling). It is the process of adding meaningful tags to data, so that this data can be later used to train ML models.
Data labeling
In the field of machine learning, annotation usually refers to data annotation (also called data labeling). It is the process of adding meaningful tags to data, so that this data can be later used to train ML models.
Data-centric AI
Data-centric AI is an approach to creating AI systems where, instead of tweaking hyperparameters of ML models, you focus on improving your data quality.
Deep learning
Deep learning is one of machine learning methods, which is based on artificial neural networks with representation learning.
E
Extraction
Extraction refers to the task of automatic identification of terms that best describe the topic of a given document.
F
Facial recognition
Facial recognition is a technology which is able to match a human face from a digital image or a video frame against a database of faces.
G
Generative adversarial network
Generative adversarial network (GAN) is a type of ML algorithm built using a combination of two neural networks (generator and discriminator).
H
Hyperparameters
Hyperparameters are parameters used to set up and control the learning process of an ML model.
I
Image recognition
Image recognition is a process that helps computer systems to identify and classify objects and patterns in images.
Image segmentation
In machine learning, image segmentation means dividing a digital image into different segments or regions, each made of a group of pixels.
Imbalanced data
Imbalanced data refers to the case when the data in a training dataset contains significantly different numbers of labels for each class.
K
K-nearest neighbor
The k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method used for classification and regression.
L
Large language model (LLM)
A large language model (LLM) is a deep learning model which consists of a neural network with a huge number of parameters (billions of weights or more) and is trained on large quantities of unlabelled text via self-supervised learning.
M
Machine learning (ML)
Machine learning (ML) is a subset of the artificial intelligence (AI) field. It focuses on developing methods for computers to leverage data to improve their performance on certain tasks.
N
Natural language processing
Natural language processing is a subfield at the intersection of linguistics, computer science, and artificial intelligence.
O
Overfitting in machine learning
Overfitting in machine learning refers to the situation when a model is trained with too much data.
P
Part-of-speech tagging
Part-of-speech tagging (POS tagging or POST) is the process of labeling every word in a text corpus with a particular part of speech, based on both its definition and its context.
R
Regression algorithms
In machine learning, regression algorithms are used for prediction and forecasting.
S
Sentiment analysis
Sentiment analysis is a natural language processing technique used to determine the emotional tone behind a piece of text.
T
Training set
Training set (training dataset) is a set of examples used to initially train an ML model.
U
Underfitting in machine learning
Underfitting in machine learning is a situation which occurs when a mathematical model fails to capture the underlying structure of the data.
V
Validation set in machine learning
In machine learning, a validation dataset is a portion of the dataset used to evaluate the performance of a model during the training process.
X
XgBoost
XGBoost is an optimized, distributed gradient boosting library that implements machine learning algorithms within the Gradient Boosting framework.
Y
YOLO (object detection algorithm)
The YOLO (you only look once) algorithm is a popular algorithm used for object detection known for its speed and accuracy.
Z
Zero-shot learning
In machine learning, zero-shot learning (ZSL) is a problem setup where a model is supposed to predict classes for the samples which were not observed during its training.