Natalie Kudan
Machine learning label vs feature, and other common terms
Machine learning is one of the fields of artificial intelligence that allows us to train computers without resorting to programming all the business logic directly. AI already plays a huge role in the development of computer science and the IT industry today.
Businesses are paying more and more attention to intelligent tools in order to develop their operations. It may be implemented in any software tool, as well as in research, manufacturing processes, and actual products. To understand how it all works first it's necessary to explore some essential terminology, types of machine learning, and its algorithms.
Before we begin explaining what this or that ML term stands for, let's think about why we need to know it. The answer may be very simple: machine learning, and artificial intelligence in general, is practically our everyday life now.
ML algorithms solve complex tasks like natural language processing (think automatic translators), sentiment analysis and content moderation, they do optical character recognition to enable text detection in images, and computer vision helps detect various objects in pictures and videos. Many modern smartphones, computers, programs, and even games today employ machine learning.
Machine learning entails that the computer recognizes patterns from examples. This does not involve programming with a certain set of so-called guidelines. Such patterns are contained in the data. Machine learning involves creating algorithms or a set of rules which learn from known patterns in the data and then make predictions.
In general, the goal of machine learning is to predict the result from some input data. For the system to learn how to make more accurate predictions, experts must first tell the computer the correct answers so that it can learn them. Although not all types of ML require a high quality training data set, which will be discussed further in more detail.
The more diverse and accurate the training data, the easier it is for the machine to find patterns and the more accurate the outcome. So, the fundamental principle of ML is that machines receive data and learn from it. Artificial Intelligence aims to learn how to perform various tasks, and in doing so allows it to make predictions or decisions based on previous data with minimal or no instruction from humans. However, before a computer can learn to make decisions like a human, a human has to teach it in most cases.
To enable us to incorporate the results of ML into our day-to-day activities, teams of data scientists and engineers create machine learning models. An ML model is essentially a piece of software (basically, a file containing all the layers and weights) that is trained to identify multiple correlations or patterns in a dataset. Normally, ML engineers train a model from datasets using a machine learning algorithm, which the model uses to analyze and remember that learning data.
The process of building a machine learning model
To launch the machine learning process, you first have to obtain a certain amount of labeled data, called a dataset. The algorithm will learn to work with them. For example, it can be photos of dogs and cats, which already have labels identifying these animals in the image.
After the training phase, the model will be able to recognize the dogs and cats in the new images without the labels. The process of training continues even after the predictions have been generated; ideally, the more labeled data will be analyzed by the computer, the more accurately it will be able to recognize animals on new images.
The basic representation of the entire machine learning model preparation involves approximately 7 steps:
Data collection
The obtaining of the required data is the start of the whole process. A designated team of experts in the field of ML identifies the data that will be utilized for training. By knowing what it is they want to predict, they can easily determine exactly what data might be more useful or valuable for their project.
Algorithm selection and data labeling process
Algorithm selection is among the initial choices one makes in ML. A mathematical algorithm is at the core of any model, determining how the model will find patterns in the collected data. The algorithm is the math force that stands behind the ML model. The same model can use various algorithms. An algorithm without a model is simply a set of mathematical equations.
Simultaneously with the choice of algorithm, the collected data undergo a series of transformations to shape the training set. Data is often edited, refined, labeled, and enhanced manually to achieve acceptable data quality for future models.
Once the data has been prepared, developers proceed to Feature Engineering. The features are the data values that the model will employ in training as well as later in real-world implementations. Specialists review the available data and generate a list of features that have great predictive power.
Model training
Training is a fundamental component of the overall procedure. Experts upload training sets to the ML model to train it to make predictions based on new data.
Testing
It is essential to note the difference in terms such as training set and test set. The first is utilized to train the model, and the second one is employed to test it. Trained models are evaluated through the test data to make sure that predictions are highly accurate.
After comparing these test results, the model may then be adjusted, modified, or trained again on some other data. Training and evaluation continue until the model achieves an appropriate rate of correct predictions.
Data labeling work is often also involved at this step, because testing involves comparing the model's predictions to pre-labeled reliable data. Without constant quality control, a trained model might "drift" from the correct path and start providing unreliable results with time. A data scientist usually keeps an eye on the network's results and runs data labeling tasks to prepare control datasets.
Practical application
The final phase is the actual practical use of the ML model. Essentially, it involves the end user employing the model to generate predictions generated from real data, which is similar to the training dataset in terms of its content but does not contain labels.
Let's say a development team wants to build an application to identify rotten tomatoes on a conveyor belt. They can train a model on a set of images of rotten and fresh tomatoes, each marked with a "rotten" or "fresh" label. Then they implement that model in an application to recognize these fruits. Specialists have to create a model that identifies which tomato is rotten and which is fresh. In other words, after training, the computer system itself should be able to assign a label to each tomato it analyzes.
Labeling data in machine learning
What is a label in machine learning? A label is a description that informs an ML model what a particular data represents so that it may learn from the example.
Data labeling, also sometimes called data annotation, is the process of adding labels to raw data to show the ML model the desired responses that it should be able to forecast. The data labeling procedure is a critical step performed by managed data labeling teams. A label represents the ground truth that your output data is compared to.
Once experts have created a successful ML model, the computer should be able to figure out the labels on its own. Thus, we can say that labels are an output you get from your model after training it.
Features in machine learning
A feature in machine learning refers to an individual measurable characteristic or property of an object that is being observed. It is one of the most common input methods in machine learning. The choice of meaningful, distinguishable, and independent features is a fundamental component of building efficient ML algorithms.
To put it simply, machine learning features describe the characteristics of your training data. In training datasets represented as tables, the features are found in columns. Each column stands for a specific characteristic or property.
For example, in image labeling features usually include patterns, colors, and shapes that are present in images, something like fur, feathers, or, a lower-level version such as pixel values.
Features also allow you to distinguish one object from another in a picture. For example, if there is fur present in a photo, then it is probably a dog, and if there are feathers, then it is more likely a bird. Although, for the model to accurately identify the object in the photo, you will definitely need more features.
Frequently, datasets that specialists have to work with comprise a large number of features, the amount of which may reach several hundred or even thousands. It is not always obvious when building an ML model which of the features are really relevant to it, and which are unnecessary. To put it another way, the initial set of features may be too large to be processed. Thus, a preliminary step in many machine learning applications consists of feature selection or the extraction of a new downsized feature set.
Feature selection refers to the procedure of selecting a subset of features from the original features so that the feature set is optimized and reduced according to a certain requirement. Feature extraction (sometimes referred to as construction) is a procedure that generates a set of new features.
ML label vs feature
At the first glance labels and features in machine learning may seem like they describe very similar, if not identical, concepts, but this is far from the truth. As discussed above, a label represents an output value, while a feature is an input value that describes the characteristics of such labels in datasets.
For example, you have a completed ML model, which has been pre-trained with datasets of dog breeds, in which their characteristics (features) and corresponding breeds (labels) have been specified. Now the algorithm built into this model should be able to determine the type of dog (label) by the traits (features) you provide to it.
When ML models are used in real life, experts present input data with features to the computer. For example, in a factory, a conveyor belt moves a lot of tomatoes. The ML model captures the features of tomatoes by employing computer vision tools. If the model recognizes such tomato features as its color which is brown or black the label "rotten" is assigned to the fruit, if the color is red, yellow, or orange it means that the tomato is fresh. It is worth mentioning that this is a simplified approximated description of the process.
The features have to be uploaded to the computer system as input so that the ML algorithms can generate a label as an output. This is the difference between ML labels and features. The more and better the labeled training dataset was, the better the model will predict labels by features.
Targets in ML and their differences from labels and features
A couple of terms that are often interchangeable in ML are target vs label. A target is a dataset variable to be predicted by an ML model. This is the variable that describes the outcome of the process. Broadly speaking, the terms label and target may be used interchangeably. However, the label is more common within classification algorithms than within regression ones. Target is the final output an ML model is trying to predict. It can be categorical or continuous.
The label represents the true outcome of the target. As mentioned earlier the labels are assigned to the training dataset but when the ML model is ready it is fed with unlabeled data. The label has a known or correct value, but finding the target variable is the main task of the ML model. As the model is trained, experts try to train the model in the best and highest quality way possible so that the targets are closest to the labels. However, predictions on the target variables will most likely never be one hundred percent perfectly correct but will strive toward that goal as they are trained and errors are corrected.
As for the difference between features and targets in ML, this should be obvious by now. As we have already found out, the target is often referred to as a synonym for a label, since there is essentially not much difference between the two terms. The label has a precisely known value, while the target is the variable the model is trying to predict. Under ideal conditions, label and target would be the same thing. The features describe the label, so they should also describe the target. If the output of the ML model indicates that the target and features do not match, then there must have been an error somewhere in the ML model that needs to be fixed for it to give correct predictions.
Types and algorithms of machine learning
There are quite a number of ML methods, although among them we may distinguish 4 key types:
classic learning
ensemble learning
reinforcement learning
neural networks and deep learning.
Classic learning
The earliest and most basic ML methods fall into two categories - Unsupervised and Supervised Learning.
Unsupervised learning
This type of ML is not as frequently used as supervised learning in real situations. In unsupervised learning, algorithms do not require a training data set. That is, the data labeling is not necessary, the machine does not need a human to guide it, and it tries to find any patterns in the provided data on its own. For example, experts merely show a machine a large number of pictures of objects and tell it to figure out the similarities between them. This type of learning includes the following kinds of tasks:
Clustering. The clustering involves arranging objects into relatively homogeneous groups. It separates data based on characteristics that seem similar to the machine. Clustering is applied for text analysis, customer segmentation for marketing purposes, and even for preventing fraud.
Association. This type is used to find sets of characteristics, and their values, which are frequently encountered in the feature descriptions of objects. Another name for this approach is called rule-finding - a machine analyzes a dataset and finds features that occur together frequently.
For example, in e-commerce, by analyzing shopping carts, the computer finds items that are often bought together. This way it draws a conclusion that one product may be recommended as a match for another, and suggests a favorable choice of goods.
Dimensionality reduction. This type of ML algorithm gathers specific attributes into a higher-level set of abstractions, combining multiple features into more general or abstract classes. If you have objects with similar features, the algorithm combines them into one category.
Supervised learning
Here the computers have a kind of teacher telling them how to do things right. This teacher helps the machine to understand that in a certain picture there is a house, and in another one there's a car. In other words, all the data has already been pre-labeled by the “teacher”, and they show the machine where a house and a car are located in a photo. The computer learns from these specific examples. This method consists of training a machine learning model using labeled data, and thus requires proper data labeling. This method is used much more often than the unsupervised one. A machine will learn faster and more accurately with a human instructor than it would with unlabeled data.
Supervised learning assumes that the anticipated answer to the given problem is unknown for new data, although it is already identified in the training dataset. In other words, data labeling is done to provide the right responses, and the challenge for the algorithm is to find them in the new data. Supervised learning is divided into two types:
Classification. Such an algorithm makes predictions of object categories or separates the data according to a certain feature. It replies to a yes/no question, or it chooses one of the two. For example, it decides whether the tomatoes are good or bad, whether cows or goats shown in the photo, and so on. It is also possible to perform multi-class classification, the task of which is to categorize the data into more than two classes. These are the ways computers filter email spam, divide articles by topic and language in search engines, or match users with music that is similar to their preferences.
Regression is essentially just like classification, except using numeric characteristics. The machine finds the dependence of some numeral data on another and then is able to make predictions. Thus the software can predict the trend of a company's stock by demand for goods or the price of a car depending on its mileage and so on.
Ensemble learning
In ensemble learning, multiple models are trained to resolve a single problem and merged to obtain the best results. When it comes to ensembles, the concept of a weak learner is introduced meaning conventional models like linear regression or decision trees. A set of weak learners serve as structural units for more complex models. The combination of weak learners in order to improve the performance of a model is called a strong learner.
Algorithms are trained simultaneously and may correct each other's mistakes. The underlying assumption is that the results of multiple models will be more accurate than the ones from just one model. For instance, ensembles are employed to identify peoples' faces and objects in the smartphone camera. The most popular ensemble methods are:
Boosting. This method involves consecutive algorithms training. Several similar models are trained sequentially in this method, fixing each other's errors. After processing one set of data, the machine is given the next set. The next set of data contains additional results different from the desired ones, and the algorithm then tries to find a solution.
Bagging. The basic idea of bagging is to train several identical models on different data samples. In this case, homogeneous models are trained on different data sets and then combined. As a result, a prediction is obtained by averaging several predictions made by different models.
Stacking. This method may use algorithms of various types, not just from a certain family. A meta-model exists that receives basic models as input and the output is a final prediction. Various weak learners are combined in parallel so that by combining them with Meta learners, more accurate future predictions can be obtained. The goal of the method is to train weak learners, but in practice, the accuracy is still low and the approach is rarely used.
Reinforcement learning
A method of machine learning where the system learns by interacting with some environment. Reinforcement learning involves ML done by trial and error. There is an agent who interacts with the environment by taking actions. The environment provides an agent with a reward for those actions, and the agent keeps performing them. The reward for a successfully completed task is the opportunity to take on a new task, and the points gained in the process of solving it.
The more efficiently the objective is accomplished, the more points are awarded. Initially, situations are designed in a virtual environment, and after that artificial intelligence continues to explore and learn in the real world. The more efficiently the objective is accomplished, the more points are awarded. Initially, situations are designed in a virtual environment, after that artificial intelligence continues to explore and learn in the real world. The algorithm is employed to train artificial intelligence in the gaming industry, robot vacuum cleaners, and autopilot self-driving vehicles.
Neural networks
Neural networks represent mathematical models and their software implementations, which are derived from the structure of the human nervous system. The central distinctive characteristic of a network is the capacity to learn. These networks are composed of several layers:
The input layer receives the data set.
Hidden layers perform calculations based on the input parameters.
The output layer produces the result of the computations.
Each of them has several nodes (neurons), which are interconnected with other nodes in the network through various links and have their own "weight" that affects the strength of the transmitted signal. Such design enables simultaneous processing of data and allows for constant comparative analysis of the processing results at each of the stages.
In recent years, higher performance capabilities of computers allowed these networks to perform more and more complex and interesting tasks. The processing capacity of the system is essential since each neuron is constantly performing computationally demanding calculations. A complicated task usually requires a huge amount of neurons and a lot of mathematical calculations. Understandably, this would imply the need for a very powerful machine.
To put it simply, the connections between neurons are implemented like this. One of the neurons sends some calculations to another neuron, which receives and processes this information and then passes the result of its own calculations to another neuron. Thus, information spreads throughout the network, and the learning process occurs.
When specialists train a neural network, they introduce it to data that should be used to predict something, as well as a set of correct responses for such data. As it was previously mentioned for other algorithms, such correct responses are called a training dataset. Of course, all above is a huge simplification: actual neural networks are more intricate than that. A lot of information is required to train an AI properly.
Deep learning
As recently as 10 years ago the evolution of neural networks was suspended for the lack of processing power of computers. Once this obstacle was gone, deep learning was introduced, bringing data processing to a completely new level.
Deep learning describes a type of ML that employs multi-layered networks that acquire knowledge by learning on massive datasets. Deep learning artificial intelligence finds the algorithm for solving the initial task on its own, learning from its mistakes and giving a more accurate result after each training session.
The networks are split into layers - neuron structures with a shared objective. Deep learning employs a higher number of hidden layers. Such models are called deep learning networks. Each calculation is considered to be a layer, so complex deep learning networks have numerous layers, thus the name "deep" networks. The more complicated a neural network is, that is, the more layers and neurons it possesses, and the more computational operations it performs, the better results it tends to produce.
Conclusion
Machine learning is an intriguing field of knowledge that requires a thorough and time-consuming exploration. The current trend is that ML algorithms are likely to be applied even more extensively in the near future, and knowledge of the topic will simply be a necessity. Nowadays, this field of AI is already firmly embedded in our daily lives and greatly simplifies it not only on a day-to-day basis but also in the workplace. It is necessary to spend a lot of effort and time studying this topic and then all the incredible possibilities of machine learning will open its doors to you.
About Toloka
Toloka is a European company based in Amsterdam, the Netherlands that provides data for Generative AI development. Toloka empowers businesses to build high quality, safe, and responsible AI. We are the trusted data partner for all stages of AI development from training to evaluation. Toloka has over a decade of experience supporting clients with its unique methodology and optimal combination of machine learning technology and human expertise, offering the highest quality and scalability in the market.
Article written by:
Natalie Kudan
Updated:
Mar 1, 2023