
Supervised learning
The majority of data scientists use supervised learning. Supervised learning is where you have some explanatory features, which are called input variables (X), and you have the labels that are associated with the training samples, which are called output variables (Y). The objective of any supervised learning algorithm is to learn the mapping function from the input variables (X) to the output variables (Y):

So the supervised learning algorithm will try to learn approximately the mapping from the input variables (X) to the output variables (Y), such that it can be used later to predict the Y values of an unseen sample.
Figure 1.13 shows a typical workflow for any supervised data science system:

This kind of learning is called supervised learning because you are getting the label/output of each training sample associated with it. In this case, we can say that the learning process is supervised by a supervisor. The algorithm makes decisions on the training samples and is corrected by the supervisor, based on the correct labels of the data. The learning process will stop when the supervised learning algorithm achieves an acceptable level of accuracy.
Supervised learning tasks come in two different forms; regression and classification:
- Classification: A classification task is when the label or the output variable is a category, such as tuna or Opah or spam and non spam
- Regression: A regression task is when the output variable is a real value, such as house prices or height