8. Loss Functions

Supervised Learning and Loss Functions

The goal in a supervised learning task is to produce a model that we can use to predict the value of a label (or response) \(Y\) given values for a set of features (or predictors), \(X^{(1)}, X^{(2)}, ..., X^{(m)}\). This model is typically represented by a predict() function. We denote the predicted label as \(\hat Y\).

When we select a learning algorithm, we are effectively selecting a class of models that we will be considering for our particular task. This set of allowed models is called our hypothesis space. Our learning algorithm will search through the hypothesis space to find the model that performs the best on our training data. To determine what model from the hypothesis space is the “best”, we need to provide the learning algorithm with a method of scoring the different models that it considers. Such a scoring method is called an objective function. An objective function takes a model and a dataset as inputs, and produces a numerical score as its output. If an objective function is defined in such a way that lower scores are better, then we call it a loss function. The goal of a learning algorithm is to find the model that minimizes the loss on the training data.

In the next two lessons, we will discuss the linear regression algorithm for performing regression tasks, and the logistic regression algorithm for performing classification tasks. As we will see, the linear regression algorithm works by minimizing the sum of squared errors loss on its training set, while the logistic regression algorithm minimizes the negative log-likelihood loss function.