Linear Methods: First revision¶

This is the first revision of the page. The external styles and javascript are gone (except MathJax), making it much faster to load (about 6 seconds).

Return to Speed up MathJax slow loading time

Introduction¶

Since our predictor $G(x)$ takes values in a discrete set $\mathcal{G}$, we can always divide the input space into a collection of regions labeled according to the classification. We saw in Chapter 2 that the boundaries of these regions can be rough or smooth, depending on the prediction function. For an important class of procedures, these decision boundaries are linear; this is what we it mean by linear methodds for classification.

To build the connection between features $X$ and discrete outcomes, we could reply on logit transformation as follows:

\begin{align} \text{Pr}(G=1|X=x) &= \frac{\exp(\beta_0+\beta^Tx)}{1+\exp(\beta_0+\beta^Tx)},\\ \text{Pr}(G=2|X=x) &= \frac{1}{1+\exp(\beta_0+\beta^Tx)},\\ \end{align}

where the monotone transformation is the logit transformation

$$ \log\frac{p}{1-p}, $$

and in fact we see that

\begin{equation} \log\frac{\text{Pr}(G=1|X=x)}{\text{Pr}(G=2|X=x)} = \beta_0 + \beta^Tx. \end{equation}

The decision boundary is the set of points for which the log-odds are zero, and this is a hyperplane defined by

$$ \left\lbrace x: \beta_0+\beta^Tx = 0 \right\rbrace. $$

We will discuss two very popular but different methods that result in linear log-odds or logits: Linear discriminant analysis and linear logistic regression.

Once could also classify discrete outcomes without using log likelihood functions, which means we can explicitly model the boundaries between the classes as linear.

We will look at two methods that explicitly look for "separating hyperplanes".

  1. The well-known perceptron model of Rosenblatt (1958), with an algorithm that finds a separating hyperplane in the training data, if one exists.
  2. Vapnik (1996) finds an optimally separating hyperplane if one exists, else finds a hyperplane that minimizes some measure of overlap in the training data.

When a separating hyperplane could be found we say it is linear classficiable, whereas we need to use neural network to classifiy them.

figure4.1


Last update: November 6, 2022
Back to top