DM825 - Introduction to Machine Learning Sheet 4, Spring 2011 [pdf format]

Exercise 1 Generalized Linear Models and Neural Networks.

1. [(a)]Show that the multinomial distribution is a member of the exponential family determining the canonical response function, called softmax or normalized exponential, and its inverse, the link function. Take into account the fact that the parameters θk of the multinomial distribution are not independent because ∑Kk=1θk=1.
2. The Fisher’s `iris` data set gives the measurements in centimeters of the variables petal length and width, respectively, for 50 flowers from each of 2 species of iris. The species are “Iris setosa”, and “versicolor” and “virginica”. [sheet4_1b.R]

Use a multiple logistic model (i.e., multinomial) to predict the test using generalized linear models to fit the parameters. Given the multivariate nature of multinomial variables we cannot use the `glm` function in R. An alternative function is `multinom` from the package `nnet`.

3. Neural networks provide a flexible non-linear extension of multinomial regression. In R the function `nnet` from the package `nnet` provides an implementation to fit single-hidden-layer neural networks, possibly with skip-layer connections (i.e., a link from the input node directly to the output nodes). Check the example of this function. Compare its results with the GLM at the previous point and comment.

Exercise 2 Perceptron. This exercise asks you to implement the perceptron algorithm and plot its result. As data set we use a simplified case with binary classification from the `iris` case. [sheet4_2.R]

Exercise 3 Neural Networks. In the derivation of the backward propagation procedure we used the fact that the partial derivation of the error for the output units is given by

 ∂ Err ∂ ak
jyj     (1)

For a single output:

1. [(a)] Show that this fact holds true for squared errors Err function.
2. Using a probabilistic interpretation of the network output, show that (1) holds true for any of the following conditional distributions and output activation functions: y|x, θN(f(x, θ), β1) and identity (for regression), y|x, θBern(f(x, θ)) and logistic sigmoid (for binary classification), yk|x, θBern(fk(x, θ)) and softmax function (for multinomial classification).
3. Show that (1) is a general result of assuming a conditional distribution for the target variable from the exponential family, along with a corresponding choice for the activation function (or for the canonical link function).