DM825 - Introduction to Machine Learning
Sheet 4, Spring 2011 [pdf format]

Exercise 1 Generalized Linear Models and Neural Networks.

[(a)]Show that the multinomial distribution is a member of the exponential family determining the canonical response function, called softmax or normalized exponential, and its inverse, the link function. Take into account the fact that the parameters θ_k of the multinomial distribution are not independent because ∑_K^k=1θ_k=1.
The Fisher’s iris data set gives the measurements in centimeters of the variables petal length and width, respectively, for 50 flowers from each of 2 species of iris. The species are “Iris setosa”, and “versicolor” and “virginica”. [sheet4_1b.R]
Use a multiple logistic model (i.e., multinomial) to predict the test using generalized linear models to fit the parameters. Given the multivariate nature of multinomial variables we cannot use the glm function in R. An alternative function is multinom from the package nnet.
Neural networks provide a flexible non-linear extension of multinomial regression. In R the function nnet from the package nnet provides an implementation to fit single-hidden-layer neural networks, possibly with skip-layer connections (i.e., a link from the input node directly to the output nodes). Check the example of this function. Compare its results with the GLM at the previous point and comment.

Exercise 2 Perceptron. This exercise asks you to implement the perceptron algorithm and plot its result. As data set we use a simplified case with binary classification from the iris case. [sheet4_2.R]

Exercise 3 Neural Networks. In the derivation of the backward propagation procedure we used the fact that the partial derivation of the error for the output units is given by

∂ Err

∂ a_k

=ŷ_j−y_j (1)

For a single output:

[(a)] Show that this fact holds true for squared errors Err function.
Using a probabilistic interpretation of the network output, show that (1) holds true for any of the following conditional distributions and output activation functions: y|x^→, θ^→∼ N(f(x^→, θ^→), β⁻1) and identity (for regression), y|x^→, θ^→∼ Bern(f(x^→, θ^→)) and logistic sigmoid (for binary classification), y_k|x^→, θ^→∼ Bern(f_k(x^→, θ^→)) and softmax function (for multinomial classification).
Show that (1) is a general result of assuming a conditional distribution for the target variable from the exponential family, along with a corresponding choice for the activation function (or for the canonical link function).

DM825 - Introduction to Machine Learning Sheet 4, Spring 2011 [pdf format]

DM825 - Introduction to Machine Learning
Sheet 4, Spring 2011 [pdf format]