DM825 - Introduction to Machine Learning
Sheet 3, Spring 2011 [pdf format]

Exercise 1 Bayesian prediction.

  1. [(a)]Let θ ∼ Dir(α). Consider multinomial random variables (X1, X2, …, XN), where XnMult(θ) for each n, and where the Xn are assumed conditionally independent given θ. Now consider a random variable XnewMult(θ) that is assumed conditionally independent of (X1, X2, …, XN) given θ. Compute the predictive distribution:
    p(xnew | x1 , x2 ,…, xN ,α)
    by integrating over θ.
  2. Redo the problem in part (a), replacing the multinomial distribution with an arbitrary exponential family distribution, and the Dirichlet distribution with the corresponding exponential family conjugate distribution. You are to show that in general the predictive probability p(xnew | x1 , x2 , … , xN) is a ratio of normalizers.

Exercise 2 Classification. The course website contains a data set classification.dat of (xn, yn ) pairs, where the xn are 2-dimensional vectors and yn is a binary label.

  1. [(a)]Plot the data, using 0’s and X’s for the two classes. The plots in the following parts should be plotted on top of this plot.
  2. Write a program to fit a logistic regression model using stochastic gradient ascent. Plot the line where the logistic function is equal to 0.5. Compare this outcome with the result attained using the glm function in R (check example in predict.glm).
  3. Fit a linear regression to the problem, treating the class labels as real values 0 and 1. (You can solve the linear regression in any way you like, including solving the normal equations, using the LMS algorithm, or calling the built-in lm routine in R). Plot the line where the linear regression function is equal to 0.5.
  4. The data set classification.test is a separate data set generated from the same source. Test your fits from parts (b), (c), and (d) on these data and compare the results.