# DM825 - Introduction to Machine Learning Sheet 9, Spring 2011 [pdf format]

Exercise 1 Sequential data.

Every day that he leaves work, Albert the Absent-minded Professor, toggles his light switch according to the following protocol: (i) if the light is on, he switches it off with probability 0.80; and (ii) if the light is off, he switches it on with probability 0.30. At no other time (other than the end of each day) is the light switch touched.

1. Suppose that on Monday night, Albert’s office is equally likely to be light or dark. What is the probability that his office will be lit all the other four nights of the week (Tuesday through Friday)?
2. Suppose that you observe that his office is lit on both Monday and Friday nights. Compute the expected number of nights, from that Monday through Friday, that his office is lit.

Now suppose that Albert has been working for five years (i.e., assume that the Markov chain is in steady state).

1. Is his light more likely to be on or off at the end of a given workday?

Exercise 2 Bayesian Networks – Practical.

There are many software for dealing with probabilistic graphical models. They differ in many characteristics, among the others, tasks and formal language. The following is an incomplete list:

• Hugin: a commercial system made in Denmark
• OpenBugs: Bayesian inference Using Gibbs Sampling
• Mallet (MAchine Learning for LanguagE Toolkit) is a Java-based package for statistical natural language processing in document classification.
• Bayes Net package in Weka: implements various Bayesian network classifier learning algorithms.
• In R the CRAN Task View: gRaphical Models lists all packages for graphical models. Among them `deal`

Install `deal` in R. Load the data:

 data(ksl) ?ksl head(ksl) str(ksl) summary(ksl) # define the nodes and type of variables corresponding to them # in the Bayesian network (nw <- network(ksl)) # network does not use the data ## Attach prior independent probability distributions localprob(nw,"FEV") <- c(179,80) localprob(nw,"Kol") <- c(691,80) localprob(nw,"Hyp") <- c(0.55,80) localprob(nw,"logBMI") <- c(3.23,80) localprob(nw,"Smok") <- c(0.5,0.5) localprob(nw,"Alc") <- c(0.5,0.5) localprob(nw,"Work") <- c(0.5,0.5) localprob(nw,"Sex") <- c(0.5,0.5) localprob(nw,"Year") <- c(0.5,0.5) plot(nw) # plots the network. No arc are yet defined # learn the probabilities from data fit.prior <- jointprior(nw) fit.learn <- learn(nw, ksl, fit.prior) fit.nw <- getnetwork(fit.learn) print(fit.nw, condposterior=TRUE) plot(fit.nw) # learn the conditional independency structure fit.arcs <- getnetwork(autosearch(fit.nw,ksl,fit.prior)) print(fit.arcs, condposterior=TRUE) # Alternatively, draw the arcs yourself draw.nw <- getnetwork(drawnetwork(nw,ksl,fit.prior)) plot(draw.nw) # PriorSampling or ancestral sampling from the network rnetwork(makesimprob(fit.arcs),n=10)

Exercise 3 Hidden Markov Models – Practical.

There are several packages in R that implement Hidden Markov Models. Among them `HMM` is a simple implementation for discrete state and discrete space (ie, emission distribution). The package `HiddenMarkov` is a more general implementation that allows to declare the emission distribution for continuous and discrete variables.

Install and explore the possibilities of the two packages.

 library(HMM) ?dishonestCasino dishonestCasino()
 library(HybridMarkov) ?HybridMarkov ?dthmm demo("norm",package="HiddenMarkov")
• Forward algorithms are used for computing the conditional probability for a sequence of observations. It can tell how well a sequence of observations match with the current HMM.
• Viterbi algorithm work out the most probable sequence of states (latent variables) that match the observations
• Baum-Welch or Forward-Backward algorithm estimates (learns) the parameters of the transition and emission probabilities from the data given several sets of observations. It consists in an implementation of the EM algorithm together with the sum-product algorithm studied for polytrees.

Consider the following example from [B3]. Your teacher wants to know whether or not you are actually preparing for the exam. There are four things you can be do on evenings {pub, TV, party, study} and he wants to work out whether or not you are studying. However, he cannot just ask you but he can make observations about your behavior and appearance. Specifically, he can observe if you look {Tired, Hungover, Scared, Fine}. He cannot know why you look the way you look but he has a probabilistic model to infer what is the probability that you look the way you look given what you did the last night. These probabilities are collected in an emission probability table.

The other information he uses are the probabilities that you are in a state tonight given that you were in another state the last night. These probabilities are collected in a transition probability table.

On the basis of his experience as a student he makes the following guesses on the probability tables: for the transition probabilities

 Previous night TV Pub Party Study TV 0.4 0.6 0.7 0.3 Pub 0.3 0.05 0.05 0.4 Party 0.1 0.1 0.05 0.25 Study 0.2 0.25 0.2 0.05

and for the observation probabilities:

 TV Pub Party Study Tired 0.4 0.6 0.7 0.3 Hungover 0.3 0.05 0.05 0.4 Scared 0.1 0.1 0.05 0.25 Fine 0.2 0.25 0.2 0.05

After one week your teacher has collected the following observations (Tired, Tired, Hungover, Hungover, Scared, Hungover, Fine) and there are three things that he might want to do with these data:

• see how well the sequence of observations that he has made match his current probability tables, ie, what is the probability of that joint outcome of observations given his model.
• work out the most probable sequence of states that you have been in based on his observations.
• given several sets of observations (for example, by watching several students) generate a good model from the data.

Write these questions in the formal way of probability calculus. Describe the algorithmic issues for answering them. Carry out the computations in R using one of the two packages above.

Below are the data ready to be imported in R. They include the probability of the initial and final states, the tables above and the observations.

 aFirst <- c(0.25,0.25,0.25,0.25) aLast <- c(0.25,0.25,0.25,0.25) a <- matrix(c(.4,.3,.1,.2,.6,.05,.1,.25,.7,.05,.05,.2,.3,.4,.25,.05),ncol=4,byrow=FALSE) b <- matrix(c(.2,.1,.2,.5,.4,.2,.1,.3,.3,.4,.2,.1,.3,.05,.3,.35),ncol=4,byrow=FALSE) obs <- c(0,0,3,1,1,2,1,3)