Export performed by marco at 1:14pm, 01 May 2013.
Exporting All questions in chronological order (most recent first).
Field | Value |
ID | 530770 |
Created | 2013-03-18 11:28:26 |
Question | Consider the graphical model in the figure. Each node represent a gaussian variable. The mean of the variables is assumed to depend linearly on the parent variables so the conditional distributions can written as: Which of the statements about the covariance matrix for gaussian that describes the joint distribution of the variables are not true? |
A | |
*B* | |
C | |
D | |
Explanation | The mean and covariance matrix can be found by using (8.16) from Bishop recursively: The rest of the components follow by symmetry. |
Augmented explanation 1 | Note that each computation of will be converted to if
This corresponds to the cryptic explanation in the book "...and so the covariance can similarly be evaluated recursively starting from the lowest numbered node." (by: larsgmathiasen [lamat10]) |
Tags | lecture_11 |
Author | ggbn (glnie07) |
Avg Rating | 3.3300 |
Avg Difficulty | 2.0000 |
Total ratings | 3 |
Field | Value |
ID | 530853 |
Created | 2013-03-18 08:57:06 |
Question | Consider the following training data with input variables of four features, along with an output that can assume either the classification A or B.
x1 x2 x3 x4 y --------------------------------- 3.99 -1.65 6.52 2.78 B You are given a new input vector:
Your task is to calculate the probabilities of
by using the gaussian discriminant analysis. |
A | |
*B* | |
C | |
D | |
E | |
Explanation | First, the mean vectors,
The probabilites P(Y=A) and P(Y=B) are calculated as the frequency of these classes in the training data:
I now need to calculate the posterior probability as: Since I use the gaussian discriminant analysis and given the covariance matrix as the identity matrix, I have that
Given that the covariance matrix is the 4x4 identity matrix, I can use the following two facts: to write the formula as:
Where k is the number of features (in this case k = 4). The same calculation is done for
I thus get the posterior as:
Inserting my values I get the results: |
Tags | lecture_7 |
Author | tvh10 (tomha10) |
Avg Rating | 4.5000 |
Avg Difficulty | 1.0000 |
Total ratings | 4 |
Field | Value |
ID | 534876 |
Created | 2013-03-18 07:33:06 |
Question | Suppose you want to predict if the movie you are currently watching is a starwars movie using a multinomial event model. To this end you have classified a number of previously watched movies as a starwars movie or not, based on the number of starwars related props used in the movie. You choose to represent each movie as a vector, where each used props are descretized into one of three bucket; few, some or many, depending on how many times the props occur in the movie: You assume that the probability for a prop to be discretized to a bucket, k, given that the movie is a starwars movie, is the same for all props.
Your previously watched movies are used as training data, and represented below as two matrixes where each row represents a movie: Given that the movie you are currently watching have the input vector: what classification will the movie be predicted to? |
A | The movie will be predicted to not be a starwars movie |
*B* | The movie will be predicted to be a starwars movie |
Explanation | We represent the discretized buckets few, some and many as 0, 1 and 2 respectively. Let Given that m=8 and n=5 we can calculate all parameters:
To predict our input vector,x, we maximize y: We use logarithms to prevent underflow: Thus we predict the movie to be a starwars movie.
|
Tags | lecture_7 |
Author | nnoej10 (nnoej10) |
Avg Rating | 5.0000 |
Avg Difficulty | 1.0000 |
Total ratings | 1 |
Field | Value |
ID | 530518 |
Created | 2013-03-18 04:53:36 |
Question | Consider an electron emitter which emits electrons with some interarrival time
Suppose the relationship is
We want to learn the constant
Suppose, based on previous work, you choose the parameters to be
You now observe 5 interarrival times at different currents:
Derive the posterior. What is the expected value of |
*A* | 41.8798 |
B | 43.0205 |
C | 42.0000 |
D | 43.0384 |
E | 41.0273 |
Explanation | The posterior is given by:
Since the observations are independent and identically exponential distributed we have:
Thus we have:
From this we see that the posterior is a new gamma distribution given by:
The expected value of a gamma distributed variable is simply
|
Tags | lecture_2 |
Author | troelsmn (trnie09) |
Avg Rating | 4.6000 |
Avg Difficulty | 1.8000 |
Total ratings | 5 |
Field | Value |
ID | 523866 |
Created | 2013-03-13 03:35:34 |
Question | Suppose we have the Bernoulli distribution and Your task is now to derive the maximum likelihood estimate of |
A | |
*B* | |
C | |
D | |
E | |
Explanation | We have the likelihood function Taking the natural logarithm of the likelihood function yields Now, taking the derivative To find the maximum of it, we set it equal to zero |
Author | acarbalacar (daand09) |
Avg Rating | 3.0000 |
Avg Difficulty | 0.2000 |
Total ratings | 5 |
Comment 1 | This is at the limit of what would be defined as a too easy question... Do not expect something as easy at the exam. (by: marco [marco]) |
Field | Value |
ID | 522394 |
Created | 2013-03-12 00:48:49 |
Question | Let
be a poison distribution with parameter
In this exercise we know that either
for the subjective prior probabilities to the two possible values.
(note that it might be unrealistic that |
A | |
B | |
C | |
*D* | |
E | |
Explanation |
So the posterior probability of |
Tags | lecture_2, lecture_12 |
Author | valdemar (chha309) |
Avg Rating | 4.2000 |
Avg Difficulty | 1.2000 |
Total ratings | 5 |
Field | Value |
ID | 523937 |
Created | 2013-03-11 16:44:43 |
Question | In the figure below is 4 different graphical models. Each node in these is a binary variable and the maximal width of the networks is given by an even number M. (so each row with 3 "dots" contains M nodes)
Select below the answer which do not correspond to the number of parameter needed to describe one of the models above. |
A | |
B | |
C | |
D | |
*E* | |
Explanation | For each node the number of needed parameters is given by the number of parameters needed to describe the node it self times the number of different states its parent nodes can take.
|
Tags | lecture_11 |
Author | ggbn (glnie07) |
Avg Rating | 3.4000 |
Avg Difficulty | 0.4000 |
Total ratings | 5 |
Field | Value |
ID | 520454 |
Created | 2013-03-09 02:06:07 |
Question | You are given the following training set, X, containing six inputs of a single feature, along with the corresponding observed outputs, Y:
X = (10,3,1,8,4,9)T Y = (9,4,2,6,3,5)T
Your task is to predict the outcome,
Answer which one of the following statements is false. |
*A* | |
B | |
C | |
D | |
Explanation | Calculation of the weights yields the following vector, W:
W = (0.135.., 0.324.., 0.043.., 0.606.., 0.606.., 0.324..)T
I rewrite the expression Firstly,
I can thus rewrite the sum as:
With concrete values this becomes:
To minimize with respect to
<=> <=>
The prediction for
|
Tags | lecture_2 |
Author | tvh10 (tomha10) |
Avg Rating | 4.2000 |
Avg Difficulty | 1.0000 |
Total ratings | 5 |
Field | Value |
ID | 518688 |
Created | 2013-03-08 11:01:13 |
Question | Suppose you have been studying a dripping water tap. It turns out that the time intervals between drops are independently and identically distributed according to the distribution:
From the study you found out that the parameter
where the parameters are
You now observe the next 5 interval times:
Derive the posterior distribution. What is the expected |
A | 1.870726 |
B | 1.582982 |
C | 2.419451 |
*D* | 1.818182 |
E | 2.499716 |
Explanation | The posterior must again be a gamma distribution and satisfies the proportionality:
Since the observations are independent and identically exponential distributed we have:
Thus we have:
From this we see that the posterior is a new gamma distribution given by:
The expected value of a gamma distributed variable is simply
|
Tags | lecture_2 |
Author | troelsmn (trnie09) |
Avg Rating | 4.5000 |
Avg Difficulty | 1.7500 |
Total ratings | 4 |
Field | Value |
ID | 517698 |
Created | 2013-03-08 04:46:57 |
Question | Given the poisson distribution:
which of the following statements are the conclusion to the proof, showing that the distribution is part of the exponential family of distributions. |
*A* | |
B | |
C |
|
Explanation | = = = = |
Tags | lecture_4 |
Author | nnoej10 (nnoej10) |
Avg Rating | 3.5000 |
Avg Difficulty | 0.7500 |
Total ratings | 4 |
Field | Value |
ID | 515538 |
Created | 2013-03-07 02:57:44 |
Question | What is the difference between Generative classifiers and Discriminative classifier models? Generally let x be input parameters and y be the class. |
A | Bayes’ theorem, represents an example of discriminative modeling. Where in the generative approach we maximize the likelihood function for the conditional distribution p(y|x). |
*B* | Generative classifiers learn a model of the joint probability p(x,y), and make their predictions by using Bayes rules to calculate p(y|x), and then choosing the most likely class y. Discriminative classifiers model the posterior p(y|x) directly, or learn a direct map from inputs x to the class labels. |
C | Only the discriminative training is used in supervised learning. Since the generative model only can learn from p(x|y), and thus it cannot tell us anything useful about the posterior. |
D | The Generative and Discriminative classifier models learn the same way from data and predict the same way, and differs only in the way that the input in the discriminative model are considered to be y (the classifier) and x (parameters) to be the output in contrary to the generative model. |
Explanation | See notes from Andrew Ag. Part IV. Or book B1 page 204 Or the sildes from lecture 7 |
Tags | lecture_7 |
Author | valdemar (chha309) |
Avg Rating | 3.5000 |
Avg Difficulty | 0.5000 |
Total ratings | 4 |
Field | Value |
ID | 515518 |
Created | 2013-03-07 02:20:40 |
Question | Consider the following categorical data and assume that a logistic regression is appropriate Obs Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Group 1 25.53 300 0.001 0 -0.164 0 1.65 0.36 0 2 12.98 387 0.786 0.34 0.600 0.002 1.15 0.60 0 3 29.27 182 -0.08 -0.2 -0.386 0.175 0.45 0.04 0 4 23.67 367 0.001 0 0 0 0.56 0.97 0 ..... 27 21.54 312 0.651 0.834 -0.084 0 1.04 0.83 1 28 17.45 242 1.337 0.060 0.724 0.38 0.89 0.86 1 29 19.45 140 0.453 0 -0.194 0 1.23 1.66 1 30 24.94 303 0.541 0.484 0.534 0 0.86 0.87 1
The data are fictive and does not present anything. As it can be seen we have 30 observations and 8 variables.
You are to consider how many of these variables that should be included in the model and why. (You should not consider which exact variables you would include, but just the number of variables) |
A | All 8 variables should be used, because more variables will always give a more accurate model and when we have as few datapoints as we do in this case, the model contruction will be fast no matter what, so there is no need to consider simplified models. |
*B* | When having only 30 data points one should use 3-6 variables, since having less than 1/10 of the data as variables, will make it hard to get a good fit and more than 1/5 of the data in variables gives rise to the risk of overfitting. For verification of what variables to include, one should verify the p-values testing for if the estimation of the parameter is different from zero. |
C | As few variables as possible should be chosen. It should be 1 or 2, which is still verified by the p-values as in answer B. When having only 1 or 2 variables it will also be possible to do a visualization of the grouping structure done by the model, by simply making a 2- or 3-dimensional plot, so one should always consider not using more than 2 variables. |
Explanation | Having to many variables will often result in overfitting which is illustrated in the slides of lecture 2. Essentially, the problem is that there is an infinite number of ways that he model can choose the parameters, so they still covers the points in the data. If this occurs the model will only describe the data, but likely want be usable for other data, since they do not describe the behavior of the data in general. This is equal to the warning message in R saying: Warning message: |
Tags | lecture_2 |
Author | acarbalacar (daand09) |
Avg Rating | 4.0000 |
Avg Difficulty | 0.5000 |
Total ratings | 4 |
Field | Value |
ID | 513131 |
Created | 2013-03-05 10:34:53 |
Question | Consider the following neural network.
The weights, including bias, are defined as follows.
Futhermore, the activation function at the hidden layer, as well as the activation function at the output layer are given by the logistic sigmoid function;
Your task is to use forward propagation to calculate the estimate,
that the neural network produce on the following input.
|
*A* | 0.003309 |
B | 0.213426 |
C | 0.013655 |
D | 0.748314 |
E | 0.508049 |
Explanation | The formula for forward propagation is given as follows.
where M = D = 2 and f is the logistic sigmoid function in our case.
We evaluate the sum with our weights;
Introducing the input yields the following;
f was the logistic sigmoid function;
|
Tags | lecture_5 |
Author | larsgmathiasen (lamat10) |
Avg Rating | 3.5000 |
Avg Difficulty | 1.0000 |
Total ratings | 6 |
Field | Value |
ID | 511287 |
Created | 2013-03-03 12:54:57 |
Question | Suppose you have implemented your favorite learning algorithm, namely logistic regression, to solve a given learning problem. Unfortunately, you are getting an intolerable test error with the parameters learned from your training data. Analysis of the situation showed that it is a problem of either high bias or high variance. State, for each of the following approaches whether it will solve high bias, high variance or both.
a) Acquire more training data b) Reduce the set of features c) Increase the set of features |
A | a) Solves both high bias and high variance b) Solves high variance c) Solves high bias |
B | a) Solves high bias b) Solves high bias c) Solves high variance |
C | a) Solves high variance b) Solves high variance c) Solves both high bias and high variance |
D | a) Solves high bias b) Solves high variance c) Solves high bias |
*E* | a) Solves high variance b) Solves high variance c) Solves high bias |
Explanation | If the bias is high, then we are consistently learning the same wrong thing, regardless of the amount of training data we have. E.g. trying to fit a linear function to quadratic data (underfitting). This can often be seen if high training error is observed together with high test error. Thus, the only remedy is increasing the set of features; more training examples will not help and reducing the set of features will definitely not help.
If the variance is high, then we are overfitting the data, i.e. fitting on a too small training set that does not reflect the true pattern of the data. E.g. fitting a 9-order polynomial on 10 training points. This can be seen if the training error is low but the test error is high. The conclusion is that we have too many features, so we should be reducing the set of features and not increase it. Furthermore we could solve the high variance by acquiring more test examples to get rid of the overfitting. |
Tags | lecture_10 |
Author | larsgmathiasen (lamat10) |
Avg Rating | 3.4000 |
Avg Difficulty | 0.6000 |
Total ratings | 5 |