• Blaire

CS代写|Python代写|机器学习代写 ECE421: Introduction to Machine Learning




In this assignment, you will implement learning and inference procedures for some of the proba-

bilistic models described in class, apply your solutions to some simulated datasets, and analyze

the results.

General Note:

• Full points are given for complete solutions, including justifying the choices or assumptions

you made to solve the question. Both complete source code and program outputs should be

included in the nal submission.

• Homework assignments are to be solved in the assigned groups of two. You are encouraged

to discuss the assignment with other students, but you must solve it within your own group.

Make sure to be closely involved in all aspects of the assignment.

• There are 3 starter les attached, helper.py, starter kmeans.py and starter gmm.py which

will help you with your implementation.

1 K-means [9 pt.]

K-means clustering is one of the most widely used data analysis algorithms. It is used to summarize

data by discovering a set of data prototypes that represent clusters of data. The data prototypes are

usually referred to as cluster centers. Usually, K-means clustering proceeds by alternating between

assigning data points to clusters and then updating the cluster centers. In this assignment, we

will investigate a di erent learning algorithm that directly minimizes the K-means clustering loss


2 Mixtures of Gaussians [16 pt.]

Mixtures of Gaussians (MoG) can be interpreted as a probabilistic version of K-means clus-

tering. For each data vector, MoG uses a latent variable z to represent the cluster assign-

ment and uses a joint probability model of the cluster assignment variable and the data vec-

tor: P(x; z) = P(z)P(x j z). For N IID training cases, we have P(X; z) =


n=1 P(xn; zn). The

Expectation-Maximization (EM) algorithm is the most commonly used technique to learn a MoG.

Like the standard K-means clustering algorithm, the EM algorithm alternates between updating

the cluster assignment variables and the cluster parameters. What makes it di erent is that in-

stead of making hard assignments of data vectors to cluster centers (the \min" operation above),

the EM algorithm computes probabilities for di erent cluster centers, P(zjx). These are computed

from P(z = kjx) = P(x; z = k)=


j=1 P(x; z = j).