# Introduction to machine learning

*Maxime Sangnier*

Fall, 2023

## Practical session 4: Gaussian mixture models and k-means

# Table of contents
1. [Gaussian mixture models](#part1)
1. [k-means](#part2)


In [1]:
from mllab import *
from sklearn import datasets as data


Packages:
	nympy as np
	matplotlib.pyplot as plt
	seaborn as sns

Functions:
	plotXY
	plot_frontiere
	map_regions
	covariance
	plot_cov
	sample_gmm
	scatter
	plot_level_set
	gaussian_sample



# Gaussian mixture models <a id="part1"></a>
>Draw a sample of size 200 from a Gaussian mixture model with parameters
$$
    \begin{cases}
        \pi_1 &= 0.33\\
        \mu_1 &= (0, 0),
    \end{cases}
$$
$$
    \begin{cases}
        \pi_2 &= 0.33\\
        \mu_2 &= (5, 0),
    \end{cases}
$$
$$
    \begin{cases}
        \pi_3 &= 0.34\\
        \mu_3 &= (2, -5),
    \end{cases}
$$
and with same identity covariance matrix.
Plot the "contours" of the three clusters and their centers.

In [None]:
# Answer

>Complete the following emplementation of soft k-means.

In [None]:
# Answer
class SoftKMeans(object):
    def __init__(self, n_components=1, n_iter=100):
        self.n_components = n_components
        self.n_iter = n_iter
        self.weights_ = None
        self.means_ = None
        self.covariances_ = None
        self.log_likelihood_ = None
        self.em_log_likelihood_ = None
    
    def fit(self, X):
        # Initialization
        n_components = self.n_components
        # List of initial weights, means and covariances
        # (initial means can be taken at random among the trainin points)
        # To do

        # End to do
        
        # Multivariate Gaussian pdf
        def pdf(X, mean, cov):
            invcov = np.linalg.inv(cov + 1e-6*np.eye(cov.shape[0]))
            r = np.exp( -0.5*np.diag((X-mean) @ invcov @ (X-mean).T) )
            r *= np.sqrt(np.linalg.det(invcov/(2*np.pi)))
            return r
            
        # Loop
        log_likelihood = []  # Marginal log-likelihood at each iteration
        em_log_likelihood = []  # Average joint log-likelihood at each iteration
        # Compute the matrix of joint density values (size #components x #points)
        # and update weights, means and covariances
        for it in range(self.n_iter):
            # Parameter update
            # To do

            # End to do
            
            # Log-likelihoods computation
            # To do

            # End to do
        self.weights_ = np.array(weights)
        self.means_ = np.array(means)
        self.covariances_ = np.array(covariances)
        self.log_likelihood_ = log_likelihood
        self.em_log_likelihood_ = em_log_likelihood


>Fit a soft k-means with 3 components and 20 iterations on the data.
Print the prior probabilities.
Plot the training dataset along with the means and the covariance matrices estimated.

>Are the results consistent with the way the data has been generated?

In [None]:
# Answer

>Plot the two log-likelihoods versus the number of iterations.
Is the marginal log-likelihood non-decreasing?
Is it bounded from below by the average joint log-likelihood?

In [None]:
# Answer

>With the help of the [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture), estimate the parameters of a 3-componenents Gaussian mixture.
Print the prior probabilities and the maximal value of log-likelihood.
Plot the training dataset along with the means and the covariance matrices estimated.

>Are the results consistent with the your own implementation?

In [None]:
# Answer

>Repeat the estimation several (let us say 9) times.
Are the results stable?

In [None]:
# Answer

>What if initial parameters are set at random (look for the suitable parameter of [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture))?

In [None]:
# Answer

>Complete the following script in order to:
1. sample from a Gaussian mixture;
1. fit a  [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture) model;
1. plot the training set, the means and the variance "contours".

>Analyze the results (there should be "unexpected" results).

In [None]:
# Answer
gmm = GaussianMixture(n_components=2)

for it in range(6):
    plt.figure(figsize=(10, 3))
    for it, (weights, means, covariances) in enumerate([
        ([0.5, 0.5], [[0, 0], [5, 0]], [(1, 1, 0), (1, 1, 0)]),
        ([0.05, 0.95], [[0, 0], [5, 0]], [(1, 1, 0), (1, 1, 0)]),
        ([0.5, 0.5], [[0, 0], [0, 0]], [(10, 1, 0), (1, 10, 0)]),
        ([0.5, 0.5], [[0, 0], [5, -5]], [(10, 1, 0), (1, 10, 0)])]):
        X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=100)
        # To do

        # End to do

# k-means <a id="part2"></a>


>Given the followin data, fit a [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture).
Display the cluster centers along with the partitioning (use the function `map_regions`).

In [10]:
(weights, means, covariances) = ([0.3, 0.2, 0.5], [[-5, -1], [5, 0], [2, -5]],
                                 [(1, 5, np.pi/3), (1, 5, np.pi/3), (5, 1, np.pi/3)])
X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=200)

In [None]:
# Answer

>Do the same with [k-means](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).
What is the difference?

In [None]:
# Answer

>Given the following dataset, perform several [k-means](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) clustering with a random initialization (original version of k-means).
What do you observe?

In [13]:
(weights, means, covariances) = ([0.05, 0.2, 0.75], [[-5, -1], [5, 0], [2, -5]],
                                 [(1, 5, np.pi/3), (1, 5, np.pi/3), (5, 1, np.pi/3)])
X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=100)

In [None]:
# Answer

>Here, we aim at analyzing [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture) and [k-means](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) for non-convex clusters.
For this purpose:
1. generate [moons](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html) (then [circles](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html#sklearn.datasets.make_circles)) with noise set to $0.1$;
1. plot the two classes with `plotXY`;
1. display the two-cluster partitioning (`map_regions`) obtained with [Gaussian mixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture) and [k-means](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).

>What do you observe?

In [None]:
# Answer