--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.16.2 kernelspec: display_name: python3 language: python name: python3 --- # Using constraints for optimization The `projnormal` package makes use of constraints to fit the parameters of the distributions. This tutorial explains how to make use of constraints in the `projnormal` package. ## Why use constraints? Constraints are useful for at least two reasons: to make sure that the parameters are valid and identifiable, and to aid optimization. First, the parameters of the projected normal distribution have to satisfy certain constraints to be valid. In particular, the parameter $\Sigma_x$ must be positive definite, and a constraint during learning ensures that this requirement is met. But the parameters of the projected normal distribution are also underdetermined. This is because the same distribution on the sphere is obtained by projecting the random variable $\mathbf{x}$, and by projecting any scaled version $\lambda \cdot \mathbf{x}$ for any $\lambda > 0$ (with the corresponding scaling of the parameters $\boldsymbol{\mu}_x$ and $\Sigma_x$). Thus, an additional constraint is needed to make the parameters identifiable (i.e. unique). Different identifiability constraints have been used in the literature, but the default in `projnormal` is the constrain $\|\boldsymbol{\mu}_x\| = 1$. The second reason is that adding constraints to the parameters can aid optimization. For example, given scarce data, a diagonal constrain can be added to the $\Sigma_x$, to avoid overfitting or having a rank-deficient covariance matrix. We may also use known structure of the problem, for example constraining $\Sigma_x$ or $\boldsymbol{\mu}_x$ to have a specific structure that reduces the number of free parameters. ## Setting constraints in projnormal `projnormal` implements constraints using a useful built-in feature of PyTorch called parametrizations. In short, once a learnable parameter (e.g. a vector or a matrix) is defined in a PyTorch model, we can add a parametrization to it, which will ensure that the parameter stays within the desired constraints. For further details on parametrizations, see below, and refer to the PyTorch [parametrizations tutorial](https://docs.pytorch.org/tutorials/intermediate/parametrizations.html). `projnormal` provides some built-in constraints that can be added to the available classes. The next section shows how to use these built-in constraints. ### Example: Adding a diagonal constraint to the covariance matrix Let's first initialize a `ProjNormal` model, which comes with the default constraints of the package. ```{code-cell} ipython3 import projnormal import torch N_DIM = 3 # Initialize projected normal class pn_fit = projnormal.classes.ProjNormal( n_dim=N_DIM ) print(pn_fit) ``` The model has two learnable parameters, `mean_x` and `covariance_x`, and each comes with a parametrization. The parameter `mean_x` has the `Sphere()` constraint, and the parameter `covariance_x` has the `PSD()` (positive definite) constraint. Constraints are implemented as Python classes that inherit from `torch.nn.Module` (this is what `Sphere()` and `PSD()` are). `projnormal` has some constraints available in the module `projnormal.classes.constraints`. For example, the class `Diagonal()` constraints a matrix to be diagonal with positive diagonal elements. To use this parametrization for `covariance_x`, we first need to remove the existing parametrization, and then add the new `Diagonal()` constraint. For this, the `parametrize` module of PyTorch is used. ```{code-cell} ipython3 import torch.nn.utils.parametrize as parametrize # Remove existing parametrization for covariance_x parametrize.remove_parametrizations( pn_fit, "covariance_x" ) # Add new parametrization parametrize.register_parametrization( pn_fit, "covariance_x", projnormal.classes.constraints.Diagonal() ) print(pn_fit) ``` We see that the `covariance_x` parameter now has a `Diagonal()` constraint. To verify that the constraint works, let's generate some samples from a projected normal distribution and fit the model to them. The resulting parameter `covariance_x` should be diagonal. ```{code-cell} ipython3 # True parameters mean_x = projnormal.param_sampling.make_mean(N_DIM) cov_x = projnormal.param_sampling.make_spdm(N_DIM) # Generate samples samples = projnormal.formulas.projected_normal.sample( mean_x=mean_x, covariance_x=cov_x, n_samples=1000 ) # Fit the model to the samples pn_fit.max_likelihood(samples, show_progress=False) # Check the covariance matrix print("True covariance: \n", cov_x.numpy()) print("Fitted covariance: \n", pn_fit.covariance_x.detach().numpy()) ``` As expected, the fitted covariance matrix is diagonal. :::{admonition} Parametrizations can be applied to any parameter Although in this tutorial we illustrate parametrizations with the `covariance_x` parameter in the `ProjNormal` class, the same approach can be used to define constraints for any of the parameters in any of the model classes provided by `projnormal`. ::: ### Example: Defining a custom constraint In coarse terms, parametrizations work by taking an unconstrained parameter $\eta$ and transforming it into a constrained model parameter $\theta$, via a function $f(\eta) = \theta$. Let's illustrate this with the example of the diagonal constraint. In this case, the constrained parameter $\theta$ is an $n$-by-$n$ diagonal matrix with positive diagonal elements. One possible choice of $\eta$ is an unconstrained vector of length $n$. The function $f(\eta)$ can be defined with the following two steps: 1. Map each unconstrained element of $\eta$ to a positive value using the exponential function 2. Construct a diagonal matrix with the positive values from step 1 as the diagonal elements To use such parametrization, we need to implement it as a class that inherits from `torch.nn.Module`, where the `forward` method implements the function $f(\eta)$. Optionally, we can also implement the `right_inverse` method, which maps from the constrained parameter $\theta$ back to the unconstrained parameter $\eta$. This is useful for assigning values to parameter $\theta$ in the model. Also, if `right_inverse` is not used, the parametrization assumes that the unconstrained parameter $\eta$ has the same shape as the constrained parameter $\theta$, which is not the case for the example discussed. Taking the above into account, we can implement the `MyDiagonal()` constraint as follows: ```{code-cell} ipython3 import torch.nn as nn class MyDiagonal(nn.Module): """Diagonal matrix with positive diagonal entries.""" def forward(self, eta): diagonal_pos = torch.exp(eta) theta = torch.diag(diagonal_pos) return theta def right_inverse(self, theta): diagonal_pos = torch.diagonal(theta) eta = torch.log(diagonal_pos) return eta # Initialize projected normal class pn_fit2 = projnormal.classes.ProjNormal( n_dim=N_DIM ) # Remove existing parametrization for covariance_x parametrize.remove_parametrizations( pn_fit2, "covariance_x" ) # Add our new parametrization parametrize.register_parametrization( pn_fit2, "covariance_x", MyDiagonal() ) print(pn_fit2) ``` Now the model `pn_fit2` has the `MyDiagonal()` constraint on the `covariance_x` parameter. Importantly, after registering a parametrization, our model can be used as usual. Although now `covariance_x` is parametrized in terms of an unconstrained parameter `eta`, we still only see `covariance_x` when accessing the model normally. For more information, see the [PyTorch parametrizations tutorial](https://docs.pytorch.org/tutorials/intermediate/parametrizations.html). :::{admonition} The `geotorch` package provides useful constraints For the symmetric positive definite constraint, `projnormal` uses the package `geotorch`, which implements a variety of constraints as PyTorch parametrizations. See the [geotorch documentation](https://geotorch.readthedocs.io/en/latest/) for more information. ::: ### Example: Defining advanced constraints To aid the user in defining advanced constraints, we provide a more elaborate example below to illustrate the flexibility allowed by PyTorch parametrizations. Specifically, we will define a class to constrain a matrix $\Sigma$ to be of the form $\Sigma = \lambda \cdot \mathbf{I} + \sum_i^k \beta_i \mathbf{v}_i\mathbf{v}_i^T$, where $\mathbf{I}$ is the identity matrix, $\lambda > 0$, $\mathbf{v}_i$ are a set of basis vectors, and $\beta_i$ are positive scalars that weight the rank-1 matrices $\mathbf{v}_i\mathbf{v}_i^T$. In this parametrization, the learnable parameters are $\lambda$ and the $\beta_i$, while the $\mathbf{v}_i$ are fixed. To implement this constraint, we will make both $\mathbf{I}$ and the $\mathbf{v}_i$ (non-learnable) attributes of the parametrization class. We will add an `n_basis` argument to the class, which specifies the number of basis vectors $\mathbf{v}_i$. We will also make the learnable parameter $\mathrm{log}(\lambda)$ an attribute of the class. The code below implements this constraint: ```{code-cell} ipython3 class MyParametrization(torch.nn.Module): def __init__(self, n_dim, n_basis=2): super().__init__() self.n_dim = n_dim self.n_basis = n_basis # Make basis functions v_i and store as attribute x = torch.linspace(0, 1, n_dim) basis = torch.stack( [torch.sin((i + 1) * torch.pi * x) for i in range(n_basis)], dim=0 ) self.register_buffer("basis", basis) # Make identity matrix and store as buffer self.register_buffer("Id", torch.eye(n_dim)) # Make learnable parameter log_lambda self.log_lambda = nn.Parameter(torch.tensor(0.0)) def forward(self, betas): # Compute lambda * I diag = torch.exp(self.log_lambda) * self.Id # Compute the sum of the rank-1 matrices basis_mat = torch.einsum('ki,k,kj->ij', self.basis, betas, self.basis) # Return the full matrix theta = diag + basis_mat return theta def right_inverse(self, theta): # Extract the basis coefs from the matrix betas = torch.einsum('ki,ij,kj->k', self.basis, theta, self.basis) - \ torch.exp(self.log_lambda) return betas # Initialize projected normal class pn_fit3 = projnormal.classes.ProjNormal( n_dim=6, ) # Remove existing parametrization for covariance_x parametrize.remove_parametrizations( pn_fit3, "covariance_x" ) # Add our new parametrization parametrize.register_parametrization( pn_fit3, "covariance_x", MyParametrization(n_dim=6, n_basis=2) ) print(pn_fit3) ``` Now the `pn_fit3` model has the custom parametrization described, and the parameters `log_lambda` and `betas` can be fit to data as usual.