Using constraints for optimization
The projnormal package makes use of constraints to fit the
parameters of the distributions. This tutorial explains how to make
use of constraints in the projnormal package.
Why use constraints?
Constraints are useful for at least two reasons: to make sure that the parameters are valid and identifiable, and to aid optimization.
First, the parameters of the projected normal distribution
have to satisfy certain constraints to be valid. In particular,
the parameter \(\Sigma_x\) must be positive definite,
and a constraint during learning ensures that this
requirement is met.
But the parameters of the projected normal distribution
are also underdetermined.
This is because the same distribution on the
sphere is obtained by projecting the random variable
\(\mathbf{x}\), and by projecting any scaled version
\(\lambda \cdot \mathbf{x}\) for any \(\lambda > 0\)
(with the corresponding scaling of the parameters
\(\boldsymbol{\mu}_x\) and \(\Sigma_x\)).
Thus, an additional constraint is needed to make the
parameters identifiable (i.e. unique). Different identifiability
constraints have been used in the literature, but the
default in projnormal is the constrain \(\|\boldsymbol{\mu}_x\| = 1\).
The second reason is that adding constraints to the parameters can aid optimization. For example, given scarce data, a diagonal constrain can be added to the \(\Sigma_x\), to avoid overfitting or having a rank-deficient covariance matrix. We may also use known structure of the problem, for example constraining \(\Sigma_x\) or \(\boldsymbol{\mu}_x\) to have a specific structure that reduces the number of free parameters.
Setting constraints in projnormal
projnormal implements constraints using a useful built-in
feature of PyTorch called parametrizations.
In short, once a learnable parameter (e.g. a vector or a matrix) is defined in a PyTorch model, we can add a parametrization to it, which will ensure that the parameter stays within the desired constraints. For further details on parametrizations, see below, and refer to the PyTorch parametrizations tutorial.
projnormal provides some built-in constraints that can be
added to the available classes. The next section shows
how to use these built-in constraints.
Example: Adding a diagonal constraint to the covariance matrix
Let’s first initialize a ProjNormal model, which comes with
the default constraints of the package.
import projnormal
import torch
N_DIM = 3
# Initialize projected normal class
pn_fit = projnormal.classes.ProjNormal(
n_dim=N_DIM
)
print(pn_fit)
ParametrizedProjNormal(
(parametrizations): ModuleDict(
(mean_x): ParametrizationList(
(0): Sphere()
)
(covariance_x): ParametrizationList(
(0): PSD(
(0): Stiefel(n=3, k=3, triv=linalg_matrix_exp)
(1): Rn(n=3)
)
)
)
)
The model has two learnable parameters,
mean_x and covariance_x, and each comes with
a parametrization. The parameter mean_x has the
Sphere() constraint, and the parameter covariance_x
has the PSD() (positive definite) constraint.
Constraints are implemented as Python classes
that inherit from torch.nn.Module (this is
what Sphere() and PSD() are).
projnormal has some constraints available in the
module projnormal.classes.constraints.
For example, the class Diagonal() constraints a matrix
to be diagonal with positive diagonal elements.
To use this parametrization for covariance_x, we
first need to remove the existing parametrization, and then
add the new Diagonal() constraint. For this, the
parametrize module of PyTorch is used.
import torch.nn.utils.parametrize as parametrize
# Remove existing parametrization for covariance_x
parametrize.remove_parametrizations(
pn_fit, "covariance_x"
)
# Add new parametrization
parametrize.register_parametrization(
pn_fit, "covariance_x", projnormal.classes.constraints.Diagonal()
)
print(pn_fit)
ParametrizedProjNormal(
(parametrizations): ModuleDict(
(mean_x): ParametrizationList(
(0): Sphere()
)
(covariance_x): ParametrizationList(
(0): Diagonal()
)
)
)
We see that the covariance_x parameter now has a Diagonal() constraint.
To verify that the constraint works, let’s generate some samples
from a projected normal distribution and fit the model to them.
The resulting parameter covariance_x should be diagonal.
# True parameters
mean_x = projnormal.param_sampling.make_mean(N_DIM)
cov_x = projnormal.param_sampling.make_spdm(N_DIM)
# Generate samples
samples = projnormal.formulas.projected_normal.sample(
mean_x=mean_x,
covariance_x=cov_x,
n_samples=1000
)
# Fit the model to the samples
pn_fit.max_likelihood(samples, show_progress=False)
# Check the covariance matrix
print("True covariance: \n", cov_x.numpy())
print("Fitted covariance: \n", pn_fit.covariance_x.detach().numpy())
True covariance:
[[1.0210476 0.04168019 0.724831 ]
[0.0416802 0.49508908 0.05529149]
[0.724831 0.05529149 1.4838649 ]]
Fitted covariance:
[[1.0247469 0. 0. ]
[0. 0.4576905 0. ]
[0. 0. 1.3491126]]
As expected, the fitted covariance matrix is diagonal.
Parametrizations can be applied to any parameter
Although in this tutorial we illustrate parametrizations
with the covariance_x parameter in the ProjNormal class,
the same approach can be used to define constraints for
any of the parameters in any of the model classes
provided by projnormal.
Example: Defining a custom constraint
In coarse terms, parametrizations work by taking an unconstrained parameter \(\eta\) and transforming it into a constrained model parameter \(\theta\), via a function \(f(\eta) = \theta\). Let’s illustrate this with the example of the diagonal constraint.
In this case, the constrained parameter \(\theta\) is an \(n\)-by-\(n\) diagonal matrix with positive diagonal elements. One possible choice of \(\eta\) is an unconstrained vector of length \(n\). The function \(f(\eta)\) can be defined with the following two steps:
Map each unconstrained element of \(\eta\) to a positive value using the exponential function
Construct a diagonal matrix with the positive values from step 1 as the diagonal elements
To use such parametrization, we need to implement it
as a class that inherits from torch.nn.Module, where
the forward method implements the function \(f(\eta)\).
Optionally, we can also implement the right_inverse method,
which maps from the constrained parameter \(\theta\) back to the
unconstrained parameter \(\eta\).
This is useful for assigning values to parameter \(\theta\) in the
model. Also, if right_inverse is not used, the
parametrization assumes that the unconstrained parameter \(\eta\)
has the same shape as the constrained parameter \(\theta\),
which is not the case for the example discussed.
Taking the above into account, we can implement the
MyDiagonal() constraint as follows:
import torch.nn as nn
class MyDiagonal(nn.Module):
"""Diagonal matrix with positive diagonal entries."""
def forward(self, eta):
diagonal_pos = torch.exp(eta)
theta = torch.diag(diagonal_pos)
return theta
def right_inverse(self, theta):
diagonal_pos = torch.diagonal(theta)
eta = torch.log(diagonal_pos)
return eta
# Initialize projected normal class
pn_fit2 = projnormal.classes.ProjNormal(
n_dim=N_DIM
)
# Remove existing parametrization for covariance_x
parametrize.remove_parametrizations(
pn_fit2, "covariance_x"
)
# Add our new parametrization
parametrize.register_parametrization(
pn_fit2, "covariance_x", MyDiagonal()
)
print(pn_fit2)
ParametrizedProjNormal(
(parametrizations): ModuleDict(
(mean_x): ParametrizationList(
(0): Sphere()
)
(covariance_x): ParametrizationList(
(0): MyDiagonal()
)
)
)
Now the model pn_fit2 has the MyDiagonal() constraint
on the covariance_x parameter.
Importantly, after registering a parametrization,
our model can be used as usual. Although now
covariance_x is parametrized in terms of
an unconstrained parameter eta, we still only
see covariance_x when accessing the model
normally. For more information, see the
PyTorch parametrizations tutorial.
The geotorch package provides useful constraints
For the symmetric positive definite constraint, projnormal
uses the package geotorch, which implements a variety
of constraints as PyTorch parametrizations.
See the geotorch documentation
for more information.
Example: Defining advanced constraints
To aid the user in defining advanced constraints, we provide a more elaborate example below to illustrate the flexibility allowed by PyTorch parametrizations.
Specifically, we will define a class to constrain a matrix \(\Sigma\) to be of the form \(\Sigma = \lambda \cdot \mathbf{I} + \sum_i^k \beta_i \mathbf{v}_i\mathbf{v}_i^T\), where \(\mathbf{I}\) is the identity matrix, \(\lambda > 0\), \(\mathbf{v}_i\) are a set of basis vectors, and \(\beta_i\) are positive scalars that weight the rank-1 matrices \(\mathbf{v}_i\mathbf{v}_i^T\). In this parametrization, the learnable parameters are \(\lambda\) and the \(\beta_i\), while the \(\mathbf{v}_i\) are fixed.
To implement this constraint, we will make both
\(\mathbf{I}\) and the \(\mathbf{v}_i\) (non-learnable)
attributes of the parametrization class.
We will add an n_basis argument to the class,
which specifies the number of basis vectors \(\mathbf{v}_i\).
We will also make the learnable parameter
\(\mathrm{log}(\lambda)\) an attribute of the class.
The code below implements this constraint:
class MyParametrization(torch.nn.Module):
def __init__(self, n_dim, n_basis=2):
super().__init__()
self.n_dim = n_dim
self.n_basis = n_basis
# Make basis functions v_i and store as attribute
x = torch.linspace(0, 1, n_dim)
basis = torch.stack(
[torch.sin((i + 1) * torch.pi * x) for i in range(n_basis)], dim=0
)
self.register_buffer("basis", basis)
# Make identity matrix and store as buffer
self.register_buffer("Id", torch.eye(n_dim))
# Make learnable parameter log_lambda
self.log_lambda = nn.Parameter(torch.tensor(0.0))
def forward(self, betas):
# Compute lambda * I
diag = torch.exp(self.log_lambda) * self.Id
# Compute the sum of the rank-1 matrices
basis_mat = torch.einsum('ki,k,kj->ij', self.basis, betas, self.basis)
# Return the full matrix
theta = diag + basis_mat
return theta
def right_inverse(self, theta):
# Extract the basis coefs from the matrix
betas = torch.einsum('ki,ij,kj->k', self.basis, theta, self.basis) - \
torch.exp(self.log_lambda)
return betas
# Initialize projected normal class
pn_fit3 = projnormal.classes.ProjNormal(
n_dim=6,
)
# Remove existing parametrization for covariance_x
parametrize.remove_parametrizations(
pn_fit3, "covariance_x"
)
# Add our new parametrization
parametrize.register_parametrization(
pn_fit3, "covariance_x", MyParametrization(n_dim=6, n_basis=2)
)
print(pn_fit3)
ParametrizedProjNormal(
(parametrizations): ModuleDict(
(mean_x): ParametrizationList(
(0): Sphere()
)
(covariance_x): ParametrizationList(
(0): MyParametrization()
)
)
)
Now the pn_fit3 model has the custom parametrization described,
and the parameters log_lambda and betas
can be fit to data as usual.