Gaussian coupling is Gaussian?

If \(X,Y\) are Gaussian random variables. Is \((X,Y)\) Gaussian?

If you google it, there are many counter-examples proving why this is NOT true in general. However, I find much less discussion on whether this could be true under some condition. So I decide to share my thoughts and also a proof from a optimal transport point of view.

Counter example

We start with a well-known counter-example:

Let \(X\) be a standard Gaussian, \(W\) be coin-flip i.e. a Bernoulli random variable taking values 1 and −1 with equal probability. If we define \(Y = W X\), then \(Y\) is still standard Gaussian by symmetric argument. However \(X+Y\) is not even Gaussian because \(X+Y = 0\) holds with probability \(1/2\).

Another observation is that \(\mathbb{E}[XY] = 0\). So this is also an example when uncorrelated but dependent. Anyhow we have a counter example.

Another simple example

Let \(X,Y\) standard Gaussian with correlation \(1\) i.e. \(\mathbb{E}[XY] = 1\). Then \(\mathbb{E}[(X-Y)^2] = 0\), so \(X=Y\). Therefore \((X,Y)\) is Gaussian!

It is not difficult to generalize this to multi-dimension. Let \(X,Y\) be standard Gaussian on \(\mathbb{R}^d\) and assume \((X,Y)\) has covariance matrix: \(\begin{bmatrix}I&P\\ P^\top&I \end{bmatrix}.\) If \(P\) is orthogonal ¹, i.e. \(PP^\top = I\), then similarly we get

\[\mathbb{E}[(X-PY)^2] = \mathrm{tr}\Big(\begin{bmatrix}I & -P\end{bmatrix}\begin{bmatrix}I&P\\ P^\top&I \end{bmatrix}\begin{bmatrix}I \\ -P^\top\end{bmatrix}\Big) = 0.\]

So \(Y = P^\top X\) and \((X,Y)\) is Gaussian.

Source of randomness

The source of randomness is the key. The covariance structure is able to restrict the source of randomness. If the correlation is so high, there is no room for external randomness to enter. Then the only source of randomness is the marginal, which in our case a Gaussian randomness. As a result, the whole distribution is a linear transformation of the marginal, hence Gaussian.

Our simple example assume marginal to be standard Gaussian, never the less, the proof generalize to general non-degenerate Gaussian marginals, by transforming marginals back to standard Gaussian.

Non-degenerate marginals

We consider \(X \sim \mathcal{N}(a,A)\) and \(Y \sim \mathcal{N}(a,B)\) where \(A,B\) are positive semidefinite matrix. We represent the covariance matrix of \((X,Y)\) by \(\begin{bmatrix}A&C\\ C^\top&B \end{bmatrix}.\) Then by using the Cholesky decomposition \(A = LL^\top\) and \(B= MM^\top\), we transform \((X,Y)\) to \((L^{-1}X, M^{-1}Y)\) and simplify the problem to our simple example before:

\[\mathrm{Cov}\Big(\begin{bmatrix}L^{-1}X\\ M^{-1}Y\end{bmatrix}\Big) = \begin{bmatrix}I&L^{-1}C(M^{-1})^{\top}\\ M^{-1}C^\top (L^{-1})^{\top}&I \end{bmatrix}.\]

Therefore, if \(C = LPM^{\top}\) where \(P\) is orthogonal, \((X,Y)\) is Gaussian.

Theorem 1

Let \(X \sim \mathcal{N}(a,A)\) and \(Y \sim \mathcal{N}(a,B)\) non-degenerate Gaussians. If \((X,Y)\) has covariance matrix of form: \(\begin{bmatrix}A&C\\ C^\top&B \end{bmatrix}.\) where \(C = LPM^{\top}\) and \(P\) is orthogonal, then \((X,Y)\) is Gaussian

Optimal transport point of view

You might think how this related to optimal transport. If you have a hammer everything looks like a nail. Optimal transport is a huge topic with deep connections to many fields of mathematics. Here we only need the basic form:

\[\min_{X\sim \mu, Y\sim \nu}\mathbb{E}[|X-Y|^2].\]

Intuitively, it is finding the best way to couple \(X,Y\) given \(X\sim \mu\) and \(Y\sim \nu\). One of the most celebrated result in optimal transport is the Brenier theorem.

Brenier’s theorem

Suppose \(\mu,\nu\) have second moments and \(\mu\) is absolutely continuous with respect to the Lebesgue measure. Then there exists a unique optimal coupling.

Actually Brenier’s theorem is much more than this, but this is already enough for us now. In our setting, \(\mu\) and \(\nu\) are non-degenerate Gaussian, so the setting of Brenier’s theorem is fulfilled. By the existence part of Brenier’s theorem, there exists \((X,Y)\) which obtains the minimum. Again we denote the covariance matrix of \((X,Y)\) by \(\begin{bmatrix}A&C\\ C^\top&B \end{bmatrix}.\)

Notice that the cost functional \(\mathbb{E}[|X-Y|^2]\) only depends on the mean and covariance of \((X,Y)\). So any \((X,Y)\) with the right mean and covariance would obtain the minimum. Of course, a Gaussian coupling with mean \(\begin{bmatrix}a \\ b \end{bmatrix}\) and covariance \(\begin{bmatrix}A&C\\ C^\top&B \end{bmatrix}\) is an optimal coupling.

By the uniqueness part of Brenier’s theorem, there is only one optimal coupling. So there is a unique distribution of \((X,Y)\) with with mean \(\begin{bmatrix}a \\ b \end{bmatrix}\) and covariance \(\begin{bmatrix}A&C\\ C^\top&B \end{bmatrix}\) subjected to \(X\sim\mu\) and \(Y\sim\nu\), which is the Gaussian one above.

From the discussion, we see if the covariance matrix of the coupling is a optimal coupling, then the coupling is Gaussian. In general, one may consider optimal transport problem with more general quadratic cost:

\[\min_{X\sim \mu, Y\sim \nu}\mathbb{E}[|X-\Lambda Y|^2].\]

If \(\Lambda\) is invertible, Brenier’s theorm still applies by a change of variable. Therefore, similarly one can prove Theorem 1 from an optimal transport approach, ².

Further discussion

In this blog, we are in the setting where marginal are non-degenerate. It would also be interesting to see how this generalizes when marginals are degenerate.

References

\(P\) is a contraction matrix if and only if \(\begin{bmatrix}I&P\\ P^\top&I \end{bmatrix}\) is positive semidefinite. ↩
One can also prove through the closed form Wasserstein distance between two Gaussians ↩