PCA is typically used for describing the distribution of given data. Given a training data $X$, the covariance matrix can be approximated as

$$ C=\frac{1}{N-1}(X-\mu)^{\top}(X-\mu) $$where $ \mu $ is the mean of input matrix $X$. We performs eigen-decomposition upon the matrix $C$, and preserves eigenvectors with highest eigenvalues. We can get

$$ \lambda v = C v $$where matrix $v$ is so called `principal components`,
diagonal elements on $ \lambda $ is the corresponding `latent variables`.

The following are two equivalent MATLAB implementation of the algorithm.

```
function [signals,PC,V,mu] = pca(X)
% PCA - perform principal component analysis upon M-by-N matrix X
[M N] = size(X);
mu = mean(X,2);
X_c = X-repmat(mu,[1 N]);
covar=1/(N-1)*X_c*X_c';
[PC,V]=eig(covar);
V=diag(V);
[junk,idx]=sort(-1*V);
V=V(idx);
PC=PC(:,idx);
signals=PC'*X_c;
```

Or, to solve it with SVD:

```
function [signals,PC,V,mu] = pca(X)
% PCA - perform principal component analysis upon M-by-N matrix X
[M N] = size(X);
mu = mean(X,2);
X_c = X-repmat(mu,[1 N]);
Y=X_c'/sqrt(N-1);
[junk,S,PC]=svd(Y);
S=diag(S);
V=S.*S;
signals=PC'*X_c;
```

With the principal components, we can reduce the dimensionality of the original data by simply projecting with those eigenvectors on the original data (with mean subtraction).

comments powered by Disqus