Principal Components Analysis in Excel VBA

March 9, 2016 Rave Admin

by Nick V. Flor • March 9, 2016 • @ProfessorF

I love Excel as an Exploratory Data Analysis tool. I’ve written macros to scrape data off of Twitter (like the millions of tweets for every #GOPDebate), and then to analyze that data in various ways—both quantitatively and qualitatively, both with & without hypotheses!

I have to say, it’s both astonishing and mind-boggling what you find when you let the data speak for itself. In later posts, I’ll discuss some of my findings.

But for now I’m on a mission to get citizens to level-up their “data analysis” skills—it’s the only way to prevent being fooled by politicians and charlatans.

Anyway, one particularly useful statistical procedure for analyzing large amounts of data is Principal Components Analysis (“PCA”). It is both a way of reducing the complexity (dimensionality) of your data, and finding structure in your data.

Of course Excel doesn’t implement PCA, and the math on the internet is cryptic. So it’s not clear how to implement the necessary functions. Here’s one way I do PCA, using Gram-Schmidt QR decomposition.

Note: fmMult is my own matrix multiplication function, you can probably get away with Excel’s mmult. I also reindex the matrices from 0..N-1. This makes it easier for me to convert the code to C# or JavaScript.

'
' QR Decomposition Function using Gram-Schmidt
' Adapted from pseudo-code in http://www.math.umn.edu/~olver/ln_/qr.pdf
'
Function GramSchmidt(ByRef A, ByRef R)
Dim mRows, nCols
Dim i, j, k

getNumRowsCols A, mRows, nCols
R = createMatrix(mRows, nCols)
For j = 0 To (nCols - 1)
    R(j, j) = 0
    For i = 0 To (mRows - 1)
        R(j, j) = R(j, j) + A(i, j) ^ 2
    Next
    R(j, j) = Sqr(R(j, j))
    For i = 0 To (mRows - 1)
        A(i, j) = A(i, j) / R(j, j)
    Next
    For k = (j + 1) To (nCols - 1)
        R(j, k) = 0
        For i = 0 To (mRows - 1)
            R(j, k) = R(j, k) + A(i, j) * A(i, k)
        Next
        For i = 0 To (mRows - 1)
            A(i, k) = A(i, k) - A(i, j) * R(j, k)
        Next
    Next
Next
End Function

Function pfGramSchmidt(AR As Range, Optional loops As Integer = -1)
Dim A, Q, R, i, j, E

If loops < 0 Then loops = 0 ' this is really one iteration

A = reindexRange(AR)
getNumRowsCols A, mRows, nCols
R = createIdentity(mRows, nCols)
E = createIdentity(mRows, nCols)

For i = 0 To loops    
    A = fmMult(R, A) ' A=R*Q
    GramSchmidt A, R ' A is now Q, Diagonals are Eigenvalues
    E = fmMult(E, A) ' E: Eigenvectors
Next
'
' Return a Range with Q, R, and Eigenvalues 
'
ReDim res(mRows - 1, (3 * nCols) - 1)
For i = 0 To mRows - 1
    For j = 0 To nCols - 1
        res(i, j) = A(i, j)
        res(i, nCols + j) = R(i, j)
        res(i, 2 * nCols + j) = E(i, j)
    Next
Next
pfGramSchmidt = res
End Function

If you have any questions about how to use this, or to implement it yourself, I’m always happy to help.

Good luck.

Analytics, Code