Principal Components Analysis in Excel VBA

by Nick V. Flor • March 9, 2016 • @ProfessorF

I love Excel as an Exploratory Data Analysis tool. I’ve written macros to scrape data off of Twitter (like the millions of tweets for every #GOPDebate), and then to analyze that data in various ways—both quantitatively and qualitatively, both with & without hypotheses!

I have to say, it’s both astonishing and mind-boggling what you find when you let the data speak for itself.  In later posts, I’ll discuss some of my findings.

But for now I’m on a mission to get citizens to level-up their “data analysis” skills—it’s the only way to prevent being fooled by politicians and charlatans.

Anyway, one particularly useful statistical procedure for analyzing large amounts of data is Principal Components Analysis (“PCA”).  It is both a way of reducing the complexity (dimensionality) of your data, and finding structure in your data.

Of course Excel doesn’t implement PCA, and the math on the internet is cryptic. So it’s not clear how to implement the necessary functions. Here’s one way I do PCA, using  Gram-Schmidt QR decomposition.

Note: fmMult is my own matrix multiplication function, you can probably get away with Excel’s mmult. I also reindex the matrices from 0..N-1. This makes it easier for me to convert the code to C# or JavaScript.

 

'
' QR Decomposition Function using Gram-Schmidt
' Adapted from pseudo-code in http://www.math.umn.edu/~olver/ln_/qr.pdf
'
Function GramSchmidt(ByRef A, ByRef R)
Dim mRows, nCols
Dim i, j, k

getNumRowsCols A, mRows, nCols
R = createMatrix(mRows, nCols)
For j = 0 To (nCols - 1)
    R(j, j) = 0
    For i = 0 To (mRows - 1)
        R(j, j) = R(j, j) + A(i, j) ^ 2
    Next
    R(j, j) = Sqr(R(j, j))
    For i = 0 To (mRows - 1)
        A(i, j) = A(i, j) / R(j, j)
    Next
    For k = (j + 1) To (nCols - 1)
        R(j, k) = 0
        For i = 0 To (mRows - 1)
            R(j, k) = R(j, k) + A(i, j) * A(i, k)
        Next
        For i = 0 To (mRows - 1)
            A(i, k) = A(i, k) - A(i, j) * R(j, k)
        Next
    Next
Next
End Function

Function pfGramSchmidt(AR As Range, Optional loops As Integer = -1)
Dim A, Q, R, i, j, E

If loops < 0 Then loops = 0 ' this is really one iteration

A = reindexRange(AR)
getNumRowsCols A, mRows, nCols
R = createIdentity(mRows, nCols)
E = createIdentity(mRows, nCols)

For i = 0 To loops    
    A = fmMult(R, A) ' A=R*Q
    GramSchmidt A, R ' A is now Q, Diagonals are Eigenvalues
    E = fmMult(E, A) ' E: Eigenvectors
Next
'
' Return a Range with Q, R, and Eigenvalues 
'
ReDim res(mRows - 1, (3 * nCols) - 1)
For i = 0 To mRows - 1
    For j = 0 To nCols - 1
        res(i, j) = A(i, j)
        res(i, nCols + j) = R(i, j)
        res(i, 2 * nCols + j) = E(i, j)
    Next
Next
pfGramSchmidt = res
End Function

 

If you have any questions about how to use this, or to implement it yourself, I’m always happy to help.

Good luck.

cropped-ProfessorFLogoFreedomSquare.png

Cronbach’s Alpha in Excel VBA

by Nick V. Flor • March 8, 2016 • @ProfessorF

Here’s the bulk of my implementation of Cronbach’s Alpha in Excel VBA. If you’re having problems with the formula on the internet, hopefully this piece of code explains how to translate the math into code. This is the formula:

CronbachsAlpha

Note: I use 0-based matrices (0..N-1), vs Excel matrices which go from 1..N. This makes it easier for me to convert the code to JavaScript or CE.  Also, there’s a pre-processing function I use (not shown) that converts an Excel Selection range into a 0-based matrix, which is the Y parameter and corresponds to your columns of data that are part of your index X. Finally I use many utility functions like createMatrix, which should be self-explanatory.

Function fCron(Y)
Dim mRows, nCols, i, j
Dim X, V, sumVY, VX

    getNumRowsCols Y, mRows, nCols
    
    ' Calculate X vector=SUM Y-Rows
    X = createMatrix(mRows, 1)
    For i = 0 To (mRows - 1)
        For j = 0 To (nCols - 1)
            X(i, 0) = X(i, 0) + Y(i, j)
        Next
    Next
    VX = fVar(X, 0) ' Variance of X
    
    ' Calculate Variances of Ys
    VY = createMatrix(1, nCols)
    sumVY = 0
    For j = 0 To (nCols - 1)
        VY(0, j) = fVar(Y, j) ' for debugging
        sumVY = sumVY + VY(0, j) ' Sum of Y Variances
    Next
    
    k = nCols
    fCron = (k / (k - 1)) * (1 - (sumVY / VX))
End Function

If you need help implementing any of these functions or understanding the code, I’m happy to answer your questions.

In my opinion, Excel cannot be beat for Exploratory Data Analysis of social media postings. It’s just missing a couple of key functions like factor extractions, factor rotation, reliability indicators, clustering, sentiment analysis, and social networking graphs.

Good luck!

cropped-ProfessorFLogoFreedomSquare.png

Varimax Rotation in Excel VBA

by Nick V. Flor • March 7, 2016 • @ProfessorF

Here’s my implementation of the Varimax Rotation algorithm — written in Excel VBA — based on IBM’s SPSS algorithm, which in turn is based on Harman (1976). The main difference is that I use a counter to control iteration. Note: I use custom functions, but the algorithm follows IBM’s math, so the functions should be self-explanatory.

Function fVarimax(F, loops)
    Dim H, Hinv, i, j, k, Lambda, LL, mRows, nCols, Omega
    Dim u, v, A, B, C, D, X, Y, P, rot
    Hinv = calcHinv(F)
    H = calcH(F)
    Lambda = fmMult(Hinv, F)

    getNumRowsCols Lambda, mRows, nCols
    For Z = 1 To loops 
        For i = 0 To (nCols - 2)
            For j = (i + 1) To (nCols - 1)
                u = calcU(Lambda, i, j)
                v = calcV(Lambda, i, j)
                A = 0
                B = 0
                C = 0
                D = 0
                For k = 0 To (mRows - 1)
                    A = A + u(k, 0)
                    B = B + v(k, 0)
                    C = C + (u(k, 0) ^ 2 - v(k, 0) ^ 2)
                    D = D + (2 * u(k, 0) * v(k, 0))
               Next
               X = D - (2 * A * B) / mRows
               Y = C - (A ^ 2 - B ^ 2) / mRows
               P = 0.25 * Atn(X / Y)
               rot = createRot(P)
               LL = rotateFactors(Lambda, i, j, rot)
               Lambda = replaceFactors(Lambda, LL, i, j)
            Next
        Next
    Next
    Omega = fmMult(H, Lambda)
    fVarimax = Omega
End Function
Harman, H. H. (1976). Modern factor analysis. University of Chicago Press.
IBM Corp (2013). IBM SPSS Statistics 23 Documentation. Armonk, NY: IBM Corp.  Retrieved March 6, 2016.
cropped-ProfessorFLogoFreedomSquare.png