Psychologists developed Multiple Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors [Thurstone, 1935]. In this form of factor analysis, the Varimax factor rotation redraws the axes through the multidimensional factors to make them sparse and thus make them more interpretable [Kaiser, 1958].
Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant [Thurstone, 1947, Anderson and Rubin, 1956]. These objections are still reported in all contemporary multivariate statistics textbooks. However, this vintage form of factor analysis has survived and is widely popular because, empirically, the factor rotation often makes the factors easier to interpret. We argue that the rotation makes the factors easier to interpret because, in fact, the Varimax factor rotation performs statistical inference.
We show that Principal Components Analysis (PCA) with the Varimax axes provides a unified spectral estimation strategy for a broad class of semi-parametric factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation (i.e., “topic modeling”). In addition, we show that Thurstone’s widely employed sparsity diagnostics implicitly assess a key leptokurtic condition that makes the axes statistically identifiable in these models. Taken together, this shows that the know-how of Vintage Factor Analysis performs statistical inference, reversing nearly a century of statistical thinking on the topic. We illustrate these techniques use on two large bibliometric examples (a citation network and a text corpus). With a sparse eigensolver, PCA with Varimax is both fast and stable. Combined with Thurstone’s straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.
Followed by the full Discussion meeting
Karl Rohe and Muzhe Zeng, University of Wisconsin-Madison, USA