Identifiable factor analysis for mixed continuous and binary variables based on the Gaussian-Grassmann distribution
Authors
Takashi Arai
Abstract
We develop a factor analysis for mixed continuous and binary observed variables. To this end, we utilized a recently developed multivariate probability distribution for mixed-type random variables, the Gaussian-Grassmann distribution. In the proposed factor analysis, marginalization over latent variables can be performed analytically, yielding an analytical expression for the distribution of the observed variables. This analytical tractability allows model parameters to be estimated using standard gradient-based optimization techniques. We also address improper solutions associated with maximum likelihood factor analysis. We propose a prescription to avoid improper solutions by imposing a constraint that row vectors of the factor loading matrix have the same norm for all features. Then, we prove that the proposed factor analysis is identifiable under the norm constraint. We demonstrate the validity of this norm constraint prescription and numerically verified the model's identifiability using both real and synthetic datasets. We also compare the proposed model with quantification method and found that the proposed model achieves better reproducibility of correlations than the quantification method.