Publication

Publication
Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

Geochemistry, Geophysics, Geosystems 18, 3, 994-1012 (2017)

Classification of geochemical data based on multivariate statistical analyses: Complementary roles of cluster, principal component, and independent component analyses

Author

Iwamori, H., Yoshida, K., Nakamura, H., Kuwatani, T., Hamada, M., Haraguchi, S. and Ueki, K.

Abstract

Identifying the data structure including trends and groups/clusters in geochemical problems is essential to discuss the origin of sources and processes from the observed variability of data. An increasing number and high dimensionality of recent geochemical data require efficient and accurate multivariate statistical analysis methods. In this paper, we show the relationship and complementary roles of k-means cluster analysis (KCA), principal component analysis (PCA), and independent component analysis (ICA) to capture the true data structure. When the data are preprocessed by primary standardization (i.e., with the zero mean and normalized by the standard deviation), KCA and PCA provide essentially the same results, although the former returns the solution in a discretized space. When the data are preprocessed by whitening (i.e., normalized by eigenvalues along the principal components), KCA and ICA may identify a set of independent trends and groups, irrespective of the amplitude (power) of variance. As an example, basalt isotopic compositions have been analyzed with KCA on the whitened data, demonstrating clear rock type/tectonic occurrence/mantle end-member discrimination. Therefore, the combination of these methods, particularly KCA on whitened data, is useful to capture and discuss the data structure of various geochemical systems, for which an Excel program is provided.

DOI: 10.1002/2016GC006663