TY - JOUR
T1 - Geochemical characterisation of rock hydration processes using t-SNE
AU - Horrocks, Tom
AU - Holden, Eun Jung
AU - Wedge, Daniel
AU - Wijns, Chris
AU - Fiorentini, Marco
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Dimensionality reduction provides a simple, two-dimensional representation of multi-element geochemical assays, which facilitates visualisation of complex data and enhances their interpretation. A recently proposed dimensionality reduction algorithm, namely t-distributed stochastic neighbour embedding (t-SNE), generates effective two-dimensional representations of a wide range of datasets based on pairwise statistical distances of the input. However, direct application to multi-element geochemical assays has been shown to produce representations which can fail to separate specimens by a desired geological property, such as state of hydration. Since t-SNE is a statistical distance-based method, these sub-optimal representations may be due to the presence of dimensions (i.e., elements) irrelevant to the desired property—an issue often termed the ‘curse of dimensionality’. To address this shortcoming, t-SNE was applied to (i) 31 elements in a geochemical assay database covering 16 000 drill core intervals intersecting the Kevitsa mafic-ultramafic intrusion (Lapland, Finland); and (ii) a subset of 11 elements capable of discriminating between unaltered and altered host rock specimens, as determined by a Random Forest classifier within a recursive feature elimination framework. The resulting representation more effectively separates altered and unaltered specimens, and we demonstrate that it produces more favourable representations than alternative well-known methods (namely, a self-organising map and principal components analysis) applied to the same dataset. We also demonstrate that the proposed t-SNE representation is applicable for re-logging of the specimens’ alteration state as logged by geologists, and in particular provides visual insight into the labels suggested by a black box statistical re-logging algorithm.
AB - Dimensionality reduction provides a simple, two-dimensional representation of multi-element geochemical assays, which facilitates visualisation of complex data and enhances their interpretation. A recently proposed dimensionality reduction algorithm, namely t-distributed stochastic neighbour embedding (t-SNE), generates effective two-dimensional representations of a wide range of datasets based on pairwise statistical distances of the input. However, direct application to multi-element geochemical assays has been shown to produce representations which can fail to separate specimens by a desired geological property, such as state of hydration. Since t-SNE is a statistical distance-based method, these sub-optimal representations may be due to the presence of dimensions (i.e., elements) irrelevant to the desired property—an issue often termed the ‘curse of dimensionality’. To address this shortcoming, t-SNE was applied to (i) 31 elements in a geochemical assay database covering 16 000 drill core intervals intersecting the Kevitsa mafic-ultramafic intrusion (Lapland, Finland); and (ii) a subset of 11 elements capable of discriminating between unaltered and altered host rock specimens, as determined by a Random Forest classifier within a recursive feature elimination framework. The resulting representation more effectively separates altered and unaltered specimens, and we demonstrate that it produces more favourable representations than alternative well-known methods (namely, a self-organising map and principal components analysis) applied to the same dataset. We also demonstrate that the proposed t-SNE representation is applicable for re-logging of the specimens’ alteration state as logged by geologists, and in particular provides visual insight into the labels suggested by a black box statistical re-logging algorithm.
KW - Dimensionality reduction
KW - Feature selection
KW - Geochemistry
KW - Hydration
KW - Random forest
KW - t-SNE
UR - http://www.scopus.com/inward/record.url?scp=85059620145&partnerID=8YFLogxK
U2 - 10.1016/j.cageo.2018.12.005
DO - 10.1016/j.cageo.2018.12.005
M3 - Article
AN - SCOPUS:85059620145
VL - 124
SP - 46
EP - 57
JO - Computers & Geosciences
JF - Computers & Geosciences
SN - 0098-3004
ER -