Towards geological knowledge discovery using vector-based semantic similarity

Majigsuren Enkhsaikhan, Wei Liu, Eun Jung Holden, Paul Duuring

Research output: Chapter in Book/Conference paperConference paper

4 Citations (Scopus)

Abstract

It is not uncommon for large organisations and corporations to routinely produce various kinds of reports indefinitely. Apart from archiving them and the occasional retrieval of some, very little can be done to take advantage of these massive resources for valuable knowledge discovery. The under-utilised unstructured data written in natural language text is often referred to as part of the “dark data”. The good news is, recent success of learning distributed representation of words in vector spaces, especially, the similarity and analogy queries enabled by the so-learned word vectors drive a paradigm shift from “document retrieval” to “knowledge retrieval”. In this paper, we investigated how representational learning of words can affect the entity query results from a large domain corpus of geological survey reports. Extensive similarity tests and analogy queries have been performed. It demonstrated the necessity of training domain-specific word embeddings, as pre-trained embeddings are good at capturing morphological relations, but are inadequate for domain specific semantic relations. Carrying out entity extractions prior to word embedding training will further improve the quality of analogy query results. The framework developed in this paper can also be readily applied to other domain specific corpus.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 14th International Conference, ADMA 2018, Proceedings
EditorsGuojun Gan, Xue Li, Shuliang Wang, Bohan Li
PublisherSpringer-Verlag Wien
Pages224-237
Number of pages14
ISBN (Print)9783030050894
DOIs
Publication statusPublished - 1 Jan 2018
Event14th International Conference on Advanced Data Mining and Applications, ADMA 2018 - Nanjing, China
Duration: 16 Nov 201818 Nov 2018
http://adma2018.nuaa.edu.cn/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11323 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Conference on Advanced Data Mining and Applications, ADMA 2018
CountryChina
CityNanjing
Period16/11/1818/11/18
Internet address

Fingerprint Dive into the research topics of 'Towards geological knowledge discovery using vector-based semantic similarity'. Together they form a unique fingerprint.

Cite this