Learning lightweight ontologies from text across different domains using the web as background knowledge

Wilson Wong

    Research output: ThesisDoctoral Thesis

    1335 Downloads (Pure)

    Abstract

    [Truncated abstract] The ability to provide abstractions of documents in the form of important concepts and their relations is a key asset, not only for bootstrapping the Semantic Web, but also for relieving us from the pressure of information overload. At present, the only viable solution for arriving at these abstractions is manual curation. In this research, ontology learning techniques are developed to automatically discover terms, concepts and relations from text documents. Ontology learning techniques rely on extensive background knowledge, ranging from unstructured data such as text corpora, to structured data such as a semantic lexicon. Manually-curated background knowledge is a scarce resource for many domains and languages, and the effort and cost required to keep the resource abreast of time is often high. More importantly, the size and coverage of manually-curated background knowledge is often inadequate to meet the requirements of most on- tology learning techniques. This thesis investigates the use of the Web as the sole source of dynamic background knowledge across all phases of ontology learning for constructing term clouds (i.e. visual depictions of terms) and lightweight ontologies from documents. To appreciate the significance of term clouds and lightweight ontologies, a system for ontology-assisted document skimming and scanning is developed. This thesis presents a novel ontology learning approach that is devoid of any manually-curated resources, and is applicable across a wide range of domains (the current focus is medicine, technology and economics). More specifically, this research proposes and develops a set of novel techniques that take advantage of Web data to address the following problems: (1) the absence of integrated techniques for cleaning noisy data; (2) the inability of current term extraction techniques to systematically explicate, diversify and consolidate their evidence; (3) the inability of current corpus construction
    Original languageEnglish
    QualificationDoctor of Philosophy
    Publication statusUnpublished - 2009

    Fingerprint

    Dive into the research topics of 'Learning lightweight ontologies from text across different domains using the web as background knowledge'. Together they form a unique fingerprint.

    Cite this