Constructing specialised corpora through analysing domain representativeness of websites

    Research output: Contribution to journalArticlepeer-review

    5 Citations (Scopus)

    Abstract

    The role of the Web for text corpus construction is becoming increasingly significant. However, the contribution of the Web is largely confined to building a general virtual corpus or low quality specialised corpora. In this paper, we introduce a new technique called SPARTAN for constructing specialised corpora from the Web by systematically analysing website contents. Our evaluations show that the corpora constructed using our technique are independent of the search engines employed. In particular, SPARTAN-derived corpora outperform all corpora based on existing techniques for the task of term recognition.
    Original languageEnglish
    Pages (from-to)209-241
    JournalLanguage Resources and Evaluation
    Volume45
    Issue number2
    DOIs
    Publication statusPublished - 2011

    Fingerprint

    Dive into the research topics of 'Constructing specialised corpora through analysing domain representativeness of websites'. Together they form a unique fingerprint.

    Cite this