ELCA evaluation for keyword search on probabilistic XML data

R. Zhou, C. Liu, Jianxin Li, J.X. Yu

    Research output: Contribution to journalArticle

    15 Citations (Scopus)

    Abstract

    As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. © 2012 Springer Science+Business Media, LLC.
    Original languageEnglish
    Pages (from-to)171-193
    JournalWorld Wide Web
    Volume16
    Issue number2
    DOIs
    Publication statusPublished - 2013

    Fingerprint

    XML
    Semantics
    Information management
    Scalability
    Industry

    Cite this

    Zhou, R. ; Liu, C. ; Li, Jianxin ; Yu, J.X. / ELCA evaluation for keyword search on probabilistic XML data. In: World Wide Web. 2013 ; Vol. 16, No. 2. pp. 171-193.
    @article{a052766accd34076b2835a48184ee97e,
    title = "ELCA evaluation for keyword search on probabilistic XML data",
    abstract = "As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. {\circledC} 2012 Springer Science+Business Media, LLC.",
    author = "R. Zhou and C. Liu and Jianxin Li and J.X. Yu",
    year = "2013",
    doi = "10.1007/s11280-012-0166-4",
    language = "English",
    volume = "16",
    pages = "171--193",
    journal = "World Wide Web",
    issn = "1386-145X",
    publisher = "Springer",
    number = "2",

    }

    ELCA evaluation for keyword search on probabilistic XML data. / Zhou, R.; Liu, C.; Li, Jianxin; Yu, J.X.

    In: World Wide Web, Vol. 16, No. 2, 2013, p. 171-193.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - ELCA evaluation for keyword search on probabilistic XML data

    AU - Zhou, R.

    AU - Liu, C.

    AU - Li, Jianxin

    AU - Yu, J.X.

    PY - 2013

    Y1 - 2013

    N2 - As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. © 2012 Springer Science+Business Media, LLC.

    AB - As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. © 2012 Springer Science+Business Media, LLC.

    U2 - 10.1007/s11280-012-0166-4

    DO - 10.1007/s11280-012-0166-4

    M3 - Article

    VL - 16

    SP - 171

    EP - 193

    JO - World Wide Web

    JF - World Wide Web

    SN - 1386-145X

    IS - 2

    ER -