A High-Performance Index for Real-Time Matrix Retrieval

Zeyi Wen, Mingyu Liang, Bingsheng He, Zexin Xia

Research output: Contribution to journalArticlepeer-review

Abstract

With the embedding techniques, many real-world objects can be represented using matrices. For example, a document can be represented by a matrix, where each row of the matrix represents a word. On the other hand, we have witnessed that many applications continuously generate new data represented by matrices and require real-time query answering on the data. These continuously generated matrices need to be well managed for efficient retrieval. In this paper, we propose an index for real-time matrix retrieval. Besides fast query response, the index also supports real-time insertion by exploiting the LSM-tree. Since the index is built for matrices, it consumes much more memory and requires much more time to search than the traditional index for information retrieval. To tackle the challenges, we power our proposed index with precise and fuzzy inverted lists, and propose a series of novel techniques to improve the memory consumption and the search efficiency of the index. The proposed techniques include vector signature, vector residual sorting, hashing based lookup, and dictionary initialization to guarantee the index quality. Comprehensive experimental results show that our proposed index can support real-time search on matrices and is more efficient than the state-of-the-art method.
Original languageEnglish
Pages (from-to)3044-3056
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume34
Issue number7
DOIs
Publication statusPublished - 1 Jul 2022

Cite this