Abstract
Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.
Original language | English |
---|---|
Title of host publication | Proceedings of the 16th IEEE International Conference on Data Mining Workshops |
Editors | Carlotta Domeniconi, Francesco Gullo, Francesco Bonchi, Francesco Bonchi, Josep Domingo-Ferrer, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Zhi-Hua Zhou, Xindong Wu |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 770-777 |
Number of pages | 8 |
ISBN (Electronic) | 9781509054725 |
DOIs | |
Publication status | Published - 2 Jul 2016 |
Event | 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 - Barcelona, Spain Duration: 12 Dec 2016 → 15 Dec 2016 |
Conference
Conference | 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 12/12/16 → 15/12/16 |