Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

Research output: Chapter in Book/Conference paperConference paper

2 Citations (Scopus)

Abstract

Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.

Original languageEnglish
Title of host publicationProceedings of the 16th IEEE International Conference on Data Mining Workshops
EditorsRicardo Baeza-Yates , Zhi-Hua Zhou
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages770-777
Number of pages8
ISBN (Electronic)9781509054725
DOIs
Publication statusPublished - 30 Jan 2017
Event16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 - Barcelona, Spain
Duration: 12 Dec 201615 Dec 2016

Conference

Conference16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
CountrySpain
CityBarcelona
Period12/12/1615/12/16

Fingerprint Dive into the research topics of 'Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem'. Together they form a unique fingerprint.

Cite this