Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

Research output: Chapter in Book/Conference paperConference paper

2 Citations (Scopus)

Abstract

Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.

Original languageEnglish
Title of host publicationProceedings of the 16th IEEE International Conference on Data Mining Workshops
EditorsRicardo Baeza-Yates , Zhi-Hua Zhou
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages770-777
Number of pages8
ISBN (Electronic)9781509054725
DOIs
Publication statusPublished - 30 Jan 2017
Event16th IEEE International Conference on Data Mining Workshops, ICDMW 2016 - Barcelona, Spain
Duration: 12 Dec 201615 Dec 2016

Conference

Conference16th IEEE International Conference on Data Mining Workshops, ICDMW 2016
CountrySpain
CityBarcelona
Period12/12/1615/12/16

Fingerprint

Integer programming

Cite this

White, L., Togneri, R., Liu, W., & Bennamoun, M. (2017). Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem. In R. Baeza-Yates , & Z-H. Zhou (Eds.), Proceedings of the 16th IEEE International Conference on Data Mining Workshops (pp. 770-777). [7836744] IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICDMW.2016.0113
White, Lyndon ; Togneri, Roberto ; Liu, Wei ; Bennamoun, Mohammed. / Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem. Proceedings of the 16th IEEE International Conference on Data Mining Workshops. editor / Ricardo Baeza-Yates ; Zhi-Hua Zhou. IEEE, Institute of Electrical and Electronics Engineers, 2017. pp. 770-777
@inproceedings{87702eb941eb4b71b5a406e68ebcecad,
title = "Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem",
abstract = "Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.",
keywords = "Knowledge representation, Sentence embeddings, Sentence generation, Word embeddings",
author = "Lyndon White and Roberto Togneri and Wei Liu and Mohammed Bennamoun",
year = "2017",
month = "1",
day = "30",
doi = "10.1109/ICDMW.2016.0113",
language = "English",
pages = "770--777",
editor = "{Baeza-Yates }, Ricardo and Zhi-Hua Zhou",
booktitle = "Proceedings of the 16th IEEE International Conference on Data Mining Workshops",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
address = "United States",

}

White, L, Togneri, R, Liu, W & Bennamoun, M 2017, Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem. in R Baeza-Yates & Z-H Zhou (eds), Proceedings of the 16th IEEE International Conference on Data Mining Workshops., 7836744, IEEE, Institute of Electrical and Electronics Engineers, pp. 770-777, 16th IEEE International Conference on Data Mining Workshops, ICDMW 2016, Barcelona, Spain, 12/12/16. https://doi.org/10.1109/ICDMW.2016.0113

Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem. / White, Lyndon; Togneri, Roberto; Liu, Wei; Bennamoun, Mohammed.

Proceedings of the 16th IEEE International Conference on Data Mining Workshops. ed. / Ricardo Baeza-Yates ; Zhi-Hua Zhou. IEEE, Institute of Electrical and Electronics Engineers, 2017. p. 770-777 7836744.

Research output: Chapter in Book/Conference paperConference paper

TY - GEN

T1 - Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem

AU - White, Lyndon

AU - Togneri, Roberto

AU - Liu, Wei

AU - Bennamoun, Mohammed

PY - 2017/1/30

Y1 - 2017/1/30

N2 - Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.

AB - Converting a sentence to a meaningful vector representation has uses in many NLP tasks, however very few methods allow that representation to be restored to a human readable sentence. Being able to generate sentences from the vector representations demonstrates the level of information maintained by the embedding representation-in this case a simple sum of word embeddings. We introduce such a method for moving from this vector representation back to the original sentences. This is done using a two stage process, first a greedy algorithm is utilised to convert the vector to a bag of words, and second a simple probabilistic language model is used to order the words to get back the sentence. To the best of our knowledge this is the first work to demonstrate quantitatively the ability to reproduce text from a large corpus based directly on its sentence embeddings.

KW - Knowledge representation

KW - Sentence embeddings

KW - Sentence generation

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85015205420&partnerID=8YFLogxK

UR - http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7836069

U2 - 10.1109/ICDMW.2016.0113

DO - 10.1109/ICDMW.2016.0113

M3 - Conference paper

SP - 770

EP - 777

BT - Proceedings of the 16th IEEE International Conference on Data Mining Workshops

A2 - Baeza-Yates , Ricardo

A2 - Zhou, Zhi-Hua

PB - IEEE, Institute of Electrical and Electronics Engineers

ER -

White L, Togneri R, Liu W, Bennamoun M. Modelling Sentence Generation from Sum of Word Embedding Vectors as a Mixed Integer Programming Problem. In Baeza-Yates R, Zhou Z-H, editors, Proceedings of the 16th IEEE International Conference on Data Mining Workshops. IEEE, Institute of Electrical and Electronics Engineers. 2017. p. 770-777. 7836744 https://doi.org/10.1109/ICDMW.2016.0113