Learning-based Composite Metrics for Improved Caption Evaluation

Research output: Chapter in Book/Conference paperConference paperpeer-review

15 Citations (Scopus)


The evaluation of image caption quality is a challenging task, which requires the assessment of two main aspects in a caption: adequacy and fluency. These quality aspects can be judged using a combination of several linguistic features. However, most of the current image captioning metrics focus only on specific linguistic facets, such as the lexical or semantic, and fail to meet a satisfactory level of correlation with human judgements at the sentence-level. We propose a learning-based framework to incorporate the scores of a set of lexical and semantic metrics as features, to capture the adequacy and fluency of captions at different linguistic levels. Our experimental results demonstrate that composite metrics draw upon the strengths of standalone measures to yield improved correlation and accuracy.
Original languageEnglish
Title of host publicationProceedings of ACL 2018, Student Research Workshop
Place of PublicationAustralia
PublisherAssociation for Computational Linguistics
Number of pages7
ISBN (Electronic)9781948087360
Publication statusPublished - 2018
Event56th Annual Meeting of Association for Computational Linguistics - Melbourne, Australia
Duration: 15 Jul 201820 Jul 2018


Conference56th Annual Meeting of Association for Computational Linguistics
Abbreviated titleACL2018


Dive into the research topics of 'Learning-based Composite Metrics for Improved Caption Evaluation'. Together they form a unique fingerprint.

Cite this