NNEval: Neural Network based Evaluation Metric for Image Captioning

Naeha Sharif, Lyndon Rhys White, Mohammed Bennamoun, Syed Shah

Research output: Chapter in Book/Conference paperConference paper

Abstract

The automatic evaluation of image descriptions is an intricate task, and it is highly important in the development and fine-grained analysis of captioning systems. Existing metrics to automatically evaluate image captioning systems fail to achieve a satisfactory level of correlation with human judgements at the sentence-level. Moreover, these metrics, unlike humans, tend to focus on specific aspects of quality, such as the n-gram overlap or the semantic meaning. In this paper, we present the first learning-based metric to evaluate image captions. Our proposed framework enables us to incorporate both lexical and semantic information into a single learned metric. This results in an evaluator that takes into account various linguistic features to assess the caption quality. The experiments we performed to assess the proposed metric, show improvements upon the state of the art in terms of correlation with human judgements and demonstrate its superior robustness to distractions.
Original languageEnglish
Title of host publicationECCV
PublisherSpringer
Pages39-55
ISBN (Print)9783030012366
DOIs
Publication statusPublished - 2018

Fingerprint

Semantics
Neural networks
Linguistics
Experiments

Cite this

Sharif, Naeha ; White, Lyndon Rhys ; Bennamoun, Mohammed ; Shah, Syed. / NNEval: Neural Network based Evaluation Metric for Image Captioning. ECCV. Springer, 2018. pp. 39-55
@inproceedings{ef25fd0c59af4c5a8bf554ef3fbc6908,
title = "NNEval: Neural Network based Evaluation Metric for Image Captioning",
abstract = "The automatic evaluation of image descriptions is an intricate task, and it is highly important in the development and fine-grained analysis of captioning systems. Existing metrics to automatically evaluate image captioning systems fail to achieve a satisfactory level of correlation with human judgements at the sentence-level. Moreover, these metrics, unlike humans, tend to focus on specific aspects of quality, such as the n-gram overlap or the semantic meaning. In this paper, we present the first learning-based metric to evaluate image captions. Our proposed framework enables us to incorporate both lexical and semantic information into a single learned metric. This results in an evaluator that takes into account various linguistic features to assess the caption quality. The experiments we performed to assess the proposed metric, show improvements upon the state of the art in terms of correlation with human judgements and demonstrate its superior robustness to distractions.",
author = "Naeha Sharif and White, {Lyndon Rhys} and Mohammed Bennamoun and Syed Shah",
year = "2018",
doi = "10.1007/978-3-030-01237-3_3",
language = "English",
isbn = "9783030012366",
pages = "39--55",
booktitle = "ECCV",
publisher = "Springer",
address = "Netherlands",

}

NNEval: Neural Network based Evaluation Metric for Image Captioning. / Sharif, Naeha; White, Lyndon Rhys; Bennamoun, Mohammed; Shah, Syed.

ECCV. Springer, 2018. p. 39-55.

Research output: Chapter in Book/Conference paperConference paper

TY - GEN

T1 - NNEval: Neural Network based Evaluation Metric for Image Captioning

AU - Sharif, Naeha

AU - White, Lyndon Rhys

AU - Bennamoun, Mohammed

AU - Shah, Syed

PY - 2018

Y1 - 2018

N2 - The automatic evaluation of image descriptions is an intricate task, and it is highly important in the development and fine-grained analysis of captioning systems. Existing metrics to automatically evaluate image captioning systems fail to achieve a satisfactory level of correlation with human judgements at the sentence-level. Moreover, these metrics, unlike humans, tend to focus on specific aspects of quality, such as the n-gram overlap or the semantic meaning. In this paper, we present the first learning-based metric to evaluate image captions. Our proposed framework enables us to incorporate both lexical and semantic information into a single learned metric. This results in an evaluator that takes into account various linguistic features to assess the caption quality. The experiments we performed to assess the proposed metric, show improvements upon the state of the art in terms of correlation with human judgements and demonstrate its superior robustness to distractions.

AB - The automatic evaluation of image descriptions is an intricate task, and it is highly important in the development and fine-grained analysis of captioning systems. Existing metrics to automatically evaluate image captioning systems fail to achieve a satisfactory level of correlation with human judgements at the sentence-level. Moreover, these metrics, unlike humans, tend to focus on specific aspects of quality, such as the n-gram overlap or the semantic meaning. In this paper, we present the first learning-based metric to evaluate image captions. Our proposed framework enables us to incorporate both lexical and semantic information into a single learned metric. This results in an evaluator that takes into account various linguistic features to assess the caption quality. The experiments we performed to assess the proposed metric, show improvements upon the state of the art in terms of correlation with human judgements and demonstrate its superior robustness to distractions.

U2 - 10.1007/978-3-030-01237-3_3

DO - 10.1007/978-3-030-01237-3_3

M3 - Conference paper

SN - 9783030012366

SP - 39

EP - 55

BT - ECCV

PB - Springer

ER -