This thesis focuses on developing and evaluating captioning models which can communicate their interpretation of the visual world in natural language. The key research gaps addressed by this thesis include the lack of exploitation of the language space, inapt handling of rare words, and the dearth of systematic research into captioning-specific evaluation metrics. This thesis proposes linguistically-aware features to improve the visual interpretation, sub-word language modelling to tackle out-of-vocabulary words, and a novel framework for soft-candidate-based image captioning. This thesis also presents state-of-the-art deterministic and learning-based evaluation metrics to capture the quality of captions at various linguistic levels.
|Qualification||Doctor of Philosophy|
|Award date||7 Apr 2021|
|Publication status||Unpublished - 2020|