TY - GEN
T1 - Improving Image-Based Localization with Deep Learning
T2 - 9th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2019
AU - Ward, Isaac Ronald
AU - Jalwana, M. A.Asim K.
AU - Bennamoun, Mohammed
PY - 2020
Y1 - 2020
N2 - This work investigates the impact of the loss function on the performance of Neural Networks, in the context of a monocular, RGB-only, image localization task. A common technique used when regressing a camera’s pose from an image is to formulate the loss as a linear combination of positional and rotational mean squared error (using tuned hyperparameters as coefficients). In this work we observe that changes to rotation and position mutually affect the captured image, and in order to improve performance, a pose regression network’s loss function should include a term which combines the error of both of these coupled quantities. Based on task specific observations and experimental tuning, we present said loss term, and create a new model by appending this loss term to the loss function of the pre-existing pose regression network ‘PoseNet’. We achieve improvements in the localization accuracy of the network for indoor scenes; with reductions of up to 26.7% and 24.0% in the median positional and rotational error respectively, when compared to the default PoseNet.
AB - This work investigates the impact of the loss function on the performance of Neural Networks, in the context of a monocular, RGB-only, image localization task. A common technique used when regressing a camera’s pose from an image is to formulate the loss as a linear combination of positional and rotational mean squared error (using tuned hyperparameters as coefficients). In this work we observe that changes to rotation and position mutually affect the captured image, and in order to improve performance, a pose regression network’s loss function should include a term which combines the error of both of these coupled quantities. Based on task specific observations and experimental tuning, we present said loss term, and create a new model by appending this loss term to the loss function of the pre-existing pose regression network ‘PoseNet’. We achieve improvements in the localization accuracy of the network for indoor scenes; with reductions of up to 26.7% and 24.0% in the median positional and rotational error respectively, when compared to the default PoseNet.
UR - http://www.scopus.com/inward/record.url?scp=85080948473&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-39770-8_9
DO - 10.1007/978-3-030-39770-8_9
M3 - Conference paper
AN - SCOPUS:85080948473
SN - 9783030397692
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 111
EP - 124
BT - Image and Video Technology - PSIVT 2019 International Workshops, Revised Selected Papers
A2 - Dabrowski, Joel Janek
A2 - Rahman, Ashfaqur
A2 - Paul, Manoranjan
PB - Springer Link
Y2 - 18 November 2019 through 22 November 2019
ER -