RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests

    Research output: Contribution to journalArticle

    31 Citations (Scopus)
    530 Downloads (Pure)

    Abstract

    This paper presents an efficient framework to perform recognition and grasp detection of objects from RGB-D images of real scenes. The framework uses a novel architecture of hierarchical cascaded forests, in which object-class and grasp-pose probabilities are computed at different levels of an image hierarchy (e.g., patch and object levels) and fused to infer the class and the grasp of unseen objects. We introduce a novel training objective function that minimizes the uncertainties of the class labels and the grasp ground truths at the leaves of the forests, thereby enabling the framework to perform the recognition and grasp detection of objects. Our objective function is learned from features that are extracted from RGB-D point clouds of the objects. For that, we propose a novel method to encode an RGB-D point cloud into a representation that facilitates the use of large convolution neural networks to extract discriminative features from RGB-D images. We evaluate our framework on challenging object datasets, where we demonstrate that our framework outperforms the state-of-the-art methods in terms of object-recognition and grasp-detection accuracies. We also show experiments by using live video streams from a Kinect mounted on our in-house robotic platform.
    Original languageEnglish
    Pages (from-to)547-564
    JournalIEEE Transactions on Robotics
    Volume33
    Issue number3
    DOIs
    Publication statusPublished - Jun 2017

    Fingerprint

    Object recognition
    Convolution
    Labels
    Robotics
    Neural networks
    Experiments
    Uncertainty

    Cite this

    @article{ecce08066dd644d9833a09307f6c8987,
    title = "RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests",
    abstract = "This paper presents an efficient framework to perform recognition and grasp detection of objects from RGB-D images of real scenes. The framework uses a novel architecture of hierarchical cascaded forests, in which object-class and grasp-pose probabilities are computed at different levels of an image hierarchy (e.g., patch and object levels) and fused to infer the class and the grasp of unseen objects. We introduce a novel training objective function that minimizes the uncertainties of the class labels and the grasp ground truths at the leaves of the forests, thereby enabling the framework to perform the recognition and grasp detection of objects. Our objective function is learned from features that are extracted from RGB-D point clouds of the objects. For that, we propose a novel method to encode an RGB-D point cloud into a representation that facilitates the use of large convolution neural networks to extract discriminative features from RGB-D images. We evaluate our framework on challenging object datasets, where we demonstrate that our framework outperforms the state-of-the-art methods in terms of object-recognition and grasp-detection accuracies. We also show experiments by using live video streams from a Kinect mounted on our in-house robotic platform.",
    author = "Umar Asif and Mohammed Bennamoun and Ferdous Sohel",
    year = "2017",
    month = "6",
    doi = "10.1109/TRO.2016.2638453",
    language = "English",
    volume = "33",
    pages = "547--564",
    journal = "IEEE Transactions on Robotics",
    issn = "1552-3098",
    publisher = "IEEE, Institute of Electrical and Electronics Engineers",
    number = "3",

    }

    RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests. / Asif, Umar; Bennamoun, Mohammed; Sohel, Ferdous.

    In: IEEE Transactions on Robotics, Vol. 33, No. 3, 06.2017, p. 547-564.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests

    AU - Asif, Umar

    AU - Bennamoun, Mohammed

    AU - Sohel, Ferdous

    PY - 2017/6

    Y1 - 2017/6

    N2 - This paper presents an efficient framework to perform recognition and grasp detection of objects from RGB-D images of real scenes. The framework uses a novel architecture of hierarchical cascaded forests, in which object-class and grasp-pose probabilities are computed at different levels of an image hierarchy (e.g., patch and object levels) and fused to infer the class and the grasp of unseen objects. We introduce a novel training objective function that minimizes the uncertainties of the class labels and the grasp ground truths at the leaves of the forests, thereby enabling the framework to perform the recognition and grasp detection of objects. Our objective function is learned from features that are extracted from RGB-D point clouds of the objects. For that, we propose a novel method to encode an RGB-D point cloud into a representation that facilitates the use of large convolution neural networks to extract discriminative features from RGB-D images. We evaluate our framework on challenging object datasets, where we demonstrate that our framework outperforms the state-of-the-art methods in terms of object-recognition and grasp-detection accuracies. We also show experiments by using live video streams from a Kinect mounted on our in-house robotic platform.

    AB - This paper presents an efficient framework to perform recognition and grasp detection of objects from RGB-D images of real scenes. The framework uses a novel architecture of hierarchical cascaded forests, in which object-class and grasp-pose probabilities are computed at different levels of an image hierarchy (e.g., patch and object levels) and fused to infer the class and the grasp of unseen objects. We introduce a novel training objective function that minimizes the uncertainties of the class labels and the grasp ground truths at the leaves of the forests, thereby enabling the framework to perform the recognition and grasp detection of objects. Our objective function is learned from features that are extracted from RGB-D point clouds of the objects. For that, we propose a novel method to encode an RGB-D point cloud into a representation that facilitates the use of large convolution neural networks to extract discriminative features from RGB-D images. We evaluate our framework on challenging object datasets, where we demonstrate that our framework outperforms the state-of-the-art methods in terms of object-recognition and grasp-detection accuracies. We also show experiments by using live video streams from a Kinect mounted on our in-house robotic platform.

    U2 - 10.1109/TRO.2016.2638453

    DO - 10.1109/TRO.2016.2638453

    M3 - Article

    VL - 33

    SP - 547

    EP - 564

    JO - IEEE Transactions on Robotics

    JF - IEEE Transactions on Robotics

    SN - 1552-3098

    IS - 3

    ER -