This thesis proposes novel deep leaning solutions for real-world applications, including action recognition and prediction from videos. The first part leverages Convolutional Neural Networks (CNN) to learn deep spatial-temporal information of skeleton sequences for action recognition. The second part presents three new methods based on CNN and Recurrent Neural Networks for action prediction. The last part of this thesis presents a new feature extraction method based on CNN for better person reidentification. Extensive experiments have shown the superiority of the proposed methods compared to state of-the-art methods for action recognition, action prediction and person re-identification.