Video-based face, expression, and scene recognition are fundamental problems in human-machine interaction, especially when there is a short-length video. In this paper, we present a new derivative sparse representation approach for face and texture recognition using short-length videos. First, it builds local linear subspaces of dynamic texture segments by computing spatiotemporal directional derivatives in a cylinder neighborhood within dynamic textures. Unlike traditional methods, a nonbinary texture coding technique is proposed to extract high-order derivatives using continuous circular and cylinder regions to avoid aliasing effects. Then, these local linear subspaces of texture segments are mapped onto a Grassmann manifold via sparse representation. A new joint sparse representation algorithm is developed to establish the correspondences of subspace points on the manifold for measuring the similarity between two dynamic textures. Extensive experiments on the Honda/UCSD, the CMU motion of body, the YouTube, and the DynTex datasets show that the proposed method consistently outperforms the state-of-the-art methods in dynamic texture recognition, and achieved the encouraging highest accuracy reported to date on the challenging YouTube face dataset. The encouraging experimental results show the effectiveness of the proposed method in video-based face recognition in human-machine system applications.