© 2016 Elsevier B.V. Most existing image set classification methods use either appearance variations or temporal information to represent semantic knowledge (relationships) and subject appearance. Such methods usually rely on a predetermined surface structures that the image sets could lie on, and/or are highly influenced by the temporal correlations between images within the image sets. In contrast, this paper introduces a novel RBM-based model which is capable of combining both, appearance variations and temporal information within image sets, to provide an automated and robust representation of semantic knowledge and subject appearance even with small image sets with weak temporal correlations. The structure of the proposed model involves two hidden sets which are used to encode different feature types. The first hidden set is used to represent the dominant appearances (facial features) from appearance variations, while the second set is used to represent the temporal information between different appearances. An extension of the standard Constructive Divergence algorithm is proposed to learn the proposed model encoding two different feature types simultaneously, while isolating them from each other. The proposed model was evaluated for the task of face recognition, using two datasets, namely UCSD/Honda and YouTube Celebrities. The results show superior performance compared to state-of-the-art methods.