Cover song identification (CSI) systems typically represent songs as chromagrams which are pairwise compared using different evaluation measurements. Chromagram comparison are usually computationally demanding, making most CSI systems unsuitable for real-world scenarios where millions of songs have to be processed. Evaluation mechanisms such as the ones proposed by Music Information Retrieval Evaluation eXchange (MIREX) handle MIR systems as black boxes, measuring their accuracy as a whole, thus disregarding the effectiveness of each particular component of the system. Moreover, computational times are usually not taken into account in the evaluation process, making difficult to assess the usefulness of the systems when facing large data-sets. In this manuscript, we present a systematic evaluation of the components of two state-of-the-art CSI systems that performed quite well according to MIREX. Our experiments indicate that the performance of those two systems strongly depends on the combination of the song descriptor and the comparison method used to quantify the similarity between songs. Nevertheless, the high computational cost involved in the comparison of song descriptors renders both approaches unfeasible to large data-sets, motivating the development of alternative methods.