Grid computing defines the combination of computers or clusters of computers across networks, like the internet, to form a distributed supercomputer. This infrastructure allows scientists to process complex and time consuming computations in parallel on demand. Phylogenetic inference for large data sets of DNA/protein sequences is known to be computationally intensive and could greatly benefit from this parallel supercomputing approach. Bayesian algorithms allows the estimation of important parameters on species divergence modus and time but at the price of running repetitive long series of MonteCarlo simulations. As part of the BioinfoGrid project, we ported parallel MrBayes to the EGEE (Enabling Grids for E-sciencE) grid infrastructure. As case study we investigate both a challenging dataset of arthropod phylogeny and the most appropriate model of amino acid replacement for that data set. Our aim is to resolve the position of basal hexapod lineages with respect to Insecta and Crustacea. In this effort, a new matrix of protein change was derived from the dataset itself, and its performance compared with other currently used models.
|Title of host publication||Bioinformatics Research and Development: 2nd International Conference on Bioinformatics Research and Development|
|Editors||M. Elloumi, J. Küng, M. Linial, R. Murphy, K. Schneider, C. Toma|
|Place of Publication||USA|
|Publication status||Published - 2008|