Randomised population-based algorithms, such as evolutionary, genetic and swarm-based algorithms, and their hybrids with traditional search techniques, have proven successful and robust on many difficult real-valued optimisation problems. This success, along with the readily applicable nature of these techniques, has led to an explosion in the number of algorithms and variants proposed. In order for the field to advance it is necessary to carry out effective comparative evaluations of these algorithms, and thereby better identify and understand those properties that lead to better performance.This paper discusses the difficulties of providing benchmarking of evolutionary and allied algorithms that is both meaningful and logistically viable. To be meaningful the benchmarking test must give a fair comparison that is free, as far as possible, from biases that favour one style of algorithm over another. To be logistically viable it must overcome the need for pairwise comparison between all the proposed algorithms. To address the first problem, we begin by attempting to identify the biases that are inherent in commonly used benchmarking functions. We then describe a suite of test problems, generated recursively as self-similar or fractal landscapes, designed to overcome these biases. For the second, we describe a server that uses web services to allow researchers to 'plug in' their algorithms, running on their local machines, to a central benchmarking repository.