Over the past five decades, several approaches for estimating probabilities of extreme still water levels have been developed. Currently, different methods are applied not only on transnational, but also on national scales, resulting in a heterogeneous level of protection. Applying different statistical methods can yield significantly different estimates of return water levels, but even the use of the same technique can produce large discrepancies, because there is subjective parameter choice at several steps in the model setup. In this paper, we compare probabilities of extreme still water levels estimated using the main direct methods (i.e. the block maxima method and the peaks over threshold method) considering a wide range of strategies to create extreme value dataset and a range of different model setups. We primarily use tide gauge records from the German Bight but also consider data from sites around the UK and Australia for comparison. The focus is on testing the influence of the following three main factors, which can affect the estimates of extreme value statistics: (1) detrending the original data sets; (2) building samples of extreme values from the original data sets; and (3) the record lengths of the original data sets. We find that using different detrending techniques biases the results from extreme value statistics. Hence, we recommend using a 1-year moving average of high waters (or hourly records if these are available) to correct the original data sets for seasonal and long-term sea level changes. Our results highlight that the peaks over threshold method yields more reliable and more stable (i.e. using short records leads to the same results as when using long records) estimates of probabilities of extreme still water levels than the block maxima method. In analysing a variety of threshold selection methods we find that using the 99.7th percentile water level leads to the most stable return water level estimates along the German Bight. This is also valid for the international stations considered. Finally, to provide guidance for coastal engineers and operators, we recommend the peaks over threshold method and define an objective approach for setting up the model. If this is applied routinely around a country, it will help overcome the problem of heterogeneous levels of protection resulting from different methods and varying model setups.