Objective: This study aimed to develop (1) a new ultrasound definition for aggregates and (2) a semi-quantitative ultrasound scoring system (0–3) for tophus, double contour and aggregates. Furthermore, the intra- and inter-reader reliabilities of both the re-defined aggregates and the semi-quantitative scoring system were assessed using static image exercises. Methods: Thirty-seven rheumatologists were invited. A Delphi process was used for re-defining aggregates and for selecting a semi-quantitative scoring system with >75% agreement obligate for reaching consensus. Subsequently, a web-based exercise on static ultrasound images was conducted in order to assess the reliability of both the re-defined aggregates and the semi-quantitative scoring system. Results: Twenty rheumatologists contributed to all rounds of the Delphi and image exercises. A consensual re-definition of aggregates was obtained after three Delphi rounds but needed an overarching principle for scoring aggregates in patients. A consensus-based semi-quantitative ultrasound scoring system for gout lesions was developed after two Delphi rounds. The re-definition of aggregates showed good intra- and inter-reader reliability (κ-values 0.71 and 0.61). The reliabilities of the scoring system were good for all lesions with slightly higher intra-reader (κ-values 0.74–0.80) than inter-reader reliabilities (κ-values 0.61–0.67). Conclusion: A re-definition of aggregates was obtained with a good reliability when assessing static images. The first consensus-based semi-quantitative ultrasound scoring system for gout-specific lesions was developed with good inter- and intra-reader reliability for all lesions when tested in static images. The next step is to assess the reliabilities when scoring lesions in patients.