We introduce a new photometric estimator of the Hi mass fraction in local galaxies, which is a linear combination of four parameters: stellar mass, stellar surface mass density, NUV-r colour and g-i colour gradient. It is calibrated using samples of nearby galaxies (0.025 <z <0.05) with Hi line detections from the GALEX Arecibo SDSS Survey (GASS) and Arecibo Legacy Fast ALFA (ALFALFA) surveys, and it is demonstrated to provide unbiased estimates even for Hi-rich galaxies. We apply this estimator to a sample of ∼24000 galaxies from the Sloan Digital Sky Survey (SDSS)/Data Release 7 (DR7) in the same redshift range. We then bin these galaxies by stellar mass and Hi mass fraction and compute projected two-point cross-correlation functions with respect to a reference galaxy sample. Results are compared with predictions from current semi-analytic models of galaxy formation. The agreement is good for galaxies with stellar masses larger than 10 10M ⊙, but not for lower mass systems. We then extend the analysis by studying the bias in the clustering of Hi-poor or Hi-rich galaxies with respect to galaxies with normal Hi content on scales between 100kpc and ∼5Mpc. For the Hi-deficient population, the strongest bias effects arise when the Hi deficiency is defined in comparison to galaxies of the same stellar mass and size. This is not reproduced by the semi-analytic models, where the quenching of star formation in satellites occurs by 'starvation' and does not depend on their internal structure. Hi-rich galaxies with masses greater than 10 10M ⊙ are found to be antibiased compared to galaxies with 'normal' Hi content. Interestingly, no such effect is found for lower mass galaxies. © 2012 The Authors Monthly Notices of the Royal Astronomical Society © 2012 RAS.