This dataset contains a collection of spectral centroid images that represent various human actions. Spectral centroid images are time-frequency representations of audio signals that capture the distribution of frequency components over time. In this dataset, the audio signals correspond to different human actions, such as walking, running, jumping, and dancing. The spectral centroid images were generated using the short-time Fourier transform (STFT) of the audio signals, and each image represents a segment of the audio signal. The dataset is designed for tasks such as human action recognition, classification, segmentation, and detection. It can be used to train and evaluate machine learning models that analyze human actions based on audio signals. The dataset is suitable for researchers and practitioners in the fields of signal processing, computer vision, and machine learning who are interested in developing algorithms for human action analysis using audio signals. The dataset is annotated with labels indicating the type of human action represented in each spectral centroid image.