Image Recognition
For audio signals recorded in our experiments, we used a time window with appropriate width to capture continuous sound segments. The process of framing audio signals into frames was illustrated in Fig. It is worth noting that the overlapping ratio of time windows varied across different groups of audio signals to prevent potential issues of data imbalance45 in our dataset. Figure 2 The process of framing audio signals into several sound segments with a half of overlapping. Before transferred into the CNN model, these images were resized to 256 × 256 for simplified calculation.