Audio Signal Processing

There are two general approaches to audio processing with deep learning:

  1. Turn the audio file into an image, typically a log-scaled mel spectrogram wavelength.
  2. Process the data in a streaming form, usually in binary.

Convert Audio File into Image

Typically we convert to a log-scaled mel spectrogram as follows:

This creates an image that looks like this:

Log Mel Spectrogram of Audio data.

With this we can apply any neural network that might apply to an image, usually a Convolutional Neural Network (CNN). That is the purpose of the image transformation.

Use Streaming form of Data