使用Python对音频文件进行数据预处理
导入一些基本的库
[Python] 纯文本查看 复制代码 import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import tensorflow as tf
from tensorflow import keras
import pathlib
import seaborn as sns
from IPython import display
from scipy.signal import wiener
从音频文件中读取音频信号
当前目录下,有音频文件example.wav,我们以此为示例。由于在大多数场景中,我们需要对数据进行批量处理,所以使用tf.data.Dataset来对数据进行预处理。代码中的AUTOTUNE仅是用来加速的,可以忽略。
[Python] 纯文本查看 复制代码 filenames = ['./example.wav']
filename_dataset = tf.data.Dataset.from_tensor_slices(filenames)
for filename in filename_dataset.take(1):
print(filename.numpy())
def get_audio_from_filename(filename):
audio_binary = tf.io.read_file(filename)
audio, _ = tf.audio.decode_wav(audio_binary)
return tf.squeeze(audio, axis=-1)
AUTOTUNE = tf.data.AUTOTUNE
waveform_dataset = filename_dataset.map(get_audio_from_filename, num_parallel_calls = AUTOTUNE)
for waveform in waveform_dataset.take(1):
print(waveform.shape)
输出:
b'./example.wav'
(11146,)
使用读取到的waveform播放音频
example.wav的采样率为16kHz。
[Python] 纯文本查看 复制代码 def playAudio(waveform, sample_rate = 16000):
display.display(display.Audio(waveform, rate=sample_rate))
playAudio(waveform)
使用stft进行特征提取,获得频谱
[Python] 纯文本查看 复制代码 def get_spectrogram(waveform, frame_length, frame_step):
waveform = tf.cast(waveform, tf.float32)
spectrogram = tf.signal.stft(
waveform, frame_length=frame_length, frame_step=frame_step)
spectrogram = tf.abs(spectrogram)
return spectrogram
spectrogram = get_spectrogram(waveform, 256, 84)
print(spectrogram.shape)
绘制波形图和频谱图
[Python] 纯文本查看 复制代码 def plot_waveform_and_spectrogram(waveform, spectrogram):
def getstep():
step = waveform.shape[0] // spectrogram.shape[0]
if waveform.shape[0] % spectrogram.shape[0] == 0:
return step
return step + 1
fig, axes = plt.subplots(2, figsize=(12, 8))
timescale = np.arange(waveform.shape[0])
axes[0].plot(timescale, waveform.numpy())
axes[0].set_title('Waveform')
log_spec = np.log(spectrogram.numpy().T)
height = log_spec.shape[0]
X = np.arange(waveform.shape[0], step=getstep())
Y = range(height)
axes[1].pcolormesh(X, Y, log_spec)
axes[1].set_title('Spectrogram')
plt.show()
plot_waveform_and_spectrogram(waveform, spectrogram)
做一个小实验,演示维纳滤波器处理带噪语音
当下目录下有clean.wav和noisy.wav,这两个文件分别是同样一句语音的纯净版和带噪版。下面的代码的编写方式绝不适合在一般的场景中使用,在本文仅用于演示。在最后几行,我们获得了这两个文件的语音波形和频谱。
[Python] 纯文本查看 复制代码 filenames_2 = ['./clean.wav', './noisy.wav']
filename_dataset_2 = tf.data.Dataset.from_tensor_slices(filenames_2)
for filename in filename_dataset_2.take(2):
print(filename.numpy())
def get_audio_from_filename(filename):
audio_binary = tf.io.read_file(filename)
audio, _ = tf.audio.decode_wav(audio_binary)
return tf.squeeze(audio, axis=-1)
AUTOTUNE = tf.data.AUTOTUNE
dataset = filename_dataset_2.map(get_audio_from_filename, num_parallel_calls = AUTOTUNE)
waves = []
for waveform in dataset.take(2):
print(waveform.shape)
waves.append(waveform)
clean_waveform = waves[0]
noisy_waveform = waves[1]
clean_spectrogram = get_spectrogram(clean_waveform, 512, 320)
noisy_spectrogram = get_spectrogram(noisy_waveform, 512, 320)
代码都是可以在python3环境下直接运行的,建议使用jupyter notebook运行查看结果。 |