引言:带宽与质量的永恒博弈
在现代通信系统中,语音信号的传输面临着一个核心矛盾:带宽限制与语音质量之间的权衡。一方面,有限的频谱资源和网络带宽要求我们压缩信号;另一方面,用户对高保真语音体验的需求又要求我们保持甚至提升质量。这种矛盾在VoIP、移动通信、语音存储和广播系统中尤为突出。
语音信号的频谱通常覆盖50Hz到8kHz(甚至更高),但传统的电话网络(PSTN)仅传输300-3400Hz频段,导致语音听起来”闷”且清晰度下降。频谱扩展技术正是为了解决这一矛盾而生——它试图在有限的带宽内重建或增强高频成分,从而提升语音的自然度和可懂度。
本文将深入分析频谱扩展的技术原理、主流方法、实现细节以及实际应用中的挑战,并提供具体的代码示例来说明如何在实际项目中应用这些技术。
1. 语音信号频谱特性与带宽限制问题
1.1 语音信号的频谱构成
语音信号的频谱具有以下特征:
- 基频(Fundamental Frequency):男性80-180Hz,女性160-400Hz
- 共振峰(Formants):决定音色的关键频率区域,通常分布在500-4000Hz
- 高频成分:2kHz以上包含大量辅音信息(如/s/、/f/、/th/)和气音,对清晰度至关重要
1.2 带宽限制的影响
当语音信号经过带宽受限的系统(如电话网络)时:
- 高频损失:>3400Hz的成分被完全切除
- 清晰度下降:辅音信息丢失,导致可懂度降低约15-20%
- 自然度受损:声音变得沉闷,缺乏”空气感”
1.3 传统解决方案的局限性
简单的低通滤波器或均衡器无法真正恢复丢失的信息,因为:
- 丢失的频谱区域没有原始数据
- 直接提升高频增益会引入噪声和失真
- 需要智能的频谱重建算法
2. 频谱扩展技术原理
2.1 什么是频谱扩展?
频谱扩展(Spectral Band Extension, SBE)是一种信号处理技术,它通过分析信号的低频成分,智能地推断并重建缺失的高频成分。其核心思想是:语音信号的低频和高频部分之间存在统计相关性。
2.2 技术分类
频谱扩展技术主要分为两大类:
2.2.1 基于信号处理的方法
- 线性预测编码(LPC)扩展
- 频带复制(Bandwidth Extension, BWE)
- 谐波增强
2.2.2 基于深度学习的方法
- 生成对抗网络(GAN)
- 自编码器(Autoencoder)
- 扩散模型(Diffusion Models)
2.3 核心数学原理
频谱扩展的数学基础可以表示为:
H(f) = S_extended(f) / S_low(f), f > f_c
其中:
S_low(f)是已知的低频谱S_extended(f)是待生成的扩展频谱H(f)是扩展传递函数f_c是截止频率(如3400Hz)
3. 基于信号处理的经典方法
3.1 线性预测编码(LPC)扩展
LPC假设语音信号可以通过线性预测模型生成:
s(n) = Σ a_i * s(n-i) + e(n)
实现步骤:
- 对低频信号进行LPC分析,获得预测系数
- 使用相同的系数预测高频成分
- 通过滤波器重建高频
Python实现示例:
import numpy as np
from scipy import signal
from scipy.io import wavfile
def lpc_spectrum_extension(audio, fs, cutoff=3400, target_band=8000):
"""
基于LPC的频谱扩展
Parameters:
-----------
audio : np.ndarray
输入音频信号
fs : int
采样率
cutoff : int
截止频率(Hz)
target_band : int
目标带宽(Hz)
Returns:
--------
extended_audio : np.ndarray
扩展后的音频
"""
# 1. 设计低通滤波器提取低频
nyquist = fs // 2
cutoff_norm = cutoff / nyquist
b, a = signal.butter(8, cutoff_norm, btype='low')
low_freq = signal.filtfilt(b, a, audio)
# 2. LPC分析(简化版,实际需使用自相关方法)
frame_size = int(fs * 0.02) # 20ms帧
hop_size = frame_size // 2
extended_audio = np.zeros_like(audio)
for i in range(0, len(low_freq) - frame_size, hop_size):
frame = low_freq[i:i+frame_size]
# 计算自相关
autocorr = np.correlate(frame, frame, mode='full')
autocorr = autocorr[len(autocorr)//2:]
# Levinson-Durbin算法求LPC系数(简化)
# 这里使用np.linalg.pinv近似求解
order = 12
R = np.zeros((order, order))
for m in range(order):
for n in range(order):
R[m,n] = autocorr[abs(m-n)]
r_vec = autocorr[1:order+1]
try:
lpc_coeffs = np.linalg.pinv(R) @ r_vec
except:
lpc_coeffs = np.zeros(order)
# 3. 高频重建
# 生成高频激励信号(白噪声)
high_freq_excitation = np.random.randn(frame_size) * 0.1
# 使用LPC系数滤波
from scipy.signal import lfilter
high_frame, _ = lfilter([1], np.concatenate([[1], -lpc_coeffs]), high_freq_excitation)
# 4. 频谱整形(匹配低频能量)
low_fft = np.fft.rfft(frame)
high_fft = np.fft.rfft(high_frame)
# 只保留高频部分 (> cutoff)
freq_bins = np.fft.rfftfreq(frame_size, 1/fs)
high_mask = freq_bins > cutoff
# 谱包络匹配
if np.sum(high_mask) > 0:
# 计算低频谱包络(平滑)
low_envelope = np.abs(low_fft[:len(freq_bins)])
low_envelope = np.convolve(low_envelope, np.ones(5)/5, mode='same')
# 计算目标高频能量(基于低频能量缩放)
target_high_energy = np.mean(low_envelope[freq_bins < cutoff/2]) * 0.3
# 调整高频幅度
high_fft_magnitude = np.abs(high_fft)
if np.sum(high_fft_magnitude) > 0:
scale = target_high_energy / np.mean(high_fft_magnitude[high_mask])
high_fft[high_mask] *= scale
# 5. 合成输出
combined_fft = np.fft.rfft(frame)
combined_fft[high_mask] = high_fft[high_mask]
# 逆FFT
combined_frame = np.fft.irfft(combined_fft, n=frame_size)
# 重叠相加
extended_audio[i:i+frame_size] += combined_frame * np.hanning(frame_size)
# 归一化
extended_audio = np.clip(extended_audio, -1, 1)
return extended_audio
# 使用示例
# fs, audio = wavfile.read('narrowband.wav')
# extended = lpc_spectrum_extension(audio, fs)
# wavfile.write('wideband.wav', fs, extended)
代码说明:
- 该实现使用LPC分析低频信号的预测模型
- 通过白噪声激励和LPC滤波器生成高频成分
- 使用谱包络匹配确保能量一致性
- 采用重叠相加法避免帧间 discontinuity
3.2 频带复制(Spectral Band Replication, SBR)
SBR是HE-AAC编码中使用的核心技术,其原理是:
- 频谱分类:将频谱分为”纯音”和”噪声”区域
- 参数提取:从低频提取包络、频率和时域参数
- 高频生成:通过复制、移位和调制生成高频
伪代码实现:
def sbr_extension(audio, fs):
# 1. 分析低频频谱
low_freq = audio[:len(audio)//2]
spec = np.fft.rfft(low_freq)
# 2. 提取包络参数
envelope = np.abs(spec)
envelope_smooth = np.convolve(envelope, np.ones(10)/10, mode='same')
# 3. 高频生成策略
# 对于纯音区域:复制低频谐波
# 对于噪声区域:添加随机噪声并整形
# 4. 参数编码(实际中需要量化传输)
return reconstructed_high_freq
4. 基于深度学习的现代方法
4.1 生成对抗网络(GAN)方法
GAN在频谱扩展中表现出色,因为它能学习真实的高频分布。
网络结构:
- 生成器:输入低频谱,输出扩展频谱
- 判别器:判断频谱是真实的还是生成的
PyTorch实现示例:
import torch
import torch.nn as nn
import torch.nn.functional as F
class SpectralExtensionGenerator(nn.Module):
"""基于U-Net的频谱扩展生成器"""
def __init__(self, input_channels=1, output_channels=1):
super().__init__()
# 编码器
self.enc1 = nn.Conv1d(input_channels, 64, kernel_size=15, padding=7)
self.enc2 = nn.Conv1d(64, 128, kernel_size=8, padding=4, stride=2)
self.enc3 = nn.Conv1d(128, 256, kernel_size=8, padding=4, stride=2)
self.enc4 = nn.Conv1d(256, 512, kernel_size=8, padding=4, stride=2)
# 解码器
self.dec4 = nn.ConvTranspose1d(512, 256, kernel_size=8, padding=4, stride=2)
self.dec3 = nn.ConvTranspose1d(256 + 128, 128, kernel_size=8, padding=4, stride=2)
self.dec2 = nn.ConvTranspose1d(128 + 64, 64, kernel_size=8, padding=4, stride=2)
self.dec1 = nn.Conv1d(64 + input_channels, output_channels, kernel_size=15, padding=7)
# 激活函数
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
def forward(self, x):
# 编码路径
e1 = self.relu(self.enc1(x))
e2 = self.relu(self.enc2(e1))
e3 = self.relu(self.enc3(e2))
e4 = self.relu(self.enc4(e3))
# 解码路径(带跳跃连接)
d4 = self.relu(self.dec4(e4))
d3 = self.relu(self.dec3(torch.cat([d4, e3], dim=1)))
d2 = self.relu(self.dec2(torch.cat([d3, e2], dim=1)))
d1 = self.tanh(self.dec1(torch.cat([d2, e1], dim=1)))
return d1
class SpectralDiscriminator(nn.Module):
"""判别器"""
def __init__(self, input_channels=2): # 真实/生成频谱拼接
super().__init__()
self.model = nn.Sequential(
nn.Conv1d(input_channels, 64, kernel_size=15, padding=7),
nn.LeakyReLU(0.2),
nn.Conv1d(64, 128, kernel_size=8, padding=4, stride=2),
nn.LeakyReLU(0.2),
nn.Conv1d(128, 256, kernel_size=8, padding=4, stride=2),
nn.LeakyReLU(0.2),
nn.Conv1d(256, 512, kernel_size=8, padding=4, stride=2),
nn.LeakyReLU(0.2),
nn.AdaptiveAvgPool1d(1),
nn.Conv1d(512, 1, kernel_size=1),
nn.Sigmoid()
)
def forward(self, x):
return self.model(x)
# 训练循环示例
def train_gan_extension(train_loader, epochs=100):
generator = SpectralExtensionGenerator()
discriminator = SpectralDiscriminator()
gen_opt = torch.optim.Adam(generator.parameters(), lr=1e-4)
disc_opt = torch.optim.Adam(discriminator.parameters(), lr=1e-4)
for epoch in range(epochs):
for low_freq, high_freq in train_loader:
# 真实样本
real_input = torch.cat([low_freq, high_freq], dim=1)
real_pred = discriminator(real_input)
# 生成样本
fake_high = generator(low_freq)
fake_input = torch.cat([low_freq, fake_high.detach()], dim=1)
fake_pred = discriminator(fake_input)
# 判别器损失
real_loss = F.binary_cross_entropy(real_pred, torch.ones_like(real_pred))
fake_loss = F.binary_cross_entropy(fake_pred, torch.zeros_like(fake_pred))
disc_loss = real_loss + fake_loss
disc_opt.zero_grad()
disc_loss.backward()
disc_opt.step()
# 生成器损失
fake_pred = discriminator(torch.cat([low_freq, fake_high], dim=1))
adv_loss = F.binary_cross_entropy(fake_pred, torch.ones_like(fake_pred))
# 添加L1损失保持结构
l1_loss = F.l1_loss(fake_high, high_freq) * 10
gen_loss = adv_loss + l1_loss
gen_opt.zero_grad()
gen_loss.backward()
gen_opt.step()
训练数据准备:
def prepare_training_data(wav_dir, target_fs=16000, low_cutoff=3400):
"""准备训练数据:成对的低频和高频"""
dataset = []
for wav_file in os.listdir(wav_dir):
fs, audio = wavfile.read(os.path.join(wav_dir, wav_file))
# 重采样
if fs != target_fs:
audio = resample(audio, len(audio) * target_fs // fs)
fs = target_fs
# 归一化
audio = audio.astype(np.float32) / 32768.0
# 分离低频和高频
nyquist = fs // 2
cutoff_norm = low_cutoff / nyquist
b, a = signal.butter(8, cutoff_norm, btype='low')
low_freq = signal.filtfilt(b, a, audio)
high_freq = audio - low_freq
# 分帧处理
frame_size = 512
hop_size = 128
for i in range(0, len(audio) - frame_size, hop_size):
low_frame = low_freq[i:i+frame_size]
high_frame = high_freq[i:i+frame_size]
# 计算STFT
low_spec = torch.stft(torch.from_numpy(low_frame), n_fft=512,
hop_length=hop_size, return_complex=True)
high_spec = torch.stft(torch.from_numpy(high_frame), n_fft=512,
hop_length=hop_size, return_complex=True)
# 只保留幅度谱(或实部+虚部)
low_mag = torch.abs(low_spec).unsqueeze(0)
high_mag = torch.abs(high_spec).unsqueeze(0)
dataset.append((low_mag, high_mag))
return dataset
4.2 扩散模型(Diffusion Models)
扩散模型通过逐步去噪生成高质量频谱:
class DiffusionSpectralExtension:
"""基于扩散模型的频谱扩展"""
def __init__(self, timesteps=1000, beta_start=1e-4, beta_end=0.02):
self.timesteps = timesteps
self.betas = torch.linspace(beta_start, beta_end, timesteps)
self.alphas = 1 - self.betas
self.alpha_bars = torch.cumprod(self.alphas, dim=0)
def q_sample(self, x_start, t, noise=None):
"""前向扩散过程"""
if noise is None:
noise = torch.randn_like(x_start)
alpha_bar_t = self.alpha_bars[t].view(-1, 1, 1)
sqrt_alpha_bar_t = torch.sqrt(alpha_bar_t)
sqrt_one_minus_alpha_bar_t = torch.sqrt(1 - alpha_bar_t)
return sqrt_alpha_bar_t * x_start + sqrt_one_minus_alpha_bar_t * noise
def p_losses(self, model, x_start, t, low_freq_input):
"""训练损失"""
noise = torch.randn_like(x_start)
x_noisy = self.q_sample(x_start, t, noise)
# 模型预测噪声
predicted_noise = model(torch.cat([x_noisy, low_freq_input], dim=1), t)
return F.mse_loss(predicted_noise, noise)
def p_sample(self, model, x, t, low_freq_input):
"""采样步骤"""
betas_t = self.betas[t].view(-1, 1, 1)
alphas_t = self.alphas[t].view(-1, 1, 1)
alpha_bar_t = self.alpha_bars[t].view(-1, 1, 1)
if t > 0:
z = torch.randn_like(x)
else:
z = 0
# 去噪公式
noise_pred = model(torch.cat([x, low_freq_input], dim=1), t)
x_start = (x - betas_t * noise_pred / torch.sqrt(1 - alpha_bar_t)) / torch.sqrt(alphas_t)
x_new = (torch.sqrt(1 - alphas_t) * z +
torch.sqrt(alphas_t) * x_start)
return x_new
def generate(self, model, low_freq_spec, num_steps=50):
"""生成扩展频谱"""
device = low_freq_spec.device
x = torch.randn_like(low_freq_spec)
for i in reversed(range(num_steps)):
t = torch.full((low_freq_spec.shape[0],), i, device=device, dtype=torch.long)
x = self.p_sample(model, x, t, low_freq_spec)
return x
5. 实际应用中的挑战与解决方案
5.1 常见问题
5.1.1 伪影(Artifacts)
问题:生成的高频听起来不自然,有”金属感”或”蜂鸣声”。
解决方案:
- 混合方法:将生成的高频与原始低频进行谐波对齐
- 后处理滤波:使用心理声学模型进行平滑
- 多目标损失:同时优化频谱和波形
def harmonic_alignment(low_freq, high_freq, fs):
"""谐波对齐后处理"""
# 计算基频
f0 = estimate_f0(low_freq, fs)
# 生成谐波网格
harmonics = np.arange(1, 10) * f0
# 对齐高频能量到谐波位置
high_spec = np.fft.rfft(high_freq)
freq_bins = np.fft.rfftfreq(len(high_freq), 1/fs)
for h in harmonics:
if h < fs/2:
# 找到最近的bin
idx = np.argmin(np.abs(freq_bins - h))
# 平滑调整
if idx < len(high_spec):
high_spec[idx] *= 1.1 # 轻微增强谐波
return np.fft.irfft(high_spec, n=len(high_freq))
5.1.2 噪声放大
问题:扩展过程中噪声也被增强。
解决方案:
- 噪声门限:在扩展前降低噪声水平
- 信噪比感知扩展:根据SNR调整扩展强度
def noise_aware_extension(audio, fs, target_snr=20):
"""噪声感知扩展"""
# 估计噪声水平
noise_level = np.percentile(np.abs(audio[:1000]), 10) # 假设前1000样本是噪声
# 计算信号能量
signal_energy = np.mean(audio**2)
# 当前SNR
current_snr = 10 * np.log10(signal_energy / (noise_level**2 + 1e-10))
# 扩展强度因子
if current_snr < target_snr:
extension_factor = 0.5 # 低SNR时减少扩展
else:
extension_factor = 1.0
return audio * extension_factor
5.1.3 时域不连续性
问题:帧处理导致的噼啪声。
解决方案:
- 重叠相加(Overlap-Add)
- 窗函数优化
- 相位一致性处理
def overlap_add_extension(frames, hop_size, window='hann'):
"""重叠相加处理"""
window_func = np.hanning if window == 'hann' else np.hamming
frame_size = len(frames[0])
win = window_func(frame_size)
output = np.zeros(len(frames) * hop_size + frame_size)
norm = np.zeros_like(output)
for i, frame in enumerate(frames):
start = i * hop_size
end = start + frame_size
output[start:end] += frame * win
norm[start:end] += win
# 避免除零
norm[norm < 1e-10] = 1.0
return output / norm
5.2 评估指标
5.2.1 客观指标
- SNR(信噪比)
- PESQ(感知语音质量评估)
- STOI(短时客观可懂度)
def evaluate_extension(original_wideband, extended_audio, fs):
"""评估扩展质量"""
from pypesq import mb_pesq
from pystoi import stoi
# 确保长度一致
min_len = min(len(original_wideband), len(extended_audio))
orig = original_wideband[:min_len]
ext = extended_audio[:min_len]
# PESQ
try:
pesq_score = mb_pesq(orig, ext, fs)
except:
pesq_score = 0
# STOI
stoi_score = stoi(orig, ext, fs)
# SNR
noise = orig - ext
snr = 10 * np.log10(np.mean(orig**2) / (np.mean(noise**2) + 1e-10))
return {
'PESQ': pesq_score,
'STOI': stoi_score,
'SNR': snr
}
5.2.2 主观测试
- MOS(平均意见得分)
- AB测试
- 可懂度测试
6. 实际应用案例
6.1 VoIP系统中的实时频谱扩展
在VoIP中,需要低延迟的实时处理:
class RealTimeSpectralExtension:
"""实时频谱扩展处理器"""
def __init__(self, fs=16000, frame_size=320, hop_size=160):
self.fs = fs
self.frame_size = frame_size
self.hop_size = hop_size
# 初始化状态
self.input_buffer = np.zeros(frame_size * 2)
self.output_buffer = np.zeros(frame_size * 2)
# 加载预训练模型
self.model = self.load_model()
def load_model(self):
"""加载预训练的轻量级模型"""
# 实际中加载ONNX或TFLite模型
model = SpectralExtensionGenerator()
# 加载权重...
model.eval()
return model
def process_frame(self, input_frame):
"""处理单帧"""
# 1. 更新缓冲区
self.input_buffer = np.roll(self.input_buffer, -self.hop_size)
self.input_buffer[-self.hop_size:] = input_frame
# 2. 提取当前帧
frame = self.input_buffer[-self.frame_size:]
# 3. 频谱分析
spec = torch.stft(torch.from_numpy(frame), n_fft=self.frame_size,
hop_length=self.hop_size, return_complex=True)
mag = torch.abs(spec).unsqueeze(0).unsqueeze(0)
# 4. 扩展
with torch.no_grad():
high_mag = self.model(mag)
# 5. 重建
# 合成完整频谱(低频+高频)
low_mask = torch.fft.rfftfreq(self.frame_size, 1/self.fs) < 3400
full_mag = torch.cat([mag[:, :, :low_mask.sum()],
high_mag[:, :, low_mask.sum():]], dim=2)
# 6. 逆变换
phase = torch.angle(spec)
full_spec = full_mag * torch.exp(1j * phase)
frame_out = torch.istft(full_spec, n_fft=self.frame_size,
hop_length=self.hop_size)
# 7. 重叠相加
self.output_buffer = np.roll(self.output_buffer, -self.hop_size)
self.output_buffer[-self.hop_size:] = frame_out.numpy()
return self.output_buffer[:self.hop_size]
def process_stream(self, audio_stream):
"""处理音频流"""
output = []
for i in range(0, len(audio_stream), self.hop_size):
frame = audio_stream[i:i+self.hop_size]
if len(frame) < self.hop_size:
frame = np.pad(frame, (0, self.hop_size - len(frame)))
out_frame = self.process_frame(frame)
output.append(out_frame)
return np.concatenate(output)
6.2 语音存储系统
对于语音存储,可以使用更复杂的离线处理:
def batch_process_storage(audio_file, output_file, method='gan'):
"""批量处理语音存储"""
fs, audio = wavfile.read(audio_file)
if method == 'gan':
# 使用GAN扩展
extended = process_with_gan(audio, fs)
elif method == 'lpc':
# 使用LPC扩展
extended = lpc_spectrum_extension(audio, fs)
elif method == 'diffusion':
# 使用扩散模型
extended = process_with_diffusion(audio, fs)
# 质量检查
quality = evaluate_extension(audio, extended, fs)
if quality['PESQ'] > 2.5: # 阈值
wavfile.write(output_file, fs, extended)
print(f"扩展成功,质量: {quality}")
else:
print("质量不足,保留原始文件")
7. 性能优化与工程实践
7.1 计算复杂度优化
7.1.1 模型量化
def quantize_model(model, calibration_data):
"""模型量化以加速推理"""
model.eval()
# 动态量化
quantized_model = torch.quantization.quantize_dynamic(
model,
{nn.Conv1d, nn.Linear},
dtype=torch.qint8
)
# 校准
with torch.no_grad():
for data in calibration_data:
quantized_model(data)
return quantized_model
7.1.2 频域优化
def optimized_frequency_processing(audio, fs):
"""使用FFT优化"""
# 使用实数FFT减少计算量
spec = np.fft.rfft(audio)
# 频域处理
mask = np.abs(spec) > threshold
processed = spec * mask
# 逆FFT
return np.fft.irfft(processed, n=len(audio))
7.2 延迟优化
对于实时系统,延迟是关键:
- 帧大小:通常10-20ms
- 重叠:50%重叠
- 缓冲区管理:使用环形缓冲区
class LowLatencyBuffer:
"""低延迟环形缓冲区"""
def __init__(self, size):
self.buffer = np.zeros(size)
self.write_ptr = 0
self.read_ptr = 0
self.size = size
def write(self, data):
"""写入数据"""
available = self.size - (self.write_ptr - self.read_ptr)
if len(data) > available:
raise BufferError("Buffer overflow")
for sample in data:
self.buffer[self.write_ptr % self.size] = sample
self.write_ptr += 1
def read(self, n):
"""读取数据"""
available = self.write_ptr - self.read_ptr
if n > available:
raise BufferError("Buffer underflow")
result = np.zeros(n)
for i in range(n):
result[i] = self.buffer[self.read_ptr % self.size]
self.read_ptr += 1
return result
def get_read_available(self):
return self.write_ptr - self.read_ptr
8. 未来发展趋势
8.1 神经音频编解码器
- EnCodec、SoundStream:端到端学习
- 联合优化:压缩与扩展一体化
8.2 自适应扩展
- 内容感知:根据语音内容调整扩展策略
- 网络感知:根据带宽动态调整
8.3 多模态融合
- 文本辅助:利用转录文本指导扩展
- 视觉辅助:唇形信息辅助频谱重建
9. 总结
频谱扩展技术通过智能重建缺失的高频成分,在带宽限制与语音质量之间架起了桥梁。从传统的LPC到现代的深度学习方法,技术不断演进:
- 传统方法:计算简单,适合嵌入式设备
- 深度学习:质量更高,但需要更多计算资源
- 混合方法:结合两者优势,是实用化方向
在实际应用中,需要根据具体场景(实时性、计算资源、质量要求)选择合适的方法,并通过精心的工程优化解决伪影、噪声和延迟等挑战。随着AI技术的发展,频谱扩展将在下一代通信系统中发挥更大作用。
参考文献:
- Valin, J. M., & Maxwell, G. (2016). “Bandwidth extension of narrowband speech for low-bit-rate coding.”
- Lagrange, M., & Marchand, S. (2005). “Bandwidth extension of audio signals using spectral envelope estimation.”
- Kuleshov, V., et al. (2017). “End-to-end audio speech synthesis.”
- Kong, J., et al. (2020). “DiffWave: A versatile diffusion model for audio synthesis.”
