傅里叶变换生成频谱图

请注意，本文编写于 454 天前，最后修改于 435 天前，其中某些信息可能已经过时。

问题

机器学习课上的一道题，题目是：仅使用矩阵乘法来生成频谱图矩阵。输入声音数据是一个列向量 $x$。请详细描述您将如何构建一个矩阵 $A$，使得 $A∙x$ 将产生复数频谱系数的 $vec(∙)$。$\text{DFT_size}=64，\text{Hop_size}=32$，并使用Hann window减少泄漏。

As promised in class, matrix multiplies can perform all kinds of linear operations.
Here you will have to implement a spectrogram using only a matrix multiplication.
Your input sound is a column vector x. Describe exactly how you would construct
a matrix A, such that the product A∙x will produce the vec(∙) of the complex
spectrogram coefficients. Your transform should have a DFT size of 64, a hop size
of 32, and will use a Hann window. Make an image plot of the real part of the matrix
A and allow me to marvel at its beauty.

解答思路

要解决这个问题，我们首先假设目标向量（即频谱图）是$S$，输入的声音数据列向量大小为$N$，给定$\text{DFT_size}=64, \text{Hop_size}=32$，帧频的数量就能被表示为：

$$ \text{Frame#} = \left\lceil \frac{N - \text{DFT_size}}{\text{Hop_size}} \right\rceil + 1 = \left\lceil \frac{N-64}{32} \right\rceil + 1 $$

每个帧频包含 $\text{DFT_size}$ 个频格。这些频格随后构成我们的频谱图向量 $S$。因此，$S$ 的长度可以表示为：

$$ |S| = \left\lceil \frac{N-64}{32} \right\rceil \times 64 = 2N - 64 $$

根据矩阵乘法的性质，$|S|$ 等于$A$ 中行的数量。所以矩阵 $A$ 的大小应该为 $(2N-64) \times N$

现在，让我们确定这个矩阵中每个entry的值。为了计算一个段 $ [a_1, a_2, \ldots, a_{64}]$ 的复数频谱系数，首先应用Hann Window来减少泄漏：

$$ w(n) = 0.5 \times \left(1 - \cos\left(\frac{2\pi n}{N-1}\right)\right), \quad N = 64 $$

应用Hann Window后的结果数据变为：

$$ \text{data_with_hann_window} = \left[ w(1)a_1, w(2)a_2, \ldots, w(64)a_{64} \right] $$

随后的步骤是应用 DFT：

$$ X(k) = \sum_{n=0}^{N-1} x[n] e^{-j\left(\frac{2\pi kn}{N}\right)} $$

给定 $x = \text{data_with_hann_window} $ 以及$N = 64$, 转换过后的数据能表示为：

$$ \text{data_with_hann_window_DFT} = \left[ X(0), X(1), \ldots, X(63) \right] $$

为了以矩阵形式表示这个过程，并考虑到Hop size，让我们定义三个函数：

$$ X(n, k) = e^{-j(2\pi nk/64)} \\ H(k) = 0.5 \times \left(1 - \cos\left(\frac{2\pi k}{63}\right)\right) \\ q(i) = 32 \left( \left\lfloor \frac{i}{64} \right\rfloor \right) $$

对于矩阵 $A$，其第 $i$ 行和第 $j$ 列的元素表示为 $A_{i, j}$，该元素必须满足以下条件：

从对$A$的可视化中，我们不难看出矩阵$A$是中心对称的，所以生成算法其实还有优化的空间，但由于只是作业就没管那么多了。

Code

import tqdm
import matplotlib.pyplot as plt
import numpy as np
from pydub import AudioSegment

sample_size = 20000
DFT_size = 128
Hop_size = 64

DRAW_A = False
DRAW_SPECTROGRAM = True
SAMPLE_FILENAME = 'sample.m4a'
assert (DFT_size / 2 == Hop_size)


# DFT function
def dft(n, k):
    return np.exp(-1j * (2 * np.pi * n * k / DFT_size))


# Hann window function
def hann(k):
    return 0.5 * (1 - np.cos(2 * np.pi * k / (DFT_size - 1)))


# down sampling, or computer will bomb
sample_raw = AudioSegment.from_file(SAMPLE_FILENAME).set_frame_rate(8000)
sample = np.array(sample_raw.get_array_of_samples())
N = min(sample_size, sample.shape[0])
sample = sample[:min(sample_size, N)]
print(f"sound has shape: {sample.shape[0]}, we use {N} datas as sample")

A = np.zeros((2 * N - DFT_size, N))

for i, row in enumerate(tqdm.tqdm(A)):
    q = Hop_size * (i // DFT_size)
    for j in range(row.shape[0]):
        if (j >= q) and (j <= (q + DFT_size)):
            # A_ij != 0
            row[j] = dft(n=j - q, k=i % DFT_size) * hann(j - q)

        # otherwise A_ij = 0

# visualize A
if DRAW_A:
    plt.imshow(A.real)
    plt.colorbar()
    plt.title(f"matrix A (sample size: {sample_size})")
    plt.axis('off')
    plt.show()

# draw spectrogram
if DRAW_SPECTROGRAM:
    spec_coefficient = A @ sample
    # keep partial of spec coefficient which can be | 64, add up real and imaginary part, and normalize log10
    spec_coefficient = np.abs(spec_coefficient[:spec_coefficient.shape[0] // DFT_size * DFT_size])
    spec_coefficient = spec_coefficient.reshape((DFT_size, -1))

    plt.imshow(np.log10(spec_coefficient), origin='lower', aspect='auto')
    plt.colorbar(label='Log Magnitude')
    plt.title(f"Spectrogram (DFT size: {DFT_size})")
    plt.xlabel('time frame')
    plt.ylabel('frequency bin')
    plt.show()

完整版

附件

附件名称：CS545 HW1.zip

文件大小：1097.4 KB

下载次数: 296

最后修改: 2023-11-27 01:06

点击下载

CC BY-ND

This license enables reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.

信号处理

傅里叶变换生成频谱图

问题

解答思路

Code

完整版

CC BY-ND

添加新评论

已有 2 条评论

傅里叶变换生成频谱图

问题

解答思路

Code

完整版

CC BY-ND

Typecho Void主题美化

音特美+Drop ERX 耳机开箱

添加新评论

已有 2 条评论