Why can we have a Fast Fourier Transform ?
The Fourier transform was introduced by Fourier as a tool to solve heat equations, but is now used in its discrete version throughout the computing world more than trillions of times each second. It is probably more, I was assuming an average computer does billions of Discrete Cosine Transforms per day for Web browsing (JPEG images and Youtube) and music listening, but how huge is the contribution of millions of people watching MPEG-2 compressed television programs on satellite or terrestrial digital broadcasting?
It is thus important to know that the Fourier transform (which is always the Fast Fourier Transform) is really fast (especially when it is dozens, hundreds of times faster than the natural algorithm). I am not yet sure about the fast versions of variants of the discrete transform (cosine transforms and its friends), but I guess they can be derived from the classical case.
What is it ?
The classical Fourier transform takes a T-periodic function f and computes for each integer n the integral
where . It separates f into components having period , and has the property that the energy is the sum .
The discrete Fourier transform takes a discrete function given by and computes the discrete analogue
which can also be represented by a matrix
where is the primitive n-th root of unity. It has the property , so the inverse of the Fourier transform is given by the complex conjugate matrix (up to a scalar). Moreover, it satisfies a unitarity condition, which means the Hermitian norm of vectors is preserved (up to multiplication by n).
The matrix is a particular case of a Vandermonde matrix, and can also be interpreted as the linear map taking a polynomial and computing the vector . Actually, what we really want is taking a set of values, and finding a polynomial having these values at roots of unity (a polynomial being written as a sum of monomials which are the elementary periodic functions), so this is a case of Lagrange interpolation, but is is actually given by the same matrix.
Another way of saying the same thing is that the Fourier transform diagonalises product of polynomials (which corresponds to the convolution product of signals): the convolution product of two vectors is not easy to compute rapidly, but is obtained by multiplying separately coordinates of and . As a reminder :
Keyword: tensor product
Humans would traditionally compute a Discrete Fourier Transform using the above formulas: each coefficient requires computing n products by roots of unity, giving the whole thing a complexity of n² operations. A little agebra will tell us how to give it a complexity of nk operations, when .
The key observation is that the matrix , which has size , can actually be written as a tensor product . If you don’t know tensor products, a quick way of understanding them is to say that if matrices operate on vectors whose index variable is (having entries), the matrix operates on vectors indexed by the tuple of variables (these vectors have entries), and its coefficients are
For those of you who thrive on algebraic geometry and schemes, this translates into the following property: first remark that by means of binary representation of integers. From this deduce that the scheme of n-th roots of unity can be seen as the image, by the map , of the product of 2-point schemes containing only 1 and . The corresponding map of rings
mapping is then an isomorphism. It can be also be written more suggestively as :
although the isomorphism is less clear.
Suppose now that our vector is given in terms of a polynomial
where is written in binary form. Evaluation at roots of unity is given by a matrix , with coefficients which are evaluation at of . Remember that (the ‘s decompose in a sum of powers of two): and this is a tensor product of matrices
The decomposition property can be stated as the fact that functions on a product are the tensor product of rings of functions, and evaluation maps are tensor products of evaluation maps.
Why does it go faster this way?
Tensor products of matrices have the property where 1 is the identity matrix, which generalises straightforwardly to longer tensor products. Thus to apply to a vector we only have to understand how to apply : we need to apply this 2×2 matrix to each pair where a and b differ by the q-th bit. This needs 3 or 4 operations for each of the n/2 such pairs, so doing this for the k matrices only needs around nk or 2nk elementary operations on complex numbers, which is better and simpler the usual algorithms for matrix multiplication. This tensor product point of view is equivalent to more classical divide-and-conquer approach to fast algorithms.
Update: the FFT is actually not really a tensor product, because we first have to convert a polynomial in the form . This can again be solved by divide-and-conquer techniques, by considering the following identities :
The base change matrix can thus be written as some sort of twisted tensor product, and needs only O(nk) operations. The traditional FFT algorithm for size actually does the two steps I described simultaneously.