Panned binaural demonstration
by Peter Fischer
In this demonstration a monophonic sound source is produced and panned 360° around the listener. The resulting stereo track is binaural and is designed to be played back over headphones. The listener should be able to hear the sound being panned around in a circle using headphones.
This effect is achieved in three parts:
In summary, the left ear HRIRs corresponding to the azimuth of their respective speaker feed are convolved and summed to give the binaural signal for the left ear. The procedure is repeated for the right ear HRIRs producing the binaural signal for the right ear. Eight separate convolutions are required for four virtual speakers.
The inspiration for this particular demonstration came about from reading both of the papers listed as references and applying those concepts using freely available command line tools and datasets.
This is a demonstration of taking a mono sound source and panning that sound around the listener in a 360° circle. There are four parts to this example:
Generation of the mono sound source
Using either Adobe Audition or the freeware Audacity (the stable version of Audacity, not the beta version, is recommended) generate a mono sound file to be panned. The following walkthrough is for Audacity.
Alternately you may use a pre-existing mono sound file or generate one using the command line tool sox.
Use the command line tool abfpan from mctools to pan the mono noise file.
Remember to add the MCTools directory to your path or run the command line from the MCTools directory.
The resulting noisepan.wav is a B Format file.
B Format to speaker decode
Use another useful tool, this time abfdcode, to decode the B Format file to a square speaker arrangement:
noisesq.wav is a four channel file with speaker feeds Front Left, Front Right, Back Left and Back Right.
Square speaker feeds to Binaural
For this step use Convolvercmd to convolve the square speaker feeds with the HRIRs representing the position of those speakers.
If the config file is called binaural.txt, run the following from the command line:
Play noisebin.wav in your favourite player. Enjoy.
What is all this Ambisonics/B Format stuff?
A method of surround sound encoding developed by Michael Gerzon, Peter Fellgett and John Hayes in the 1970's as a alternative to Quadrophonic sound systems. The basic principle of ambisonics is a mathematical decomposition of a 3D sound field, specifically as a spherical harmonic decomposition of the 3D sound field.
To use an analogy, any complex single waveform can be decomposed into a infinite series of sine and cosine terms. This is termed Fourier theory and gives us the Fourier Series and the massively useful Fourier Transform. The Fourier Transform is used in its discrete and fast form as the Fast Fourier Transform (FFT). It is the basis on which convolution programs such as Convolver perform their calculations. In essence a time domain signal is transformed to the frequency domain by FFT. A time domain Impulse Response (IR) is also transformed to the frequency domain by FFT. These two frequency domain signals are multiplied (convolution!) and the result is transformed back to the time domain by iFFT (inverse Fast Fourier Transform).
Sound pressure waves can be described as an infinite series of spherical waves - the Bessel-Fourier series. The Bessel functions describe the spherical, radial functions. The angular, Cartesian functions are referred as the spherical harmonics and are the expression of the Bessel functions on a Cartesian coordinate system e.g. X, Y and Z. Like the Fourier Series the Bessel-Fourier series starts at 0th order and extends to infinite order.
The traditional ambisonic B Format is the spherical harmonic terms up to and including order 1. These channels are terms W (0th order), X (1st order vector direction X), (Y 1st order vector direction Y) and Z (1st order vector direction Z).
A physical analogy: A single omni-direction microphone records the sound pressure (0th order component - B Format W) of a sound field. A ribbon microphone aligned so that it records signals to the front and rear (1st order pressure gradient in the X direction - B Format X). A ribbon microphone at 90 degrees to the first ribbon microphone in the horizontal plane recording signals from the left and right (1st order pressure gradient in the Y direction - B Format Y). Finally a ribbon microphone at 90 degrees to the first two aligned so it records signals up and down (1st order pressure gradient in the Z direction - B Format Z).
First order ambisonics is a highly truncated version of a 3D sound field although the orders can be theoretically extended to whatever level of detail is required. The B Format is then a highly compact representation of a 3D sound field and easily manipulated mathematically e.g. rotated and is useful for the generation of synthetic sound fields as in the panned Binaural demonstration. Decoding of B Format signal to a regular polyhedral arrangement of speakers is achieved by multiplying each B Format channel by its specific gain and summing the result to each speaker.
Two good if somewhat mathematical papers are: