Convolver

Home News Download Usage Command line utilities Feedback Troubleshooting Links

Panned binaural demonstration

 
Web This site

by Peter Fischer

In this demonstration a monophonic sound source is produced and panned 360° around the listener. The resulting stereo track is binaural and is designed to be played back over headphones. The listener should be able to hear the sound being panned around in a circle using headphones.

This effect is achieved in three parts:

  • Taking the monophonic sound source and panning it around 360 degrees in a horizontal circle. This information is stored as an ambisonic B Format file. This is a compact representation of the 2D sound field generated by the panning of the monophonic signal.
  • This B Format file is then decoded to a square speaker rig and this a saved to a four channel file.
  • Lastly the square speaker rig file is convolved with the HRIRs (Head Related Impulse Responses) of the MIT KEMAR dummy head dataset to produce a binaural stereo file. This is a virtual speaker method for generating binaural signals. The idea is each 'virtual' speaker is positioned equidistantly around the listener and the HRIR that corresponds to the azimuth of the position of the virtual speaker is convolved with that speaker feed for both ears, the convolution products for each of the ears are then summed giving the binaural signal for each ear. The KEMAR HRIR is a stereo file with the left channel being the HRIR of the left ear and the right channel being the HRIR response of the right ear. e.g. the Front Left virtual speaker has an azimuth of 315°. This is convolved with the KEMAR HRIR of 315° azimuth which are the left and the right ear's impulse response to a signal at 315°. This procedure is repeated for each of the virtual speaker feeds at azimuths of 045°, 225° and 135°.

In summary, the left ear HRIRs corresponding to the azimuth of their respective speaker feed are convolved and summed to give the binaural signal for the left ear. The procedure is repeated for the right ear HRIRs producing the binaural signal for the right ear. Eight separate convolutions are required for four virtual speakers.

The inspiration for this particular demonstration came about from reading both of the papers listed as references and applying those concepts using freely available command line tools and datasets. 

The walkthrough

This is a demonstration of taking a mono sound source and panning that sound around the listener in a 360° circle. There are four parts to this example:

  1. Firstly the generation of a mono sound source.
  2. Secondly using another command line tool we can pan the mono sound around in 2D space. The result is saved as a B Format file.
  3. Thirdly we convert the B Format file to something we can use for Binaural encoding. In this case it is a square speaker arrangement.
  4. Finally the square speaker arrangement is converted to a stereo Binaural file.

Tools required

  1. The mctools package. It contains Ambisonic command line tools. The two needed for this demonstration are
    • Abfpan - a command line Ambisonic panner;and
    • Abfdcode - an Ambisonic B Format decoder to a square speaker feed.
  2. The MIT KEMAR dummy head HRIRs (Head Related Impulse Response). Two stereo HRIRs for zero elevation are required at azimuth 45° and 135°. They can be from either the compact or diffuse data sets. Alternatively you can use other HRIRs with elevation 0° and azimuths 045°, 135°, 225° and 315°if the head is not symmetrical or azimuths 045° and 135° degrees if the head is symmetrical.
  3. Convolvercmd to do the Binaural convolution.
  4. Freeware audio editor Audacity or Adobe Audition to generate a test signal.

Method

Generation of the mono sound source

Using either Adobe Audition or the freeware Audacity (the stable version of Audacity, not the beta version, is recommended) generate a mono sound file to be panned. The following walkthrough is for Audacity.

  1. From the Generate menu select White Noise. A Generate Noise dialog box pops up.
  2. Enter 30 in the Length (seconds) field and click on the Generate Noise button.
  3. To save as a wave file click on the File menu a select Export As WAV. Enter in your name for the noise file and click the Save button.
  4. To change the bit depth of the sound track select Preferences from the Edit menu. The options for exported file types are under the File Formats tab.

Alternately you may use a pre-existing mono sound file or generate one using the command line tool sox.

Mono panning

Use the command line tool abfpan from mctools to pan the mono noise file.

abfpan noise.wav noisepan.wav 0.0 1.0

Remember to add the MCTools directory to your path or run the command line from the MCTools directory.

The resulting noisepan.wav is a B Format file.

B Format to speaker decode

Use another useful tool, this time abfdcode, to decode the B Format file to a square speaker arrangement:

abfdcode noisepan.wav noisesq.wav

noisesq.wav is a four channel file with speaker feeds Front Left, Front Right, Back Left and Back Right.

Square speaker feeds to Binaural

For this step use Convolvercmd to convolve the square speaker feeds with the HRIRs representing the position of those speakers.

The convolvercmd filter is given a text configuration file. You can download a text version of the file but will have to alter the file path to reflect the path of the HRIRs on disk.

44100 4 2 0
0 0 0 0
0 0
D:\binaural\diffuse\elev0\H0e045a.wav
0
1.0
0.0
D:\binaural\diffuse\elev0\H0e045a.wav
1
1.0
1.0
D:\binaural\diffuse\elev0\H0e135a.wav
0
3.0
0.0
D:\binaural\diffuse\elev0\H0e135a.wav
1
3.0
1.0
D:\binaural\diffuse\elev0\H0e135a.wav
1
2.0
0.0
D:\binaural\diffuse\elev0\H0e135a.wav
0
2.0
1.0
D:\binaural\diffuse\elev0\H0e045a.wav
1
0.0
0.0
D:\binaural\diffuse\elev0\H0e045a.wav
0
0.0
1.0
Square speaker mapping FL FR BL BR to 2 channel binaural
No input delay
No output delay
FR speaker to L ear using left channel of H0e045a.wav as filter



FR speaker to R ear using right channel of H0e045a.wav as filter



BR speaker to L ear



BR speaker to R ear



BL speaker to L ear
NB since the head is symmetrical only the right side is given,
therefore the input channels are reversed

BL speaker to R ear
NB since the head is symmetrical only the right side is given,
therefore the input channels are reversed

FL speaker to L ear
NB since the head is symmetrical only the right side is given,
therefore the input channels are reversed

FL speaker to R ear
NB since the head is symmetrical only the right side is given,
therefore the input channels are reversed

If the config file is called binaural.txt, run the following from the command line:

convolvercmd 4 1 -9 binaural.txt noisesq.wav noisebin.wav

where

  • 4 is the number of partitions to be used. Use perftest to ascertain the best number for your machine, or just use 0.
  • 1 is how hard to tune the convolution algorithm to your machine. 1 or, if you are patient, 4 are good values to try.
  • -9 is the attenuation (dB) to be applied to the result (to avoid overflow of the output).  Convolvercmd outputs an estimated gain and a slightly more conservative optimum gain that you can use.

Thats it

Play noisebin.wav in your favourite player. Enjoy.

References

Acknowledgments

  • Richard Dobson for his free CDP Multichannel Toolkit (mctools).
  • Bill Gardner and Keith Martin for making freely available HRIRs (MIT KEMAR ).
  • Freeware Convolver software.

Further reading

What is all this Ambisonics/B Format stuff?

A method of surround sound encoding developed by Michael Gerzon, Peter Fellgett and John Hayes in the 1970's as a alternative to Quadrophonic sound systems. The basic principle of ambisonics is a mathematical decomposition of a 3D sound field, specifically as a spherical harmonic decomposition of the 3D sound field.

To use an analogy, any complex single waveform can be decomposed into a infinite series of sine and cosine terms. This is termed Fourier theory and gives us the Fourier Series and the massively useful Fourier Transform. The Fourier Transform is used in its discrete and fast form as the Fast Fourier Transform (FFT). It is the basis on which convolution programs such as Convolver perform their calculations. In essence a time domain signal is transformed to the frequency domain by FFT. A time domain Impulse Response (IR) is also transformed to the frequency domain by FFT. These two frequency domain signals are multiplied (convolution!) and the result is transformed back to the time domain by iFFT (inverse Fast Fourier Transform).

Sound pressure waves can be described as an infinite series of spherical waves - the Bessel-Fourier series. The Bessel functions describe the spherical, radial functions. The angular, Cartesian functions are referred as the spherical harmonics and are the expression of the Bessel functions on a Cartesian coordinate system e.g. X, Y and Z. Like the Fourier Series the Bessel-Fourier series starts at 0th order and extends to infinite order.

The traditional ambisonic B Format is the spherical harmonic terms up to and including order 1. These channels are terms W (0th order), X (1st order vector direction X), (Y 1st order vector direction Y) and Z (1st order vector direction Z).

A physical analogy: A single omni-direction microphone records the sound pressure (0th order component - B Format W) of a sound field. A ribbon microphone aligned so that it records signals to the front and rear (1st order pressure gradient in the X direction - B Format X). A ribbon microphone at 90 degrees to the first ribbon microphone in the horizontal plane recording signals from the left and right (1st order pressure gradient in the Y direction - B Format Y). Finally a ribbon microphone at 90 degrees to the first two aligned so it records signals up and down (1st order pressure gradient in the Z direction - B Format Z).

First order ambisonics is a highly truncated version of a 3D sound field although the orders can be theoretically extended to whatever level of detail is required. The B Format is then a highly compact representation of a 3D sound field and easily manipulated mathematically e.g. rotated and is useful for the generation of synthetic sound fields as in the panned Binaural demonstration. Decoding of B Format signal to a regular polyhedral arrangement of speakers is achieved by multiplying each B Format channel by its specific gain and summing the result to each speaker.

Further references

Two good if somewhat mathematical papers are:

 

Home ] Up ] Config file ] ConvolverVST ] ZoomPlayer Pro ] [ Panned binaural demonstration ]

Send mail to with questions or comments Convolver or about this web site.
Copyright © 2006-8 Convolver
Last modified: 16-Jan-2008 20:24 -0000 SourceForge.net Logo