Friday, November 16, 2007

Fundamentals of embedded audio, part 1

Audio functionality plays a critical role in embedded media processing. While audio takes less processing power in general than video processing, it should be considered equally important.

In this article, the first of a three-part series, we will explore how data is presented to an embedded processor from a variety of audio converters (DACs and ADCs). Following this, we will explore some common peripheral standards used for connecting to audio converters.

Converting between Analog and Digital Audio Signals

All A/D and D/A conversions should obey the Shannon-Nyquist sampling theorem. In short, this theorem dictates that an analog signal must be sampled at a rate (the Nyquist sampling rate) equal to or exceeding twice its bandwidth (the Nyquist frequency) in order for it to be reconstructed in the eventual D/A conversion. Sampling below the Nyquist sampling rate will introduce aliases, which are low frequency "ghost" images of frequencies that fall above the Nyquist frequency. For example, if we take an audio signal that is band-limited to 0-20 kHz, and sample it at 2 - 20 kHz = 40 kHz, then the Nyquist Theorem assures us that the original signal can be reconstructed without any signal loss. Sampling this 0-20 kHz band-limited signal at anything less than 40 kHz will introduce distortions due to aliasing. Figure 1 shows the aliasing effect on a 20 kHz sine wave. When sampled at 40 kHz, a 20 kHz signal is represented correctly (Figure 1a). However, the same 20 kHz sine wave sampled at 30 kHz actually looks like a lower-frequency alias of the original sine wave (Figure 1b).

Figure 1. (a) Sampling a 20 kHz signal at 40 kHz captures the original signal correctly (b) Sampling the same 20 kHz signal at 30 kHz captures an aliased (low frequency ghost) signal.

No practical system will sample at exactly twice the Nyquist frequency, however. This is because restricting a signal into a specific band requires an analog low-pass filter. Since analog filters are never ideal, high frequency components above the Nyquist frequency can still pass through, causing aliasing. Therefore, it is common to sample above the Nyquist frequency in order to minimize this aliasing. For example, the sampling rate for CD audio is 44.1 kHz, not 40 kHz, and many high-quality systems sample at 48 kHz in order to capture the 0-20 kHz range of hearing even more faithfully.

For speech signals, the energy content below 4 kHz is enough to store an intelligible reproduction of a speech signal. For this reason, telephony applications usually use only 8 kHz sampling (= 2 - 4 kHz). Table 1 summarizes some sampling rates used by familiar systems.

Table 1. Commonly used sampling rates.

PCM Output
The most common digital representation for audio is called PCM (pulse-code modulation). In this representation, an analog amplitude is encoded with a digital level for each sampling period. The resulting digital wave is a vector of snapshots taken to approximate the input analog wave. Since all A/D converters have finite resolution, they introduce quantization noise that is inherent in digital audio systems. Figure 2 shows a PCM representation of an analog sine wave (Figure 2a) converted using an ideal A/D converter. In this case, quantization manifests itself as the "staircase effect" (Figure 2b). You can see that lower resolution leads to a worse representation of the original wave (Figure 2c).

For a numerical example, let's assume that a 24-bit A/D converter is used to sample an analog signal whose range is -2.828 V to 2.828 V (5.656 Vpp). The 24 bits allow for 224 (16,777,216) quantization levels. Therefore, the effective voltage resolution is 5.656 V / 16,777,216 = 337.1 nV. In the second part of this series, we'll see how codec resolution affects the dynamic range of audio systems.

Figure 2. (a) An analog signal (b) Digitized PCM signal (c) Digitized PCM signal using fewer bits of precision.

PWM Output
Another popular type of modulation is pulse-width modulation (PWM). In PWM, it is the duty cycle, not voltage level, that codes a signal's amplitude. An advantage of PWM is that PWM signals can drive an output circuit directly without any need for a DAC. This is especially useful when a low-cost solution is required. PWM signals can be generated with general-purpose I/O pins, or they can be driven directly by specialized PWM timers, available on many processors.

To achieve decent quality, the PWM carrier frequency should be at least 12 times the bandwidth of the signal, and the resolution of the timer (i.e. granularity of the duty cycle) should be at least 16 bits. Because of the high carrier frequency requirement, traditional PWM audio circuits were used only for low-bandwidth audio such as audio sent to a subwoofer. However, with today's high-speed processors, it's possible to carry higher bandwidth audio.

Before the PWM stream is output, it must be low-pass-filtered to remove the high-frequency carrier. This is usually done in the amplifier circuit that drives the speaker. A class of amplifiers, called Class D, has been used successfully in such a configuration. When amplification is not required a low-pass filter is sufficient as the output stage. In some low-cost applications, where sound quality is not as important, the PWM streams can connect directly to a speaker. In such a system, the mechanical inertia of the speaker's cone acts as a low-pass filter to remove the carrier frequency.

Brief Background on Audio Converters

Audio ADCs
There are many ways to perform A/D conversion. The first commercially successful ADCs used a successive approximation scheme, in which a comparator compares the the input analog voltage against a series of discrete voltage levels and finds the closest match.

Most audio ADCs today, however, are sigma-delta converters. Instead of employing successive approximations to create wide resolutions, sigma-delta converters use 1-bit ADCs. In this scheme, the single bit codes whether the current sample has a higher or lower voltage than the previous sample.

In order to compensate for the reduced number of quantization steps, the signal is oversampled at a frequency much higher than the Nyquist frequency. In order to accommodate the more traditional PCM stream processing, conversion from this super-sampled 1-bit stream into a slower, higher-resolution stream is performed using digital filtering blocks inside these converters. For example, a 16-bit 44.1 kHz sigma-delta ADC might oversample at 64x, yielding a 1-bit stream at a rate of 2.8224 MHz. A digital decimation filter converts this super-sampled stream to a 16-bit one at 44.1 kHz.

Because they oversample analog signals, sigma-delta ADCs relax the performance requirements of the analog low-pass filters that band-limit input signals. They also have the advantage of spreading out noise over a wider spectrum than traditional converters.

Audio DACs
Traditional approaches to D/A conversion include weighted resistor, R-2R ladder, and zero-cross distortion methods. Just as in the A/D case, sigma-delta designs rule the D/A conversion space. To take an example, a sigma delta converter might take a 16-bit 44.1 kHz signal, convert it into a 1-bit 2.8224 MHz stream using an interpolating filter, and then feed the 1-bit signal to a DAC that converts the super-sampled stream to an analog signal.

Many audio systems employed today use a sigma-delta audio ADC and a sigma-delta DAC. Therefore the conversion between PCM signals and oversampled 1-bit signals is done twice. For this reason, Sony and Philips have introduced an alternative to PCM, called Direct-Stream Digital (DSD), in their Super Audio CD (SACD) format. This format stores data using the 1-bit high-frequency (2.8224 MHz) sigma-delta stream, bypassing the PCM conversion. The disadvantage is that DSD streams are less intuitive than PCM and require a separate set of digital audio algorithms.

Connecting to Audio Converters: An ADC example
OK, enough background information. Now let's look at an actual ADC. A good choice for a low-cost audio ADC is the Analog Devices AD1871, a sigma-delta converter featuring 24-bit resolution and a 96 kHz sampling frequency. A functional block diagram of the AD1871 is shown in Figure 3a. Stereo audio is input via the left (VINLx) and right (VINRx) input channels and digitized audio data is streamed out serially through the data port, usually to a corresponding serial port on a signal processor.
Figure 3. (a) Functional block diagram of the AD1871 audio ADC
(b) Glueless connection of an ADSP-BF533 media processor to the AD1871.

As the block diagram in Figure 3b implies, the interface between the AD1871 ADC and Blackfin processor is glueless (the analog part of the circuit is simplified since only the digital signals are important in this discussion). The Blackfin connects to the ADC via 2 serial ports (SPORTs) and an SPI (Serial Peripheral Interface) port that allows the AD1871 to be configured via software commands. Parameters configurable through the SPI include the sampling rate, word width, and channel gain and muting. The oversampling rate of the AD1871 is supplied with an external crystal.

The SPORT is the data link to the AD1871 and is configured in I²S mode. I²S is a standard protocol developed by Philips for transmission of digital audio signals.

This standard allows audio equipment manufacturers to create components that are compatible with each other. To be exact, I²S is simply a 3-wire serial interface used to transmit stereo data. As shown in Figure 4a, it specifies a bit clock (middle), a data line (bottom), and a left/right synchronization line (top) that selects whether a left or right channel frame is currently being transmitted. In essence, I²S is a time-division-multiplexed (TDM) serial stream with two active channels. TDM is a method of transferring multiple channels (for example, stereo audio) over one physical link.

During setup, the AD1871 can reduce the 12.288 MHz sampling rate it receives from the external crystal and use this reduced clock to drive the SPORT clock (RSCLK) and frame synchronization (RFS) lines. This configuration insures that the sampling and data transmission are in sync.

The SPI interface, shown in Figure 4b, was designed by Motorola for connecting host processors to a variety of digital components. The interface between an SPI master and an SPI slave substantially consists of a clock line (SCK), two data lines (MOSI and MISO), and a slave select (SPISEL) line. One of the data lines is driven by the master (MOSI), and the other is driven by the slave (MISO). In the example shown in Figure 3b, the processor's SPI port interfaces gluelessly to the SPI block of the AD1871.

Figure 4. The data signals transmitted by the AD1871 using the I²S protocol
(b) The SPI 3-wire interface used to control the AD1871.

Audio codecs with a separate SPI control port allow a host processor to change the ADC settings on the fly. Besides muting and gain control, one of the really useful settings on ADCs like the AD1871 is the ability to place it in power-down mode. For battery-powered applications, this is often an essential function.

DACs and Codecs
Connecting an audio DAC to a host processor is an identical process to the ADC connection we just discussed. In a system that uses both an ADC and a DAC, the same serial port can hook up to both, if it supports bidirectional transfers.

But if you're tackling full-duplex audio, then you're better off using a single-chip audio codec that handles both the analog-to-digital and digital-to-analog conversions. A good example of such a codec is the Analog Devices AD1836, which features three stereo DACs and two stereo ADCs, and is able to communicate through a number of serial protocols, including I²S.

In this article, we have covered some basics of connecting audio converters to embedded processors. In part 2 of this series, we will describe the formats in which audio data is stored and processed. In particular, we will review the compromises associated with selecting data sizes. This is important because it dictates the data types used and may also rule out some processor choices if the desired quality level is too high for what a particular device can achieve. Furthermore, data size selection helps with making tradeoffs between increased dynamic range and additional processing power.

This series is adapted from the book "Embedded Media Processing" (Newnes 2005) by David Katz and Rick Gentile. See the book's web site for more information.

No comments: