CCD and CMOS image sensor processing pipeline

Camera sources today are overwhelmingly based on either Charge-Coupled Device (CCD) or CMOS technology. Both of these technologies convert light into electrical signals, but they differ in how this conversion occurs.

In CCD devices, an array of millions of light-sensitive picture elements, or pixels, spans the surface of the sensor. After exposure to light, the accumulated charge over the entire CCD pixel array is read out at one end of the device and then digitized via an Analog Front End (AFE) chip or CCD processor. On the other hand, CMOS sensors directly digitize the exposure level at each pixel site.

In general, CCDs have the highest quality and lowest noise, but they are not power-efficient. CMOS sensors are easy to manufacture and have low power dissipation, but at reduced quality. Part of the reason for this is because the transistors at each pixel site tend to occlude light from reaching part of the pixel. However, CMOS has started giving CCD a run for its money in the quality arena, and increasing numbers of mid-tier camera sensors are now CMOS-based.

Regardless of their underlying technology, all pixels in the sensor array are sensitive to grayscale intensity -- from total darkness (black) to total brightness (white). The extent to which they're sensitive is known as their "bit depth." Therefore, 8-bit pixels can distinguish between 28, or 256, shades of gray, whereas 12-bit pixel values differentiate between 4096 shades. Layered over the entire pixel array is a color filter that segments each pixel into several color-sensitive "subpixels." This arrangement allows a measure of different color intensities at each pixel site. Thus, the color at each pixel location can be viewed as the sum of its red, green and blue channel light content, superimposed in an additive manner. The higher the bit depth, the more colors that can be generated in the RGB space. For example, 24-bit color (8 bits each of R,G,B) results in 224, or 16.7 million, discrete colors.

In order to properly represent a color image, a sensor needs 3 color samples -- most commonly, Red, Green and Blue -- for every pixel location. However, putting 3 separate sensors in every camera is not a financially tenable solution (although lately such technology is becoming more practical). What's more, as sensor resolutions increase into the 5-10 Megapixel range, it becomes apparent that some form of image compression is necessary to prevent the need to output 3 bytes (or worse yet, 3 12-bit words for higher-resolution sensors) for each pixel location.

Not to worry, because camera manufacturers have developed clever ways of reducing the number of color samples necessary. The most common approach is to use a Color Filter Array (CFA), which measures only a single color at any given pixel location. Then, the results can be interpolated by the image processor to appear as if 3 colors were measured at every location.

The most popular CFA in use today is the Bayer pattern, shown in Figure 1. This scheme, invented by Kodak, takes advantage of the fact that the human eye discerns differences in green-channel intensities more than red or blue changes. Therefore, in the Bayer color filter array, the Green subfilter occurs twice as often as either the Blue or Red subfilter. This results in an output format sometimes known as '4:2:2 RGB', where 4 Green values are sent for every 2 Red and Blue values.

Figure 1: Bayer pattern image sensor arrangement
Connecting to Image Sensors
CMOS sensors ordinarily output a parallel digital stream of pixel components in either YCbCr or RGB format, along with horizontal and vertical synchronization and a pixel clock. Sometimes, they allow for an external clock and sync signals to control the transfer of image frames out from the sensor.

CCDs, on the other hand, usually hook up to an "Analog Front End" (AFE) chip, such as the AD9948, that processes the analog output signal, digitizes it, and generates appropriate timing to scan the CCD array. A processor supplies synchronization signals to the AFE, which needs this control to manage the CCD array. The digitized parallel output stream from the AFE might be in 10-bit, or even 12-bit, resolution per pixel component.

Recently, LVDS (low-voltage differential signaling) has become an important alternative to the parallel data bus approach. LVDS is a low-cost, low pin-count , high-speed serial interconnect that has better noise immunity and lower power consumption than the standard parallel approach. This is important as sensor resolutions and color depths increase, and as portable multimedia applications become more widespread.

Image Pipe
Of course, the picture-taking process doesn't end at the sensor; on the contrary, its journey is just beginning. Let's take a look at what a raw image has to go through before becoming a pretty picture on a display. In digital cameras, this sequence of processing stages is known as the "image processing pipeline," or just "image pipe." Refer to Figure 2 for one possible dataflow. These algorithms are typically performed on a media processor such as those in Analog Devices' Blackfin family.

Figure 2: Example Software Image Pipe Flow

Mechanical Feedback Control
Before the shutter button is even released, the focus and exposure systems work with the mechanical camera components to control lens position based on scene characteristics.

Auto-exposure algorithms measure brightness over discrete scene regions to compensate for overexposed or underexposed areas by manipulating shutter speed and/or aperture size. The net goals here are to maintain relative contrast between different regions in the image and to achieve a target average luminance.

Auto-focus algorithms divide into two categories. Active methods use infrared or ultrasonic emitters/receivers to estimate the distance between the camera and the object being photographed. Passive methods, on the other hand, make focusing decisions based on the received image in the camera.

In both of these subsystems, the media processor manipulates the various lens and shutter motors via PWM output signals. For auto-exposure control, it also adjusts the Automatic Gain Control (AGC) circuit of the sensor.

As we discussed earlier, a sensor's output needs to be gamma-corrected to account for eventual display, as well as to compensate for nonlinearities in the sensor's capture response.

Since sensors usually have a few inactive or defective pixels, a common preprocessing technique is to eliminate these via median filtering, relying on the fact that sharp changes from pixel to pixel are abnormal, since the optical process blurs the image somewhat.

Lens correction (shading / distortion correction)
This set of algorithms accounts for the physical properties of lenses that warp the output image compared to the actual scene the user is viewing. Different lenses can cause different distortions; for instance, wide-angle lenses create a "barreling" or "bulging" effect, while telephoto lenses create a "pincushion" or "pinching" effect.

Lens shading distortion reduces image brightness in the area around the lens. Chromatic aberration causes color fringes around an image. The media processor needs to mathematically transform the image in order to correct for these distortions.

Image stability compensation, or hand-shaking correction is another area of preprocessing. Here, the processor adjusts for the translational motion of the received image, often with the help of external transducers that relate the real-time motion profile of the sensor.

White balance is another important stage of preprocessing. When we look at a scene, regardless of lighting conditions, our eyes tend to normalize everything to the same set of natural colors. For instance, an apple looks deep red to us whether we're indoors under fluorescent lighting, or outside in sunny weather. However, an image sensor's "perception" of color depends largely on lighting conditions, so it needs to map its acquired image to appear "lighting-agnostic" in its final output. This mapping can be done either manually or automatically.

In manual systems, you point your camera at an object you determine to be "white," and the camera will then shift the "color temperature" of all images it takes to accommodate this mapping. Automatic White Balance (AWB), on the other hand, uses inputs from the image sensor and an extra white balance sensor to determine what should be regarded as "true white" in an image. It tweaks the relative gains between the R, G and B channels of the image. Naturally, AWB requires more image processing than manual methods, and it's another target of proprietary vendor algorithms.

De-mosaic / Pixel interpolation / Noise reduction / Edge enhancement
De-mosaicing is perhaps the most crucial and numerically intensive operation in the image pipeline. Each camera manufacturer typically has their own "secret recipe," but in general, the approaches fall into a few main algorithm categories.

Nonadaptive algorithms like bilinear interpolation or bicubic interpolation are among the simplest to implement, and they work well in smooth areas of an image. However, edges and texture-rich regions present a challenge to these straightforward implementations. Adaptive algorithms, those that change behavior based on localized image traits, can provide better results.

One example of an adaptive approach is edge-directed reconstruction. Here, the algorithm analyzes the region surrounding a pixel and determines in which direction to perform interpolation. If it finds an edge nearby, it interpolates along the edge, rather than across it. Another adaptive scheme assumes a constant hue for an entire object, and this prevents abrupt changes in color gradients within individual objects. Many other de-mosaicing approaches exist, some involving frequency-domain analysis, Bayesian probabilistic estimation, and even neural networks.

Color Transformation
In this stage, the interpolated RGB image is transformed to the targeted output color space (if not already in the right space). For compression or display to a television, this will usually involve an RGB®YCbCr matrix transformation, often with another gamma correction stage to accommodate the target display. The YCbCr outputs may also be chroma subsampled at this stage to the standard 4:2:2 format for color bandwidth reduction with little visual impact.

In this phase, the image is perfected via a variety of filtering operations before being sent to the display and/or storage media. For instance, edge enhancement, pixel thresholding for noise reduction, and color-artifact removal are all common at this stage.

Display / Compress / Store
Once the image itself is ready for viewing, the image pipe branches off in two different directions. In the first, the postprocessed image is output to the target display, usually an integrated LCD screen (but sometimes an NTSC/PAL television monitor, in certain camera modes). In the second, the image is sent to the media processor's compression algorithm, where industry-standard compression techniques (JPEG, for instance) are applied before the picture is stored locally in some storage medium (usually a non-volatile Flash memory card).

About the authors
David Katz is a Senior DSP Applications Engineer at Analog Devices, Inc., where he is involved in specifying and supporting Blackfin media processors. He has published dozens of embedded processor articles both domestically and internationally. Previously, he worked at Motorola, Inc., as a senior design engineer in cable modem and automation groups. David holds both a B.S. and M. Eng. in Electrical Engineering from Cornell University. He can be reached at

Rick Gentile joined ADI in 2000 as a Senior DSP Applications Engineer, and he currently leads the Blackfin DSP Applications Group. Prior to joining ADI, Rick was a Member of the Technical Staff at MIT Lincoln Laboratory, where he designed several signal processors used in a wide range of radar sensors. He received a B.S. in 1987 from the University of Massachusetts at Amherst and an M.S. in 1994 from Northeastern University, both in Electrical and Computer Engineering. He can be reached at


