Wednesday, February 28, 2007

PC-Vision Based System for Robust Face Tracking in "Smart Airbag"

PC-Vision Based System for Robust Face Tracking in "Smart Airbag"
2005 ASEAN Virtual Instrumentation Applications Contest Submission

Author(s):
Y.K. Liaw, School of Engineering, Monash University Malaysia
Alex See, School of Engineering, Monash University Malaysia

Industry: Automotive, University/Education

Product(s):
LabVIEW 7.1
Vision Development Module 7.1
LabVIEW NI-USB Webcam Driver
Vision Assistant 7.1

The Challenge:
Most airbags consider a single standard for the occupant’s size and the nature of crash. The airbags deployment in this case is not considered as a “smart airbag” system. Airbags are extremely dangerous due to its explosion force and have caused fatality leading to death of occupants in vehicles if deployed incorrectly. This is even more troubling if the occupants are little kids or infant. Analysis of the vehicle’s occupant position or posture is a key in designing a “smart airbag” system. “Smart airbag” should be able to distinguish a small person and person not in a safe spatial position for airbag deployment. One of the main difficulties encountered by the decision logic systems used in airbag deployment deals with the critical assumption about the occupant size and position in the car at the time of a crash. With this challenging problem, steps have been taken to make airbag deployment safer by adding adaptive deployment decision capabilities.

The Solution:
One of the methods of determining these deployment decisions is by using sensors in the passenger seat to estimate passenger’s size and deploy an airbag at a force appropriate to protect the passenger. If something is shoved under the seat or if the passenger is in the wrong position, the sensor can misread the occupant and potentially set off a deadly airbag deployment. One of the solutions is to use a vision occupant-sensing spatial position system. The vision system uses a tiny stereo camera system mounted in the overhead console in the vehicle’s compartment. The camera is posited to face inwards to track the passengers’ activity. The stereo video mode allows data to be triangulated to pinpoint the position of the occupant. If the occupant ends up in a zone that would prove deadly if an airbag deploys, the system will have to make a decision to prevent that deployment. A prototype LabVIEW vision based system has been developed for robust face tracking and spatial 3-D position estimation.

Abstract
As a first step towards the solution mentioned above, a computer vision color tracking algorithm is developed by using LabVIEW and applied towards tracking human faces. The developed algorithm must be fast and efficient for tracking in real time without consuming a major share of computational resources. The face tracking algorithm is based on mean shift algorithm as a basis and modified as Continuously Adaptive Mean Shift (CAMSHIFT) algorithm and applied with the selected kernel. The flow of operation of developed algorithm is presented as well in the following part and completed with the results tested under different conditions.

Introduction
This paper mainly describes a program developed in LabVIEW for human face tracking purpose utilizing a computer. The aim was to facilitate and have the ability to segment, track, and estimate the 3D spatial position of a human in front of a webcam. Furthermore, the developed robust tracker must be able to track a given face in the presence of noise, other face occlusion, half-face occlusion, and the movement of hand. Moreover, for tracker program, during the period of tracking, it should not utilize too much of computer memory available or other computer resources so that the algorithm can also be implemented for slow computer specification and inexpensive consumer cameras.

In order to develop such an algorithm, attention has been focused on robust statistics and probability distributions. Mean shift algorithm is one of the most efficient algorithm operates on probability distributions. It is a robust non-parametric technique for climbing density gradients to find the mode/ peak of the probability distributions. [1]

Mean shift was first applied to the problem of mode seeking by Cheng. [2] Besides that, kernel based object tracking including adaptive scale and background- weighted histogram extension was described by Comanuciu. [3] Camshift is primarily intended to perform head and face tracking in a perceptual user interface was performed by Bradski. [4] Mean shift has also been implemented by coupling two of this algorithm together to track migrating cells. [5]

The idea of using robust statistics is because it tends to ignore outliers in the data, such as the points far away from the region of interest (ROI). Therefore, the developed algorithm recompenses for noise and “distractors” in the vision data. Robust statistics are collaborated with mean shift algorithm in order to find the mode represents the Centroid of the face that is being tracked. Details of tracking operation is discussed below.

Software Architecture
Tracking algorithm was fully programmed by using LabVIEW. It consisted of two main parts, namely :
(1) creation of probability distribution image, and
(2) adaptive tracking


Figure 1: Camshift is a new developed method modified from mean shift algorithm which climbs the density gradients to find the mode of probability distribution. The mode of a color distribution within image plane is considered in this case. The modified version is necessary to deal with dynamically changing color probability distributions derived from video frame sequences.

Figure 1 depicts the detail flow of Camshift operation implemented in the program. A user needs to position his/ her head on the center of an onscreen box to extract the flesh sample. Color planes of an acquired image are converted at the beginning from RGB planes to HSL planes.

HSL stands for HUE, SATURATION, and LUMINANCE color space that corresponds to projecting standard RGB color space. HSL separates out HUE from SATURATION and from brightness. Thus, the problem of luminance variation can be solved in this case since the LUMINANCE plane has been separated out.

The program utilized the HUE plane and bins into 1D histogram. The histogram is quantized into bins, which reduces the computational and space complexity and allows similar color values to be clustered together. The histogram is saved for future use when the sampling is complete. The result of histogram is used as a model or lookup table to transform each acquired image into probability distribution image. (Figure 2) The histogram may consist of unwanted region (background pixels), the 2D probability distribution image will be influenced by their frequency in the histogram back- projection. In order to assign higher weighting to pixels nearer to the region center, a weighted histogram may be used to compute the target histogram. Tracking is performed by Camshift on this probability of flesh image.

Probability image is nothing but a grayscale image which the gray value gives the probability of the pixel representing skin.

Mean shift algorithm is performed within the ROI to find the Centroid position and move the center of ROI to that point and research Centroid until it converges. This process could be only one or more iterations. Zeroth moment is computed as one parameter in calculating the size of new ROI region.

Equation (1)

Whereby is the Zeroth moment, is the probability pixel value at position , x and y range over the ROI search window, s is the new window width, and h is window length. All these parameters are reported after mean shift and new size of ROI search window is set and overlaid to indicate the detected face region. The iteration is repeated so that the ROI tracks on the moving face. The whole process is known as CAMSHIFT as it continuously adapts its window size to deal with dynamically changing color distribution and at the same time mean shift algorithm is iteratively running within the ROI search window. The search window is able to track with the ROI covers the whole face region a smaller face/ smaller ROI (far away from webcam) or bigger face/ bigger ROI (nearer to webcam).

Implementation and Result Analysis


Figure 2: Probability Distribution Image is created based on the sampled user fleshy tone pixels. This process is known as histogram Back- Projection. Histogram Back- Projection is a primitive operation that associates the pixel values in the image with the value of the corresponding histogram bin. Since the background does not have any flesh HUE value, all the pixels in this region have been turned to black, except for the skin region (face & hand).

An onscreen box of 30X30 pixels is initially overlaid on image plane. A user is required to put the face on the box and waits for a count down finishes. After the time has elapsed, flesh sample is taken and Camshift starts to perform tracking. The ROI search window is continuously adapting its window size, based on Equation 1, by calculating the Zeroth moment, area, and 3D position until it covers the whole face region. The center of ROI window is located at the Centroid found in the mean shift algorithm.

ROI search window keeps on tracking the face by climbing the density gradient of the probability distribution in any direction. Unlike the Mean Shift algorithm, which is designed for static distribution (distribution is not updated unless the target experiences significant changes in shape, size or color), Camshift is designed for dynamically changing distributions. These occur when objects in video sequences are being tracked and the object moves so that the size and location of the probability distribution changes in time. Thus, the Camshift algorithm adjusts the search window size in the course of its operation.

Camshift tracker can handle and avoid of off-tracking when another face appears on the image plane. This can be explained the powerful of using robust statistics that ignore outliers in the vision data. Another case is when the face near to an unwanted fleshy-liked object. Due to the behavior of webcam, exposure will be automatically changed with the movement of any object. This may be causing the background fleshy tone object partially appears as noisy distribution. Weight has played a vital role in this case whereby it assigns no or lower weight to the pixels far away from the center of ROI. Hence, the ROI search window is not moving towards other object (out tracking) or covering any unnecessary objects.

Tracking a face with the presence of passing hand occlusion. Camshift tends to be robust against transient occlusion because the search window will tend to first absorb the occlusion and then stick with the dominant distribution mode with the occlusion passes.

Camshift is also able to keep track on the face even the face only partially appears on the screen.

Conclusion
The developed prototype algorithm functions as a prototype as part of the program used in safety airbag deployment system. Camshift is a simple, computationally efficient face and colored object tracker. It has been seen that it compacts with any kind of possible conditions occur while driving the vehicle by robust tracking of occupant. This algorithm can still be improved by implementing an adaptive color model in real time. Since the current program relies on fixed model/ histogram, it may be still affected by significant changing in luminance. In order to alleviate this problem, at each time frame, a new set of pixels is sampled from the tracked region and can be used to update the weighted histogram. The algorithm was coded utilizing LabVIEW, and the prototyping has been very efficient and rapid.

Not all the samples can be correctly used in adaptation. An obvious problem with adapting a color model during tracking is the lack of ground-truth. Any color- based tracker can lose the object it is tracking due, for example, to occlusion or lighting varying. If such errors go undetected the color model will adapt to image regions which do not correspond to the object. Observed log-likelihood measurement can be used to overcome this problem to detect erroneous frames. Color data from these frames are not used to adapt the object’s color model. This is known as selective adaptation, which can be further investigated. This developed prototype system is considered a low cost system, which has potential for utilizing this in the automotive environment for safe deployment of airbag in a vehicle.

References
[1] K. Fukunaga, D. Hostetler (1975), “The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition”, IEEE Transactions on Information Theory, Jan 1975, vol. 21, No. 1.
[2] Yizong Cheng, (1995), “Mean Shift, Mode Seeking, and Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, August 1995, vol. 17, Issue: 8, pg. 790-799
[3] Comaniciu, D., Ramesh, V., Meer, P., “Kernel-Based Object Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2003, vol. 25, Issue: 5, pg. 564-577
[4] Bradski, G.R. (1998), “Real Time Face and Object Tracking as a Component of a Perceptual user Interface”, In Applications of Computer Vision, 1998. WACV’98. Proceedings., Fourth IEEE Workshop, 19-21 October 1998, vols. 1, pg. 214-219
[5] O. Debeir, P. Van Ham, R. Kiss, C. Decaestecker (2005), “Tracking of Migrating Cells Under Phase-Contrast Video Microscopy with Combined Mean-shift Processes” IEEE Transactions on Medical Imaging, June 2005, vol. 24, No. 6.


For more information, contact:
Alex See, Lecturer
Y.K. Liaw, Student
Monash University Malaysia
School of Engineering
No. 2 Jalan Kolej, Bandar Sunway
46150, Selangor, MALAYSIA
Tel: +60 3 5636 0600
Fax: +60 3 5632 9314
Email: alex.see@eng.monash.edu.my

No comments: