top of page

This site was designed with the

website builder. Create your website today.Start Now

Pedestrian sensing

In order to detect pedestrians, the car uses 4 technologies:

carl-nenzen-loven-USDJYiaYIro-unsplash.j

1) Stereo Vision

Stereo vision is a process of triangulation that determines range from two images taken from two different positions. These two images are taken simultaneously from a pair of cameras with a known baseline (i.e., separation distance between the cameras). The objective of a stereo vision system is to find correspondences in the images captured by the two cameras. This is done by some manner of image correlation and peak finding. The image correlation function is a local correlation-based method that provides a dense disparity map of the image, which can then be converted to a range map. The correlation function implemented is the sum of absolute differences (SAD). The equation for SAD for each pixel in an image when computed over a 7 × 7 pixel local region is found in figure 1.

Where: A = Left image of the stereo pair. B = Right image of the stereo pair. x and y = Image pixel locations. s = Number of horizontal shifts that are searched to find an image correlation. There are other functions that could be used for local image correlation, including the sum of squared differences and normalized correlation. After the correlation was computed, 32 horizontal shifts in this case, the minimum value was detected and interpolated for an accurate disparity estimate. In this implementation, the SAD correlation was applied to multiple resolutions of the image pair, extending the search range by a factor of two for every coarser resolution image. The disparity estimates obtained at coarser resolutions are generally less prone to false matches that can occur in regions of low texture, but they are commensurately less accurate. The computed disparity maps based on these methods are often noisy because the range of data depends on accurately correlating each point in the image to a corresponding point in the other image. To increase the reliability of the range data, the image can be prefiltered (with boxcar or Gaussian filters), and the summing window for SAD can be changed from 7 × 7 pixels to 13 × 7 pixels.This checking method masks inconsistent or ambiguous disparity data such as areas that are occluded by one of the two cameras. Disparity maps computed at multiple resolutions are combined before range (depth) is computed. Given the horizontal coordinates of corresponding pixels, xl and xr, in the left and right image, the range, z, can be expressed as follows:

Where: b = The stereo camera baseline. f = The focal length of the camera in pixels. d = The image disparity value.

2) Feature Extraction Using HOG

HOG is a method of encoding and matching image patches under varying image orientation and scale changes. It is defined as the HOG directions of image pixels within a rectangular sampling window on an image. The gradient direction of each pixel in an image can be computed by convolving it with the Sobel mask (or differential kernels) in the X and Y directions. The ratio of convolution along two directions gives the underlying image feature direction. The gradient direction of each pixel is then binned in nine directions covering 180 degrees. HOG is then computed by gathering the directions of pixels inside a sampling window and weighting each response by the edge strength. This results in an 8 × 1 vector that is then normalized to be bound in a [0, 1] range. Figure 3 shows an illustration of how a HOG is computed. The image location is shown in blue in the photo of the pedestrian.

Within small regions, HOG encodes dominant shapes, which are computed by a voting scheme applied to the region’s edge segments (see figure 4). Specifically, an image patch or region is first subdivided into multiple image cell regions. Each cell region is further divided into 2 × 2-pixel or 3 × 3-pixel local grids. The HOG feature is computed for each local grid region. For pedestrian recognition, candidate image patches are typically resized to a nominal size of 64 × 128 pixels, and HOG is computed. To handle image noise and exploit pedestrian shape, the algorithm applies a four-tap Gaussian filter to smooth the image and enhances it using histogram stretching.

To efficiently compute HOG for use in a real-time system, an integral image is pre-computed so that HOG can be retrieved by look-up operations that consist of simple arithmetic summations. The integral image denotes a stack of image encoding where cumulative histogram of orientation for each pixel is computed by a fast scanning method. The integral histogram is computed as follows:

Given a candidate pedestrian region of interest (ROI), the corresponding HOG for each ROI is computed by sampling integral histogram as follows:

3) AdaBoost Classifier

AdaBoost uses a training set (x1, y1)… … …(xm, ym) where xi belongs to a domain X and yi is a label in some label set Y. For simplicity, assume the labels are -1 or +1. AdaBoost calls a given weak learning algorithm repeatedly. It maintains a distribution or set of weights over the training set. Initially, all weights are set equally, but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on harder examples in the training set. The goodness of a weak hypothesis is measured by its error; this error is measured with respect to the distribution on which the weak learner was trained. The following information provides the pseudo-code for the algorithm:

4) MRF

Another important technology is to label scene structure into buildings, trees, other tall vertical structures (e.g., poles), and objects of interest (e.g., pedestrians and vehicles). This approach relied on posing the labeling problem as Bayesian labeling in which the solution is defined as the maximum a posteriori (MAP) probability estimate of the true labeling. This posterior is usually derived from a prior model and a likelihood model, which, in turn, depends on how prior constraints are expressed. The MRF theory encodes contextual constraints into the prior probability. MRF modeling can be performed in a systematic way as follows: 1. Pose the problem as one of labeling with a specific label configuration. 2. Further pose the problem as a Bayesian labeling problem in which the optimal solution is defined as the MAP label configuration. 3. Characterize the prior distribution of label configurations. 4. Determine the likely density of data based on an assumed observation model. 5. Use the Bayesian rule to derive the posterior distribution of label configurations.

bottom of page