top of page
Crosswalk finding
The car uses Figure-ground segmentation to identify crosswalks.

Figure-Ground Segmentation
The figure-ground segmentation problem using a graphical model that assigns a label of figure or ground to each element in an image. The elements are a sparse set of geometric features created by grouping together simpler features in a greedy, bottom-up fashion. The features are designed to occur commonly on the foreground structure of interest and more rarely in the background. The true positive features tend to cluster into regular structures (roughly parallel stripes in this case), differently from the false positives, which are distributed more randomly. Our approach exploits this characteristic clustering of true positive features, drawing on ideas from work on object-specific figure-ground segmentation, which uses normalized cuts to perform grouping. Affinity functions are used to measure the compatibilities of pairs of elements as potential foreground candidates and construct a graphical model to represent a figure-ground process. Each node in the graph has two possible states, figure or ground. The graphical model defines a probability distribution on all possible combinations of figure-ground labels at each node. Belief propagation (BP) is used to estimate the marginal probabilities of these labels at each node; any node with a sufficiently high marginal probability of belonging to the figure. The Form of the Graphical Model is defined the graphical model for a general figure-ground segmentation process as follows. Each of the N features extracted from the image is associated with a graph node (vertex) xi , where i ranges from 1 through N. Each node xi can be in two possible states, 0 or 1, representing ground and figure, respectively. The probability of any labeling of all the nodes is given by the following expression: P(x1,... ,xN ) = 1/Z QN i=1 ψi(xi) Q ψij (xi ,xj ). This is the expression for a pairwise MRF (graphical model), where ψi(xi) is the unary potential function, ψij (xi ,xj ) is the binary potential function and Z is the normalization factor. < ij > denotes the set of all pairs of features i and j that are directly connected in the graph. ψi(xi) represents a unary factor reflecting the likelihood of feature xi belonging to the figure or ground, independent of the context of other nearby features. ψij (xi ,xj ) is the compatibility function between features i and j, which reflects how the relationship between two features influences the probability of assigning them to figure/ground. The unary and binary functions may be chosen by trial and error, as in the current application, or by maximum likelihood learning. However, the preliminary results are shown to demonstrate the feasibility of our approach, and simple trial and error is used to choose unary and binary functions. The general form of our unary and binary functions is as follows. First, ψi(xi) enforces a bias in favor of each node being assigned to the ground: ψi(xi = 0) = 1 and ψi(xi = 1) < 1. The magnitude of the figure value for any feature will depend on one or more unary cues, or factors. In order for a node to be set to the foreground, the binary functions must reward compatible pairs of nodes sufficiently to offset the unary bias. Ground-ground and ground-figure interactions are set to be neutral: ψij (xi = 0,xj = 0) = ψij (xi = 0,xj = 1) = ψij (xi = 1,xj = 0) = 1. Figure-figure interactions ψij (xi = 1,xj = 1) are set less than 1 for relatively incompatible nodes and greater than 1 for compatible nodes. The value of ψij (xi = 1,xj = 1) will be determined by several binary cues, or compatibility factors.
Figure-Ground Process for Finding Lines
​
A standard approach to detecting crosswalk stripes is to use the Hough transform to find the straight-line edges of the stripes, and then to group them into an entire zebra pattern. While this method is sound for analyzing high-quality photographs of sufficiently well-formed crosswalks, it is inadequate under many real-world conditions because the Hough transform fails to isolate the lines correctly. To illustrate the limitations of the Hough transform, consider Figure 1. A straight line is specified in Hough space as a pair (d,θ): this defines a line made up of all points (u,v) such that n(θ)·(u,v) = d, where n(θ) = (cos θ,sin θ) is the unit normal vector to the line. In an image containing one straight line, each point of the line will cast votes in Hough space, and collectively the votes will concentrate on the true value of (d,θ). The lines in Figure 1 are not perfectly straight, however, and so the peak in Hough space corresponding to each line will be smeared. If only one such line were present in the image, this smearing could be tolerated simply by quantizing the Hough space bins coarsely enough. However, the presence of a second nearby line makes it difficult for the Hough transform to resolve the two lines separately, since no choice of Hough bin quantization can group all the votes from one line without also including votes from the other.

Fig. 1. Two slightly curved lines (black), representing edges of crosswalk stripes (with exaggerated curvature). The straight red dashed line is tangent to both lines, which means that the Hough transform cannot resolve the two black lines separately.
Crosswalks and Stripelets
​
A bottom-up procedure was devised for grouping edges into composite features that are characteristic of zebra crosswalks, and which are uncommon elsewhere in street scenes. The image is converted to grayscale, downsampled (by a factor of 4 with our current camera) to the size 409x307 and blurred slightly. Since the crosswalk stripes are roughly horizontal under typical viewing conditions, our edge detector finds roughly horizontal edges by finding local minima/maxima of a simple y derivative of the image intensity, ∂I/∂y. A greedy procedure groups these individual edges into roughly straight line segments. A candidate stripe fragment feature, or “stripelet,” is defined as the composition of any two line segments (referred to as “upper” and “lower”) with all of the following properties: (1.) The upper and lower segments have polarities consistent with a crosswalk stripe, i.e. ∂I/∂y is negative on the upper segment and positive on the lower segment, since the crosswalk stripe is painted a much brighter color than the pavement. (2.) The two segments are roughly parallel in the image. (3.) The segments have sufficient “overlap,” i.e. the x-coordinate range of one segment has significant overlap with the x-coordinate range of the other. (4.) The vertical width w of the segment pair (i.e. the y-coordinate of the upper segment minus the y-coordinate of the lower, minimized across x belonging to both segments) must be within the range 2 to 70 pixels (i.e. spanning the typical range of stripe widths observed in our 409x307 images). Many stripelets are detected in a typical crosswalk scene.

bottom of page


