Read "Site-Based Video System Design and Development" at NAP.edu

« Previous: Chapter 8 - Image Processing and Feature Extraction

Page 56

Suggested Citation:"Chapter 9 - Vehicle Localization and Trajectory Estimation." National Academies of Sciences, Engineering, and Medicine. 2012. Site-Based Video System Design and Development. Washington, DC: The National Academies Press. doi: 10.17226/22836.

Page 57

Page 58

Page 59

Page 60

Page 61

Page 62

Page 63

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

56 C h a p t e r 9 In the previous chapter, it was noted that localization in the camera image is not the same as localization in the 3-D world, and that it is easy to confuse what is seen (by a human observer) with what is known or estimated quantitatively (by the machine vision tracking system). In this chapter the research team pro- ceeds with the quantitative analysis. The first step is to convert camera coordinates into world coordinates, which are of course common to all four camera systems. Cluster tracks are trans- formed using a pair of assumed heights, at 0.5 m and 1.5 m. Of course, vehicles can display features well in excess of this, up to approximately 5 m for heavy commercial vehicles. Figure 9.1 shows a sample cluster track, with two corresponding projec- tions into world coordinates based on these assumed heights. Here the vehicle is traveling toward the camera. The uncer- tainty in feature height amounts to more than half a lane width in location, so clearly some refinement is needed to improve precision. The quality of the projected cluster track is not that good either. It suffers random variations attributable to an effect noted in Chapter 8; as component features are gained or lost from the cluster, the mean position is displaced, and this can happen between consecutive frames. In the right plot, this is corrected by using a modification of a method proposed by Kim (2008): velocities are integrated and linear regression is applied to remove drift from the original position estimates. Of course, the result is no less sensitive to feature height because this is a basic geometrical effect, but certainly the reduction in high frequency variations represents a worth- while improvement. Multicamera Cluster tracks Any individual vehicle may have many clusters attached, including from different camera positions. So the next step is to pick out a single cluster as a seed for vehicle identification and build on conditions of consistency with rigid body vehicle motions to increase the information available but without overlapping features with other vehicles. This is the aim. The basic strategy is to set a trigger on the outgoing leg of the inter- section, where there is minimal chance of a queue forming, and where vehicles are mainly well separated in the camera frame. Informally, a best case trigger location was set at 30 m from the center of the intersection, where longitudinal separation of vehicles is likely and where the camera height is still sufficient to avoid most occlusions. In addition, on exit, vehicles are squeezed into a single lane at this distance, which provides ini- tial help with localization. Given that cluster height affects lat- eral position (because of the lateral offset in camera position relative to the exit lane) the condition that the cluster track should normally be within the lane boundaries offers a simple condition to select an approximate feature height. Thus, for features moving away from the intersectionâand having a greater lateral offset than for the tracks shown in Figure 9.1â the preference for being within the single lane provides a sim- ple means to select between the two assumed heights. Other triggers on the same exit leg that match the first trig- gered cluster track along their whole common length are sought. Of those found, priority is given to the one that is generated first in time, so it has best visibility within the inter- section area; the selected cluster track is then used as the ref- erence cluster for the vehicle in question. In addition, it is easy to add clusters from all camera locations as long as they match the reference cluster based on rigid body conditions. Note that cluster height above ground is assumed to remain con- stant, so the rigid motion is well approximated by the condi- tion that distances in the (x, y) plane remain constant, at least within some tolerance. In fact this is an important step because it often connects a cluster track entering the intersec- tion with the reference one that exits. From this, the time range over which the maximum number of cameras are rec- ognizing at least one of this group of clusters is determined; the central time within this range, rounded to the nearest frame time, is called the reference time tref for the detected vehicle. If only one camera view can be found, the set of clus- ter tracks is considered incomplete and is rejected. Vehicle Localization and Trajectory Estimation

57 cles, if not all, would have provided an exit trigger. It is worth noting that approximately 5 min of start-up delay was neces- sary; approximately 3 min elapsed before the background image had converged in all cameras, and an additional 2 min was allowed so that triggers of exiting vehicles would also show a corresponding entry track. For illustration, an example trigger was selected and back- propagated into the intersection; compatible triggers were included, giving three matching cluster tracks, all three being outgoing on the east leg of the intersection. Searching for compatible cluster tracks across all cameras yielded a total of 18 clusters. Despite the lack of precise localization, given the uncertainty in cluster height, informal video review of many cases suggests there is a high probability that all are from the same vehicle. The results are seen in Figure 9.3, which shows the four camera angles, complete with blobs and identified clusters (shown here only in the relevant image). Note that not all 18 clusters are in view in this instant, which is the refer- ence time mentioned above. It is clearly seen that in this case the clusters are all unique to the one vehicle, which is turning left from the north leg and exiting toward the east. Figure 9.4 shows the scatter in the clusters when resolved at a nominal 0.5-m height above ground. Note that exceptions here are the original three triggering clusters, which have some initial height refinement based on position in the trigger zone. It might be thought that the yellow marker (placed on the reference cluster track) makes an acceptable vehicle center and that this would be adequate for localization. This would give an advantage through averaging over multiple views, but it is not making best use of the information available. In fact, this particular vehicle, by executing a left turn, is seen by all four Figure 9.2 shows all triggers obtained from a 30-min test run (Run 00122). Of the triggers obtained, each was referred to the two reference heights (0.5 m and 1.5 m), and the condi- tion applied that at least one of the trigger points should be within the lane boundaries. If both are within the boundaries, the one nearest the lane center is selected. Of 6,599 cluster tracks, 1,427 triggers were found in this way. Given an approx- imate estimate of 500 to 1,000 vehicle movements per hour (see Chapter 7), it seems reasonable to expect that most vehi- Figure 9.1. A single-cluster track resolved in world coordinates at an assumed height of 0.5 m (blue) and 1.5 m (red), with vehicle traveling to the north, viewed from NE camera. -30 -20 -10 0 Meters 10 20 30 Figure 9.2. The 30-m triggers set on exiting cluster tracks (Run 122).

58 at a common time) onto the road surface, whereas Figure 9.6 shows a close-up view. These images are determined at the same time and using the same data from the turning vehicle in Figure 9.3. The key point here is that with a zero-height ground plane, the projection from 2-D camera to 3-D world is precisely known. In fact, although a ground plane is men- tioned, the actual mapped surface heights are used. The pro- jection of the blobs onto the road surface takes full account of any height variations in the surface geometry. It can be seen that some of the blobs are projected far from the intersection center; for example, in the upper plot of cameras, but as was noted in Chapter 7, right-turning vehi- cles normally are covered by only two cameras, so any type of simple position average is likely to induce bias for such cases. Given the need for precision when determining conflict metrics, it is worth seeking an improved method. Note that there is no way to associate clusters between different cameras (and typically there is no commonality), so stereographic analysis is not feasible. The polygonal blobs are used to provide new information to localize the vehicle when it is maximally visible to all cam- eras. Figure 9.5 shows a projection of blobs (all those existing 0 50 100 150 200 250 300 50 100 150 200 250 300 0 50 100 150 200 250 300 50 100 150 200 250 300 Figure 9.3. Simultaneous images of a turning vehicle identified with 18 cluster tracks. From the top, the cameras are located at the NE, SE, SW, and NW corners, respectively. Scales are in pixel coordinates.

59 the projected polygons are intersected to give a localized bound- ing polygon (BP) for the vehicle. This is shown in magenta in Figure 9.6. To complete the basic vehicle localization at the reference time, a rectangle is fitted. This is not a unique process because many rectangles can fitted to a polygon, and given the uncer- tainty over the exact limits of the BP, the rectangle is allowed to protrude slightly beyond the BP. Rectangle fitting is simpli- fied by first estimating the direction of motion, and this is easily done by tracking the cluster set between adjacent frames: essentially the cluster velocities are averaged to pro- vide a direction for the orientation of the rectangle. Multiple lines are intersected with the BP in the directions parallel and perpendicular to the direction of motion, and the median lengths and widths obtained are used to estimate the size and positioning of the fitted rectangle. This process is found to be normally robust, especially provided the vehicle is visible to at least three cameras. Cluster height estimation is now considered again; with the vehicle boundary now determined (at time tref), the local- ization of the clusters can be improved. This is carried out in local vehicle coordinates (Figure 9.7) based on an origin G at the center of the rectangle and GXV, GYV axes aligned with the vehicle rectangle; the GXV axis points to the left of the direction of motion as shown. In Figure 9.7, O, X, and Y are the intersection coordinates, with O at the nominal center of the intersection, OX pointing east, and OY pointing north. The vehiclesâ axes are to move with the vehicle and are especially useful for projecting clusters into the local vehi- cle geometry and thus for estimating the unknown cluster Figure 9.4. Multicamera cluster tracks using a nominal 0.5-m assumed height above ground. Left plot shows detail. Yellow dot is reference track at reference time. Tracks are seen from cameras at the NE (blue), SE (red), SW (green), and NW (black) corners. Scales are in meters. 0 2 4 6 8 10 11 0 2 4 6 8 10 Figure 9.5. Multicamera blob projections at time tref. The black star shows the location of the centroid of associated clusters. Colors denote camera source: NE, blue; SE, red; SW, green; and NW, black. -40 -20 0 Meters 20 40 60 -40 -20 0 20 40 60 Figure 9.3, a blob is cut off by the image frame, and its 3-D counter part may extend well beyond the projected line. In this case, the projected polygon is extended well beyond the limits found from the visible points in the camera. To localize further, the centroid of the clusters is used to select the nearest projected blob from each camera, and then

60 heights. Cluster localization is shown in Figure 9.8, in which the blue dashed rectangle is the vehicle boundary (at ground height) and each red line represents the projection of a single cluster between the upper standard height (h = 1.5 m, marked with a red star) and the lower standard height (h = 0.5 m). Clearly the upper height indicates the point nearer to the rel- evant camera. Numeric values indicate the source camera, the directions of which are rotated because of the transformation to vehicle coordinates. If the assumed height of any cluster between these reference heights is varied, it assumes a different position on its corre- sponding (red) cluster line. Of course, the cluster height may be outside of this range, in which case it should be extrapo- lated beyond the nominal endpoints of the cluster line. Inter- secting each cluster line with the vehicle rectangle provides an estimation of cluster location and height. Although two inter- section points normally are found, it is assumed that the vehi- cle boundary nearest the camera is the most probable location, and this one is used. If no intersecting point is obtained, the point of nearest approach is used, unless it is further from the vehicle than a certain tolerance (1 m is assumed), in which case the cluster is rejected. The resulting cluster points are shown as blue squares in Figure 9.8. For comparison, the blue circles are the nominal heights used previously, mostly at the -4 -3 -2 -1 0 Meters 1 2 3 4 -6 -5 -4 -3 -2 M et er s -1 0 1 3 3 1 2 1 2 3 2 3 2 0 -10 -5 0 5 10 15 -5 0 5 10 15 20 1 2 3 4 5 6 7 8 9 4 5 6 7 8 9 10 11 12 Figure 9.6. Detail and vehicle localization by intersecting blobs. Blob color codes are the same as in Figure 9.5, with the addition of a magenta bounding polygon and fitted rectangle (black). Distances are in meters. X Y O G XV YV Figure 9.7. Vehicle local coordinates (ISO sign convention). Figure 9.8. Cluster localization using a vehicle rectangle (vehicle coordinate system: XV is horizontal, YV is vertical; units are in meters). Numeric values adjacent to cluster lines indicate the camera location: 0 5 NE, 1 5 SE, 2 5 SW, and 3 5 NW.

61 this point, it is assumed that viable localization has been achieved with fully automated methods. For any single-cluster track, there is now a simple way to estimate the motion of the vehicle center: follow the cluster in 3-D coordinates assuming fixed height above ground, esti- mate the velocity vector as part of the cluster tracking, and then apply known offsets (i.e., the vehicle-based coordinate of the fitted cluster). Averaging the results may be used to give a refined vehicle trajectory, but because cluster tracks may appear or disappear, a more systematic approach is pre- ferred; this is considered in the next chapter. For now the research team includes the results of tracking a single outgo- ing cluster in this way, joining it with a single incoming clus- ter, where length and height preferences have been used in the cluster selection. The path shown has been truncated at 50 m because it is not expected that the single-cluster track- ing is likely to be stable or sufficiently accurate beyond that distance. A similar limit will be imposed when performing velocity and acceleration estimation (see Chapter 10). Figure 9.11 shows the resulting path of the example vehi- cle considered above. Although only a single cluster has been used for the path estimation, and clearly the lateral positioning is not as precise as it may be, this basic trajec- tory may be used for searching purposes. Using polynomial curve fitting, tangent directions also may be reliably deter- mined, even when the precise vehicle location is uncertain. The tangent vector is stored so it is available when addi- tional refinement requires the vehicle orientation. As a by- product, the curve fitting gives estimates of speed and distance (see Figure 9.12), where the distance is measured along the curved path and has a nominal zero point at the reference time tref. The speed estimation is seen to be robust lower (0.5 m) location. If negative heights are found, the clus- ter is rejected unless it is within a small tolerance, in which case its height is set to zero. The resulting set of fitted clusters is shown in intersection coordinates in Figure 9.9, whereas a 3-D projection of the bounding box on one of the camera images is given in Fig- ure 9.10. The height of the box was estimated as being twice the median height of the fitted clusters (which is more robust than choosing the maximum cluster height). It should be emphasized that all steps are fully automated and that the example was randomly selected. The ânearest edge to cameraâ algorithm is not always accu- rate because clusters attached to the roof or other interior surfaces such as windshield or hood may exist further from the camera. This means that for tracking purposes, the clus- ters nearer the ground are preferred. If greater precision is required, it is possible to further refine locations of clusters using the coincidence of cluster lines from multiple frames, at the expense of additional computation and complexity. At Figure 9.10. Fitted rectangular bounding box on camera image. Figure 9.9. Fitted rectangle and associated cluster points, including close-up (right plot). Colors according to camera corner location: NE, red; SE, green; SW, blue; and NW, black. Meters M et er s -10 -10 -5 0 5 10 15 20 25 -5 0 5 10 15 20 25 4 6 8 10

62 M et er s 0 10 20 30 40 50 Figure 9.11. Basic vehicle fitted path (red: using outgoing cluster; blue: using incoming cluster). The right plot shows a detail of the track near the stop bar; corresponding vehicle image is for the front car at the stop bar. 170 175 180 185 190 195 200 0 20 40 60 time (s) di st an ce (m ) 170 175 180 185 190 195 200 0 5 10 15 time (s) sp ee d (m /s ) Figure 9.12. Speed and distance estimates from basic track fitting.

63 The double arrows represent SQL-based data extraction from the feature database. White rectangles are tables constructed to store relevant data elements (where t, x, y, and z represent time and position coordinates, cid represents a cluster identi- fication, and in CLUS 2 the dual reference heights h1 and h2 are indicated by the suffixes [. . .]1,2). to the fact that the vehicle actually stopped for a short period at the stop bar (between t = 180 and 188 s, as seen in video review), although this is captured as a very low drift speed of approximately 0.2 m/s. Figure 9.13 summarizes in a block diagram the overall steps used in the foregoing basic vehicle trajectory estimation. Image clusters Projecting and smoothing Blobs CLUS 2 [t,x,y,z]1,2 h1,h2, hmap Detecting vehicles by triggering Projecting and intersecting Triggers [t,x,y,cid] Cluster set [cid1,cid2,â¦] Compatibility testing Fitting vehicle rectangles Vehicle set [cid1,cid2, â¦.] [t,GX,GY,speed] [rectangle] Fitting reference trajectories Figure 9.13. Block diagram summary of basic vehicle trajectory estimation.

Next: Chapter 10 - Trajectory Refinement and Estimation of Motion Variables »

Site-Based Video System Design and Development (2012)

Chapter: Chapter 9 - Vehicle Localization and Trajectory Estimation

Welcome to OpenBook!

Get Email Updates