Read "Site-Based Video System Design and Development" at NAP.edu

« Previous: Chapter 7 - System Hardware and Site Installation

Page 50

Suggested Citation:"Chapter 8 - Image Processing and Feature Extraction." National Academies of Sciences, Engineering, and Medicine. 2012. Site-Based Video System Design and Development. Washington, DC: The National Academies Press. doi: 10.17226/22836.

Page 51

Page 52

Page 53

Page 54

Page 55

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

50 C h a p t e r 8 Image processing generates a rich set of features that are later to be interrogated to identify vehicles and their kinematics. The image processing for feature extraction takes place at the level of an individual camera system, and there is no intention to fuse information from raw images. For this process, the input is a sequence of camera images, and the output is a set of geometric features in camera coordinates. Two types of feature were extracted, one based on corner features and one based on background subtraction. The processing steps are indicated in Figure 8.1. The extracted features are numerical arrays of cam- era coordinate points and velocities, as well as associated times and object identifiers. Compressed video is also extracted for review but is not used in any further numerical analysis; all image analysis takes place with the uncompressed video images, but once the features are extracted the raw images can be discarded. Once extracted, feature data are stored locally and then sent over the Ethernet link to the main intersection host com- puter. Again, once they are stored centrally, there is no need to retain the features on the local computer. In this way, the local camera system can run in extended data collecting ses- sions without storage limits. The image processing algo- rithms operate at a low level and are fully automated. The key requirement for these algorithms is to extract sufficient infor- mation to isolate and track individual vehicles but to avoid storing excessive unwanted features. In the process, high-volume image information is removed, and the result is a distilled version of what was visible in the image streams. The core image processing algorithms, devel- oped by the UC Berkeley team, employ a number of impor- tant attributes that facilitate postprocessing into vehicle trajectories. Background Subtraction The background subtraction algorithm detects objects based on the intensity change of each pixel. Most existing vehicle detection and tracking systems are based on this algorithm, but here it plays more of a supporting role. The algorithm requires relatively low computation time and shows robust detection in good illumination conditions. In Figure 8.2, the right image shows how background (black) is separated from foreground objects (blue); the moving vehicle and the bicycle are sufficiently different in pixel intensity from the stored background image so that the foreground region is identified. This works reliably only if the background image is identified correctly, despite lighting changes and passing objects. (Note that Figure 8.2 was obtained from an intersection in Califor- nia; it was not part of the Michigan data collection.) There is a well-known problem with background subtrac- tion, in that parked or otherwise stationary vehicles in the early images become part of the background image, and it may take several minutes before this aberration can be removed. In Figure 8.2, the parked white vehicle is part of the background, so when it moves there will be a false foreground object for some time. In the current analysis, this type of error is reduced by the interaction between background subtrac- tion and feature tracking. In general, background subtraction can suffer from problems caused by occlusions, shadows, glare, and other sudden illumination changes, and such aber- rations make background subtraction unsuitable for direct use in tracking applications. Clusters and Cluster tracking Corner features may be detected all over any complex image. To limit the search, only foreground regions are analyzed, speeding up image processing. This is the first part of the inter- action between background subtraction and feature tracking. As illustrated in Figure 8.3, nearby corner features have similar velocities and are grouped into clusters. As the object moves across the image, individual corner features may be lost or new ones detected, but for extended periods of time the cluster may be tracked. The mean position of the cluster may jump slightly as the number of component features changes, but the asso- ciated mean velocity tends to remain stable. Image Processing and Feature Extraction

51 and then remains in the image for some time. The use of clusters, rather than individual corner features, tends to exclude such aberrations, but certainly some erroneous clusters can find their way into the recorded data set. This provides a challenge for postprocessing; however, the clus- ter sets are largely free of corrupt data, which makes the whole approach feasible. The corner feature grouping and tracking algorithm is implemented in a dynamic way, operating frame by frame, thus making it suitable for the real-time processing. In Figure 8.3, grouping into clusters is shown. The algorithm includes a cluster track generation algorithm to connect fragmented corner trajectories into a continuous cluster track. Note that the paths seen in Figures 8.2 and 8.3 are only determined in the camera frame; the human observer is deceived into seeing these as tracks in the world, but in reality the feature height The second part of the interaction is that regions with tracked clusters are not used for updating the background image, so even if a vehicle stops for a long time, provided the clusters continue to be recognized there is no danger that the background will be corrupted by its presence. The key point is that clusters are detected only in foreground regions, and as they are tracked across the image they are excluded from the background; the result is stable and tends to reduce errors in both cluster detection and updating of the background image. Occasional errors do occur. For example, when a shadow or a glare line moves across another line (such as a fence rail) in an image, a resulting moving corner feature may be detected. It normally disappears soon after, but it can happen that the corner becomes confused with a stationary corner feature (e.g., the intersection between a fence rail and a fence post) Figure 8.1. Image processing steps at an individual camera station. Frame buffer From camera Background subtraction Corner feature extraction JPEG compression Compressed video Encoding into cluster tracks Objects Encoding into convex polygons Objects Background Figure 8.2. An example video image (left), a corner feature detection and tracking result (center), and a background subtraction result (right).

52 must be known before that inference can be made. The cluster tracks are not located on a vehicle bounding box; again, the human observer interprets what is not known to the algo- rithms. At this stage in the process, there is no vehicle trajec- tory, only data that can be used to construct one. In summary, a combination of corner feature detection, dynamic grouping and tracking, background estimation, and interaction between these processes provides a robust foundation for extracting essential information from the video stream. Feature extraction and Data representation Cluster tracks are immediately available for export to the site host computer; Table 8.1 shows the format of the data tables. The site refers to the camera used (NE corner in this case), RunId is the data collection run, ProcessId identifies the param- eters used during the feature extraction, FrameTime is GPS time in milliseconds, and ClusterId is the identifier for the tracked cluster. The principal data elements are then (X, Y) coordi- nates of the cluster center, whereas (XVelocity, YVelocity) are the velocities determined by averaging the component fea- ture motions; these last four columns are in units of camera pixels. Additional fields exist, in particular the size and orien- tation of the bounding ellipse (as seen in Figure 8.3) and flags for whether the cluster was actually present and whether the cluster was deemed stable at that time. Note that, during a short occlusion, clusters are allowed to persist by extrapolation based on their velocity in the image. Unlike the cluster tracks, foreground regions (âblobsâ) are image based; certain pixels are filled, others are empty. To turn this pixelated information into something more com- pact, a polygon was fitted to each blob. In general, the blobs are convex in shape (any two points inside can be joined by a straight line that stays within the blob) and therefore a convex hull was determined; this is a minimum sized bounding poly- gon that is also convex. Thus, the blob is represented by a series of coordinate values, together with an index to identify the blob and another index to identify the particular vertex. Blobs can then be recreated in the image by joining consecu- tive vertices for a fixed BlobId (see Table 8.2). Figure 8.4 shows three camera views taken at the same time, with cluster positions overlaid and blobs shown along- side. All except one cluster is attached to a vehicle, the errone- ous case being attributable to glare. Several blobs are also erroneous, and generally the variable quality of blobs makes them suitable for use only with extreme caution. In the next chapter we will see that despite this, retaining blobs for data fusion offers a major advantage for vehicle localization and fusion between cameras. Figure 8.5 displays an image of pedestrian-generated cluster tracks. A significant part of the software development was to take the core image processing routines and encapsulate them in real-time code for implementation in the CDAPS environ- ment. The real-time code can be run in one of four modes: â¢ Full streaming: Image data are taken from the camera frame buffer and processed sequentially. This is the full real-term version of the system and is sensitive to image size and numbers of clusters that need to be processed. â¢ Data streaming: Full-resolution video images stored on a server are used as input to the processor; the processor controls the speed of the image stream, but of course this requires an image library. Apart from image input con- trol, the process is identical to the full streaming mode. â¢ Image capture: In this mode, the system operates as a video image recorder; uncompressed images are stored on a hard Figure 8.3. The clustering algorithm uses corner feature grouping and is sensitive to distances in the image. Left: Feature tracks and clusters on two vehicles (parking lot test at UMTRI). Right: Dispersion and mean velocities are recorded as part of the cluster properties.

53 disk, to be uploaded later to a server, and feed the data streaming process. â¢ Dual mode: In this mode, the camera spools image data to the local hard drive; once sufficient data have been obtained, or the hard drive in nearly full, the system switches to data streaming mode. In this way, the system swaps between waking and sleeping cycles and is just a combination of the previous two modes of operation. The real-time system software was based on the UMTRI DAS (data acquisition system) platform, developed over sev- eral years during previous projects involving real-time data acquisition and control. For this pilot project, with the need for multiple tests using a fixed set of images, it was most con- venient to run in the combined data streaming/image cap- ture modes. Thus, uncompressed images were stored on a server at UMTRI, and the various processes ran from there. All hardware and software is compatible with the computer infrastructure at the intersection. Benchmarking showed that the system could run in full streaming mode if the number of pixels were effectively reduced by decimating by a factor of 2 in the horizontal direction; reducing the number of pix- els speeds up processing. At the time of operation it was not possible to fully run in this mode because images eventually are lost when the rate of image processing falls below the cap- ture rate of 20 Hz. As mentioned above, with accurate transformations defined to map between world and image coordinates, it is possible Table 8.1. Sample Cluster Data Extracted from Image Processing Site RunId ProcessId FrameTime (ms) ClusterId X Y XVelocity YVelocity 0 122 7 242,064,800 374916 157.2 14.2 -12.00012 -3.999996 0 122 7 242,064,800 376791 162.6079 20.00808 -26.37817 -6.5411 0 122 7 242,064,800 380833 412.8333 162.6667 90 60 0 122 7 242,064,800 381780 144.3333 6.333333 -13.33344 0 0 122 7 242,064,800 383261 358 152.2727 87.27295 47.27264 0 122 7 242,064,800 388346 256.5 97 20 11.66672 0 122 7 242,064,800 388407 303 96.4 27.99988 6.666718 0 122 7 242,064,800 394415 386.6 161 84.00024 56.00006 0 122 7 242,064,800 396188 435.6667 179 117.7777 57.77771 0 122 7 242,064,800 396386 394 72 80 13.33328 Table 8.2. Sample of Blob Feature Data Values Site RunId ProcessId FrameTime (ms) BlobId Vertex X Y 0 122 7 242,441,850 13 0 228 103 0 122 7 242,441,850 13 1 228 107 0 122 7 242,441,850 13 2 235 114 0 122 7 242,441,850 13 3 237 115 0 122 7 242,441,850 13 4 249 118 0 122 7 242,441,850 13 5 255 118 0 122 7 242,441,850 13 6 279 116 0 122 7 242,441,850 13 7 283 114 0 122 7 242,441,850 13 8 285 112 0 122 7 242,441,850 13 9 286 110 0 122 7 242,441,850 13 10 287 105 0 122 7 242,441,850 13 11 287 99

54 accurately with other camera views. The challenge then is to fuse the feature data sets in a reliable and automatic way that does the following three actions: â¢ Attaches features (blobs and clusters) to vehicles in a unique way; â¢ Locates the vehicle in 3-D space (i.e., allows a 3-D bound- ing box to be co-located with the vehicle); and â¢ Attaches features to the bounding box and thus estimates their heights, with sufficient feature numbers to track the vehicle all the way through the intersection. In the case of Figure 8.6, clearly only the exit leg is popu- lated with features; to do more than this requires data fusion between the four cameras. effects of environmental Conditions The Site Observer was tested during late winter and early spring, including test periods of light snow and low sun angles. The presence of glare can be seen in Figures 8.4 and 8.5, including corruption to blobs. Glare was not found to gener- ate cluster tracks, although occasionally, as mentioned, the intersection of a moving shadow with a background object such as a fence can give the illusion of a moving feature. How- ever, clusters are rarely created from such stray effects, and when they are, the cluster-triggering process (see Chapter 9) to take features recorded on the camera and transform them into a world view. Figure 8.6 shows an example; on the left, five cluster tracks obtained from the SW camera location are mapped onto a world view. In this case there was no prior information about the height of the clusters on the vehicle and an assumed height of 0.5 m was used. At the time of the image, only three of the clusters were in existence, and these are marked as blue crosses. The blob for this vehicle is par- ticularly well behaved, giving hope that the vehicle location can be precisely identified, assuming it can be combined Figure 8.4. Clusters and blobs, including effects of glare and erroneous features. Figure 8.5. Image detail showing pedestrian-generated cluster tracks.

55 to twisting of the signal mast due to wind load on the mast arm. Rotations of as much as 1Â° were seen, and these tended to be at sufficiently low frequency not to disturb the feature tracking. Other parts of the trajectory estimation (Kalman filtering, see Chapter 10) ensure that these perturbations will have minimal effect on trajectories. On the other hand, loose or highly flex- ible camera mounts causing large or high frequency vibrations of the camera could not be tolerated. Improved mounts or camera-level shake removal hardware and software is needed in such cases. At night, with the current camera hardware, there is no expectation that satisfactory tracking can be achieved; no spe- cific tests were performed. With street lighting, CMOS cam- eras and customized control of iris and shutter, it is possible that features other than headlights and tail lights can be found and adequately tracked; a lot depends on the conditions at any particular site. If the camera can display a sufficiently rich set of corner features to obtain at least two or three clusters per vehicle, and there is adequate contrast between foreground and background, tracking may take place using the algorithms develop in this project. The installed intersection hardware was located in cabinets that were heated or cooled as appropriate. It is not expected that any normal variations in air temperature or solar heating will cause the system to fail during operation. means that such stray phenomena cannot result in the detec- tion of a false vehicle trajectory. Shadows typically add to uncertainty in the location of vehi- cle boundaries, and the effects of shadows are included in the analysis of the following chapters. Again, the use of multiple cameras and 3-D projections tends to reduce the effects of shadows, but certainly their effects are not completely removed. There was found to be no systematic influence arising from adverse weather conditions such as in light rain or snow; these conditions generate random patterns of corruption at the pixel level, and provided the resulting noise levels are not over- whelming, no effect is seen. On the other hand, in dense rain, snow, dust, or fog, it would not be possible to track vehicles. Because of the limited range of weather conditions under which data were captured, no particular benchmarking was possible for such effects. However, it is clear that when a human observer finds it difficult to see features on vehicles in captured images, the automated system likewise is challenged. It is expected that with deteriorating weather conditions, the number of clusters detected will reduce until the point that vehicle trajectories become incomplete, an error condition that is easily detected. Under extreme conditions, such as blizzard or dense fog, a human observer may need to recognize that the system is blind. High winds can cause cameras to shake, although the loca- tion of the installed cameras meant that the effects were limited -30 -20 -10 0 10 20 0 10 20 30 40 50 650 700 750 800 850 900 Figure 8.6. World view and camera view of cluster tracks on a single vehicle (through vehicle traveling north).

Next: Chapter 9 - Vehicle Localization and Trajectory Estimation »

Site-Based Video System Design and Development (2012)

Chapter: Chapter 8 - Image Processing and Feature Extraction

Welcome to OpenBook!

Get Email Updates