Skip to main content

Currently Skimming:

First-Person Computational Vision - Kristen Grauman
Pages 17-24

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 17...
... While both cases represent similar situations -- and indeed the same physical environment -- the latter highlights the striking difference in capturing the visual experience from the point of view of the camera wearer. This distinction has intriguing implications for computer vision research -- the realm of artificial intelligence and machine learning that aims to automate visual intelligence so that computers can "understand" the semantics and geometry embedded in images and video.
From page 18...
... These questions lead to applications in personal video summarization, sharing first-person experiences, and in situ attention analysis. Throughout these two research threads, our work is driven by the notion that the camera wearer is an active participant in the visual observations received.
From page 19...
... However, today's best computer vision algorithms, particularly those tackling recognition tasks, are deprived of this link, learning solely from batches of images downloaded from the Web and labeled by human annotators. We argue that such "disembodied" image collections, though clearly valuable when collected at scale, deprive feature learning methods from the informative physical context of the original visual experience (Figure 1)
From page 20...
... In this way, ego-motion serves as side information to regularize the features learned, which we show facilitates category learning when labeled examples are scarce. We demonstrate the impact for recognition, including a scenario where features learned from "ego-video" on an autonomous car substantially improve large-scale scene recognition.
From page 21...
... In particular, the three functions of control, per-view recognition, and evidence fusion are simultaneously addressed in a single learning objective. Results so far show that this significantly improves the capacity to recognize a scene by instructing the egocentric camera where to point next, and to recognize an object manipulated by a robot arm by determining how to turn the object in its grasp to get the sequence of most informative views (Figure 3)
From page 22...
... We are developing methods to generate visual synopses from egocentric video. Leveraging cues about ego attention and interactions to infer a storyline, the proposed methods automatically detect the highlights in long source videos.
From page 23...
... , • identifying which egocentric video frames passively captured with the wearable camera look as if they could be intentionally taken photographs (i.e., if the camera wearer were instead actively controlling a camera) (Xiong and Grauman 2015)
From page 24...
... 2015. Intentional photos from an unintentional photographer: Detecting snap points in egocentric video with a web photo prior.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.