The National Academies Press

Currently Skimming:

4 Temporal Data and Real-Time Algorithms
Pages 58-65

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 58... ... It illuminates the challenges of dynamic data, and it will also touch on the hardware infrastructure required for storing and processing temporal data. An example of the changes wrought by time upon massive data sets for human-generated data is the "click-through rate" estimation problem in online advertising systems. Read the entire page →
From page 59... ... The data from the distributed sources must generally be collected into one or more data analysis centers using a real-time, reliable data feeds management system. Such systems use logging to ensure that all data get delivered, triggers to ensure timely data delivery and ingestion, and intelligent scheduling for efficient processing. Read the entire page →
From page 60... ... Paxos is actually a family of protocols for determining consensus in a network of unreliable processors; consensus is the process of agreeing on the result among a group of computing units, which is difficult when the units or their communication medium experience temporal failures. However, the Paxos family of algorithms was designed for maintaining consistency in small- to medium-scale distributed data warehousing systems, and scaling Paxos-based and other consistency preserving storage 1 For an initial keynote paper that suggests a formal treatment of stream consistency, see Golab and Johnson (2011) Read the entire page →
From page 61... ... Despite the fact that the Paxos algorithm, which was invented more than 20 years ago, is well understood and analyzed, a 1-year effort by one of the world's experts in the field of distributed processing was still necessary to implement the algorithm on Google's cluster system at a speed that will sustain the required transaction rate as well as survive a burst of failures.2 DATA PROCESSING, REPRESENTATION, AND INFERENCE The next stage in time-aware data analysis includes building an abstract representation of the data and then using it for inference. Methods for abstract data representation include coding and sketching. Read the entire page →
From page 62... ... Going past the representation phase, which can be the sole stage of a real-time system, the core of many temporal data streams is a learning and inference engine. There has been an immense amount of work on online algorithms that are naturally suitable for time-aware systems.5 Most online algorithms impose constant or at least sublinear memory assumptions, similar to data-streams algorithms. Read the entire page →
From page 63... ... This fusion poses significant challenges because state-of-the-art learning algorithms are not designed to cope with partial summaries and snapshots of temporal data. SYSTEM AND HARDWARE FOR TEMPORAL DATA SETS The discussion thus far has focused on software, analysis, and algorithmic issues and challenges that are common to massive temporal data. Read the entire page →
From page 64... ... The current algorithms for updating net work metrics permit efficient calculation only for certain network structures. • Streaming and sketching algorithms that leverage new architec tures, such as flash memory and terascale storage devices. Read the entire page →
From page 65... ... 1990. Reliable broadband communication using burst erasure error correcting code. Read the entire page →

From page 58...

... It illuminates the challenges of dynamic data, and it will also touch on the hardware infrastructure required for storing and processing temporal data. An example of the changes wrought by time upon massive data sets for human-generated data is the "click-through rate" estimation problem in online advertising systems.

Read the entire page →

From page 59...

... The data from the distributed sources must generally be collected into one or more data analysis centers using a real-time, reliable data feeds management system. Such systems use logging to ensure that all data get delivered, triggers to ensure timely data delivery and ingestion, and intelligent scheduling for efficient processing.

Read the entire page →

From page 60...

... Paxos is actually a family of protocols for determining consensus in a network of unreliable processors; consensus is the process of agreeing on the result among a group of computing units, which is difficult when the units or their communication medium experience temporal failures. However, the Paxos family of algorithms was designed for maintaining consistency in small- to medium-scale distributed data warehousing systems, and scaling Paxos-based and other consistency preserving storage 1 For an initial keynote paper that suggests a formal treatment of stream consistency, see Golab and Johnson (2011)

Read the entire page →

From page 61...

... Despite the fact that the Paxos algorithm, which was invented more than 20 years ago, is well understood and analyzed, a 1-year effort by one of the world's experts in the field of distributed processing was still necessary to implement the algorithm on Google's cluster system at a speed that will sustain the required transaction rate as well as survive a burst of failures.2 DATA PROCESSING, REPRESENTATION, AND INFERENCE The next stage in time-aware data analysis includes building an abstract representation of the data and then using it for inference. Methods for abstract data representation include coding and sketching.

Read the entire page →

From page 62...

... Going past the representation phase, which can be the sole stage of a real-time system, the core of many temporal data streams is a learning and inference engine. There has been an immense amount of work on online algorithms that are naturally suitable for time-aware systems.5 Most online algorithms impose constant or at least sublinear memory assumptions, similar to data-streams algorithms.

Read the entire page →

From page 63...

... This fusion poses significant challenges because state-of-the-art learning algorithms are not designed to cope with partial summaries and snapshots of temporal data. SYSTEM AND HARDWARE FOR TEMPORAL DATA SETS The discussion thus far has focused on software, analysis, and algorithmic issues and challenges that are common to massive temporal data.

Read the entire page →

From page 64...

... The current algorithms for updating net work metrics permit efficient calculation only for certain network structures. • Streaming and sketching algorithms that leverage new architec tures, such as flash memory and terascale storage devices.

Read the entire page →

From page 65...

... 1990. Reliable broadband communication using burst erasure error correcting code.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

4 Temporal Data and Real-Time Algorithms Pages 58-65

4 Temporal Data and Real-Time Algorithms
Pages 58-65