Skip to main content

Currently Skimming:

3. Geospatial Databases and Data Mining
Pages 47-72

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 47...
... Despite the importance and proliferation of geospatial data, most research in data mining has focused on transactional or documentary data. iFrom a white paper, "Data Mining Techniques for Geospatial Applications," prepared for the committee's workshop by Dimitrios Gunopulos.
From page 48...
... TECHNOLOGIES AND TRENDS This section outlines key developments in database management systems and data mining technologies as they relate to geospatial data. Database Management Systems The ubiquity and longevity of the relational database architecture are due largely to its solid theoretical foundation, the declarative nature of the query processing language, and its ability to truly separate the structure of the data from the software applications that manipulate them.
From page 49...
... Geospatial Data Mining Tasks The goal of data mining5 is to reveal some type of interesting structure in the target data. This might be a pattern that designates some type of regularity or deviation from randomness, such as the daily or yearly temperature cycle at a given location.
From page 50...
... Geospatial data mining is a subfield of data mining concerned with the discovery of patterns in geospatial databases. Applying traditional data mining techniques to geospatial data can result in patterns that are biased or that do not fit the data well.7 Chawla et al.
From page 51...
... , resulting in increased running times or poor-quality clusters.~° For this reason, recent research has centered on the development of clustering methods for large, highly dimensioned data sets, particularly techniques that execute in linear time as a function of input size or that require only one or two passes through the data. Recently developed spatial clustering methods that seem particularly appropriate for geospatial data include partitioning, hierarchical, density-based, grid-based, and cluster-based analysis.
From page 52...
... Research to establish firm methodologies for when and how to perform data mining will be needed before this new technology can become mainstream for geospatial applications. The development of geospatial-specific data mining tasks and techniques will be increasingly important to help people analyze and interpret the vast amount of geospatial data being captured.
From page 53...
... The final key problem is integrating geospatial data from heterogeneous sources into one coherent data set. Moving and Evolving Objects Objects in the real world move and evolve over time.
From page 55...
... A second approach is based on the constraint paradigm. DEDALE, one example of a constraint database system for geospatial data proposed by the Chorochronos Participants i5From a white paper, "The Opportunities and Challenges of Location Information Management," prepared for the committee's workshop by Ouri Wolfson.
From page 56...
... Query languages also will need to be extended to provide high-level access to the new geospatial data types. It is important to develop consistent algebraic representations for moving and evolving objects and to use them for querying geospatial databases.
From page 58...
... Such research would inform and expand the notions of geospatial ontologies and increase their usefulness. Geospatial Data Integration The purpose of data integration is to combine data from heterogeneous, multidisciplinary sources into one coherent data set.~9 The sources of the data typically employ different resolutions, measurement techniques, coordinate systems, spatial or temporal scales, and semantics.
From page 59...
... A key issue for spatial data integration is developing a formal method that bridges disparate ontologies by using, for example, spatial association properties to relate categories from different ontologies to make such knowledge explicit in forms that would be useful to other disciplines. Long-term research is required to create new data models and languages specifically designed to support heterogeneous spatiotemporal data sets (see Box 3.3 for a sample application)
From page 61...
... Handling different kinds of imprecision and uncertainty is an important research topic that must be addressed for geospatial databases. Most important, for data integration in particular, different data sets may be described with different types of inaccuracy and imprecision, which seriously impedes information integration.
From page 62...
... are often more important to the users of geospatial applications. Ultimately, problems like those encoun24The committee thanks Lars Arge of Duke University for his white paper, from which this section was adapted.
From page 63...
... Memory-Aware Algorithms Although the availability of massive geospatial data sets and of small but computationally powerful devices increases the potential of geospatial applications, it also exposes scalability problems with existing algorithms. One source of such problems is that most algorithm research has been done under models of computation in which each memory access costs one unit of time regardless of where the access takes place.
From page 64...
... Further research in the area of I/O-efficient and cache-oblivious algorithms can significantly improve the usability of geospatial data by allowing complicated problems on massive data sets to be solved efficiently. Kinetic Data Structures With the rapid advances in positioning technologies (such as the Global Positioning System and wireless communication)
From page 65...
... Should the interarrival times of fires be fitted to a Poisson model or something else? Because the assumptions required for the classical stochastic representations (such as Gaussian distributions and Poisson processes)
From page 66...
... See, for example, Stoyan and Stoyan (1994~. For further discussion on the use of fractal models for geospatial data, Hastings and Sugihara (1993)
From page 67...
... In general, software agents should be able to automatically locate spatiotemporal data sets; process models and data mining algorithms; identify appropriate fits; perform conversions when necessary; apply the models and algorithms; and report the resulting patterns (e.g., correlations, regularities, and outliers)
From page 68...
... All the methods look for linear correlations across attributes, however, and will not work for nonlinear correlations.30 Research is needed on scalable, robust, nonlinear methods for reducing dimensionality. Mining Data When Objects Move or Evolve lust as moving and evolving objects pose problems for geospatial data models, they also pose problems for geospatial data mining.
From page 69...
... A more complicated example is a vehicle management application, which integrates data sets containing information on weather, special events, and traffic conditions. How do typical data mining algorithms work in this type of scenario?
From page 71...
... 1998. "Kinetic Data Structures A State of the Art Report," in P.K.
From page 72...
... 2001. Geographic Data Mining and Knowledge Discovery.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.