Skip to main content

Currently Skimming:

Session IV: Networked Worlds
Data Mining in Social Networks
Pages 287-302

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 287...
... SESSION IV Networked WorIds
From page 289...
... Introduction Recent research projects in two closely related areas of computer science—machine learning and data mining—have developed methods for eons cructing statistical models of network data. Examples of such data include social networks, networks of web pages, complex relational databases, and data on interrelated people, places, things, and events extracted from text documents.
From page 290...
... (Muggleton 1992 Dzeroski and Lavrac 200 ~ ~ and social network analysis (Wasserman and Faust ~ 994~. For example, we have employed relational probability trees (RPTs)
From page 291...
... While predicting link existence and classifying subgraphs are extremely interesting problems, the techniques learning DYNAMIC SOCIAL NETWORK MODELING ED CYSTS 291
From page 292...
... Querying and Learning To address learning tasks of this kind, our research group is constructing PROXIMITY a system for machine learning and data mining in relational data. The system is designed as a framework within which a variety of analysis tools can be used in combination.
From page 293...
... The two components work in concert. The query language is used to extract subgraphs from a large network of data; the RPT algorithm is used to learn a model that estimates a conditional probability distribution for the value of an attribute of a class of objects or links represented in all those subgraphs.
From page 294...
... Leaf nodes in Figure 5 shows the number of movie subgraphs of each class that reach the leaf, as well as their respective probabilities. The leftmost pair of numbers indicate the number and probability of movies with opening weekend box office receipts exceeding $2 million (receipts = True)
From page 295...
... To enable truly effective data mining, analysts must be able to change the schema easily, and thus reconceptualize the domain (Jensen & Neville 2002b; Neville & Jensen 2002~. Design Choices: Data, Tasks, and Models Techniques for relational learning can be better understood by examining them in the context of a set of design choices and statistical issues.
From page 296...
... Task Level of relational dependence The most commonly used modeling techniques from machine learning, data mining, and statistics analyze independent attribute vectors, thus assuming that relational dependencies are unimportant, or at least beyond the scope of analysis. Specialized techniques for spatial and temporal data have been developed that assume a highly regular type of relational dependence.
From page 297...
... Search over model structures—The RPT learning algorithm searches over a wide range of possible structures for the tree and for the attributes included in the tree. In contrast, some approaches to relational learning, including first-order Bayesian networks, PROXIMITY'S own relational Bayesian cIassifer, and other techniques in social network analysis only learn the parameters for a model with fixed structure and attributes.
From page 298...
... To date, PROXIMITY does not employ any explicit form of background knowledge in its learning algorithms. Statistical Issues Our recent work on relational learning has concentrated on the unique challenges of learning probabilistic models in relational data.
From page 299...
... We have found similar degree disparity in other data sets. For example, the number of owners differs systematically among publicly traded companies in different industries and the number of hypertinks differs systematically among different classes of web pages at university web sites.
From page 300...
... ~ _ `1_ _ ~ .1 _ rid -rid a ~ ~ -—=~ ~ Coin of these effects show the problems associated with violating the assumption of independence among data instances that underlies so many of the techniques common to machine learning, data mining, and statistical modeling techniques. These results imply that new approaches are necessary to extend current techniques for data mining to relational data.
From page 301...
... . The effect of degree disparity on feature selection in relational learning.
From page 302...
... Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.