Skip to main content

Currently Skimming:

Panel Discussion
Pages 187-202

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 187...
... Or as Peter Huber put it last night, How does your data grow? The third question: A lot of computer-intensive techniques like Monte Carlo Markov chain fMCMC]
From page 188...
... But sufficiency is dependent on the model in the usual sort of theory. Or you can ask the question, Can we define a notion or an analog to sufficiency that does not depend on a specific model as a means of compressing data without loss of information?
From page 189...
... The idea of the specific phenomenon is the key, because one of the major complaints about statistical theory as formulated and taught in the textbooks is that it is theory about procedures. It is divorced from the specific phenomenon.
From page 190...
... Some of the ideas we are working on are to put layered networks into a more statistical framework, or to see them as generalizations of things like factor analysis, layers of factor analysis of various kinds. We also rediscovered mixture models, but in a much broader framework than they are commonly used in statistics.
From page 191...
... Then as you are going through data sets and making models, you are effectively doing a generalized kind of clustering, and you can think of these as mixture models, if you like that framework. I would like to have a lot of parallel processors, each one containing the developing model fitting the data sets, and by the time I got through with my massive data set, I would have obtained a much smaller set of pararneterized models that I would then want to go back through again and validate, and so on.
From page 192...
... The fourth category is existence arguments. They are very popular in medicine, the area I work in, where basically a case series allows you to say, yes, it is possible to do coronary artery bypass grafts on people with Class IV congestive heart failure without killing three-quarters of them.
From page 193...
... One thing that you do with hierarchical modeling, whether it is by partitioning or viewed in the most general sense, is bring a massive data set problem down to a situation where in some local sense, you have maybe a hundred data points per parameter, just as you have many leaves of a tree tmodel] at which there is one parameter win a hundred data points sitting underneath them.
From page 194...
... I might have standard errors for them, but I really do not know exactly what they mean, but they fit the data pretty well. Then we can have a database model, where we have domain variables and predictor variables.
From page 195...
... There was a comment before that Fisher statistics were very data-based, and that was echoed In Peter Huber's comment. I do not see that at all, and so I am challenging the notion of oscillation a little bit.
From page 196...
... After you fit these models with maximum likelihood, you try to classify with them, and you get pretty poor results. But nonetheless, it is still within a clean statistical framework, and they do a lot of model validation and merging and parameter tests and so on.
From page 197...
... Carr: We are talking about these complex hierarchical models. Yes, maybe that is the only thing that is going to work on some problems, but I would like to think of models as running the range from a scalpel to a club.
From page 198...
... In some sense, Monte CarIo Markov chains and hierarchical models go together quite beautifully. Madigan: Can they coexist with massive data sets?
From page 199...
... Hodges: This is partly in response to Luke Tierney. If you hearken back to a little paper in The American Statistician by Efron and Gong in 1983, and to some work that David Freedman did in the census undercount controversy, you can use bootstrapping to simulate the outcome of model selection applied to a particular data set.
From page 200...
... Lewis: One simple answer to that is that the people in physics write their own software to handle their gigantic data sets. The question is, Are we going to require that everybody go out and write their own software if they have a large data set, or are we going to produce software and analytic tools that let people do that without becoming computer scientists arid programmers, which would seem like a real waste of time?
From page 201...
... Tierney: Sufficiency in some ways has always struck me as a little bit of a peculiar concept, looking for a free lunch, being able to compress with no loss. Most non-statisticians who use the term data reduction do not expect a free lunch.
From page 202...
... That turns a lot of the trade-offs that we use for small data sets on their head, like computing cost versus statistical efficiency and summarization. For example, if you want to compute the most efficient estimate for a Gaussian random field for a huge data set, it would take you centuries.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.