Skip to main content

Currently Skimming:

Massive Data Sets: Guidelines and Practical Experience from Health Care
Pages 51-68

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 51...
... and Healthcare Design Systems.
From page 52...
... More ambitious data collecting is underway in selected locations, through the systematic abstraction of supplementary clinical measures of patient health from medical records, through recording of additional patient characteristics, or through recording of more detailed financial information. In principal the entire medical record is available.
From page 53...
... Legislation, in New York and New Jersey and other states, has turned to mandating minimum levels of hospital care, notably for mothers delivering their babies. The analysis of health care data, indeed, the massive analysis of massive health care data sets, has a central role in setting health care policy anti in steering healthcare reform, through its influence on the actual delivery of heaTthcare, one hospital and one health maintenance organization at a time.
From page 54...
... The next section gives a few details of COREPLUS and SAFs, two systems for outcomes analysis and resource modeling, including risk adjustment, developecl in part by the author at Healthcare Design Systems. The very large number of different questions in health care, and the specificity of those questions to individual providers, payers, regulators, and to patients, are compelling reasons to do massive analysis of the data.
From page 55...
... SAFs, for Severity Adjustment Factor computation, is a system for modeling resource consumption, including cost and length of stay. Both systems have been developed through a collaborative effort between clinical experts, heaTthcare information processing staff, statisticians, management, and marketing.
From page 56...
... 1 - Current Year Hospital\ Data/Rate \ 2 - Prior Years Hospital Data/Rates ~ 3 - Rate Legend 4 - Hospital Specific Peer Group Name (Hospital List in Appendix) 5 - Mean Rate of 10 lowest_ Hospitals having significant data for the study topic (includes Hospitals with O rates)
From page 57...
... determination of a set of candidate predictor variables based on clinical expertise supported by the data analysis, (5) predictive model building by a combination of hierarchical and variable selection methods, (6)
From page 58...
... it- . ..0~ l 25 MINSKY ~ ~ _ ~ Breech presentation-delivered High head at term-delivered Transverse/oblique lie-delivered Highrisk deliveries Placenta previa-delivered Secondary uterine inertia-delivered Facehrow presentation-delivered Multiple gestation malpresent-delivered Fetal disproportion nos-delivered Fetal distress-delivered Primary uterine inertia-delivered Genital herpes nos Placenta previa with hemmorhage-delivered Persistent occipitopostenor-delivered Other viral disease~elivered Uterine inertia nec-delivered Pneumonia as secondary diagnosis Excess fetal growth-delivered Severe prectampsia-delivered Polyhydramnios-delivered 652.21 652.51 652.31 641.01 661.11 652.41 652.61 653.51 656.31 661.01 054.10 641.11 660.31 647.61 661.21 656.61 642.51 657.01 14 - Separate model exists I ~= for high risk cases; defining variables and variable list are in Appendix C
From page 59...
... 3 Presentation of Data Analysis Massive analysis of massive health care data finds consumers at all levels, from federal government to state government, from payers to health systems, from hospitals to clinics, to physicians and to patients. The needs may differ in detail, but the overall strategy is clear: To provide multi-faceted insight into the delivery of health care.
From page 60...
... There is no absolute dividing line between reports generated directly from the massive parent data, and reports generated from software at distributed locations. What is clear, however, is that the analysis of massive data is a specialized undertaking, and that exceptional computational resources, in terms of hardware, software, and personnel, as well as clinical and statistical expertise, must accumulate at those central locations.
From page 61...
... center, perhaps complementing the summary statistics already in the provider data base with fully uptodate data, or summary data for a different category of patients. Third, in generating a comparative report; for a provider with some additional fields beyond the standard set, the TP center might access comparable data using its priviledged access to other providers databases.
From page 62...
... · . ~ J 7 1 strategy IS natural given the pervasive subject matter knowledge: patients are naturally divided by major diagnostic category (MDC)
From page 63...
... Although these systems provide a consistent interface to statistical functions, a programming language, graphics, an interface to operating system tools, and even a programmable graphical user interface, each has limitations. S has the more powerful and flexible environment, but SAS programming expertise is easier to find, SAS jobs are more likely to plug away until complete inevitably massive data analyses are left to run "in batch" overnight and over the weekend, and it is better to pay a performance penalty than to risk non-completion.
From page 64...
... The basic idea is to create an ascii 'state file' containing current state information to each outcome measure, or task, that can be read and written by different job schedulers. When a job scheduler is active on a task, the file is locked.
From page 65...
... ; print "$o state $s"; &write($f,$s) ; Figure 3 A per} script for job scheduling in a distributed computational environment.
From page 66...
... A per! script can run with little change on many different types of platform; communication using ASCIT files containing state information separately for each task is highly robust.
From page 67...
... Or, one task may depend on the completion of several other tasks, which can be implemented using an initial state and associated command that checks the states of the respective tasks in the files, possibly coupled with command line arguments that incorporate sublists of such tasks.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.