National Academies Press: OpenBook
« Previous: 2 Research Methodology and Bilingual Education
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

3
The Longitudinal Study

OVERVIEW

In 1983 the Department of Education began a major multiyear study that came to be known as the “National Longitudinal Study of the Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students” (hereafter referred to as the Longitudinal Study). The study was commissioned in response to concerns about the lack of a solid research base on the effectiveness of different approaches to the education of language-minority (LM) students. The study was a direct response to a call from Congress, in the 1978 Amendments to the Elementary and Secondary Education Act, for a longitudinal study to measure the effectiveness of different approaches to educating students from minority language backgrounds.

Although the ultimate goal of the Longitudinal Study was to provide evaluations to inform policy choices, the Department of Education determined that an evaluation study required a firmer information base about the range of existing services. The study consisted of two phases. The first phase was descriptive of the range of services provided to language-minority limited-English-proficient (LM-LEP) students in the United States and was used to estimate the number of children in kindergarten through sixth grade (grades K-6) receiving special language-related services. The second phase was a 3-year longitudinal study to evaluate the effectiveness of different types of educational services provided to LM-LEP students. The Longitudinal Study itself consisted of two components, a baseline survey and a series of follow-up studies in the subsequent 2 years.

The study began late in 1982, and data collection for the descriptive phase occurred during the fall of 1983. The prime contractor was Development As-

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

sociates, Inc. A subcontractor, Research Triangle Institute (RTI), assisted with survey design and sampling. The research design for the longitudinal phase was developed during the spring of 1983. Baseline data collection for the longitudinal phase occurred in the fall of 1984. Additional data were collected in the springs of 1985, 1986, and 1987. The original contract did not cover data analysis, and a separate contract for data analysis was issued in 1988 to RTI, the subcontractor on the original contract.

As a basis for comparing and contrasting the panel's analysis of the Longitudinal Study, we present first the summary prepared by the U.S. Department of Education (1991).

The National Longitudinal Evaluation of the Effectiveness of Services for Language Minority, Limited English-Proficient (LEP) Students

A joint initiative by OBEMLA and the Office of Planning, Budget and Evaluation from 1982 to December 1989, this study examined the effectiveness of instructional services in relation to particular individual, home and school/district characteristics. The Department is planning to contract with the National Academy of Sciences to undertake a review of the quality and appropriateness of the methodologies employed both for data collection and analysis of the very rich database. Findings from the Descriptive Phase (1984–1987) include:

  • The need for LEP services is not evenly distributed geographically across states and districts. Almost 70 percent of all LEP students resided in California, 20 percent in Texas, and 11 percent in New York.

  • LEP students were found to be more disadvantaged economically than other students. Ninety-one percent of LEP students were eligible for free or reduced-price lunches compared to 47 percent of all students in the same schools.

  • LEP students were found to be at-risk academically, performing below grade level in native-language skills as well as in English and other subjects, as early as first grade. However, mathematics skills are reported to be generally superior to language skills in either language.

  • Most instruction of LEPs is provided in English, or a combination of English and the native language.

  • There were significant problems with district and school procedures for entry and exit:

  • Almost half of the schools countered [sic] district policy and reported using one criterion for program entry.

    The entry criteria that were used were of the less rigorous variety, such as staff judgment or oral language tests versus the required use of English reading/writing tests.

  • Schools with relatively small enrollments of LEP students (under 50) mainstreamed an average of 61 percent of LEP students, compared with 14 to 20 percent of LEP students mainstreamed in schools with relatively large LEP enrollments.

  • Eighty-two percent of districts placed no time limit on continued participation in the bilingual program.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
  • Instructional staff persons who speak and understand languages other than Spanish are rare. While 78 percent of LEP students were Spanish-speaking, 64 percent of schools with LEP students had more than one foreign language represented; the mean was 3.5 languages per school.

The results from the Longitudinal Study were disappointing to those interested in policy, and their interpretation remained controversial. First, the data did suggest correlations between policy-relevant variables and educational outcomes of policy interest; however, attribution of causation from the reported analyses is extremely problematic. Because the study was based on a sample survey, it does not provide the basis for inferring that differences in outcomes are due to differences in services provided, nor does it provide a warrant for inferences about the impact of proposed changes in policy. Second, despite the effort expended to develop a longitudinal database, only single-year analyses were performed.

The failure of a study of such magnitude to produce results even approximating those anticipated is understandably a cause for concern. The need remains for an information base to guide policy on bilingual education. This chapter addresses four issues that arise from the study and its disappointing outcomes:

  1. What information was obtained as a result of the descriptive and longitudinal phases of the Longitudinal Study?

  2. What were the reasons for the failure of the study to achieve its major objectives, and to what extent could the problems have been prevented?

  3. Might useful information be obtained from further analyses of existing data?

  4. How should the outcome of this study affect the design and implementation of future studies of this nature?

Following the time line presented below the remainder of this chapter is divided into four sections. The first two cover the descriptive and longitudinal phases, respectively. An overview of the study design, analysis methods, and results is provided for each phase, followed by the panel's critique. The third discusses the prospects for further analyses of study data, while the fourth discusses the implications of the Longitudinal Study for the conduct of future observational studies by the Department of Education.

1968

Bilingual Education Act passed.

1974

ESEA Title VII expanded Lau v. Nichols decision. School districts must give special services to LM-LEP students.

September 1982

REP for the Longitudinal Study.

Late 1982

Longitudinal Study begins.

Spring 1983

Pilot testing of forms for descriptive phase.

Fall 1983

Data collection for descriptive phase.

Spring 1984

Longitudinal Study Phase of the National Evaluation of Services for Language-Minority Limited-English-Proficient Students: Overview of Research Design Plans for, report by Development Associates, Inc. Describes

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

 

plans for analyzing descriptive phase data.

Fall 1984

Initial data collection for longitudinal phase.

December 1984

The Descriptive Phase Report of the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Development Associates, Inc. and Research Triangle Institute. Reports final results of descriptive phase.

Spring 1985

Second data collection in year one.

Spring 1986

Year two data collection.

June 1986

Development Associates, Inc., Year 1 Report of the Longitudinal Phase.

Spring 1987

Year three data collection.

May 1988

Request for proposal issued for data analysis for the longitudinal phase

February 1989

Descriptive Report: Analysis and Reporting of Data from the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Considers which original study objectives could be addressed by study data.

April 1989

Analysis Plan: Analysis and Reporting of Data from the National Longitudinal Evaluation of the Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Describes plans for analyzing longitudinal phase data.

1991

Effectiveness of Services for Language-Minority Limited-English-Proficient Students, report by Research Triangle Institute. Reports final results of longitudinal phase.

THE DESCRIPTIVE PHASE

Objectives

The descriptive phase of the Longitudinal Study had nine objectives:

  1. To identify and describe services provided to LM-LEP students in Grades K-6;

  2. To determine the sources of funding for the services provided;

  3. To estimate the number of LM-LEP students provided special language related services in Grades K-6;

  4. To describe the characteristics of students provided instructional services for LM-LEPs;

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
  1. To identify and describe home and community characteristics associated with each major language group;

  2. To determine the entry/exit criteria used by schools and school districts serving LM-LEP students;

  3. To determine the relationship between services offered for LM-LEP students and services offered to students in adjoining mainstream classrooms;

  4. To identify clusters of instructional services provided to LM-LEP students in Grades K-6; and

  5. To obtain information useful in designing a longitudinal evaluation of the differential effectiveness of the identified clusters of services provided to LM-LEP students.

The first eight objectives are concerned with characterizing the population of LM-LEP students in elementary grades in U.S. public schools and describing the range and nature of special services provided to them. The ninth objective was to provide information to inform the design of the subsequent longitudinal phase of the study.

Study Design and Data Collection

The descriptive study was designed as a four-stage stratified probability sample. First-stage units were states; second-stage units were school districts, counties, or clusters of neighboring districts or counties; third-stage units were schools; and fourth-stage units were teachers and students.

The target population of students consisted of elementary-age LM-LEP students receiving special language-related services from any source of funding. The study used local definitions of the term “language-minority limited-English-proficient” whenever available. Thus, the criteria for classifying students as LM-LEP varied from site to site, and the term “LM-LEP student” used in reporting study results refers to a student classified locally as LM-LEP, not to any defined level of English proficiency. This variation in classification criteria affects interpretation of results. Appendix A includes information on the identification of LEP students, by state.

Ten states (those with at least 2 percent of the national estimated LM-LEP population) were included in the sample with certainty, and an additional 10 states were selected as a stratified random sample of the remaining states, with selection probability proportional to estimated size of the elementary-grade LM-LEP population in the state. The state of Pennsylvania was subsequently dropped because of the refusal of the Philadelphia school district to participate. School districts were stratified according to the estimated LM-LEP enrollment in their respective states, then sampled within strata with probability proportional to the estimated LM-LEP enrollment. Schools were selected with a probability proportional to the estimated LM-LEP enrollment.

Teachers and students were sampled only from schools with at least 12 LM-LEP enrollments in grades 1 or 3. All academic content teachers who taught

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

LM-LEP students in selected schools in grades 1 through 5 were selected for inclusion in the sample. A stratified random subsample of schools was selected for the student sample. Up to five first graders and five third graders were randomly selected from each school. Of the five students in each grade, two were from the predominant language-minority group at the school and three were from other language-minority groups if such students existed; otherwise students from the predominant language-minority group were substituted.

Site visits were made to districts with large LM-LEP enrollments, and mail or telephone interviews were used in the remaining districts. In visited districts, site visits were made to schools with moderate to large LM-LEP enrollments, and mail or telephone interviews were administered in the remaining schools. Teachers completed a self-administered questionnaire. Student-level data consisted of a questionnaire completed by teachers who taught the student and a questionnaire filled out by field staff from student records. A planning questionnaire provided data from school personnel for planning the longitudinal phase.

Analysis Methods

The Descriptive Phase Report (Development Associates, 1984a) did not document the nature and full extent of missing data. The overall response rate on each of the major study instruments was at least 81 percent. For LM-LEP students, the combined school and student response rate within schools was 87.2 percent. The student sample was drawn from a sample of 187 of the 335 schools from which teacher-level data were obtained. (These teacher-level data were obtained from 98 percent of the schools selected.) Of the 187 schools, 176 permitted a sample of students to be drawn (94.1 percent). Within these 176 schools, a student background questionnaire was completed by 1,665 LM-LEP students of 1,779 students selected (92.6 percent). Teacher data were obtained for 95.8 percent of those 1,665 students. However, no information is given on the extent of item nonresponse, that is, missing information for individual questions. Missing data were handled by excluding cases of item nonresponse from tabulations of single items. The report notes that this approach assumes that respondents and nonrespondents do not differ in ways affecting the outcome of interest. This approach also reduces the amount of data available for analysis.

The results are presented primarily in the form of tabulations and descriptive statistics (means, percentages, and distributions). The analysis methods used were standard and appropriate. Most analyses used sampling weights. This means that, in computing average values, the observations were weighted by the inverse of their probability of selection into the sample. When observations are sampled at unequal rates from subpopulations with different characteristics, use of sampling weights allows the sample results to be generalized to the target population. For the Longitudinal Study, the target population for some analyses consists of all LM-LEP students in grades 1–5 in the United States, excluding Pennsylvania. Other analyses are restricted to grade 1 and grade 3 students attending schools

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

with at least 12 LM-LEP students at either grade 1 or grade 3.

Chapter 9 of the Descriptive Phase Report (Development Associates, 1984a) categorizes service patterns into service clusters, or “sets of instructional services provided to one or more LM-LEP students at a particular school or schools, … based on their most salient features,” thereby defining clusters of similar types of service. There is no description of the methodology used for the categorization. It appears that the researchers tried a number of different categorizations and finally settled on a typology that “provided the most workable array.” The report does not explain why the typology used was more “workable” than others that were considered.

In general, the statistical methods used to analyze the survey results were straightforward. The results were presented in tabular form. Some form of graphical presentation of the results would have been quite helpful as an aid to understanding. Of particular statistical interest would have been graphical displays of the multidimensional space of observations on the variables used to define the service clusters—see Chambers, Cleveland, Kleiner, and Tukey (1983) for an introductory description of multidimensional graphical methods.

Summary of Results

The Descriptive Phase Report tabulates a great many results relevant to the objectives of the study goals listed above. This section provides a brief summary of them.

There is a great deal of variation in the operational definition of a LM-LEP student from district to district, and from school to school in some districts. Of the school districts, 61 percent had an official definition for a LM-LEP student, and 75 percent reported setting official entry criteria for eligibility for special LM-LEP services. Some districts defined subcategories of LM-LEP students. Three main criteria for entry into LM-LEP services were used: (1) tested oral English proficiency; (2) judgment of student need by school or district personnel; and (3) tested proficiency in English reading or writing. There was also variation in the instruments and procedures used to measure entry criteria within these broad categories.

Because of the variation in the definition of limited-English proficiency, estimates of the number of LM-LEP students based on the Longitudinal Study are not directly comparable with estimates based on any study that uses a standard definition. Moreover, the definition of a LM-LEP student can vary from year to year within a single district as a result of administrative policy, legal requirements, or economic pressures. It is possible that in some districts the requirement to serve all students in need of special services led to a definition of LM-LEP students as those students for whom services were provided. These factors argue for extreme caution in extrapolating estimates of the numbers of LM-LEP students to years much later than 1983 because changes in how LM-LEP students were defined would invalidate the results. Based on the data from this study, there were esti-

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

mated to be 882,000 students locally defined as LM-LEP in grades K-6 of public schools in the United States in the 1983–1984 school year.

Spanish is by far the most prominent native language among LM-LEP students, accounting for an estimated 76 percent of LM-LEP students in all schools and 78 percent in schools with LM-LEP enrollments of more than 12 LM-LEP students. No other language accounted for more than 3 percent of the students in schools with enrollments of more than 12 LM-LEP students. Southeast Asian languages were predominant in 14 percent of schools; 36 percent of schools had students from only one language group other than English; 3 percent of schools had 12 or more language groups. The average across all schools was 3.5 languages.

Third-grade LM-LEP students were a few months older than the national norms for their grade level. First graders were near the national norms. Both first-grade and third-grade students were rated by teachers as being below grade-level proficiency in mathematics, English language arts, and native language arts, but third-grade students were rated as being closer to grade-level proficiency. More third graders than first graders were rated equal or higher on English-language skills than native-language skills. Of grade K-6 LM-LEP students, 91 percent received free or reduced-price lunches (a measure of socioeconomic status), in comparison with 47 percent of all students in the same schools.

Most student characteristics in the report appear as aggregates across all language groups. The Spanish-speaking group is a large subpopulation, and these students tended to receive different services. It would therefore be of interest to see tabulations of student characteristic variables classified by native language. One reported result is that 64 percent of Spanish-speaking students were born in the United States, in comparison with no more than 28 percent from any other language group. It would be interesting to determine whether observed differences exist in other measured variables, such as free-lunch participation and subject-area proficiency.

The district survey results indicated that an estimated 97 percent of districts with LM-LEP students in grades K-6 offered special services to these students, although 12 percent of teachers judged that students needing services were not receiving them. The nine states with the highest LM-LEP populations provided services to a higher percentage of their LM-LEP students than states with lower LM-LEP populations. In all districts a goal of services was to bring LM-LEP students to a level of English proficiency needed to function in an all-English classroom. Most districts also stated the goal of providing other skills necessary to function in a public school classroom. Very few districts (15 percent) stated the goal of maintaining or improving native language proficiency.

Services were generally provided in regular elementary schools, either in mainstream classrooms or in specially designated classrooms. Students were often in classrooms containing both LM-LEP and English-language-background students. Instruction for LM-LEP students was usually slightly below grade level. Most Spanish-speaking students received instruction in English delivered in the native language, native language as an academic subject, and ethnic heritage; most

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

other LM-LEP students did not.

The contractors defined five types of service cluster in terms of the following variables:

  1. use of native language

  2. special instruction in English

  3. rate of transition

  4. native language arts instruction, based on a narrative program description by school personnel

  5. narrative program description by school personnel

The five types of clusters were called:

(A) native language primacy

(B) continued instruction in native language and English

(C) change in language of instruction, subdivided into

(C1) slow transition and

(C2) fast transition

(D) all English with special instruction in English, subdivided into

(D1) with native language-proficient personnel and

(D2) without native language-proficient personnel

(E) all English without special instruction in English, subdivided into

(E1) with native-language-proficient personnel and

(E2) without native-language-proficient personnel

Table 3-1 shows the estimated percentages of schools offering, and first-grade LM-LEP students receiving, each type. Clusters emphasizing use of the native language appeared predominantly at schools with Spanish-speaking LM-LEP students; schools with no Spanish-speaking LM-LEP students were very likely to offer cluster D.

There was great variation in the characteristics of teachers providing services to LM-LEP students. Sixty-three percent of districts required special certification for teachers of LM-LEP students; fewer than 35 percent of teachers had such certification. In a number of districts requiring certification, teachers were teaching with provisional certification or waivers. Approximately 60 percent of teachers had received some special training in teaching limited-English-proficient students. About half of the teachers could speak a language other than English; this other language was overwhelmingly Spanish. Overall, field researchers found a positive attitude in most schools toward serving the needs of LM-LEP students.

Panel Critique of the Descriptive Phase Study

The descriptive phase study was based on a national probability sample of students and teachers. The sample was obtained through a four-stage sampling process of selecting states, school districts within states, schools, and their students (all eligible teachers from a selected school).

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Table 3-1 Schools Offering and First-Grade LM-LEP Students Receiving, Each Type of Service Cluster

Type of Service Cluster

% Schools

% Students

A

3

7

B

11

26

C

26

40

C1

20

 

C2

6

 

D

51

25

D1

13

 

D2

38

 

E

6

1

E1

2

 

E2

4

 

The procedures used to draw the sample were generally standard and appropriate given the stated objectives. The sampling of states was a nonstandard feature of the design, with the 10 states with the largest proportion of the national total of elementary LM-LEP students included with certainty (these states contain 83.5 of the elementary-school LM-LEP population, 92 percent of the Spanish LM-LEP population, and 64 percent of the non-Spanish LM-LEP population). Of these 10 states, Pennsylvania, with 1.9 percent of the LM-LEP population, effectively did not participate. A stratified sample of 10 states was drawn from the remaining 41 states. In aggregate, the sample accounts for 91 percent of the LM-LEP target population.

This method of selecting a first-stage sample of 20 states is unusual. The objectives of the study might have been better served by one of two alternative strategies. If it was important to limit the number of states to 20, then restricting the study to the 20 states containing the largest proportion of LM-LEP students might well have been more desirable. Such a sample would have contained 92.7 percent of the LM-LEP population and would have constituted a worthwhile study population in its own right. Conversely, making statistical inferences from a sample of 10 states to a population of 41 is hazardous due to the widely differing characteristics of states and their school districts.

Another sampling design would have treated school districts as the first-stage sampling unit and would have used the nation as a target population. This design would have led to less extreme variation in sampling weights. Most of the prominent national surveys of schools and students use districts or schools as the first stage of selection (e.g., the discussion of the National Education Longitudinal Study in Spencer and Foran (1991)).

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

The selection of districts within the selected states gave a total of 222 districts. A number of districts refused to participate, including two large ones—Philadelphia, Pa., and Buffalo, N.Y. Since Philadelphia was one of only two selections from the state of Pennsylvania, the whole state (a state selected with certainty) was dropped from the sample. This decision is illustrative of one of the disadvantages of using states as the first stage of selection. In some other cases, districts that refused to participate were replaced in the sample by others. Of the 23 districts that refused, 19 were replaced in the sample, giving a final sample of 218 districts. The planned number of districts and the level of district participation appear quite adequate for a study of this nature.

Within districts, the school selection procedure gave an initial selection of 536 schools with LM-LEP students. Fourteen of these schools refused to participate, and two were replaced in the sample. This is a high level of participation at the school level. Given that the sampling to the school level was carried out by sampling professionals using a variety of standard and recommended sampling procedures, with relatively few district and school refusals, the study sample gave a sound basis for characterizing U.S. schools with respect to their LM-LEP population and services.

The school sample was reduced prior to the selection of teachers and students. Initially, 342 schools were identified as having sufficient LM-LEP students. All eligible teachers from these schools were selected—a total of 5,213, of whom 4,995 responded. A subsample of 202 of the 342 schools was used for selecting students. Of these, 187 schools actually provided student data, with five LM-LEP students selected for each of grades 1 and 3. Field data collectors were responsible for drawing these samples, but they were not uniform in the applications of sampling rules. As a result, the expected student sample yield of 1,980 was not achieved. A total of 1,909 students were actually sampled, but it is impossible to determine from the documentation provided which of these were original selections and which were replacements resulting from parental refusals to participate.

The sample sizes, sampling procedures, and response rates for teachers and schools were at least adequate to provide a sound basis for inference from the data collected, although it would have been desirable to have rates calculated with and without replacements for refusals. Two points are worthy of special note. First, estimates based on data from teachers and students do not relate to the whole LM-LEP population, but only to that part of the population from schools that would have been deemed “viable” for the conduct of the Longitudinal Study. Such schools had 12 or more LM-LEP students in grades 1 or 3 and contained 82 percent of LM-LEP population. Second, the procedure for sampling students in the field was not well controlled. The reported problems appear to have been minor; thus, it was still possible to weight the data appropriately. However, in the longitudinal phase, the problems were so severe that the data could not be weighted.

The procedures for weighting the data were appropriate and are well documented in Appendix E of Development Associates (1984a). With the caveat

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

concerning the population restriction for student-level data, the study sample appears to be adequate for making inferences from the sample, assuming that there were no major problems with data quality. One exception is that school-level data for grade 6 cannot be reliably projected to national totals and proportions, because the sample only represents those sixth grades contained in sampled schools that have any of the grades 1 through 5. (That is, there are no assurances that the population of sixth graders in schools that have none of grades 1–5 is the same as the population of sixth graders in schools that include as least one of grades 1–5. This may seem a minor point, but it speaks directly to the issue of comparability of populations.) The report notes that there is a distinct under representation at grade 6, observed by comparing school-level data to district-level data, which were not subject to such bias.

The Longitudinal Study Descriptive Phase Report is deficient in that there is no description of the method of calculating sampling errors. Given the nature of the sample design, this issue is far from being completely straightforward. Section 2.5 of Development Associates (1984a) is entitled “Weighting Factors and Standard Errors,” but the text of this section (and Appendix E, to which it refers) does not discuss the topic of standard errors at all. The panel assumes that this was just an oversight in reporting and that appropriate sampling error estimation procedures were used.

Although standard error estimates are reported for a number of key tables, the reporting of them could usefully be far more extensive than it is. Most of the tables in the report would be enhanced by the inclusion of standard errors accompanying the estimates presented. The accompanying text would often be enhanced by a reference to the significance of, or a confidence interval for, reported differences.

In general, the report does a satisfactory job of describing the decisions that were made to transform the reported data into operational variables and of explaining why the decisions were made. These decisions appear to have been very logically based and well researched. The result is that useful interpretations can be made from the complex mass of data that were collected.

One concern is the apparent discrepancy between district-level and school-level data regarding the number of LM-LEP students enrolled. This is discussed in the report, but is perhaps dismissed too readily. As the report states, for grade 6 enrollment the discrepancy is explainable by, and consistent with, the sampling frame of schools, which was such that national estimates for grade 6 cannot be obtained from the school-level data. The discrepancy observed at other grades for Category B states (those in the second level of sampling) is not similarly explainable. The implied claim in the report is that the discrepancy is attributable to sampling error. The computation behind this claim is based on the assumption that the school and district samples were independent, but this is not correct. On the contrary, the sampled schools were drawn from within the sampled districts. The resulting sets of estimates are no doubt highly correlated, and it seems very unlikely that the observed discrepancy is attributable to sampling error alone. Further investigation of the relationship between the school-and district-level

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

data appears warranted in an effort to better understand the discrepancy in these fundamental sets of estimates.

One type of estimate included in Chapter 3 of Development Associates (1984a) appears to be based exclusively on assumptions with little support evident from either the study or elsewhere. Projecting the number of LM-LEP students to include those in private schools, which were not included in the study, appears to be poorly founded. The proportion and characteristics of students who are LM-LEP in private schools could easily be considerably different from those in public schools for many reasons. The study itself does not appear to provide any basis for projecting the results to include private schools.

From the report itself it is difficult to evaluate the quality of the data collected, although the breadth of data collected is described in Appendix C of Development Associates (1984a). One aspect of data quality that is particularly noteworthy, and is pointed out in the report, is the definition of a LM-LEP student. This is a fundamental concept for the evaluation of bilingual education programs.

Throughout the Longitudinal Study the definition of a LM-LEP student was local, and the results of all aspects of the Longitudinal Study must be interpreted with this in mind. The problem may be more serious than one would at first suspect. The study obtained information at the school level about the size of the population of students who were considered non-English dominant, that is, the predominant language used by the child was not English. As the authors point out, one might reasonably expect that LM-LEP students would constitute a proper subset of non-English-dominant students, yet in 18 percent of schools there were more LM-LEP students than non-English-dominant students, while in 37 percent of schools the numbers in these two groups were identical. Thus, the reader must question what is encompassed in different schools by the term LM-LEP. This raises issues about the target population. The presence of LM-LEP students gave schools access to funds to which they might not otherwise have been entitled and may have led to overestimates of LM-LEP students.

Questionnaires were used to collect data on district services, school LM-LEP characteristics, school services, teacher characteristics and LM-LEP instructional practices, student proficiencies in language and mathematics, student perceptions of LM-LEP instruction, and student background data. Although missing data problems in the descriptive phase were encountered, they do not appear to have been severe, particularly in comparison with those experienced during the longitudinal phase of the survey. The report does not document in detail the sources and problems of missing data. Section 2.5 of Development Associates (1984a) discusses the issue briefly but gives no quantitative or qualitative summary of the quality of the data collected.

In summary, the descriptive phase of the study presents a large set of data that can be reliably projected to the national population of locally defined LM-LEP students, or a large majority of it, without substantial problems of bias or imprecision with regard to the representativeness of the sample. The quality of the data itself is less clear.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

THE LONGITUDINAL PHASE

Objectives

As stated by Development Associates (1986) the longitudinal phase of the study had two broad objectives:

  • to determine the degree to which services provided are effective collectively in enabling LM-LEP students in grade levels 1 through 5 to function successfully in all-English-medium classrooms; and

  • to determine which clusters of services are most effective under specific conditions.

Although the study included services provided to LM-LEP students regardless of source of funding, a major purpose was to provide information to guide policy decisions regarding the allocation of funding under Title VII of the Elementary and Secondary Education Act (ESEA). More specifically, the study initially proposed to address the following five questions that were of interest to Congress, the Department of Education, and the educational establishment (Development Associates, 1986):

  1. What are the effects of the special services provided for LM-LEP students in Grades 1–5 in terms of the LM-LEP student's ability to function effectively in an all-English-medium classroom?

  2. How do the various combinations of special services (“service clusters”) provided for LM-LEP students in Grades 1–5 compare in terms of the effectiveness with which LM-LEP students subsequently can function in an all-English-medium classroom?

  3. What are the characteristics of English-proficient recipients of special services for LM-LEP students, and how does the receipt of these services affect the academic performance of these students, overall and when analyzed in terms of language background?

  4. What are the characteristics of LM-LEP students whose parents refuse to permit them to participate in special LM-LEP services, and how does the non-receipt of these services affect their academic performance?

  5. What have been the consequences of ESEA Title VII policy and funding on provision of effective services for LM-LEPs?

These major study questions were broken down into more than 60 specific research questions to be addressed by the study. These objectives were revised after data collection. The revised set of objectives is presented below, in the section entitled “Longitudinal Study Research Questions,” together with the breakdown of specific research questions. Objectives D and E were dropped entirely. Objective C was changed to focus on the effects on LM-LEP students of having non-LM-LEP students in the classroom. Objectives A and B were modified to address effects on learning of English rather than effects on the ability to function in an all-English classroom. These modifications presumably reflect the realization that the data collected did not support the more ambitious initial objectives. Still, depending on how one counts overlapping questions, about 25 research questions remained to be addressed. At the data analysis phase, these objectives were further revised and restricted.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

It is clear both from the planning documents and the original and revised study objectives that the Department of Education was concerned with drawing causal inferences from the study. That is, the longitudinal portion of the study was intended not only to document correlations between predictor and criterion variables, but to attribute causal explanations to those correlations, as required to predict the effects of proposed policy alternatives. Although the study did document correlations between outcomes of policy interest and policy relevant variables, causal attribution is extremely problematic. The study has documented the existence of certain relationships among the measured variables, but the data do not support inferences about how direct manipulation of the policy-relevant variables would affect any outcomes of concern.

Many of the effects of concern to the study developed over a long period of time. In particular, students in programs emphasizing a slow transition from native-language use to English-language use are expected to show lower English-language proficiency in earlier years; however, advocates maintain that their English language proficiency in later years will be higher than if they had been in programs emphasizing an early transition. The Longitudinal Study was intended to provide multiyear data that could be used to evaluate such claims. Unfortunately, the reported data analyses included only single-year analyses. Thus, the major objectives of the study—to provide causal inferences to guide policy decisions and to evaluate the impact of different programs over multiple years—were not achieved. There were a number of methodological difficulties with the study design and data collection that contributed to the disappointing results of the study.

Study Design and Data Collection

The panel found much less documentation for the research design for the longitudinal phase than for the descriptive phase. The design document made available to the panel, “Overview of Research Design Plans,” contained far less than was available for the descriptive phase. There was no comprehensive final report for the initial contract, which covered only design and data collection. As noted above, a separate contract for analyzing the data from the longitudinal phase was awarded to RTI, subcontractor for the data collection phase. The final report for the data analysis was written to describe analysis of the existing data (Burkheimer et al., 1989); there was no attempt to cover study design.

The sample of schools and school districts was a subsample of the original descriptive phase sample. First, the 36 school districts that received site visits in the descriptive study, had at least 200 LM-LEP students in either the first grade or third grades, and agreed to participate in the longitudinal phase were identified. Second, 25 of these 36 districts were chosen. The method by which the sample was reduced to 25 is not clear. An effort was made to include schools: (1) that provided all five service clusters A through E (see Table 3-1); (2) with predominant language groups other than Spanish; and (3) in all major geographic regions in the United States.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Students in the sample consisted of six distinct groups. Two cohorts of students were followed: students entering first grade in the fall of 1985 (the first-grade cohort) and students entering third grade in the fall of 1985 (the third-grade cohort). Three different types of students were sampled in each cohort. The LEP group consisted of virtually all students in the respective cohort who were classified as limited-English proficient by local criteria. The English-proficient group consisted of those students not classified as LEP but who were receiving special services because they were placed in classes with LEP students. The comparison group consisted of children considered English proficient who had never been classified as LEP or received LEP services. Some sites had no comparison students, and the sampling of comparison students seems to have been very poorly controlled. Data from the study came from a total of 33 forms completed by students, parents, teachers, and school and district personnel over a period of 3 years. The documents made available to the panel provide no plans for data management or control of missing data, nor do they include the interviewer protocols.

Analysis Methods

Initial Plans

The data analyses actually performed bore little resemblance to the original plans. The research design plan called for two levels of analysis. First, “a single, national study” would be “based on the use of carefully chosen uniform measures to address a set of common research questions investigated across all participating schools.” Second, “sets of linked mini-studies” would “address some questions of national interest which only apply in certain settings and to address questions of particular local interest to the participating school districts” (Development Associates, 1984a, page 5). The national studies were to include correlational analyses, including multiple and partial correlations. The researchers also planned to apply path analysis to derive causal associations between treatment variables and outcomes.

The initial contract for the Longitudinal Study did not cover data analysis. The funding for the subsequent data analysis contract was significantly scaled down from what had been originally planned. No linked ministudies were attempted. Hierarchical regression models were estimated, but as the authors noted, path-analytic interpretation of the parameters was unwarranted. Under the data analysis contract, a “Descriptive Report” produced some descriptive analyses of the longitudinal data and examined the feasibility of addressing the research objectives. A data analysis plan, which included a revised set of analysis questions (substantially reduced from the research questions), was then submitted. The analysis plan still proposed both estimation of structural parameters in path analysis models and multiyear analyses. Both of these objectives were dropped in the final data analyses.

Linked Ministudies A major part of the original research plan for the Longitudinal Study was a series of linked ministudies. The research plan noted that many

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

questions of major policy import could not be answered in a large national study, either because they concerned only some types of schools or programs or because they concerned variables which could not be directly combined across sites (such as, subjectively determined questionnaire responses whose interpretation might differ from site to site). The original research plan proposed that a set of linked ministudies be performed, with the results to be combined by meta-analysis (a set of statistical methods for combining the results of several studies). No detailed plans were presented for the linked ministudies. Which research questions were to be addressed by the single national study and which by the linked ministudies was not made clear. Only the single national study was performed. No mention of the linked ministudies was made in the data analysis plan or final report developed by RTI for the data analysis contract.

The panel concurs with the authors of the research plan overview that a set of linked ministudies is an appropriate methodology for assessing the effectiveness of different strategies for bilingual education. The research design and sampling plan of the longitudinal study were more appropriate, however, for a single national descriptive study than for a set of linked small-scale quasi-experiments. This point is discussed in more detail below.

Path Analysis Path analysis is a statistical technique for estimating regression models in which there are several sets of equations relating predictor variables to predicted variables. In a path analysis model, a variable that is predicted in one equation may be a predictor variable in another. Variables in a path analysis model are organized into a directed graph referred to as a path diagram. Figure 3-1 shows a high-level path diagram taken from the research design plan of the Longitudinal Study. Path analysis models are also sometimes called “causal models” because the direction of the arrows in the path diagram is often taken to denote causality.

As an example, consider a simplified version of Figure 3-1 with two equations:

  1. Instructional exposure (treatment) is modeled as a function of a set of background variables (covariates) plus a random error;

  2. Achievement (outcome) is modeled as a function of the instructional exposure and background variables, plus a random error.

Statistical methods for estimating the parameters of path analysis models depend on certain assumptions about the relationships between the variables. (If these assumptions are not satisfied, the resulting estimates are inconsistent, meaning that even with very large sample sizes, the parameter estimates are not necessarily close to the actual parameter values.) These assumptions are described in Appendix B to the Overview of the Research Design Plans (Development Associates, 1984b).

In the example, the most important assumption is that the random prediction errors in the two equations are uncorrelated with the background variables. The authors of note (Development Associates, 1984b, Appendix B; italics in original):

The rationale for this assumption is based upon the fact that the important variables are already included in the model and that the [prediction errors]

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

FIGURE 3-1 Sample Path Diagrams from Research Plan

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

represent a composite of small random influences from a large number of sources (including measurement error) that can reasonably be expected to be independent of the major sources of influence.

If this assumption is met, then the first of the two equations can be estimated consistently with ordinary least squares regression. (Ordinary least squares regression is the most common regression estimation method and was used by RTI in the Longitudinal Study data analysis.) The second equation in the example uses the dependent variable from the first equation as one of the predictor variables. For this reason, a second assumption is required if ordinary least squares regression is to be used to estimate the parameters of this equation: namely that the random errors in the first and second equations are uncorrelated with each other. The research plan notes (Development Associates, 1984b, Appendix B) that:

[this assumption]…is on less firm grounds than the assumption of independence between the [background variables] and the error terms. It may be unreasonable since some important variables may have been omitted from the model that may influence both [treatment] and [outcome].

(We note that this quote directly contradicts the previous quote from the same page of the Appendix, although the first is deemed plausible and the second implausible.)

The authors go on to describe statistical methods for estimating parameters of the equations when the second assumption (errors in equations 1 and 2 are uncorrelated) does not hold. They also describe approaches to estimating parameters when the variables are measured with error. Measurement error is one way in which the first assumption (errors are uncorrelated with predictor variables) can be violated. But the assumption remains that the errors in the regression equation relating the true values of the predictor and predicted variables (of which the observed values are “noisy” measurements) are uncorrelated with the true values of the predictor variables. This assumption would be violated if important variables were left out of the regression or if the regression model had the wrong functional form, invalidating any of the estimation approaches discussed in the research design plan.

When estimating any type of regression model, it is important to have sufficient independent variation among the important predictor variables. When predictor variables are highly correlated, parameter estimates are unstable. That is, large differences in parameter estimates make very little difference in how well the model fits the data. For example, if background variables are highly correlated with treatments, a model that includes only treatments and no background variables may be nearly indistinguishable in fit from a model that includes only background variables and no treatments. In other words, it is not possible to tell whether differences between treatment and control groups are due to the treatments or to the background variables. No statistical method, no matter how sophisticated, can make up for lack of adequate independent variation among treatments and background variables.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

The common practice of dropping variables out of the model if leaving them in does not significantly improve model fit invalidates the assumptions under which parameters of a path analysis model can be consistently estimated, unless it can be argued that a parameter estimate that is not significantly different from zero really means that the variable has no or little effect. When there is insufficient independent variation among variables to reliably estimate model parameters, no such argument can be made. Therefore, dropping insignificant variables may invalidate the assumptions under which parameters can be consistently estimated.

Methodology Used

The primary data analysis method used in the Longitudinal Study was multiple regression analysis. The regression models are consistent with Figure 3-1. The final report states (Development Associates, 1984b, page 2–1) that “the sets of regression equations could be linked to define path analysis models,” but the authors were careful to note that causal attribution is problematic. Nowhere else in the report is path analysis mentioned. All analyses in the final report were unweighted, and they were limited to the LEP group and to students whose native language was Spanish. Because of the extent of missing data, RTI attempted no multiyear analyses.

Estimates for four sets of linear regression models were included in the report:

  • Receipt of services was modeled as a function of student and home background measures, ability/achievement measures, and educational history measures (Chapter 4).

  • English language arts achievement was modeled as a function of instructional exposure measures, student and home background measures, ability/achievement measures, educational history measures, school/classroom measures, and measures of background characteristics and teaching style of the main language arts teacher (Chapter 5).

  • Mathematics achievement was modeled as a function of instructional exposure measures, student and home background measures, ability/achievement measures, educational history measures, school/classroom measures, and measures of background characteristics and teaching style of the main mathematics teacher (Chapter 6).

  • Exit of students from LEP services was modeled as a function of instructional exposure measures, student and home background measures, ability/achievement measures, educational history measures, school/classroom measures, and measures of background characteristics and teaching style of the main language arts teacher (Chapter 7).

These estimates were developed separately for each cohort and each year, but no achievement models were developed for the first year of the first-grade cohort because of the lack of pretest data.

The basic modeling approach can be summarized as follows. Modeling began with a set of “core” variables, which included the variables of major policy

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

interest with respect to the variable being predicted. The remaining variables were divided into classes (such as school and district indicator variables, instructional exposure variables, educational history variables, ability/achievement variables, and various interactions among classes). The following four steps were used in modeling process:

  1. Estimate a regression equation based on the “core” variables.

  2. Add the next set of variables. Test whether the entire set of variables can be removed from the model without a significant decrease in explained variance.

  3. If all new variables can be removed, do so. Otherwise, use stepwise backward elimination to remove variables from the added set.

  4. Repeat steps 2 and 3 until no new sets of variables remain to be added.

Summary of Results

The final report for the longitudinal phase (Burkheimer et al., 1989) includes a summary of major study findings. In order for readers to fully appreciate the panel's concerns and critique, we present first the full report summary (emphases in original):

Despite the considerable definitional and analytic constraints and the inconsistency of model solutions, a number of reasonably consistent and meaningful general conclusions relevant to the education of LEP children can be drawn from the findings. These conclusions are presented below.

  • Assignment of LEP students to specific instructional service packages reflects principally local policy determinations. To a much smaller extent (and principally in the lower grades) assignment may be based on designed (and apparently criterion-driven) placement of students in the specific services for which they are ready.

Although the predictability of the instruction received by LEP children was highly variable (some types of instruction were highly predictable and others were not), the patterns of provided instruction seem to be set primarily at the school/district level and secondarily at the teacher/classroom level. Dummy variables, reflecting nothing more than membership in a cluster of schools/districts, generally contributed most of the explained variance in predicting the educational exposure variables. When combined with other school and classroom variables, which also served as clustering variables, these contextual factors accounted for (over-model median) over 93 percent of the explained variance in the instructional exposure variables.

The findings also tended to indicate that instructional services were determined in part by the size of the LEP population being served within the district, school, or classroom. Patterns established at the level of these units may have been designated to meet the needs of the average child in the unit and, thus, may have subordinated other potential relationships between instructional exposure and home or individual child characteristics. Patterns of instruction received, when averaged over students within a study year, demonstrated systematic changes over the nominal grade placement levels associated with the study year. Indicators of relative English proficiency for

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

the individual LEP child played only a relatively small role in his/her receipt of particular instruction (for the variables examined here), principally in the earlier grades.

Deliberate assignment of LEP children to types of instruction consistent with explicit and implicit indicators of their greater proficiency in English was reflected in a weak trend (and probably occurred in many instances). However, an alternate explanation (which recognizes the implicit clustering of students within existing programs based on district policy) is considered equally plausible, particularly in light of the fact that oral English proficiency as measured in year 1 of the study was still predictive of services received in study year 3.

While not modeled directly, it seems quite reasonable (in light of the strong prediction obtained from school/district membership) that one of the best predictors of instructional exposure in any year is the type of instruction provided in the previous year (i.e., the nature of instructional exposure is principally a function of a consistent district policy, a supposition well supported in these analysis). In this case, if (as would certainly be expected) instruction in English language arts and in English generally facilitates English achievement (as measured in the short term), then the positive effect of prior year's instruction on the pretest for any given year, plus the continuity between the prior year's instruction and the current year's instruction, should have resulted in the types of relationships observed.

In actuality, it was probably the case that both of these hypothesized effects were in operation. Programs that used simplified English and oral English in previous years (recall that none of the models were addressed exclusively to students who had received no prior LEP services), toward developing transition to English as the language of instruction, should have realized some success in the achievement measures which would, in turn, have allowed greater use of standard English, less need for additional instruction in oral English, and a foundation on which to base increased instruction in English language arts and more instruction provided using English.

  • Too heavy a concentration on one specific aspect of the LEP child's education generally can detract from achievement in other areas.

The effects of the instructional exposure variables in the models of mathematics and English language arts achievement were quite pronounced but rarely simple. In a number of cases, interactions among the instructional exposure variables indicated the trade-offs that are common to most education programs; namely, within a framework that constrains total instructional hours and is further constrained by legislated requirements for some courses, increased instruction in one particular subject area is typically accomplished at the expense of reduction in another subject area.

  • The yearly achievement of LEP students in mathematics and English language arts is not facilitated by a single approach; rather, different approaches seem more appropriate depending on the characteristics of the student. LEP students who are assigned to services for which they are ready generally show increased achievement (in both mathematics, as measured on an English test, and English language arts); however, if these services are provided before the child is ready such services may be counterproductive.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Yearly achievement of LEP children was generally well predicted, and the nature of the instruction received by these children contributed substantially to explaining achievement. However, the large number of interactions between individual measures of prior proficiency and various measures of instruction indicated that the relationships were typically not simple ones. Findings indicated relatively consistently that greatest benefit from a particular combination of instructional strategies was realized by those who already possessed sufficient requisite skills. These same combinations of instructional services frequently resulted in lower net achievement when provided to LEP children who had not obtained the requisite proficiencies to benefit from the instruction; those children typically benefited from instructional services more attuned to the different skills that they possessed. Specific examples of these general conclusions are provided below.

  • When ready (i.e., with sufficiently high oral proficiency or prior achievement in English) and when provided more instruction in English language arts and/or indirect exposure to English through instruction provided in English in other courses, LEP children show greater yearly achievement in English language arts.

  • When LEP children are weak in English and/or strong in their native language, English language arts instruction in the native language facilitates yearly English language arts achievement; to a lesser extent, mathematics instruction in the native language facilitates mathematics achievement under the same conditions.

  • In earlier grades yearly mathematics achievement gains, as measured on a test in English, can be realized when mathematics instruction is provided principally in English or when the instruction is provided primarily in the native language. Achievement, in the former case, is facilitated by greater prior achievement on an English language math test; in the latter case, achievement is facilitated to the extent that the child receives more instruction in English language arts.

  • In later grades, regardless of the language in which the student learns mathematics, yearly achievement in mathematics, on an English language test, will not be realized until the child gains some mastery of English (through English language arts instruction or exposure to instruction in English in other courses, particularly mathematics).

Language of instruction in mathematics seems to play a considerably smaller role in the upper grades, however, it does continue to be reflected through the dummy variables in those less-powerful models. Consequently, if diverse instructional procedures (related to prior proficiencies, or lack thereof) prepare the children equally well, there should be no specific benefit associated with any specific approach. However, mathematics achievement in this study was measured through a test in the English language, and while at lower grade levels it is likely that much of the mathematics work may be language-free, such is not the case at upper grade levels where more direct effects are associated with higher English language proficiency. Thus, success on a test that is administered in English (be it a state-level competency test, a college placement test, or a job placement test) depends on acquisition

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

of English language skills, regardless of the language in which the student has learned mathematics.

  • Like assignment to specific services, exit from LEP services reflects both local policy determinations (some of which are apparently independent of criteria measured by the analytic variables) and specific criterion-driven exit rules related to reaching certain levels of proficiency/achievement in English.

The best predictor of exit was the school dummy variable set, which reflected little more than enrollment in specific clusters of schools and/or districts. This certainly reflects the fact that, under some conditions, retention in the LEP program was likely to be determined less by individual differences in English proficiency (particularly at the upper grade levels and during the later study years) than by district policy (which was probably determined, in part, by the size and nature of the LEP student population and the overall language-minority makeup of the general population within the district and/or school attendance area). Also, the contributions from other variables were generally attributable to how those variables clustered together at a school or district level (that were not adequately captured by the dummy variables).

Thus, the prime determinations of exiting at upper grade levels appear to be school or district policies relating to the entire remaining group (or at least the greater portion of it) of students being served. At the lower grade levels even with the mixed policies of criterion and non-criterion retention in services, relationships existed between exit and individual factors, exposure to specific instruction, and enhanced achievement in English language arts (probably as a result of that exposure). Nonetheless, these effects were also modified by school characteristics (clustered within districts), such that the relationships existed (or were most pronounced) only in certain types of schools. It seems quite likely that the characteristics of those schools were related to the programs that were using specific criteria for retention in services rather than programs that were retaining all (or nearly all) LEP students in those grades.

Exit rate patterns themselves were indicative of the nature of exit from LEP services over time (or grade). The conditional exit rates (i.e., rate of exit for those who had not exited previously) in both cohorts were greatest in the first two years of the study. Analyzed within the different school districts, the data showed markedly different lower-bound cumulative exit rates and over-year patterns. Some districts had exited virtually all of their LEP students prior to nominal Grade 5, while others had exited essentially none. Some districts showed a steady increase in exit rate over the years and others showed early exit of a subgroup of LEP students, with little subsequent increment. This suggests that some programs were retaining children in LEP services on the basis of some criterion that increased over subsequent years (i.e., perhaps achievement in English language arts), while others were maintaining services to all or a subgroup of LEP students irrespective of external criteria.

  • Children receiving patterns of services that also enhance English language arts achievement, and/or patterns more similar to those that would be expected for English-proficient children, are more likely to be exited from LEP services.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

One relatively consistent finding for exit was exhibited in one way or another in all of the models; namely, during a given year, children who were receiving services more akin to those that would be provided to English-proficient students were more likely to be exited, while those who were not were less likely to be exited. Also, in a number of instances, interactive patterns that predicted exit were quite similar to the patterns that had been found to predict achievement gains in English language arts. Greater prior achievement in English language arts and/or initial oral proficiency in English (typically as modified by certain combinations of instructional services) were also found to increase the probability of exit for children in all but the least powerful of the models (i.e., in the later grades).

To the extent that retention in the LEP program was based on individual student needs, the higher achievers would be exited from the program. This did happen to a small extent (note also that achievement variables did show up in some main effects and in a relatively small number of interactions); however, the effects from these variables were not nearly as strong or numerous as would be expected if individual achievement were being consistently considered in exit decisions. This indicates that decisions regarding both the type of instruction (resulting in differences in English proficiency/achievement) and the policy for exiting were probably made more on a school or district level for all (or major groups of) students than for individual students based on their level of proficiency in English. This may reflect either a deliberate placement of students in “pre-exit” conditions or the use of those specific patterns in programs that have managed to increase achievement in prior years and provide these particular combinations of services in the year of exit.

The relationships of exit to instruction similar to that provided to English-proficient students (and conversely non-exit to services more directed to LEP students) also seems to reflect a sound educational practice. Children who are in small classes and for whom there is considerable use of simple English in instruction (both of which were associated with reduced probability of exit) are probably in such an instructional environment because they need special attention; as such, the students are not likely candidates for exit from service during the year. Similarly, children receiving instruction in oral English (associated with reduction in probability of exit) probably need such instruction because of a lack of English proficiency, and consequently would be kept in the LEP program for one or more additional years.

It is conceivable that students who had the greatest need for continuation of LEP services were grouped into individual classrooms (where a large number of options for educational services were viable), while those less in need (and nearing the point where they could be exited) were placed in classrooms with larger proportions of English-proficient students. It is considered more likely, however, that districts or schools with high proportions of LEP students (which would certainly lead to large proportions of LEP students in all classrooms) had a larger number of viable options for provision of services to their LEP populations, including longer-term programatic efforts, with policies that would probably favor retention in the LEP program at the higher grade/age levels.

These reported conclusions are based on dubious statistical methodology (see “Critical Review of Longitudinal Phase Study,” below). Although the authors

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

note the problems with drawing defensible causal inferences from these data, these difficulties are not reflected in the way these conclusions are worded. For example, the report states (Burkheimer et al., 1989; italics added):

When LEP children are weak in English and/or strong in their native language, English language arts instruction in the native language facilitates yearly English language arts achievement; to a lesser extent, mathematics instruction in the native language facilitates mathematics achievement under the same conditions.

What can defensibly be said is that a positive partial correlation was observed between achievement and the respective instructional variable when the effects of other variables in the model were taken into account. Such a correlation may be suggestive of a positive causal connection, but the panel notes that due to many problems with the data and analyses, none of the report's conclusions can be regarded as verified by these data.

We note, however, that some of the report's conclusions are consistent with other findings in the literature, including conclusions from the Immersion Study: The panel has noted that when several studies carried out under different conditions yield convergent results, confidence in the conclusions increases, even if the individual studies are flawed. The panel still urges caution in combining (flawed) studies, particularly if the flaws are much the same from study to study.

Critical Review of Longitudinal Phase Study

Study Design

A major objective of the descriptive phase of the Longitudinal Study was to provide information relevant to the design of the longitudinal phase. The longitudinal phase reused the sample taken for the descriptive phase, benefited from the descriptive phase experience with similar types of data collection instruments and methods, and made use of relationships established with school personnel. It appears, however, that the results of the descriptive phase had little influence on the design of the longitudinal phase..

This is unfortunate because the Descriptive Phase Report reveals a number of warning signs that should have signaled potential problems during the longitudinal phase. The study timeline (see “The Descriptive Phase,” above) suggests why these warning signs may not have been heeded. Data collection for the descriptive phase occurred in the fall of 1983. The research design plan for the longitudinal phase was submitted in the spring of 1983, but the Descriptive Phase Report was not submitted until December of 1984, after the baseline data for the longitudinal phase had been collected. To meet this schedule, an initial sampling design and drafts of data collection instruments would have had to have been prepared well in advance of spring 1983 when plans were submitted. It appears that the broad framework of the longitudinal phase design was settled before any results from the descriptive study were available. The tight time schedule prevented the results of the descriptive phase from having a major impact on the sample design and

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

data collection, even though they suggest that major changes were needed. More specific criticisms of the study design are considered below.

Study Objectives A successful observational study must have clearly defined objectives, and these objectives must be ranked to reflect some form of priority. As became clear during implementation, the original objectives of the study were far too ambitious, and not all objectives could be realized. Resource restrictions always require tradeoffs among objectives, and dropping infeasible or low-priority objectives early in a study enables resources to be concentrated on ensuring that high-priority objectives will be met.

To illustrate this point, consider the descriptive phase results that revealed that a large majority of LM-LEP students were Spanish speaking, that these students differed in many important ways from other LM-LEP students, and that there was a strong relationship between native language and service characteristics. Within available resources under the existing sampling plan, there was little hope that the sample would contain sufficient numbers of students with sufficient variation in treatments to make reliable inferences relating native language to all other variables of interest. Aggregating across language groups was problematic because of known differences between the populations. A strong case could have been made a priori for limiting the study to Spanish-speaking LM-LEP students and dropping the objectives relating to language groups other than Spanish. Doing this at the outset (rather than having it forced after the fact due to the inevitable data problems) might have made resources available for better control of missing data. This in turn might have enabled a longitudinal analysis, albeit limited to Spanish-speaking students.

Definition of the Unit of Inference For most analyses, the researchers attempted to use students as the unit of inference. The rationale for this was that different students in the same school or classroom might receive different treatments, and the researchers wanted to be able to analyze the effect of these differences. There was a strong correlation, however, between services provided to different students in the same classroom, school, or district. In many cases, the instructional variables were highly confounded with school and district indicator variables because certain types of instructional practices were clustered within certain schools or districts. This confounding severely limited the power of the analysis and the ability to distinguish between effects of treatments and effects of highly correlated covariates. Yet this clustering could have been anticipated from the information available from the descriptive phase.

When there is a high degree of clustering, a large sample of students does not guarantee high precision in the results. Of more importance to precision is the number of schools or districts, which was not large (118 schools in 25 districts).

Definition of the Subject Population The contractors decided at the outset of the study to use local definitions to classify students as LEP. Therefore, the definition of who was eligible for inclusion in the analysis varied from school to

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

school. As noted above, in some cases LEP classification may have had more to do with availability of funds or school/district policy for length of stay in LEP programs than with English-language proficiency. Thus, important characteristics of the subject pool (including unmeasured characteristics) appear to be confounded with school and district variables. In addition, the definition of LEP, and thus the definition of the target population, may differ from year to year. Because treatments tended to cluster within schools and districts, it is likely that important characteristics of the subject pool are confounded with assignment to treatments. This problem could have been anticipated from the descriptive phase results, which indicated that both LEP classification and service features depended on school and district policy and varied widely among districts.

Treatment Definition The Department of Education was interested in the impact of a number of instructional variables on English acquisition. Development Associates (1984b) identified 10 different categories of “treatment variables” planned to be used in causal modeling. A number of these treatment variables were subcategorized into three or four distinct variables. The descriptive phase had identified five distinct service clusters, and the original research questions (see below) concerned the effects of these clusters on achievement.

Unlike the situation in an experiment however, there was substantial variation in how treatments were applied within the service clusters. The service clusters were not used in the final data analyses; models were built using the treatment variables directly. Of course, the existence of clusters in the population restricted the range of independent variation of treatments, thus restricting the inferences that could be made about distinct effects. (For example, use of the native language in mathematics instruction tended to correlate with use of the native language in language arts instruction because both tended to occur more in programs emphasizing a gradual transition to English; therefore, inferences about separate influences of these variables are problematic.)

Because there is no well-defined protocol for treatments, it may be impossible to separate genuine treatment effects from the effects of covariates that affect how treatments are actually applied. For example, when use of the native language is emphasized, teachers tend to have higher native-language proficiency and lower English proficiency than when English is emphasized; teachers with credentials in LM-LEP education tend to cluster in programs emphasizing slower transition to English. These effects must be considered to be part of the treatments as actually administered. It is important to stress that conclusions apply only to treatments as administered in a study, and not necessarily to proposed treatments in which the values of important covariates differ from those observed in the study. That is, conclusions about the effects of programs emphasizing slower transition to English would not apply if they were taught by teachers with low Spanish proficiency.

There does not appear to have been any a priori attempt to analyze the range of variation in treatments and assess whether there was sufficient variation to address the study objectives.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Covariates and Outcomes The distinction between covariates and outcomes is not appropriately addressed in the Longitudinal Study. Outcomes (for example, test scores) at one time period are used as covariates for analyses of the data for the next time period. There were no true pretests for the study. All test scores were obtained from students who had been receiving one or another treatment prior to the beginning of the study.

Selection of Units into the Sample The sample of schools for the longitudinal phase consisted of those schools from the descriptive phase sample with a sufficient number of LM-LEP students. The two phases had different purposes, and different sample selection criteria were appropriate. In the descriptive phase, the concern was with coverage—ensuring that the sample represented the wide range of treatments in the population. For drawing causal inferences it is essential to control for the effects of variables other than the putative causal factors. The major concern should be comparability of units in the different treatment categories. There is no reason to expect a sample selected for coverage to show the required comparability.

The descriptive phase concern for coverage carried over, inappropriately, to the longitudinal phase. The selection criteria attempted to ensure coverage of a wide range of treatments and language groups, with little concern for whether the sample would contain enough observations to make reliable inferences. For example, the research design plan, in describing how the 25 school districts were chosen from the qualifying 36 districts, states that “an effort was made to assure inclusion of clusters A, D, and E.” Three schools in the final sample had service cluster E. There was no discussion of whether this was sufficient for reliable statistical inference, nor was there any analysis of the comparability of these three schools to schools offering other treatments. The plan stated that some schools offered multiple treatments, but it did not state whether the three E-cluster schools also offered other treatments.

To determine whether one of two treatment types is more effective, the sample would have needed to contain districts or schools that are similar in important respects (e.g., policy towards classifying LM-LEP students, student characteristics, and teacher characteristics) with the exception of treatment type. Differences in outcomes might then be attributed to differences in treatment type and not to differences in those other variables. The descriptive phase sample contained information on many variables pertaining to characteristics of students, teachers. and treatments. It also provided information on LM-LEP classification policy. It would thus have been possible to analyze the descriptive phase sample to determine which causal inferences were and were not possible for that sample. Those objectives that could not be met from the existing sample could then have been made a priority. If judged cost-beneficial, the sample could have been augmented by selecting additional schools covering the range of variation needed to make inferences on high-priority objectives.

These concerns were recognized in the planning documents (Development

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Associates, 1984b). The impossibility of assigning subjects randomly to treatments was noted, and the issue of controlling for biases was discussed. Standard methods of adjusting for potential biases include matching or subclassification on the basis of covariates and/or model-based adjustment. These methods were discussed in a general way in the planning documents; however, specific procedures were not defined. There appears to have been no planning to control for the effect of potential hidden biases or effects due to unmeasured covariates. The report recognized that data were extremely poor or unavailable for some potentially important covariates, but there appear to have been no plans for anything other than qualitative assessment of the effects of those data. No consideration appears to have been given to augmenting the sample to ensure the necessary variability even though data available from the descriptive phase could have identified problems of insufficient variation in treatments and noncomparability of units.

Determination of Sample Size Required for Reliable Inferences Having a large overall sample is not sufficient for drawing conclusions when the goal is to infer relationships among variables. There are two aspects that need to be carefully considered in any study. One is the level of significance of a result, or, roughly speaking, the likelihood that a seemingly significant outcome has occurred by chance alone. The second aspect is the power of the study, or the likelihood that the study will be able to detect an effect of a given size. Power analyses need to be performed for specific analyses of major importance, and these power analyses must take into account not just the raw numbers of observations, but the number of observations expected to exhibit specific patterns of covariates. For example, if service clusters are highly correlated with variables thought to influence outcomes of interest (for example, a particular service cluster occurs only in districts dominated by students of low socioeconomic status), then estimating effects due to treatment alone will be virtually impossible. In experiments, treatments are generally randomized within cells defined by covariates in order to ensure sufficient independent variation of treatments and covariates. This approach is not possible in an observational study. When initial data, such as that from the descriptive study, are available, it is often possible to determine ahead of time whether sufficient variation exists within the population to make the desired inferences. If not, resources can be concentrated on collecting the data required for analyses that are more likely to be fruitful, or the sample can be augmented with additional schools chosen specifically to obtain combinations of treatments and covariates not represented in the initial sample.

Data Collection

Failure to Control for Missing Data Missing data can undermine attempts to derive any inferences from statistical data, but missing data are particularly problematic when causal inferences are contemplated. Missing data rates that are quite acceptable for estimating the mean and variance of a single variable can undermine the determination of relationships between several variables. For

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

relational analysis, patterns of missing data are more important than raw rate of missing data. Longitudinal studies also compound problems of missing data. Without careful planning for the control of missing data, missing data is likely to undermine a longitudinal study involving relational analysis.

Concerns about missing data are especially acute for the Longitudinal Study. This could have been anticipated. Data from the study came from a total of 33 forms completed by students, parents, teachers, and school and district personnel over a period of 3 years. Many of these forms were quite complicated. The research design plan called for more than 50 variables to be involved in causal modeling.

A simple calculation reveals the pitfalls awaiting unwary investigators. If 50 variables each have a 98 percent completion rate, and completion rates are independent between questions, then only 36 percent of the cases would be expected to have complete data for a single year. If completion rates are also independent across years, the completion rate over 3 years is about 5 percent. Of course, completion rates in the study were not independent between questions; however, missing data rates were much higher than 2 percent. In fact, even after preliminary imputation and the dropping of some variables from the analysis, fewer than 3 percent of the cases in either cohort had complete data required for analyses over all 3 years (Burkheimer et al., 1989). It was for this reason that the plan for multiyear analyses was abandoned.

It is essential to plan a strategy for minimizing and controlling for the effects of missing data. It does not appear that this was done in the Longitudinal Study. The term “missing data” was mentioned only once in the Overview of Research Design Plans (Development Associates, 1984b, Appendix B): “Among the potential data anomalies … the most obvious is that of missing data, which can have serious implications for subsequent modeling.” This observation was not followed up with any plans for coping with missing data. Unit nonresponse rates were reported for the descriptive phase and in the year 1 report of the longitudinal phase. No item nonresponse rates were reported, much less analyses of patterns of missing items. The planning documents contained no strategy for control or monitoring of missing data. Missing data problems can be mitigated with proper planning and proper monitoring. Problems can be identified early so that resources can be redirected away from collecting data that will turn out to be useless (see below).

High Levels of Attrition Sample sizes were dramatically reduced from year to year, both because of students' exit from LEP programs and because of the mobility of the LM-LEP population. Plans called for follow-up of students who moved away, but student mobility turned out to be even greater than had been expected (study students transferred to over 500 nonstudy schools.) The magnitude of the data collection effort precluded follow-up, except for students who stayed within study districts. This problem could not have been anticipated from the single-year descriptive sample.

Sample shrinkage due to exit implies that the nature of the population of

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

LM-LEP students changed from year to year. Moreover, differential exit policies means that the nature of the population change differed from program to program. This difficulty was anticipated, which was a major reason given for attempting to follow exiting students.

Measurement Error The descriptive phase report contains little information about the quality of the data. Inconsistencies among data items suggests that there was considerable measurement error for many variables. To the extent that the study instruments and data collection methods were similar, measurement error problems should have been similar. The descriptive phase thus should have provided information about the necessity of applying methods for controlling or adjusting for measurement error. In view of the important effect of measurement error on the ability to estimate parameters of interest in path analysis models, the lack of attention devoted to planning for and controlling measurement error is a serious omission.

Unmeasured Covariates Constructs of great importance were never measured or measured very poorly. For example, pretest data were not collected for the first-grade cohort, nor for some students in the third-grade cohort. Measures of prior exposure to LEP services were very weak, limiting the inferences that could be drawn from the sample.

Changes in Forms The final report for the data analysis contract states that changes were made in forms from year to year in order to “improve” them. This is contrary to sound statistical practice. Consistency in the meaning of data items from year to year is essential to a reliable longitudinal analysis.

Data Analysis Methods

The statistical analysis in the Longitudinal Study focused exclusively on regression and correlational analysis. The modeling proceeded by adding batches of variables to a regression model and then successively eliminating variables that did not significantly improve model fit. The modeling was not based on a priori theoretical considerations, except that candidate variables were chosen because it was thought they might affect outcomes of interest.

The data exhibited a high degree of multicollinearity: that is, many of the explanatory variables were themselves highly correlated with each other. As a result of this multicollinearity, many different models might have fit nearly as well as the ones reported. This was noted in the report. The report also noted that certain results, especially counterintuitive ones or ones that exhibited inconsistency from year to year, should be interpreted with caution. In particular, it is not generally accurate to say that a measured quantity (variable) that does not appear in a model has, as the report states, “no effect.” The variable may well be related to the outcome being predicted by the model, but provide no incremental prediction effect because it is correlated with a variable that does appear in the model. Conversely, a variable that does appear in the model may be no better a predictor than some correlated variable that does not appear in the model.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

It should also be noted that any reported significance levels need be viewed with suspicion when the modeling proceeded by trying a number of models and selecting the best; see, for example, Miller (1981). The report also notes the high degree of multicollinearity among the predictor variables. This multicollinearity further obfuscates the conclusions that can be drawn because the predictive power of one variable may be completely masked by other variables. Thus, a nonsignificant coefficient may imply that the corresponding variable has limited impact on predictions or that other correlated variables have masked the effect. The data analysis reported in the study must be more nearly viewed as an exploratory analysis than a theoretically driven quasi-experiment (see Chapter 2). Although the initial intent was a quasi-experiment, the study was not planned or executed in a way that enabled more than exploratory analyses. Exploratory analyses can serve the important purpose of suggesting hypotheses for later verification, however, firmly grounded causal inferences require carefully thought-out, planned, and executed experiments or quasi-experiments.

FURTHER ANALYSES

Despite the above criticisms, there may still be value to be gleaned from the data, albeit not to answer the intended questions. The panel found that the data provide a picture of the state of bilingual education in the early to mid-1980s and might be useful in planning further studies.

Descriptive Phase Data

The descriptive phase of the Longitudinal Study presents a large set of data that can be reliably projected to the national population of (locally defined) LM-LEP students, or a large majority of it, without substantial problems of bias or imprecision. As such, it is a useful resource for researchers attempting to understand the range of services being provided to LM-LEP students and a valuable knowledge base for use in planning studies of the effectiveness of programs.

One of the inherent difficulties with the Longitudinal Study (and the Immersion Study) is the difficulty of correctly classifying LM-LEP service programs, and indeed of even finding good examples of any given type. One possibly useful application of the descriptive phase data would be a more detailed characterization of the types of services offered and the characteristics of the students to whom they were offered. This work might concentrate future evaluation efforts on those types of programs currently widely offered and to cover the full range of such programs. Information might also be extracted about the extent to which specific types of services are targeted to student needs and capabilities, rather than being a function of the resources available to the school. Such analyses could view service delivery as a multidimensional phenomenon, considering the interrelationships exhibited among the five variables used to define service clusters, plus possibly others. Such analyses have the potential to shed substantial light as to what is actually

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

“out there” and, just as importantly, why. This could well serve future debates and investigations about the efficacies of different programs.

Yet one caveat must be noted. The data from the descriptive study were collected in 1983. No doubt the population of LM-LEP students has changed substantially since then. The nature and range of services offered has probably also changed. Hence, results from any further analyses of these data would necessarily be dated.

Longitudinal Phase Data

Descriptive Analyses of Year 1 Data The baseline data for the longitudinal analyses were collected during 1984–1985. The data were collected about districts, schools, teachers, classrooms, and students from a subset of those schools and districts surveyed during the descriptive phase with sufficient LM-LEP students to make the Longitudinal Study data collection effort cost effective. Some schools declined to participate, and for some of these substitutions were made, but no weighting adjustments were made to reflect this nonresponse. All LEP students and English-proficient students receiving LEP services in grades 1 and 3 were included, plus a sample of students who had never been LEP or received services (comparison students). Survey weights were not developed for the data. The sampling procedures for selecting the comparison students were so poorly controlled that weighting of these students within schools was not possible even if it had been deemed desirable.

The Descriptive Report of the Longitudinal Study (Development Associates, 1984a) provides details of the variables collected in the longitudinal phase. The process of arriving at derived variables is described and limitations of the data, particularly the extent of missing data problems, are described.

Given the large volume of baseline data collected, the question arises as to whether there are useful descriptive analyses of these data that have not been undertaken thus far, but that might be done and might shed light on the delivery of services to LM-LEP students. Some issues related to this question are addressed in this section.

In general, there were considerable problems with missing and unreliable data. Many of these are discussed in some detail in the descriptive report. These problems appear to have been so severe that, if any finding derived from them contradicted the “conventional wisdom” about bilingual education, the finding could easily be dismissed as an artifact of the poor data. There may, however, be components of the data that are worthy of additional analyses.

Any generalization of descriptive analyses from sample to target population would be on much firmer ground if the analyses incorporated survey weights. The panel does concur with the investigator in concluding that it is unlikely that useful survey weights could be developed for the longitudinal sample.

In the descriptive phase, 342 schools were identified as having sufficient LM-LEP students for the study, yet only 86 schools participated in the longitudinal

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

phase. This implies that a more restrictive population was used than is made clear in the description on page 2 (Development Associates, 1984a). The report does not make it clear how the reduction to 86 schools was accomplished. Furthermore, the participation of only 86 schools from a smaller number of districts means that the level of sampling error is likely to be moderately high, despite the inclusion of several thousand students in the sample.

The variables that appear to be most complete, and also numerous, are the school-level variables derived from teacher data and the teacher-level variables. The three derived classroom-level variables, classroom size, percentage of students who are LEP, and the percentage of students who speak only English, also appear to be relatively free from missing data and measurement error problems. One important use for these data might be to cross classify these school-, teacher-and classroom-level variables, perhaps weighted by the numbers of LEP students in the classroom, in order to characterize the type of learning environments that LM-LEP students are exposed to (or were in 1984–1985). For example, simple cross-classification of “Teacher support for using child's native language in teaching him/her” (the variable coded as SCLM1Y1 in the report) with “Principal support of school's LM-LEP program” (coded as SCLM4Y1) might provide information about the distribution of teacher support for native language usage and its association with a principal's support for LM-LEP programs. This in turn might shed some light on the relationship of school policies to teacher practices. Cross-classification of some of the teacher characteristic variables with the percentage of students in a class that are LEP (coded as CLSPLEP1) would provide information not only about the distributions of different teaching practices for LM-LEP students, and the variations in the percentage of students who are LM-LEP (in this “high LEP” population), but might also be informative about whether certain types of teacher services and practices are predominantly available in classrooms with mostly LEP students, or in those with few LEP students.

In summary, it is unclear from the Descriptive Report how restricted the population for inference is from the longitudinal phase. This must be addressed before a conclusion can be reached as to whether further descriptive analyses are warranted. There do appear to be a substantial number of reliable variables that could be analyzed, and in particular cross-classified, with a view to better characterizing the range of learning environments to which LM-LEP students are exposed and their relative prevalence in practice. Such analyses might guide future efforts at evaluating programs, at least by identifying what “programs” actually exist from the viewpoint of a LM-LEP elementary school student.

Other Analyses It is possible that some defensible causal inferences could be drawn from the Longitudinal Study; however, it would take considerable effort just to determine the feasibility of such analyses.

The value of purely exploratory analyses might be enhanced after additional exploration of the robustness of the findings and the degree to which alternate

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

models fit the data. Variables not appearing in a model but highly correlated with model variables might be substituted to evaluate whether an alternate model based on these variables would fit as well. For example, school-level, classroom-level, and teacher-level variables might be substituted for the school and district indicator variables.

Cross-validation might improve confidence in the robustness of the results. The models would be extremely suspect if coefficients changed drastically when the model was fit on a subset of the data, especially if the resulting models were poor predictors of the hold-out samples. No such analyses were performed as part of the original study, but they might be desirable given the degree of multicollinearity and other problems.

Finally, no attempt was made to analyze data from students who exited LEP programs, English-proficient students, or students whose native language was not Spanish. It is likely that sample sizes are too small for the latter two categories, but exploratory analyses for students who exited, of the type already performed for LEP students, may be of interest.

PROSPECTS FOR THE FUTURE

General Remarks

Contrary to the hopes of those who commissioned the Longitudinal Study, further research is needed to address the question of which interventions are most effective in improving educational outcomes for LM-LEP children. Despite all its problems the study does provide a valuable information base for designing future studies to address these objectives. The descriptive phase study provides information concerning the natural range of variation in services in the population and about correlations between service types and various background characteristics. The longitudinal phase adds to this information base: variables were measured in the longitudinal phase that were not in the descriptive phase, measurements were made over time, and information was collected about the factors related to exit from LM-LEP services. Although they must be regarded as exploratory in nature, the longitudinal phase analyses revealed associations between variables. Hypotheses about the causal mechanisms underlying these associations could be confirmed by well-designed and carefully implemented quasi-experimental studies.

Just as important, the study provides important information about the difficulties awaiting future researchers. The measurement difficulties, missing data problems, and attrition problems encountered by the Longitudinal Study will have to be faced by future researchers. Awareness of the magnitude of the problems permits planning to mitigate their impacts.

Planning for Observational Studies Directed to Causal Inference

At least seven factors must be attended to in planning an observational study if it is to be used for policy-relevant questions of causation.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

First a successful observational study must have clearly defined study objectives, and these objectives must be made a priority. Resource restrictions always require tradeoffs between objectives. Dropping infeasible or low-priority objectives early in a study enables resources to be concentrated on ensuring that high-priority objectives will be met. Too many objectives without clear priorities often leads to failure to achieve any objectives.

Second, a clear, operational definition of treatments is essential and that means a full characterization of treatments as implemented.

Third, after the treatments have been designed, a sampling plan must be developed that ensures that comparable subjects are assigned to each of the treatments. Collecting data on covariates with the hope of adjusting or matching after the fact invites problems unless there is some assurance that the treatment groups will be reasonably comparable with respect to the distribution of covariates. A safer approach is to select observational units that are explicitly matched on key covariates. This approach requires data on which to base selection of matched units. The knowledge base gained from the Longitudinal Study, if used properly, can provide an extremely valuable resource for sample selection for future studies.

Fourth, there must be control of missing data. Missing data can undermine attempts to derive causal inferences from statistical data. It is essential to plan a strategy for minimizing and controlling for the effects of missing data. This issue is discussed in detail in the next section. Here, we note the critical importance of identifying key data items for which it is essential to have complete data, monitoring the completeness of these items as the survey progresses, and devoting resources to follow-up on these items if the extent of missing data becomes too high.

Fifth, there must be control of measurement error. If a model is defined at the outset with clearly specified hypotheses, it is possible to identify a priori the variables for which measurement error could seriously affect results. Special attempts might be made to measure these variables accurately or estimate a statistical model for the measurement error distribution. Information about key variables can be obtained from multiple sources (for example, school records and interviews). Multiple questions can be asked about the same variable and responses checked for consistency. A small subsample might be selected for intensive efforts to determine these variables accurately; a measurement error model might then be estimated by comparing the accurately assessed values for these cases with their initial error-prone responses.

Sixth, planning must include determination of the sample size required for reliable inferences. Having a large sample is not sufficient when the goal of a study is to infer relationships among variables. Power analyses need to be performed for specific analyses of major importance, and these power analyses must take into account not just the raw numbers of observations, but the observations expected to exhibit specific patterns of covariates. If, for example, service clusters are highly correlated with variables thought to influence outcomes of interest (for example, a particular service cluster occurs only in districts dominated by students

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

of low socioeconomic status), then estimating effects due to treatment alone will be impossible. In experiments, treatments are generally randomized within cells defined by covariates in order to assure sufficient independent variation of treatments and covariates. This is not possible in an observational study; however, especially when initial data such as that from the Longitudinal Study are available, it may be possible to determine ahead of time whether sufficient variation exists within the population to make the desired inferences. If not, resources can be concentrated on collecting data required for analyses that are more likely to be fruitful. In determining a sample size it is again crucial to consider the issue of the appropriate unit of inference. Sample size calculations must also address missing data and, in a longitudinal study, expected attrition of sample units. Results of such analysis can be used to guide resource tradeoffs.

Finally, there must be careful monitoring of survey execution. As data are collected and coded, counts should be maintained of missing data, and preliminary correlational analyses can be run to determine whether the sample is showing sufficient variation among key variables. Adjustments can be made to correct for problems as they are identified. In some cases, it may turn out that some objectives cannot be met; resources can then be redirected to ensure the satisfaction of other important objectives.

Strategies for Control of Missing Data

Missing data problems can be greatly reduced (although seldom eliminated entirely) by proper planning and execution of a data collection effort.

A National Research Council report (Madow, Nisselson, and Olkin, 1983) states: “Almost every survey should be planned assuming nonresponse will occur, and at least informed guesses about nonresponse rates and biases based on previous experience and speculation should be made.” Among the steps that can be taken to reduce nonresponse or its impact on results are the following (see Madow, Nisselson, and Olkin (1983) for a more complete set of recommendations):

  • Based on study goals, identify a set of key variables for which it is essential to have relatively complete data.

  • Collect data as completely and accurately as possible, using follow-ups and callbacks as necessary. A final follow-up using supervisors and superior interviewers may be necessary. Consider using a shorter and/or simpler questionnaire for the final follow-up. Pay special attention to key data items.

  • Monitor data completeness as the study progresses; make adjustments in the data collection strategy as necessary to respond to the actual pattern of missing data.

  • Plan strategies for collecting data items anticipated to be especially difficult to collect. Adjust strategies as data collection progresses.

  • Monitor completion rates by important classifications of covariates. (This recommendation is crucial if relational analysis is the goal).

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
  • Avoid unnecessarily complex and time-consuming questionnaires. Design questions to be non threatening and easy to understand.

  • Consider including questions that are useful for modeling and adjusting for nonresponse. Consider including simpler or less-threatening versions of questions for which high non-response rates are anticipated (notice that this recommendation conflicts with the prior one; obviously a tradeoff is required).

  • Make sure that the sample size is sufficiently large to compensate for missing data.

  • To the extent possible, describe and/or model the missing data process and the degree to which respondents differ from non-respondents.

  • Attempt to select cells for imputation so that respondents and non-respondents are as similar as possible within cells.

  • If possible, use multiple imputation to improve estimates and estimates of variances. Explore the effects of alternate assumptions regarding the response process.

  • Discuss and/or model biases that are the result of missing data.

LONGITUDINAL STUDY RESEARCH QUESTIONS

Reprinted here are the revised set of research objectives for the Longitudinal Study. This material is quoted verbatim from Exhibit 2 of the Longitudinal Report (Development Associates, 1984b):

  1. What are the effects of the special services provided for LM/LEP students in grades 1-6?

    1. Effects of specific student characteristics

      1. Comprehension of native language (oral & written)

        1. What is the relationship between LM/LEP student's oral proficiency in the native language and the learning of English?

        2. How is a LM/LEP student's oral proficiency in the native language related to the learning of English when the student's native language is:

        1. A language linguistically related to English?

        2. A language linguistically distant from English?

        1. What is the relationship between a LM/LEP student's literacy in the native language and the acquisition of literacy in English when the native language is a language linguistically related to English (e.g. Spanish, Portuguese)? When it is not related?

        2. What is the relationship between a student's general ability and native language proficiency?

      1. Classroom behavior

      2. What is the relationship between LM/LEP students' classroom behaviors and success in school?

      3. Parents' interest and involvement

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

What is the relationship between LM/LEP parents' interest and involvement in their child's education and the child's success in school?

  1. Effect of extent or duration of services

    1. What are the effects of length or duration of receipt of special services on the subsequent achievement of LM/LEP students?

    2. How do LM/LEP students who are receiving or have received special services for LM/LEP students compare with LM/LEP students who have never received such services?

  1. Effects of site (classroom, school, or district) and staff characteristics

    1. What is the effect of linguistic isolation; e.g., being in a school where few or no students speak the same language, on the time required to learn English?

    2. To what extent does the degree of teacher and staff familiarity with the native culture of the LM/LEP students affect student achievement.

    3. To what extent does the educational background of teacher and staff affect the achievement of LM/LEP students?

  1. Effects of conditions for exiting

    1. To what extent is oral proficiency in English correlated with proficiency in handling the written language (e.g., reading comprehension) and with academic achievement?

    2. When students are exited from special services after a fixed time, without regard to level of performance on some criterion variable, what is the effect on the student's subsequent achievement?

  1. How do the various combinations of special services, or "service clusters," provided for LM/LEP students in grades 1-6 compare in terms of the effectiveness with which recipients subsequently can function in all English medium classrooms? (A service cluster is a set of instructional services provided to a particular student at a particular point in time).

  1. Effects by specific student characteristics

    1. Socioeconomic status

      1. Which clusters work best with LM/LEP students whose socioeconomic status is low?

      2. Which clusters work best with LM/LEP students whose socioeconomic status is middle or high?

    1. General academic ability

      1. Which clusters work best for LM/LEP children whose ability level is high?

      2. Which clusters work best for LM/LEP children whose ability level is low?

    1. English proficiency

      1. Which clusters are most effective for children who speak little or no English?

      2. Which clusters are most effective for children who speak some English, but nonetheless, cannot benefit from a mainstream classroom?

    1. Grade level

      Which clusters are most effective for LM/LEPs by grades?

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
  1. Category of native language

  2. Which clusters are most effective when the LM/LEP students' native language is historically related to English? When it is not?

  1. Effects of specific service characteristics

    1. Using students' native language in teaching academic subjects

      1. What are the effects of using the student's native language in teaching academic subjects?

      2. How are the effects of the student's level of oral proficiency in the native language related to teaching academic subjects in the student's native language?

    1. Teaching reading of native language before teaching reading of English

      1. At the end of grade 6, how will students for whom the introduction of English reading was delayed while they were taught to read in their native language compare with students for whom reading in English was not postponed?

      2. To what extent does the effect of postponing instruction in reading English while the child learns to read in his native language first depend on

      1. The degree of lexical and orthographic similarity between the native language and English?

      2. Initial oral proficiency in the native language?

      3. General academic aptitude?

    1. Using “controlled” English vocabulary in instruction To what extent does the use of “special English” for instruction affect LM/LEP student achievement in:

      1. The content area?

      2. English?

    1. Styles of using two languages in instruction In the transition from the use of native language in instruction to 100% English, which is more effective, a slow shift or a rapid shift?

    2. Subjects taught

    3. What are the effects of teaching the child's native language as a subject of instruction rather than merely using it as the medium of instruction?

  1. Effect of school, classroom or teacher characteristics

    1. Linguistic composition of the student population

      1. Which clusters are the most effective when the child is in a school where there are few or no other children in his/her age group who speak the same language?

      2. Which clusters are most effective when the child is in a classroom where there are many other children in his/her age group who speak the same language?

    1. Untrained teachers What are the effects of untrained teachers by service clusters?

  1. What are the effects of having English proficient students in LM/LEP classrooms on the achievement and English language acquisition of LM/LEP students?

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
  1. What are the effects when the English proficient students come from a language minority background?

  2. What are the effects when the English proficient students come from a native English speaking background?

REFERENCES

There are several good introductory books on samples surveys. Cochran (1977), Yates (1981), and Kish (1965) all provide thorough introductions to the entire field of the design and analysis of sample surveys. The book by Kasprzyk et al. (1987) discusses the issues of longitudinal surveys with particular attention to nonresponse adjustments and modeling considerations. Skinner, Holt, and Smith (1989) discuss methods for looking at complex surveys and pay special attention to issues of bias (in the statistical sense) and modeling structured populations. Groves (1989) provides a detailed discussion of issues related to survey errors and survey costs and includes an extensive discussion of problems with coverage and coverage error of samples together with concerns related to nonresponse. Little and Rubin (1987) is a basic reference for statistical analyses with missing data, both in experiments and sample surveys. Finally, Duncan (1975) is an early and invaluable reference for the latent and structural equation models used in path analysis.

Burkheimer, Jr., G.J., Conger, A.J., Dunteman, G. H., Elliott, B. G., and Mowbray, K. A. (1989) Effectiveness of services for language-minority limited-english-proficient students (2 vols). Technical report, Research Triangle Institute, Research Triangle Park, N.C.


Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. (1983) Graphical Methods for Data Analysis. Belmont, Calif.: Wadsworth International Group.

Cochran, W. G. (1977) Sampling Techniques (third ed.). New York: John Wiley.


Development Associates (1984a) The descriptive phase report of the national longitudinal study of the effectiveness of services for LMLEP students. Technical report, Development Associates Inc., Arlington, Va.

Development Associates (1984b) Overview of the research design plans for the national longitudinal study study of the effectiveness of services for LMLEP students, with appendices. Technical report, Development Associates Inc., Arlington, Va.

Development Associates (1986) Year 1 report of the longitudinal phase. Technical report, Development Associates Inc., Arlington, Va.

Duncan, O.D. (1975) Introduction to Structural Equation Models. New York: Academic Press.


Groves, R.M. (1989) Survey Errors and Survey Costs. New York: John Wiley.


Kasprzyk, D., Duncan, G., Kalton, G., and Singh, M. P. (1987) Panel Surveys. New York: John Wiley.

Kish, L. (1965) Survey Sampling. New York: John Wiley.


Little, R. J. A., and Rubin, D. B. (1987) Statistical Analysis with Missing Data. New York: John Wiley.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×

Madow, W. G., Nisselson, J., and Olkin, I., eds. (1983) Incomplete Data in Sample Surveys, Volume 1: Report and Case Studies. Panel on Incomplete Data, Committee on National Statistics, Commission on Behavioral and Social Sciences and Education, National Research Council. New York: Academic Press.

Miller, R. G. (1981) Simultaneous Statistical Inference (second ed.). New York: Springer Verlag.


Skinner, C. J., Holt, D., and Smith, T. M. F. (1989) Analysis of Complex Surveys. New York: John Wiley.

Spencer, B. D., and Foran, W. (1991) Sampling probabilities for aggregations, with applications to NELS:88 and other educational longitudinal surveys. Journal of Educational Statistics, 16(1), 21–34.


U.S. Department of Education (1991) The Condition of Bilingual Education in the Nation: A Report to the Congress and the President. Office of the Secretary. Washington, D.C.: U.S. Department of Education.


Yates, F. (1981) Sampling Methods for Censuses and Surveys (fourth ed.). New York: Macmillan.

Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 24
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 25
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 26
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 27
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 28
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 29
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 30
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 31
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 32
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 33
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 34
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 35
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 36
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 37
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 38
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 39
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 40
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 41
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 42
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 43
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 44
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 45
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 46
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 47
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 48
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 49
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 50
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 51
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 52
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 53
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 54
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 55
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 56
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 57
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 58
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 59
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 60
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 61
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 62
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 63
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 64
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 65
Suggested Citation:"3 The Longitudinal Study." National Research Council. 1992. Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Washington, DC: The National Academies Press. doi: 10.17226/2014.
×
Page 66
Next: 4 The Immersion Study »
Assessing Evaluation Studies: The Case of Bilingual Education Strategies Get This Book
×
Buy Paperback | $45.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Bilingual education has long been the subject of major disagreements in this country. This book provides a detailed critique of the two largest studies of U.S. bilingual education programs. It examines the goals of the studies and what can be learned from them. In addition, using these studies as cases, this book provides guidelines on how to plan large evaluation studies to achieve useful answers to major policy questions about education.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!