Skip to main content

Currently Skimming:

5 Measurement Quality
Pages 159-208

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 159...
... However, five percent of establishments and 19 percent of firms had some form of problematic data, indicating that problematic cells were widely dispersed. About 16 percent of SRO cells had conflicting responses in Component 1 versus Component 2 data for the same year (one measure was zero while the other was not)
From page 160...
... The chapter concludes with recommendations for further quality analyses and for strengthening future data-collection efforts, to reduce measurement errors and improve fitness for use. To briefly summarize the key findings, measurement quality was assessed both before and after filtering the data (i.e., removing a small 1 In 2018 and 2019, EEO-1 data collections occurred in two components.
From page 161...
... data quality issues noted and tracked by
From page 162...
... . The overarching question of this chapter is whether important measurement quality issues exist for current Component 2 data that EEOC could address before the next round of data collection, through improvements to survey design or administration.
From page 163...
... Nevertheless, it is still possible to develop reasonable criteria for identifying and resolving data quality issues using indirect methods. The methods employed in this section to assess Component 2 measurement quality include the following: • An analysis of extreme values, inconsistent reports such as zero employees with non-zero hours worked or vice versa, and other data anomalies identified using only Component 2 data for a given year.
From page 164...
... First, the subsequent analysis will address errors in both employee counts and hours worked, both at the SRO and SROP levels. Second, to evaluate the biasing effects of measurement error, some measure of "truth" is needed which, unfortunately, is unavailable.
From page 165...
... . MEASUREMENT QUALITY INTERNAL INCONSISTENCY 165AND EXTRE ERNAL ERNAL REME VALUES INCONSISTENCYAND INCONSISTENCY AND EXTREME EXTREME VALUES VALUES INTERNAL INCONSISTENCY AND EXTREME Internal VALUESInconsistencies Internal Inconsistencies Internal Inconsistencies Internal Inconsistencies This section considers the inconsistencies in the Compone This section considers the inconsistencies in the Component onsiders onsiders nent 2 data thewithin the inconsistencies inconsistencies insurvey in the Component the year.
From page 166...
... 166 COMPENSATION DATA COLLECTED THROUGH THE EEO-1 FORM TABLE 5-1  Percent of SROP Cells with Missing Data on Hours Worked or Number of Employees 2017 2018 Missing Hours Missing Hours Missing Worked; Missing Worked; Have Employees; Have Employees; Firm, Establishment, and Employee Have Hours Employee Have Hours Employee Characteristic Counts Worked Counts Worked Administration Mode Online-Entry 0.9 0.0 0.8 0.0 Data-Upload 6.3 3.7 5.9 3.3 Size Distribution Fewer than 100 2.1 0.9 2.0 0.8 100–249 2.8 1.4 2.6 1.2 250–499 2.5 1.0 2.2 0.9 500–999 2.2 0.7 2.1 0.7 1,000 or More 2.5 0.6 2.4 0.5 Job Category Executive 5.9 3.0 5.5 2.6 First/Midlevel 2.3 1.1 2.2 1.0 Professionals 2.3 1.0 2.1 0.9 Technicians 2.2 1.1 2.2 1.0 Sales Workers 1.7 0.7 1.6 0.6 Administrative Support 2.4 1.1 2.3 1.0 Craft Workers 2.6 1.2 2.5 1.1 Operatives 2.4 0.9 2.2 0.8 Laborers and Helpers 3.1 1.3 2.8 1.1 Service Workers 2.2 0.7 2.0 0.6 Overall 2.4 1.0 2.2 0.9 SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018. NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.
From page 167...
... These findings suggest that missing data constitute an important data quality issue for current Component 2 data from uploaded forms -- one that would be advantageous to address when planning future data collections. Extreme Values Implausible or extreme data values are another indicator of measurement error.
From page 168...
... • Red Flag: Average # hours worked per employee for the SROP is greater than 5,840/year • Orange Flag: SROPs not previously flagged whose # hours worked per employee exceeds three standard deviations from the mean, where the standard deviation is computed over all establishments conditional on SROPs • Yellow Flag: Average # hours worked per employee for the SROP is greater than 0 while # of employees for the SROP is 0 • Green Flag: All other SROPs Establishment-Level Flags • Red Flag: Average # hours worked per employee for the establishment is greater than 5,840/year • Orange Flag: Establishments not previously red-flagged in the step above, with at least one SROP with a red flag or >10 percent SROPs with orange flags • Green Flag: All other establishments Firm-Level Flags Remove firms whose total number of employees exceeds the number of em ployees of the largest U.S. employer (1.4 million)
From page 169...
... This is especially true for larger establishments in which employee counts are expected to be more stable over time. Thus, inconsistent zeros or large discrepancies in reported numbers of employees between the two components can serve as a quality indicator (i.e., an indicator of the risk of measurement error for either or both datacollection components)
From page 170...
... # indicates rounds to zero. aFirm size for first set of three columns; establishment size for the remainder.
From page 171...
... collecting ze strata based dataupon establishment on the sizes from number of employees andComponent 10 collecting1data data.on the number of URCE: Panel generated from Component 2 employer, establishment, hours worked (EEOC, 2020e)
From page 172...
... aSize strata based upon establishment sizes from Component 1 data. as well as the absolute value of RDi denoted by ARDi.
From page 173...
... Establishment Web n Mode FIGURE NOTE: 5-1 Top Threefirms Quintiles Web Intercomponent Average ARD for Number of Employee reportingfor n Mode Excludes more than 1.4 million employees, establishments covered in in an SROType Cell6 reports, by Year 0 and and Firm SROP cells 25 Characteristic with missing (Excludes 0 50 values. 25 75 Inconsistent 50 100 Zero 75 Cells)
From page 174...
... NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. the squared difference between Component 1 and 2 observations.
From page 175...
... , and Iunweighted mi iis2sum i / Iinterpreting data) the be average proportion regarded squares, while because of parallel total the measures alldenominator variance establishments that (i.e., isthe iserror the 1ijare and(total variance.
From page 176...
... I1 forII2 I1I1 I 2 I 2 I1 i also apply to boththumb Tableand5-5i ,also which shows apply are theto also both confined values Table andand 5-5 toshows the , which unit forinterval. theare also conf comparison values NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports,Table and SROP cells with 5-5 shows the2missing values ITable values.
From page 177...
... . Component data cells thethe collection 2 proportion data that2proportion data are collection is non-zero of collection about SRO is 12across cells about is about percentage 12 Componen perce 12poin per collection is about 12 points than of SRO the eower results non-zero in in Table both Comparing lower Comparing years lower 5-6 with of than than the the the increase the results those Component proportion results proportion is in in expected in Table Table of Table 2 of SRO 5-3, data SRO given 5-6 5-6 the cells with collection with cells the that those proportion those that one-year are is are in about non-zero in Table of Table non-zero gap 12 betweenSRO 5-3, across 5-3, across the cells percentage proportion Components the points proportion Components reporting periods 1 of and of 1 in SR SR and 2 T than the proportion of SRO cells that are non-zero across Components 1 and 2 data.
From page 178...
... This Type tables 6 omits reports, and cells SROP that cellswere with zero missinginvab an 1.4 million employees, establishments Type Type 6 covered reports, reports, in Type 6 reports, and SROP cells with missing values. This tablesomi 6 and and SROP SROP cells cellswith with missing missing values.
From page 179...
... (%) Administration Mode Online-Entry –1.79 –2.20 Data-Upload –0.49 –3.19 Establishment Size Fewer than 100 –1.12 –3.32 100–249 0.32 –1.64 250–499 1.57 –0.83 500–999 1.85 –1.24 1,000 or More 9.18 –4.44 Overall –0.75 –3.08 SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.
From page 180...
... SOURCE: Panel generated from Component 2 employer, establishment, and em ployee files for 2017 and 2018. NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.
From page 181...
... the rules I1 , which and contributio Note of that th highly varying SRO sizes, which may better reflect overall data and possible quality couldthan be explored does in a not e more for Ii also mbaccount I1 and Ii also I 2to, which I1 and Iconfined thumb forapply tofor both an establishment's applycontribution both are alsototal to 2 , which to arethe variance. also unit confined Other interval.
From page 182...
... . Thus, I 2 0.94 is weighted toward larger 500–999 account 0.03 for an establishment's contribution 1,000 or More possible 0.05 and could be 0.55explored in a more exte highly varying SRO sizes, which may better reflect overall data quality than Overall account for an establishment's 0.13contribution thumb for Ii also applyto 0.58 total I1 Other variance.
From page 183...
... NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missingFigure Figure values. 5-4 5-4plots plotsthe thenumber numberof ofemployees employeesfor forlinked linkedestablishments establishmentsreporting reporting both boththe theComponent Component11and and22data datacollections collections(i.e., (i.e., yy1i(E)
From page 184...
... . the Third, administration the percent- quality the an mode qu distribution of number of establishments and employees across administration mode an ors established indicators age established in of the previous establishments in the sections previous deleted of by sections this the chapterof filtering strata.
From page 185...
... NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1)
From page 186...
... , covered a rule of thumb for interpreting the index of inconsistency for in Type 6 reports, establishments with 0 employees, and SROP cells with s: 0 ≤ Ii ≤ 0.20 is good, 0.21 ≤ Ii ≤ 0.50 missing values. Filtered data is moderate, also and Ii ≥ 0.51 exclude Component is poor.
From page 187...
... 2018 RD (%) Administration Mode     Online-Entry –1.89 –2.64 –0.81 –1.55 Data-Upload –0.30 –1.30 –0.09 –1.08 Size Distribution     Fewer than 100 Component0.68 1 versus –0.34Component 0.682 data)
From page 188...
... 2017 the and to 2 the values and al2o fo NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, Table 5-5 andTable shows SROP cells the 5-5 valueswithITable shows 1missing the Ivalues. andunweighted values 5-5 2 shows IFiltered for comparisons 1 Table and the Idata Iunweighted 1 values 25-5for I12017 also shows Ithe exclude ofcomparisons and the IComponent 1 and 2values for 2018 I1 and ofcomparisons 2017 2Iand I 22018 data that show and 2 data.
From page 189...
... for Ii also toward largeritestablishments is I w account for an establishment's thumb contribution applyvariance. to total to both 1Oth an highly varying SRO sizes, which may better reflect overall data extended quality than I1 , which does In possible and could be explored in a more analysis of reliab account for an establishment's contribution to total variance.
From page 190...
... NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with no employees, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1)
From page 191...
... .reflect toward 0.03 sizes, Thus, larger 0.05 overall 2which data may establishme is weight I account for0.05 an establishment's 0.55 account contribution for0.02 an establishment's to0.07 total variance. cont 1,000 or More highly varying SRO 9.18 sizes, which highly mayvarying better 0.93 reflect SRO sizes, overall which datamay quality better than I1 , which reflect overall possible and could be exploredpossible in aand morecould extended be explored rd analysisinofa m 0.13 Overall –0.75 account for an establishment's account for0.58 contribution to –0.73 an establishment's 0.13 Other total variance.
From page 192...
... a more SOURCE: Panel generated Component 2 employer, establishment, and employee files for also both extended Note confined that andr analysi the 2017 and 2018. Ii also I1 forI 2I I1 I 2 I 2 I1 and Io1 NOTE: thumb Excludesforfirms apply more reporting to both thumb Table than and 5-5 1.4 shows,i also million which theapply are also to both values employees, confined Table establishments and 5-5 toshows the, which for covered unittheinterval.
From page 193...
... NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1)
From page 194...
... also found extremely large error variances, particularly within the smallest and largest establishments. Plotting the establishment sizes from Component 1 and 2 data against one another revealed that less than one percent of all establishments in Component 2 data reported highly implausible numbers.
From page 195...
... While reliability was not directly evaluated for pay data, results for both numbers of employees and hours worked have important implications for assigning hourly pay rates to employees. Erroneous employee counts and hours worked for an SROP cell result in erroneous estimates of hourly pay calculated from these data.
From page 196...
... Filtering on employee counts and on hours worked would be beneficial, but some issues would be best addressed by modifying the basic data-collection methodology. RECOMMENDATION 5-2: Before future collection of Component 2 data, EEOC should conduct a field test to investigate issues of burden, data availability, and instrument design.
From page 197...
... CHAPTER APPENDIXES APPENDIX 5-1 Percent of Data Present for Hours Worked and Employment in SROP Cells 2017 2018 0 or Missing 0 or Missing >0 Employees; Employees; >0 Employees; Employees; Firm, Establishment, and >0 Employees; 0 or Missing >0 Hours >0 Employees; 0 or Missing >0 Hours Employee Characteristic >0 Hours Worked Hours Worked Worked >0 Hours Worked Hours Worked Worked Administration Mode Online-Entry 74.0 26.5 # 73.8 26.5 # Data-Upload 26.0 73.5 100.0 26.2 73.5 100.0 Establishment Size Less than 100 45.8 41.0 40.4 45.8 40.6 41.2 100–249 27.1 31.8 36.5 27.0 32.2 36.0 250–499 13.3 13.9 13.6 13.4 13.4 13.4 500–999 6.9 6.2 5.0 7.0 6.5 5.1 1,000 or More 6.9 7.1 4.1 6.9 7.3 3.8 Missing or Invalid 0.0 0.0 0.4 0.0 0.0 0.4 Job Category Executive 2.6 6.9 8.0 2.6 6.7 7.9 First-/Mid level 15.9 15.2 17.6 15.7 15.2 17.4 continued 197
From page 198...
... APPENDIX 5-1  Continued 198 2017 2018 0 or Missing 0 or Missing >0 Employees; Employees; >0 Employees; Employees; Firm, Establishment, and >0 Employees; 0 or Missing >0 Hours >0 Employees; 0 or Missing >0 Hours Employee Characteristic >0 Hours Worked Hours Worked Worked >0 Hours Worked Hours Worked Worked Professionals 17.5 16.9 16.5 17.7 16.9 17.1 Technicians 6.8 6.3 7.0 6.7 6.7 7.2 Sales Workers 13.1 9.3 8.2 12.9 9.1 8.2 Administrative 14.7 15.0 15.2 14.6 15.3 15.3 Support Craft Workers 4.9 5.3 5.8 4.9 5.4 5.7 Operatives 7.7 7.6 6.9 7.7 7.5 6.6 Laborers and 5.4 7.0 6.7 5.4 6.8 6.7 Helpers Service Workers 11.5 10.5 8.1 11.7 10.4 7.9 Establishment Quality Flag Red 0.3 0.6 2.2 0.2 0.5 2.2 Orange 5.1 77.1 56.0 4.7 75.8 51.9 Green 94.5 22.4 41.8 95.1 23.7 45.9 Overall 100.0 100.0 100.0 100.0 100.0 100.0 SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018. NOTE: # Indicates rounds to zero.
From page 199...
... APPENDIX 5-2 Percentage of Firms, Establishments, and Cells with Each Flag Status (2018) Firms Establishments Within Firms Cells Within Establishments Firm, Establishment, and Employee Characteristic Red Orange Green Red Orange Green Red Orange Yellow Green Administration Mode Online-Entry 23.2 26.6 40.6 68.6 34.4 80.9 58.0 71.9 26.5 73.1 Data-Upload 76.8 73.4 59.4 31.4 65.6 19.1 42.0 28.1 73.5 26.9 Establishment Size Less than 100 0.0 # 0.1 77.2 75.0 84.9 39.8 23.8 40.6 45.8 100–249 50.0 36.0 55.3 15.6 15.8 10.2 28.5 20.1 32.2 27.1 250–499 23.9 22.3 22.7 4.9 4.7 3.1 15.1 17.3 13.4 13.4 500–999 14.6 15.1 11.0 1.2 2.0 1.1 7.0 10.9 6.5 6.9 1,000 or More 11.4 26.6 10.9 1.2 2.5 0.7 9.6 27.9 7.3 6.9 Missing or Invalid 0.0 0.0 0.0 0.0 0.0 # 0.0 0.0 0.0 # Job Category Executive – – – – – – 2.7 2.6 6.7 2.7 First/Midlevel – – – – – –  13.4 5.7 15.2 15.8 Professionals – – – – – –  17.8 16.6 16.9 17.7 Technicians – – – – – –  6.3 22.3 6.7 6.7 Sales Workers – – – – – –  7.6 11.9 9.1 12.9 Administrative Support – – – – – –  9.9 13.4 15.3 14.6 199 continued
From page 200...
... APPENDIX 5-2  Continued 200 Firms Establishments Within Firms Cells Within Establishments Firm, Establishment, and Employee Characteristic Red Orange Green Red Orange Green Red Orange Yellow Green Craft Workers – – – – – – 6.4 11.4 5.4 4.9 Operatives – – – – – – 13.3 1.5 7.5 7.7 Laborers and Helpers – – – – – – 6.3 6.2 6.8 5.4 Service Workers – – – – – – 16.3 8.4 10.4 11.7 Establishment Quality Red – – – 100.0 0.0 0.0 57.2 4.3 0.5 0.1 Orange – – – 0.0 100.0 0.0 42.8 46.6 75.8 5.1 Green – – – 0.0 0.0 100.0 0.0 49.0 23.7 94.8 Overall 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018. NOTE: # Indicates rounds to zero.
From page 201...
... the match rates on how successful the two datasets can be matched based on the ID variables and additional information (e.g., addresses, NAICS code, and zip code)
From page 202...
... Component 1 & 1 Data 2 data files for Numerator: the number of establishments that 2017 and 2018 can be matched across the two years Denominator 1: the number of establishments that appeared in the 2017 Component 2 data Denominator 2: the number of establishments that appeared in the 2018 Component 2 data a The Type 6 reports and establishments in the firms that failed the Walmart rule (i.e., with a size larger than Walmart) are excluded in the calculation of the match rate.
From page 203...
... Establishments with unique matches across the two years then can be merged. Match Rate Match rate among establishments, after excluding outliers (based on the Walmart rule)
From page 204...
... 5) Merge based on unique matches of HDQ_NBR+ZIPCODE+NAICS: for establishments that failed to be matched in the steps above, they were matched based on the zip code from the standardized ad dresses and the NAICS code (all six digits)
From page 205...
... The overall match rate is 66.5 percent, which equals to 597,361 divided by 897,770. Match rate in 2018 among establishments, after excluding large outliers (based on the Walmart rule)
From page 206...
... Match Rate Match rate among establishments, after excluding large outliers (based on the Walmart rule) and the Type 6 reports Among the 597,361 establishments that can be matched across the two components in 2017, 527,379 (88.3%)
From page 207...
... APPENDIX 5-4 Inconsistent Zeros for Number of Employees Comparing Component 2 Data for 2017 and 2018 at the SRO Level Number Percent Firm and Establishment >0 Employees; >0 Employees; >0 Employees; >0 Employees; >0 Employees; >0 Employees; Characteristic 2017 Only 2018 Only Both Years 2017 Only 2018 Only Both Years Administration Mode             Online-Entry 243,256 277,659 1,482,817 12.1 13.9 74.0 Data-Upload 849,513 896,738 4,498,270 13.6 14.4 72.0 Establishment Size             Fewer than 100 729,700 789,222 3,290,336 15.2 16.4 68.4 100–249 231,687 245,237 1,543,519 11.5 12.1 76.4 250–499 85,843 89,616 658,356 10.3 10.7 79.0 500–999 29,070 31,948 276,911 8.6 9.5 81.9 1,000 or More 16,469 18,302 211,965 6.7 7.4 85.9 Establishment Quality Red 3,401 4,052 18,288 13.2 15.7 71.0 Orange 71,705 77,498 361,755 14.0 15.2 70.8 Green 1,017,663 1,092,847 5,601,044 13.2 14.2 72.6 Overall 1,092,769 1,174,397 5,981,087 13.2 14.2 72.5 SOURCE: Panel generated from Component 2 employer and establishment files for 2017 and 2018.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.