**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

**Suggested Citation:**"Chapter 4 - Guidelines for Statistical Sampling." National Academies of Sciences, Engineering, and Medicine. 2017.

*Performance-Related Specifications for Pavement Preservation Treatments*. Washington, DC: The National Academies Press. doi: 10.17226/24945.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

24 A PRS must be supplemented by a sampling method that balances efficiency with contrac- tor and agency risks. As noted in Chapter 2, accepting poor construction quality is sometimes considered. The decision of accepting or rejecting a construction quality is made based on an inference drawn on the statistical parameters of the candidate AQC obtained in the field. Usu- ally a few samples are tested in the field and the PWL value is calculated. A contractor may be paid an incentive if the measured quality (PWL value) is better than acceptable quality level, or penalized if the measured quality is less than AQL. The PWL value of a lot depends on the type of sampling method and sample size. Although the risks associated with accepting poor con- struction quality or rejecting good quality cannot be eliminated, they can be reduced by varying the sampling method and sample sizes. This chapter discusses ways to determine the effect of sampling methods and sizes on the PWL estimation of a lot and their effect on the risks to the agency and the contractor. 4.1 Types of Acceptance Sampling Plans Inspection of construction materials or finished pavement layers is one aspect of quality assurance. Inspection performed for acceptance or rejection of a finished product (based on specifications) is called acceptance sampling. In highway construction, acceptance plans are based on lot-by-lot acceptance sampling. However, acceptance sampling is not a substi- tute for construction or production monitoring and control, or use of statistical methods to reduce variability. The effective use of these techniques early in the construction process can greatly reduce, and in some cases eliminate, the need for extensive sampling inspections (Montgomery 2013). For a typical application of acceptance sampling, the highway agency receives a notice for completion of, for example, an HMA surface layer for a lot from a contractor. The agency needs to make sure that the construction quality is achieved within a certain range of quality character- istics (e.g., air voids in the finished asphalt layer ranges between 4% and 7%). Therefore, samples are taken from the sublots to measure the as-constructed AQCs (e.g., percent air voids). A deci- sion regarding lot disposition is made based on the information obtained from these samples. Generally, the decision is either to accept or to reject the lot. Sometimes this decision is referred to as lot sentencing. Although the contractor is paid in full for the accepted lots, rejected lots should be fixed or the contractor may be subjected to some other lot disposition action, such as reduced payment. The purpose of acceptance sampling is to sentence lots, not to estimate its quality. Acceptance sampling plans do not provide any direct form of quality controlâthey are used to simply accept or reject lots. Although all lots may be of the same quality, sampling will accept some lots and Guidelines for Statistical Sampling C h a p t e r 4

Guidelines for Statistical Sampling 25 reject others. Process controls during construction (not the acceptance sampling plans) are used to control and systematically improve quality. Thus, acceptance sampling is used as an audit tool to ensure that the output of a process conforms to specifications. For acceptance sampling, typically three approaches can be used for lot sentencing: (1) accept with no inspection; (2) 100% inspection, and; (3) acceptance sampling (Scott et al. 2014a; Montgomery 2013; Fugro Consultants, Inc., and Arizona State University 2011). The no- inspection alternative is useful where either the contractorâs construction process is so good that defective units are almost never encountered or where there is no economic justification to look for defective units. In general, 100% inspection can be used in situations where the component is extremely critical and passing any defectives would result in an unacceptable consequence or where the contractorâs process capability is inadequate to meet specifications. Acceptance sam- pling is most likely to be beneficial in the following situations (Montgomery 2013): â¢ Testing is destructive (i.e., more resources are required to acquire and test samples). â¢ The cost of 100% inspection is extremely high or not feasible. â¢ The inspection error rate is so high that 100% inspection might lead to the acceptance of a higher percentage of defective units than would occur with the use of a sampling plan. â¢ The contractor has an excellent quality history, and some reduction in inspection from 100% is desired. â¢ There are potentially serious product liability risks, and a program for continuously monitoring the product is necessary. When compared with other approaches, acceptance sampling has the following advantages (Montgomery 2013): â¢ It is usually less expensive because there is less inspection. â¢ There is less handling of the product, hence reduced damage. â¢ It is applicable to destructive testing. â¢ It involves fewer personnel in inspection activities. â¢ It reduces the amount of inspection error. â¢ It provides a stronger motivation to the contractor for quality improvements because it rejects an entire lot as opposed to requiring a corrective action of defective sublots. However, acceptance sampling also has the following disadvantages (Montgomery 2013): â¢ It presents the risks of accepting âbadâ lots (agencyâs risk) or rejecting âgoodâ lots (contractorâs risk). â¢ It usually generates limited information about the finished product or the process used for making it. â¢ It requires planning and documentation of the acceptance sampling procedure. Acceptance sampling plans can be classified by different schemes, often by attributes and vari- ables. Attributes are quality characteristics expressed on a pass or fail basis; variables are quality characteristics measured on a numerical scale (National Academies of Sciences, Engineering, and Medicine 2012, TRB Committee on Management of Quality Assurance 2002). Attribute Sampling Plan Attribute sampling plans are typically used in manufacturing industries where a lot is inspected by taking samples to determine if a sample passes or fails. The sentencing is based on an AQC of interest. Such an acceptance plan can be used for pavement construction by measuring the binder content in an asphalt mixture. A cut-off value of binder content can be decided to sentence a sample. For example, if a sample has a binder content of less than 4.5%, it fails, otherwise it will

26 performance-related Specifications for pavement preservation treatments pass. There is a restriction on the number of sample failures if the lot is to be accepted (e.g., 5 sam- ples can fail out of 50). Suppose that a lot of size N has been submitted for inspection. A single-sampling plan is defined by the sample size n and the acceptance number c. Thus, if the lot size is N = 10,000, then the sampling plan n = 50, and c = 5 means that from a lot of size 10,000, a random sample of n = 50 units are inspected and the number of nonconforming or defective items d observed. If d is less than or equal to c = 5, the lot will be accepted. If d is greater than 5, the lot will be rejected. Since the quality characteristic inspected is an attribute, each unit in the sample is judged to be either conforming or nonconforming. One or several attributes can be inspected in the same sample; generally, a unit that is nonconforming to specifications on one or more attributes is said to be a defective unit. This procedure is called a single-sampling plan because the lot is sentenced based on the information contained in one sample of size n (Montgomery 2013). An essential performance measure of an acceptance sampling plan is the operating charac- teristic (OC) curve. This curve plots the probability of accepting the lot versus the lot fraction defective. Thus, the OC curve displays the discriminatory power of the sampling plan (i.e., the probability that a lot with a certain fraction defective will be either accepted or rejected). The OC curve of the sampling plan n = 50, c = 5 is shown in Figure 4-1. The probability of observing exactly d can be estimated by using the binomial distribution as follows: P d n d n d p pd n d( )( ) ( )= â â âdefective ! ! ! 1 (Eq. 4-1) The probability of acceptance can be calculated as the probability that d is less than or equal to c as follows: ! ! ! 1 (Eq. 4-2) 0 P A P d c n d n d p pd n d d c â ( )( ) ( ) ( )= â¤ = â â â = The OC curve can be developed by using Equation 4-2 for various values of p. In attribute sampling plans, the OC curve shows its discriminatory power. For example, in the sampling 0% 20% 40% 60% 80% 100% 0.00 0.10 0.20 0.30 0.40 0.50 P( A) Percent defective n = 10 n = 25 n = 50 Figure 4-1. OC curves for attribute sampling plan.

Guidelines for Statistical Sampling 27 plan n = 50, c = 5, if the lots are 10% defective, the probability of acceptance is approximately 0.62 (i.e., of 100 lots from a process that manufactures 10% defective product, 62 will be accepted and 38 will be rejected). A sampling plan that discriminates perfectly between good and bad lots would have an OC curve that looks like the solid line in Figure 4-1. The OC curve runs horizontally at a probability of acceptance P(A) = 1.00 until a level of lot quality that is considered âbadâ is reached, at which point the curve drops vertically to a probability of acceptance P(A) = 0.00, and then the curve runs horizontally again for all lot fraction defec- tives greater than the undesirable level (10% in this example). Figure 4-1 also shows the effect of sample size n on the OC curveâthe OC curve becomes more like the idealized OC curve shape as the sample size increases for a constant c. Thus, the precision with which a sampling plan differentiates between good and bad lots or pass and fail lots increases as the size of the sample is increased. The greater the slope of the OC curve, the greater is the discriminatory power (Montgomery 2013). Typically, the quality engineer will focus on certain points on the OC curve. The seller or contractor is usually interested in knowing what level of process quality would yield a high probability of acceptance. For example, a contractor might be interested in the 0.95 probability of acceptance point. This would indicate the level of process fallout that could be experienced and still have a 95% chance that the lots would be accepted. Conversely, a buyer or agency might be interested in the level of lot or process quality that will yield a low probability of acceptance (i.e., the end of the OC curve). Therefore, an agency will often establish a sampling plan with reference to an acceptable quality level (AQL). The AQL represents the poorest level of quality for the sellerâs process that the buyer would consider to be acceptable as a process average. Therefore, an agency will often design the sampling procedure so that the OC curve may give a high probability of acceptance at the AQL. The AQL is simply a standard against which to judge the lots. An agency will also be interested in the other end of the OC curve to protect against acceptance of a poor quality lot. In such a situation, the agency may establish a rejectable quality level (RQL). The RQL is the lowest level of quality that the buyer or agency is willing to accept in an individual lot, recognizing that the lot tolerance percent defec- tive is not a characteristic of the sampling plan, but is a level of lot quality specified by the agency. It is possible to design acceptance sampling plans that give specified probabilities of acceptance at the RQL point. The two unknown quantities required to specify an attribute acceptance sample plan can be determined from the simultaneous Equations 4-3 and 4-4. 1 ! ! ! 1 (Eq. 4-3) 0 â ( ) ( ) ( )â Î± = â â â = = n d n d AQL AQLd n d d d c ! ! ! 1 (Eq. 4-4) 0 â ( ) ( ) ( )Î² = â â â = = n d n d RQL RQLd n d d d c where a = Sellerâs or contractorâs risk b = Buyerâs or agencyâs risk d = Number of defective sublots n = Sample size Variable Sampling Plan Generally, statistical quality control in highway industry uses a variable sampling plan because the AQCs used are mostly measured on a numerical scale. With the variable sampling plans, the same OC curve can be obtained with a smaller sample size than would be required by an attributes sampling plan. Thus, a variables acceptance sampling plan that provides the same protection as

28 performance-related Specifications for pavement preservation treatments an attributes acceptance sampling plan would require less sampling (Montgomery 2013, National Academies of Sciences, Engineering, and Medicine 2012). This sampling plan will reduce the costs of inspection when destructive testing is employed. In addition, measurement data usually provide more information about the construction process or the lot than do attributes data. Generally, numerical measurements of quality characteristics are more useful than simple classification of the item as defective or non-defective. Also, when AQLs are very small, very large sample size is required for attributes sampling plans, and it would be advantageous to switch to variables measurement. The most important advantage of a variable acceptance plan is its use in calculating a quality mea- sure like PWL and facilitating the implementation of pay factors. Therefore, the adoption of a vari- able sampling plan requires the use of an AQC which is measured on a continuous numerical scale. For the variable sampling plans, the distribution of the AQC must be known; most acceptance plans assume normal distribution of the AQC. If the distribution is not normal, but a normal dis- tribution is assumed, substantial deviations from the advertised risks of accepting or rejecting lots of a given quality may be experienced. Also, variable sampling needs a separate sampling plan for each AQC (i.e., if an item is inspected for three AQCs, three separate variable inspection sampling plans would be required). Variable sampling plans are used to calculate a quality measure (QM) that provides a measure of population parameters related to quality and thus requires the mean and standard deviation estimates, especially for AQCs with both upper and lower specification limits. AQCs that use only the average as a QM are less effective than measuring the variability (Burati et al. 2004, Burati et al. 2003) because the average of a QM tends to balance low and high test values to meet specifications. Therefore, such practices may result in a multimodal population with multiple modes or peaks and increased variability. In addition, using only the average does not lend itself to quantifying risks because it reduces population variability. The literature (Burati et al. 2004, Burati et al. 2003) indicates that the PWL or percent defective (PD) is the most often used QM. This QM is used to develop the proposed guidelines. The PWL value of a lot is calculated using the quality index (Q value) of the specification limits. The Q-statistics are calculated using Equations 4-5 and 4-6. (Eq. 4-5)Q x LSL s L = â (Eq. 4-6)Q USL x s U = â where QL = quality index for the lower specification limit QU = quality index for the upper specification limit LSL = lower specification limit USL = upper specification limit x _ = the sample mean for the lot s = sample standard deviation for the lot The lower and upper Q-values are used to find the PWL from tables (Burati et al. 2003) or by using a beta distribution (Willenbrock and Kopac 1978) to estimate the PWL for two-sided specification limits by using Equation 4-7: 100 (Eq. 4-7)PWL PWL PWLT U L= + â Figure 4-2 shows the PWL for double- and single-sided (upper or lower specification) limits (calculations are provided in Chapter 6).

Guidelines for Statistical Sampling 29 PWL estimates depend on sample sizes and sampling methods. Methods for estimating sample sizes and types of sampling methods are discussed next. 4.2 Estimating Sample Sizes Sample sizes can be determined based on (1) standard error of the mean, (2) estimation of the lot mean, or (3) hypothesis tests on the lot mean. Standard Error of the Mean This method estimates the sample size as n at which the standard error of the mean of the sample has stabilized (i.e., any further increase in the sample size would have a negligible effect on the standard error of the mean). Figure 4-3 (Cho et al. 2011) illustrates the standard error of the mean versus sample sizes and shows that the standard error of the mean stabilizes around a sample size of 20. This method requires an estimation of population standard deviation, which can be obtained from the historical data. Estimation of the Lot Mean When n samples are measured from a lot of the true mean AQC Âµ, the sample mean (y _ ) and standard deviation (s) of the samples can be calculated. Based on the central limit theorem (Freund (b) PWL based on lower specification limits QLxs X LSL PWL PD (a) PWL based on double-sided specification limits QLxs X LSL USL PWL PDQUxs (c) PWL based on upper specification limits X USL PWL PDQUxs Figure 4-2. Illustration of PWL and PD for different specification limits.

30 performance-related Specifications for pavement preservation treatments et al. 2010), y _ will be normally distributed with mean Âµ and standard error nÏ . Based on the area under a normal curve, the interval nÂµ Â± Ï1.96 contains 95% of the y_ s in repeated sampling. In other words, the interval y nÂ± Ï1.96 contains the population mean Âµ 95% of the times. So y nÂ± Ï1.96 is an interval estimate of Âµ with a level of confidence of 95%. There are many confidence intervals for Âµ, depending on the level of significance. A con- fidence level of (1 - a) can be used to calculate the confidence interval by using Equation 4-8. CI y Z n y E= Â± Ï = Â±Î± (Eq. 4-8)2 where CI = confidence interval y _ = sample mean ZÎ± 2 = value of z having a tail area a/2 to its right s = population standard deviation n = sample size E = Z nÏÎ± 2 The sample size can be estimated by using Equation 4-9, or Equation 4-10 can be used, if the population standard deviation is not known and a sample is used to estimate the population standard deviation. The tolerable error can be assumed based on engineering judgment. For example, if the standard deviation of air void content is 2% for a lot, to estimate the true mean of the lot with a tolerable error of 1% air void content, a sample size of 16 is needed for a 95% confidence level. n Z E ( ) = ÏÎ± (Eq. 4-9) 2 2 2 2 where ZÎ± 2 = value of Z having a tail area a/2 to its right s = population standard deviation E = tolerable error 0.18% 0.16% 0.14% 0.12% 0.10% 0.08% 0.06% 0.04% 0.02% 0.00% 0 5 10 15 20 25 Sample size St an da rd e rro r o f t he m ea n (% ) Figure 4-3. Standard error of the mean versus sample sizes (Cho et al. 2011).

Guidelines for Statistical Sampling 31 n t s E ( ) = Î± (Eq. 4-10) 2 2 2 2 where tÎ± 2 = value of t having a tail area a/2 to its right s = sample standard deviation Hypothesis Test on Lot Mean Another method of estimating the sample size is the hypothesis test on the mean difference of sample and desired means of a lot. In this procedure, the sample mean is compared to the desired lot mean to determine if it is within the specified limits. The sample mean AQC is represented by y _ and the desired lot mean AQC is designated by Âµ0. Equation 4-11 can be used to test if the estimated sample mean y _ is within the desired confidence interval of Âµ0. z y n = â Âµ Ï (Eq. 4-11) 0 where z = z value y _ = sample mean Âµ0 = desired mean s = population standard deviation n = sample size The calculated z value from Equation 4-11 is compared with the critical z values for a specified confidence level (e.g., 95%). For example, for a confidence level of 0.95, the critical z value is 1.96. If the calculated z value is more than 1.96, then the lot mean estimated from the sample mean is significantly different from the desired lot mean. As with any decision process, a Type I error can be made, i.e., by falsely rejecting the lot (contractorâs risk, a) or by falsely accepting the lot (agencyâs risk, b). The required sample size can be estimated for assumed values of both risks and tolerable difference between the estimated lot mean and desired lot mean using Equation 4-12 (Freund et al. 2010). (Eq. 4-12)2 2 2 n Z Z( ) = Ï + â Î± Î² where n = sample size s = population standard deviation Za = Z value for the required sellersâ risk Zb = Z value for the required buyersâ risk D = tolerable difference between the estimated lot mean and desired lot mean As the tolerable mean difference between the estimated and desired lot mean decreases, the number of samples required to detect the mean difference with the same confidence (1 - a) and power (1 - b) increases, as seen in Figure 4-4. For example, to detect a mean difference of 1.5% air avoids, a sample size of 15 will have a power of 85% as compared to 35% for a sample size of 5. This means that for a sample size of 15, the agencyâs risk of not detecting this mean difference is 15% for a given contractorâs risk of 5% (at 95% confidence level).

32 performance-related Specifications for pavement preservation treatments These three approaches for determining sample size for a lot estimate a high sample size to represent a lot. However, in practice, a sample size of at least five is typically used; a higher sam- ple size is desirable, but may not be practical. None of these approaches adequately addresses the optimum sample size (i.e., the most cost-effective n). These statistical methods for sample size estimation are meant for (1) accept/reject acceptance plans, rather than pay adjustment acceptance plans; (2) single acceptance plans (one AQC), rather than acceptance systems (two or more AQCs); and (3) the average as the measure of quality, rather than the PWL measure or any other measure of quality (Cho et al. 2011). In addition, highway construc- tion and materials acceptance plans use a sample size that is often established on the basis of practical considerations such as personnel and time constraints. Commonly used sample sizes range between three and seven units. If this sample size is too small, the probability of making erroneous acceptance or pay adjustment decisions would be too high for agencies. If this sample size is too large, the cost of sampling and testing would be unnecessarily high, especially where destructive testing is used. The current literature shows that when histori- cal quality levels are satisfactory, the agencies may consider reducing sample size as much as practically possible (in most cases, a sample size of three per lot for each AQC is optimal) (Gharaibeh et al. 2010). 4.3 Types of Sampling Methods The units selected for inspection from a lot should be chosen at random and should be repre- sentative of all the items in the lot. Unless random samples are used, bias will be introduced and the effectiveness of the inspection process is destroyed. Sometimes the inspector may stratify the lot by dividing the lot into sublots and then taking random samples from each sublot. Although this stratification is usually a subjective activity, it ensures that units are selected from all loca- tions in the lot. Because sample size is determined based on the mean AQC by using statistical methods, there is a need to determine the effect of sample size on PWL for developing guidelines for PRS. In addition, the sampling method effects on PWL have not been quantified in the past. Therefore, the effects of three different sampling methods and four sample sizes on PWL were evaluated in this study. The three sampling methods were (1) random sampling with replacement, (2) stratified sampling, and (3) random sampling without replacement. For each sampling method, sample sizes of 3, 5, 10, and 20 were evaluated. A 2-mile-long lot was assumed for this evaluation. The lot was divided into 20 sublots, each 0.1-mile long. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Po w er Tolerable mean difference 0 0.5 1 1.5 2 2.5 3 3.5 n=5 n=10 n=15 Figure 4-4. Power vs. tolerable mean difference for different sample sizes.

Guidelines for Statistical Sampling 33 Random Sampling with Replacement In this sampling method, samples of different sizes are randomly selected from any location on the lot (i.e., from one sublot or different sublots). Figure 4-5 shows examples of random sampling with replacement for sample sizes of 3, 5, 10, and 20, respectively. In this sampling method, the samples obtained may not represent the entire lot. This may not be a problem if the construction variability is low (i.e., there is uniform construction quality across the entire lot). Stratified Sampling In this sampling method, the sublots are grouped together spatially such that the number of samples required is equal to the number of grouped sublots (one sample must come from each group of sublots). Figure 4-6 shows examples of stratified sampling for sample sizes of 3, 5, 10, (a) Sample size of 3 (b) Sample size of 5 (c) Sample size of 10 (d) Sample size of 20 Figure 4-5. Random sampling with replacement for various sample sizes. (a) Sample size of 3 (b) Sample size of 5 (c) Sample size of 10 (d) Sample size of 20 Figure 4-6. Stratified sampling for various sample sizes.

34 performance-related Specifications for pavement preservation treatments and 20. Stratified sampling reduces bias in the selection of samples because the samples col- lected are highly representative of the lot. Stratified sampling also prevents overrepresenta- tion of a part of the lot and provides greater precision than random sampling, especially when construction variability is high. Therefore, it may be possible to use a smaller sample size with this method. Random Sampling without Replacement This method is very similar to random sampling with replacement, except that, for a given sample size, each sample has to be collected from a different sublot (i.e., without replace- ment). It is possible that this method could be more precise than random sampling with replacement because the samples selected from the lot are spread more evenly along the lot. Figure 4-7 shows example random sampling without replacement for sample sizes of 3, 5, 10, and 20. 4.4 Effects of Sampling Methods and Sizes on PWL The effects of the three sampling methods and sample sizes of 3, 5, 10, and 20 on the estima- tion of PWL of a lot were evaluated. As an example, air void content was chosen as an AQC. Four after-construction qualities (i.e., four different lots) were simulated. For each sampling method, sample of sizes 3, 5, 10, and 20 were used to evaluate the PWL estimations. The estimated PWL values were compared with the true PWL of the lot. The true PWL of a lot was determined based on the simulated values of the air void contents and specifications limits (i.e., lower and upper limits of 2% and 8%, respectively). Figure 4-8 presents the four simulated after-construction qualities. For this evaluation, an AQL of 90 PWL and a RQL of 60 PWL were used. For each construc- tion quality, samples of sizes 3, 5, 10, and 20 were selected from the lot, using the different sampling methods. Each time the samples were randomly picked, the PWL value of the lot was estimated. This procedure was simulated 5,000 times. The percentage of PWL values below (a) Sample size of 3 (b) Sample size of 5 (c) Sample size of 10 (d) Sample size of 20 Figure 4-7. Random sampling without replacement for various sample sizes.

Good Fair Good Bad 5% 6% 4% 6% 1% 2% 1.5% 1.8% 2.1% 3.6% 6.1% 6.6% 5.6% 4.6% 4.8% 5.0% 9.2% 9.7% 8.5% .8% Bad (a) Construction Quality 1 Fair Bad Fair Bad 2.5% 3% 3.5% 3% 1% 2% 1.5% 1.8% 1.2% 2% 1.4% 1.8% 6.2% 6.5% 7.3% 7.8% 9.2% 9.7% 8.5% 8.8% Bad (b) Construction Quality 2 Fair Bad Fair Bad 2.5% 3% 3.5% 3% 1% 2% 1.5% 1.8% 1.2% 2% 1.4% 1.8% 2.7% 3.2% 2.2% 3.8% 1.6% 1.1% 1.3% 1.8% Bad (c) Construction Quality 3 5% 6% 4% 6% 2.1% 3.6% 6.1% 6.6% 4.6% 4.8% 5% 2.7% 3.2% 2.2% 3.8% (d) Construction Quality 4 5.6% 5.2% 4.7% 5.9% 4.2% Good Good Fair GoodFair Figure 4-8. Simulated construction qualities.

36 performance-related Specifications for pavement preservation treatments RQL, between AQL and RQL, and above AQL for each sampling method and size was cal- culated (see Table 4-1). The results are shown in Figures 4-9 through 4-11. The effects of the sampling methods and sample sizes on the PWL values relative to the true construction quality are discussed below. Construction Quality below RQL The true PWL value of construction quality 3 is 55% (see Table 4-1). The established RQL value is 60%. Out of 5,000 simulations, stratified sampling (Method 2) has a higher percentage of PWL values less than RQL, thus better representing the true quality. As the sample size increases, the percentage of PWL values less than RQL increases for each of the methods. However, the rate of increase is higher for Method 2 (see Figure 4-9c). The percentage values between AQL and RQL and above AQL for Method 2 are lower, which represents the true quality better than the other methods. For a sample size of three, there is not much difference between the methods, but at a sample size of five or more, Method 2 represents the true quality better than the other two methods. This means that, for a sample size of three for a lot with poor construction quality, none of the methods will be appropriate. (a) Construction Quality 1 (b) Construction Quality 2 (c) Construction Quality 3 (d) Construction Quality 4 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es < R QL Sample size Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es < R QL Sample size Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es < R QL Sample size Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es < R QL Sample size Method 1 Method 2 Method 3 Figure 4-9. Percentage of PWL values below RQL for various construction qualities.

Guidelines for Statistical Sampling 37 Construction Quality between RQL and AQL The true PWL values of construction qualities 1 and 2 are 75% and 65%, respectively (see Table 4-1). The construction qualities are between the RQL of 60% and the AQL of 90%. The results of simulations show that stratified sampling (Method 2) has a higher percentage of PWL values between the category RQL and AQL. As the sample size increases, the percentage of PWL values between the category RQL and AQL increases for each method, but at a higher rate for Method 2 than the other methods. Also, the stratified sampling has higher precision than the other methods, especially for a sample size of five or more (see Figures 4-10a and 4-10b). For Method 2 (stratified sampling), the percentage PWL values below RQL decreases more rapidly than for other methods as the sample size increases. The percentage of PWL values less than RQL is higher for construction quality 2 than for quality 1 because the true PWL value is closer to the RQL value for construction quality 2. For example, for construction quality 2, there is a 65% chance of rejecting the lot when the sample size is three for Method 2. When the sample size increases to five, there is a 28% chance of rejecting the lot, even though the actual lot quality is 65% PWL. In addition, for a sample size of 10, there is no chance of rejecting the lot. Therefore, if the true quality is closer to RQL, a larger sample size is needed to represent the actual quality. (a) Construction Quality 1 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es b et w ee n RQ L an d A QL Sample size Method 1 Method 2 Method 3 (b) Construction Quality 2 % P W L va lu es b et w ee n RQ L an d A QL 0% 20% 40% 60% 80% 100% 0 10 20 30 Sample size Method 1 Method 2 Method 3 (c) Construction Quality 3 0% 20% 40% 60% 80% 100% 0 10 20 30 Sample size Method 1 Method 2 Method 3 % P W L va lu es b et w ee n RQ L an d A QL (d) Construction Quality 4 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es b et w ee n RQ L an d A QL Sample size Method 1 Method 2 Method 3 Figure 4-10. Percentage of PWL values between RQL and AQL for various construction qualities.

38 performance-related Specifications for pavement preservation treatments Construction Quality above AQL The true PWL value of construction quality 4 is 97% (see Table 4-1) which is above the AQL of 90%. For all methods, the percentage of PWL values greater than AQL increases with an increased sample size. However, stratified sampling (Method 2) represents the true quality better (as seen in Figures 4-10d and 4-11d). There is not much difference in percentage of PWL values for all the methods in all three categories at sample size three. In case of stratified sampling, for a sample size of three, the quality of the lot is above AQL only about 66% of the time, even though the true quality is 97%. The estimated quality increases to 93% when the sample size is increased to five. The stratified sampling method captures the true quality better than the other methods, especially for sample sizes of five or more. 4.5 Quantifying Risks The PWL value obtained from a given sampling method and size can be used to determine whether the lot quality is below RQL, between RQL and AQL, or above AQL. However, an incor- rect decision can sometimes be made regarding the lot quality. For example, if three samples (a) Construction Quality 1 (b) Construction Quality 2 (c) Construction Quality 3 (d) Construction Quality 4 0% 5% 10% 15% 20% 25% 0 10 20 30 % P W L va lu es > A QL Sample size Method 1 Method 2 Method 3 0% 5% 10% 15% 20% 25% 30% 0 10 20 30 % P W L va lu es > A QL Sample size Method 1 Method 2 Method 3 0% 5% 10% 15% 20% 25% 30% 0 10 20 30 % P W L va lu es > A QL Sample size Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 0 10 20 30 % P W L va lu es > A QL Sample size Method 1 Method 2 Method 3 Figure 4-11. Percentage of PWL values above AQL for various construction qualities.

Guidelines for Statistical Sampling 39 from a lot are tested and the PWL value obtained is below the RQL, the lot is rejected. However, it is possible that the three samples tested are from a sublot of a bad quality but the remaining sublots are of good quality, and therefore an incorrect decision would have been made. In any sampling process, an error can be made by falsely rejecting a good-quality material or falsely accepting a bad-quality material. The error of falsely rejecting a good-quality material is a risk to the contractor and is called contractorâs risk (a) and the error of falsely accepting a bad-quality material is a risk to the agency and is called agencyâs risk (b). The complementary of the agencyâs risk (1âb) is called the power of the sampling method. The agencyâs risk can be calculated for each sampling method. Samples of different sizes were selected 5,000 times for each construction quality within a method. The PWL value can be calcu- lated each time samples were selected and the mean PWL value and its standard error can also be calculated. If the contractorâs risk was set at 0.05, the agency would reject a good-quality lot 5 out of 100 times (an acceptable level of risk). The null hypothesis would be: the average PWL of the lot is above RQL or above AQL. The agencyâs risk would be accepting the null hypothesis, even though the lot quality is less than RQL or AQL. These risks are calculated using Equations 4-13 and 4-14. (Eq. 4-13)PWL P z z PWL PWL SE True RQL True( )Î² = < â âï£®ï£°ï£¯ ï£¹ ï£»ï£ºÎ± (Eq. 4-14)PWL P z z PWL PWL SE True AQL True( )Î² = < â âï£®ï£°ï£¯ ï£¹ ï£»ï£ºÎ± where b = Agencyâs risk PWLTrue = Measured PWL value from samples Quality Measure Construction Quality 1 Construction Quality 2 Sample size Percent of PWL values True PWL Sample size Percent of PWL values True PWL Method 1 Method 2 Method 3 Method 1 Method 2 Method 3 < RQL 3 25% 27% 27% 75% 3 61% 65% 61% 65% 5 11% 0% 10% 5 43% 28% 41% 10 1% 0% 0% 10 18% 0% 8% 20 0% 0% 0% 20 3% 0% 0% RQL - AQL 3 56% 56% 58% 3 38% 35% 38% 5 75% 100% 81% 5 57% 72% 59% 10 92% 100% 99% 10 82% 100% 92% 20 98% 100% 100% 20 97% 100% 100% > AQL 3 19% 17% 15% 3 2% 0% 1% 5 14% 0% 9% 5 0% 0% 0% 10 7% 0% 1% 10 0% 0% 0% 20 2% 0% 0% 20 0% 0% 0% Quality Measure Construction Quality 3 Construction Quality 4 Sample size Percent of PWL values True PWL Sample size Percent of PWL values True PWL Method 1 Method 2 Method 3 Method 1 Method 2 Method 3 < RQL 3 61% 65% 62% 55% 3 0% 0% 0% 97% 5 62% 74% 63% 5 0% 0% 0% 10 65% 79% 70% 10 0% 0% 0% 20 72% 100% 100% 20 0% 0% 0% RQL - AQL 3 33% 33% 34% 3 32% 34% 30% 5 35% 26% 36% 5 20% 7% 15% 10 35% 21% 30% 10 8% 0% 1% 20 28% 0% 0% 20 2% 0% 0% > AQL 3 6% 2% 4% 3 68% 66% 70% 5 2% 0% 1% 5 80% 93% 85% 10 0% 0% 0% 10 92% 100% 99% 20 0% 0% 0% 20 98% 100% 100% Table 4-1. Percentage of PWL values for various construction qualities.

40 performance-related Specifications for pavement preservation treatments PWLRQL = PWL value at RQL (60%) PWLAQL = PWL value at AQL (90%) za = z value at a = 0.05 SE = Standard error of the average PWL value The results of risk calculations are shown in Table 4-2 and Figures 4-12 and 4-13. The power of a sampling method increases as the sample size increases because of the decrease in the stan- dard error of the mean PWL value. The results also show that Method 2 (stratified sampling) has the highest power among the three methods. Method 2 requires collecting samples along the entire lot, and hence the variability in PWL values is low. Also, as the average PWL value gets closer to the RQL value, the power decreases and the chance of wrongly accepting a lot increases. 4.6 Effects of Lot and Sublot Size on PWL The AQC chosen for a preservation treatment dictates the lot, sublot, sample size, and data collection methods (destructive versus non-destructive) for estimating the PWL. Some AQCs, such as percent air voids, require expensive destructive testing for data collection. Generally, the length of a pavement section constructed on the same day can be considered as a lot. The lot can be subdivided into sublots of equal lengths (e.g., 0.1 mile) for obtaining samples to estimate the lotâs PWL. For example, testing can start with five samples using stratified sampling by divid- ing a lot into five sublots (one sample from each sublot). However, if the variability within the samples is high or the PWL estimates are close to RQL, collection of additional samples should be considered (with consideration of the costs of falsely accepting bad-quality material and the costs of the additional testing). For AQCs that involve continuous and non-destructive data collection, such as IRI, the PWL value can be estimated based on all the data collected for the entire project length. In this case, the Quality Limits Construction Quality Sample Size Method 1 Method 2 Method 3 Mean SE Power Mean SE Power Mean SE Power At RQL 1 3 71% 18% 0.85 15% 71% 16% 0.83 17% 71% 17% 0.84 16% 5 73% 12% 0.71 29% 72% 4% 0.09 91% 73% 11% 0.68 32% 10 75% 8% 0.41 59% 73% 2% 0.00 100% 74% 6% 0.25 75% 2 3 56% 17% 0.92 8% 58% 8% 0.92 8% 57% 14% 0.92 8% 5 61% 9% 0.94 6% 62% 3% 0.84 16% 62% 7% 0.91 9% 10 64% 5% 0.80 20% 64% 1% 0.01 99% 64% 3% 0.62 38% 3 3 50% 27% 0.90 10% 48% 25% 0.88 12% 51% 25% 0.90 10% 5 53% 20% 0.90 10% 55% 8% 0.85 15% 53% 18% 0.90 10% 10 54% 13% 0.88 12% 56% 6% 0.84 16% 55% 9% 0.86 14% 4 3 94% 8% 0.00 100% 94% 7% 0.00 100% 93% 7% 0.00 100% 5 95% 5% 0.00 100% 96% 4% 0.00 100% 95% 5% 0.00 100% 10 96% 4% 0.00 100% 96% 2% 0.00 100% 96% 2% 0.00 100% At AQL 1 3 71% 18% 0.72 28% 71% 16% 0.68 32% 71% 17% 0.70 30% 5 73% 12% 0.59 41% 72% 4% 0.00 100% 73% 11% 0.54 46% 10 75% 8% 0.41 59% 73% 2% 0.00 100% 74% 6% 0.15 85% 2 3 56% 17% 0.36 64% 58% 8% 0.01 99% 57% 14% 0.24 76% 5 61% 9% 0.06 94% 62% 3% 0.00 100% 62% 7% 0.01 99% 10 64% 5% 0.00 100% 64% 1% 0.00 100% 64% 3% 0.00 100% 3 3 50% 27% 0.56 44% 48% 25% 0.49 51% 51% 25% 0.53 47% 5 53% 20% 0.42 58% 55% 8% 0.00 100% 53% 18% 0.34 66% 10 54% 13% 0.13 87% 56% 6% 0.00 100% 55% 9% 0.01 99% 4 3 94% 8% 0.87 13% 94% 7% 0.86 14% 93% 7% 0.89 11% 5 95% 5% 0.74 26% 96% 4% 0.56 44% 95% 5% 0.74 26% 10 96% 4% 0.56 44% 96% 2% 0.09 91% 96% 2% 0.09 91% Table 4-2. Power of various sampling methods and sizes at RQL and AQL for different construction qualities.

Guidelines for Statistical Sampling 41 estimated PWL values and the corresponding pay adjustments might be different for different lot and sublot sizes. The effects of lot and sublot sizes on PWL estimation were evaluated using IRI as the AQC. The SPS-1 (flexible) and SPS-2 (rigid) test sections in the States of Montana, Nevada, California, and North Dakota were used as individual projects of 1.2 miles long. Two different lot sizes were evaluated: 0.1 and 1.2 miles. For each lot size, sublots of sizes 25, 50, 100, and 250 feet were used and their corresponding IRI values were calculated from the longitudinal profile by using ProVAL software. Figure 4-14 shows the distributions of IRI values for the 25-foot-long sub- lots. For this evaluation, an AQL of 90 PWL and a RQL of 60 PWL were assumed. Also, an upper specification limit of 90 inch/mile was used to calculate the PWL values, and Equation 4-15 was used to estimate the pay factors (PF) (construction quality below RQL receives no pay): 55 0.5 (Eq. 4-15)PF PWL( )= + Ã The PWL values and the corresponding pay factors were also estimated for different sublot sizes within each lot size; results are shown in Tables 4-3 and 4-4 for lot sizes of 1.2 and 0.1 miles long, respectively. These results show that, as the sublot size increases, if the mean IRI value of a lot is below the upper specification limit, the variability in measured IRI values decreases and the (a) Construction Quality 1 (b) Construction Quality 2 (c) Construction Quality 3 (d) Construction Quality 4 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at RQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at RQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at RQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at RQL Method 1 Method 2 Method 3 Figure 4-12. Power versus sizes at RQL for different construction qualities.

42 performance-related Specifications for pavement preservation treatments PWL value increases (as in the cases of Montana, Nevada, and California). However, as the sublot size increases, if the mean IRI value of a lot is above the upper specification limit, the variability in measured IRI values decreases and the PWL value decreases (as for North Dakota). If the PWL value is estimated for the lot size of 1.2 miles (i.e., project length), the contractor may or may not receive pay for the entire project, depending on a single estimate of PWL value. If the overall qual- ity of the lot is good (e.g., Montana and Nevada), the effect of lot and sublot size is minimal and the contractor gets more than 100% pay as shown in Figures 4-15a and 4-15b. However, if the construction quality is fair (e.g., California), the estimated PWL value may fall below or above RQL (depending on the sublot size) and the contractor receives no pay if the sublot size is less than 100 feet (see Table 4-3), but payment would be justified if the chosen lot size is 0.1 mile instead of 1.2 miles and some of the lots are of very good quality (see Table 4-4). The average pay factor is 58% for the project when the lot size is 0.1 mile compared to 85% when the lot size is 1.2 miles. The reason for these large differences is probably the masking of con- struction variability when PWL is estimated based on the statistical parameters (i.e., mean and standard deviation) from higher numbers of sublots (i.e., 60 sublots for 1.2 mile versus 5 sublots (a) Construction Quality 1 (b) Construction Quality 2 (c) Construction Quality 3 (d) Construction Quality 4 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at AQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at AQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at AQL Method 1 Method 2 Method 3 0% 20% 40% 60% 80% 100% 120% 0 5 10 Po w er Sample Size at AQL Method 1 Method 2 Method 3 Figure 4-13. Power versus sizes at AQL for different construction qualities.

Guidelines for Statistical Sampling 43 (a) Montana (b) Nevada (c) California (d) North Dakota 0 10 20 30 40 50 60 70 20 30 40 50 60 70 80 90 100 110 120 130 140 Fr eq ue nc y IRI (in./mile) 0 10 20 30 40 50 60 70 80 90 100 20 30 40 50 60 70 80 90 100 Fr eq ue nc y IRI (in./mile) 0 10 20 30 40 50 60 40 50 60 70 80 90 100 110 120 130 140 150 160 Fr eq ue nc y IRI (in./mile) 0 5 10 15 20 25 30 35 40 45 50 60 70 80 90 100 110 120 130 140 150 160 170 Fr eq ue nc y IRI (in./mile) Figure 4-14. Distribution of IRI values for 25 feet sublots.

44 performance-related Specifications for pavement preservation treatments for 0.1 mile). Similarly, for poor construction quality (North Dakota), the PWL value based on the lot size of 1.2 miles will support no pay but will justify 8% pay if the lot size is 0.1 miles. 4.7 Summary Two types of acceptance sampling plans are used to assure the quality of a lot. Attribute sam- pling plans can be adopted to sentence a lot, i.e., pass/fail or accept/reject by using the mean AQC as a threshold. Variable sampling plans are adopted when the AQC has a continuous distribu- tion. A quality measure such as PWL can be used for assessing construction quality and also pay adjustments based on the produced quality. In addition, the sample size required to estimate the true construction quality is significantly less than for the equivalent attribute sampling plan. Acceptance sampling is not a substitute for adequate construction process control or the use of other statistical methods to drive variability reduction. Sublot size (ft) State Montana1 Nevada1 California1 North Dakota2 PWL PF PWL PF PWL PF PWL PF 25 94% 102 100% 105 58% 0 37% 0 50 97% 103 100% 105 59% 0 35% 0 100 99% 104 100% 105 61% 85 32% 0 250 100% 105 100% 105 64% 87 25% 0 500 100% 105 100% 105 71% 90 24% 0 Note: 1 mean IRI < 90 inch/mile, 2 mean IRI > 90 inch/mile, shaded cells mean no pay Table 4-3. PWL values and pay factors for a lot size of 1.2 miles. State Sublot size State Sublot size 25 feet 50 feet 100 feet 250 feet 25 feet 50 feet 100 feet 250 feet PWL PF PWL PF PWL PF PWL PF PWL PF PWL PF PWL PF PWL PF Montana 100% 105 100% 105 100% 105 100% 105 Nevada 100% 105 100% 105 100% 105 100% 105 99% 104 100% 105 100% 105 100% 105 99% 104 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 98% 104 99% 105 100% 105 100% 105 98% 104 99% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 100% 105 86% 98 90% 100 100% 105 100% 105 86% 98 90% 100 100% 105 100% 105 73% 92 84% 97 95% 102 100% 105 73% 92 84% 97 95% 102 100% 105 66% 88 68% 89 75% 92 79% 95 66% 88 68% 89 75% 92 79% 95 99% 104 100% 105 100% 105 100% 105 99% 104 100% 105 100% 105 100% 105 92% 101 100% 105 100% 105 100% 105 92% 101 100% 105 100% 105 100% 105 90% 100 97% 103 99% 105 100% 105 90% 100 97% 103 99% 105 100% 105 86% 101 95% 102 97% 104 98% 104 100% 105 100% 105 100% 105 100% 105 California 85% 98 90% 100 94% 102 100% 105 North Dakota 43% 0 39% 0 34% 0 23% 0 99% 105 100% 105 100% 105 100% 105 68% 89 70% 90 72% 91 92% 101 32% 0 25% 0 21% 0 2% 0 28% 0 26% 0 16% 0 9% 0 42% 0 42% 0 41% 0 6% 0 30% 0 29% 0 23% 0 19% 0 92% 101 97% 104 100% 105 100% 105 42% 0 39% 0 36% 0 2% 0 54% 0 55% 0 60% 0 82% 96 29% 0 29% 0 23% 0 2% 0 56% 0 57% 0 61% 85 77% 94 51% 0 43% 0 40% 0 40% 0 54% 0 56% 0 57% 0 69% 89 35% 0 32% 0 29% 0 21% 0 68% 89 72% 91 85% 97 96% 103 49% 0 49% 0 49% 0 46% 0 100% 105 100% 105 100% 105 100% 105 21% 0 14% 0 6% 0 0% 0 26% 0 25% 0 26% 0 4% 0 38% 0 35% 0 28% 0 31% 0 64% 87 72% 91 93% 102 94% 102 1% 0 0% 0 0% 0 0% 0 64% 49 66% 50 70% 58 69% 75 36% 7 34% 7 30% 8 24% 8 Note: Shaded cells mean no pay Table 4-4. PWL values and pay factors for a lot size of 0.1 mile.

Guidelines for Statistical Sampling 45 The statistical approaches used for determining sample size for a lot generally estimate a large sample size to represent a lotânone of these approaches adequately addresses the optimum sample size. In practice, a sample size of at least five is generally used, although a larger sample size is desir- able. The statistical approaches for sample size estimation are intended to (1) provide accept/reject acceptance plans and not to serve as pay adjustment acceptance plans, (2) provide single acceptance plans (one AQC) and not acceptance systems (two or more AQCs), and (3) use the average as the measure of quality, not the PWL or other measure of quality. Typically, highway construction and materials acceptance plans use a sample size that is often established on the basis of practical con- siderations such as personnel and time constraints and commonly ranging between three and seven units. If the sample size is too small, the probability of making erroneous decisions regarding accep- tance or pay adjustment decisions will be high. If the sample size is too large, the cost of sampling and testing will be unnecessarily high, especially where destructive testing is required. Three sampling methods (random sampling with replacement, stratified sampling, and ran- dom sampling without replacement) were evaluated for sample sizes of 3, 5, 10, and 20. The samples used in random sampling with replacement may not represent the entire lot, but this (a) Montana (b) Nevada (c) California (d) North Dakota Figure 4-15. Pay factor comparisons of different lot sizes.

46 performance-related Specifications for pavement preservation treatments may not be a problem if the construction variability is low (i.e., there is a uniform construction quality across the entire lot). Stratified sampling reduces the bias in selecting the samples and provides samples representative of the lot, even when there is high construction variability. Therefore, it may be possible to use a smaller sample size in this method. Also, random sampling without replacement could be more precise than random sampling with replacement because samples selected from the lot are spread more evenly along the lot. For a sample size of three, there is not much difference between the sampling methods, but at a sample size of five or more, the stratified sampling method represents the true construction quality better than the other two methods. If the true quality of a lot is closer to RQL, a larger sample size (i.e., five or more) is needed to represent the actual quality. The power of a sampling method increases as the sample size increases. The stratified sampling method has a higher power as compared to other sampling methods. Stratified sampling requires collecting samples along the entire lot, and hence the variability in PWL values is less. Also, the closer the average PWL value to the RQL value, the lower the power. This means that there is a high chance of wrongly accepting a lot if its PWL value obtained from the sampling is near the RQL. Therefore, agencies can start testing with five samples using stratified sampling by dividing a lot into five sublots (one sample from each sublot) if destructive testing is needed. If the variability within the samples is high or the PWL estimates are close to RQL, additional samples could be collected. The decision to test additional samples can be made by evaluating the costs of falsely accepting bad-quality material versus the costs of additional testing. For AQCs such as IRI, the data collection is continuous and non-destructive. It is more appropriate to use a smaller lot size (e.g., 0.1-mile) in estimating PWL to capture the construction variability and determine appropriate pay to the contractor. A mini- mum sublot length of 100 ft can be justified based on the profile-based IRI measurements.