Notes: Inference with Sigma known: confidence intervals

Inference with  sigma known: Confidence intervals Statistical inference is drawing conclusions about a population based on a random sample from that population. Turning probabilistic statements about random variables into confidence statements about parameters. Very simple problem situation: 1. We are interested in one population. 2. We are interested in a quantitative, and usually continuous, random variable. 3. We know the population variance,  2 , and thus we also know the population standard deviation,  . 4. We want an estimate of the population mean  in both of two ways, if we can get it. 4.1. A point estimate. 4.2. An interval estimate, to quantify uncertainty. 5. We to predict the value of the random variable X in two ways, if we can. 5.1. With a point prediction. 5.2. With an interval, to quantify uncertainty. 6. The underlying population distributions is either 6.1. Known to be normal 6.2. Known to be some other distribution: Gamma, Weibull, Poisson, whatever, …. 6.3. Unknown 7. The sample size is either 7.1. Large enough to apply the CLT 7.2. Not large enough to apply the CLT Example: You want to estimate what the mean SAT Math score would be for the more than 385,000 high school seniors in California. You know better than to trust data from the students who choose to take the SAT. Only about 45% of California students take the SAT. These self‐selected students are planning to attend college and are not representative of all California seniors. At considerable effort and expense, you give the test to a simple random sample (SRS) of 500 California high school seniors. The mean score for your sample is X 500  461 . Let’s make the unrealistic assumption that we know the population variance, and it is  2  1002 . 1. Can we give a point estimate and/or an interval estimate for the population mean  ? 2. Can we give a point estimate and/or an interval estimate for the score of a randomly selected person? 1 Definitions: Estimator vs. estimate, predictor vs. prediction Estimate: A specific number (4, 12, .001) that we believe is a good guess of the value of a parameter, like  ,  2 , or  . Because it is a specific number, it is not a random variable. Prediction: A specific number that we believe is a good guess of the value of a random variable, like X . Because it is a specific number, it is not a random variable. Estimator: The formula or method used to obtain an estimate. An estimator is a random variable, as we will see. Predictor: The formula or method used to obtain a prediction. A predictor is also a random variable.  to In this problem situation, we want to estimate the population mean,  . We use the symbol 
represent both the estimator and the estimate.   X , the sample mean. Until we observe it, it is a random The estimator is the random variable: 
variable, and when we discuss and evaluate our methods, we treat it as a random variable.   x . The estimate is the realized value of the sample mean: 
Why do we use the sample mean to estimate the population mean? 


Would it have occurred to you to ask this question at all? It just seems obvious, doesn’t it? Statisticians ask this question, and we have lots of good reasons for using the sample mean. Reasons for using the sample mean: 1.
2.
3.
4.
It is the maximum likelihood estimator, if we believe the population is normally distributed. It is the method of moments estimator, no matter what distribution the population has. It is an unbiased estimator for the mean, no matter what distribution the population has. Of all unbiased estimators for the mean, it is the one with minimum variance, if the population is normally distributed. We will not discuss the theory of point estimation much, because it really is not something you’ll use if you don’t get at least an M.S. in statistics. But you need to know that the sample mean is an unbiased estimator for the population mean, and the one with the minimum variance. 2 Draw 4 bulls‐eyes here to understand what that means (hopefully). And that completes our point estimation theory. Now let’s discuss interval estimates. 3