Now that you have learned SAS data steps and basic procedures, you can start to use simulation to better understand some basic statistics. In this assignment, you will try and simulate the sample mean of a variable. Follow the steps below carefully, and submit a .docx file with your SAS code, key output from each step, your observations, and answers to the questions.
Below is a suggested diagram for overall processing. Feel free to do it in other ways that fit you.
Process diagram:
Part A (50 pts):
- Research on SAS random number generation functions, such as RAND(). Understand how to use it to generate random values from the standard normal distribution N(0,1).
- Use two DO loops to create multiple samples. The outer loop has 500 rounds to generate a total of 500 samples, and the inner loop has 100 iterations to create the 100 obs in each sample. Create a variable named X to save the generated values from N(0,1). X will include the values for all 500 samples. The loop index can be used to differentiate between the samples. This is one way of how to do sampling from distributions.
- Use a PROC MEANS step to calculate the sample mean of X for each sample. You will have 500 sample means.
- Then use another RPOC MEANS to find out the mean of these sample means. Is it close enough to the true mean of X, which is 0 in this case?
Part B (30 pts):
- Now, repeat the above steps, but this time get 500 obs for each sample. Does the sample mean get closer to 0? Comment on what you saw.
Part C (20 pts):
- Calculate the theoretical 95% Confidence Interval (CI) for the sample mean for samples of size 100 from N(0,1). Similarly, calculate the theoretical 95% CI for the sample mean for samples of size 500. See hint at the end of this file.
- Sort the means you got from the results of Q3, and use a data step to locate the two obs that will serve as the simulated 95% CI bounds for samples of size 100. These bounds correspond to the 2.5% and 97.5% quantiles of your sample means. Compare them to the theoretical 95% CI you calculated in Q6. They should be close. DO NOT use PROC MEANS or PROC TTEST to estimate CI in this step. For the purpose of simulation, identify the bound values directly within the data instead of estimating them.
- Repeat your steps in Q7 for the results of Q5 to figure out the simulated 95% CI bounds for samples of size 500. Compare them to the theoretical 95% CI, and comment on your observations.
Hint: The 95% CI for the population mean can be expressed as
where is the mean of the sample means (the result of Q4), Sigma is the standard deviation of the original distribution (N(0,1) in this case), and n is the size of the sample means.
If you are not familiar with Confidence Interval, I encourage you to explore online resources to gain a better understanding. This will greatly enhance your comprehension of the assignment and the underlying concept of simulation.