Download the dataset statistician10.sas7bdat and place it in a folder you can access via SAS studio. Note that the extension name is SAS7BDAT, meaning this is already a SAS dataset.
- (30pts) Data step and PROC PRINT.
- (5pts) Write a libname statement to point to the folder where you saved the dataset. Use proper way to check the data. What are the data types of variables birthday and death?
- (10pts) Use a data step to do the following:
- Use PUT or INPUT function to convert birthday and death to numeric variables.
- Use RENAME and DROP statements to remove the initial birthday and death variables and use the same names for the converted numeric variables.
- (5pts) Use another data step to assign the following:
- Formats to show the birthday to look like 02-17-2001 while the death to look like 02/17/2001.
- Add labels for the variables text, birthday, and death to show what the variable was representing. For example, for text, you can say something like ‘Biography’.
- (5pts) Among these statisticians, who were born in the first half of the year? Use a DATA step to figure this out. You may need to use the month() function to get the month information from a date variable.
- (5pts) Use PROC PRINT to display the dataset from 1.4. In this proc step, Add a label to the variable text to display as ‘Story of the Statistician’. Use proper option to display the label. Does it change the label of this variable in the printed results? How about in the dataset?
- (40pts) Duplicate observations. Use a data step to read the practice data SASHELP.CARS, create a copy of this dataset and name it Cars. Use Cars for the following questions.
- (10pts) Use PROC SORT to check if there are any duplicate observations with same Make and Model. Save the observations with same Make and Model values into a separate dataset called “CheckDup”. E.g. if there are two observations with Make and Model being Acura MDX, save both observations in CheckDup.
Hint: use NOUNIKEY
- (5pts) What are the Make and Models in your data CheckDup? How many observations are there for each Make-Model combination?
- (10pts) Use another PROC SORT to remove duplicate observations by Make and Model. Keep only one observation for each Make-Model combination. Do not overwrite the initial dataset. Save the remaining data into a second dataset and name it “Sorted1”. Save the removed duplicates into a third dataset and name it “Dups1”.
- (5pts) Compare the dataset Dups1 with CheckDup, can you tell which observation was removed when there are duplicates? Use MSRP as a reference to answer if the higher MSRP observation was removed or the lower MSRP observation.
- (5pts) Use a third PROC SORT to sort the data by Make and Model first, then by MSRP on a descending order.
- (5pts) Re-do 2.3 and 2.4 using the sorted data in 2.5. This time name your remaining data and removed duplicates “Sorted2” and “Dups2”. Which observation was removed now?
- (30pts) PROC MEANS. Use the same data Cars for this question (use the original data before removing duplicates)
- (10pts) Use PROC MEANS to calculate the average MSRP amount by Origin and Type. Use CLASS statement to specify the subgroups. Add an OUTPUT statement to save the results in a dataset and name it “avgPrice”.
- (5pts) Check the dataset avgPrice, what is the average MSRP for the whole dataset? And what is the average MSRP for USA Sedans?
- (5pts) In the data avgPrice, can you find the average MSRP for all Sedans regardless of the Origins? How about the average MSRP for all USA cars regardless of the types? You can use additional proc means to verify your thoughts but it’s not required.
- (10pts) Make a copy of your PROC MEANS from 3.1. Add the following statement to this procedure and keep everything else the same. Run the code and check your output dataset. What’s the difference between this data and the earlier output from 3.1?
types () ORIGIN*type;