*** M10 workshop ***;
*Q1 will be based on these two datasets;
data models;
input Model $ 1-12 Type $ @23 Price DOLLAR9.2 @33 Frame $;
format price DOLLAR9.2;
datalines;
Black Bora Track $796.00 Aluminum
Delta Breeze Road $399.00 CroMoly
Jet Stream Track $1,130.00 CroMoly
Mistral Road $1,995.00 Carbon Comp
Nor’easter Mountain $899.00 Aluminum
Santa Ana Mountain $459.00 Aluminum
Scirocco Mountain $2,256.00 Titanium
Trade Wind Road $759.00 Aluminum
;
run;
data orders;
input customerid OrderDate DATE7. Model $13-24 Quantity;
format orderDate date7.;
cards;
287 15OCT03 Delta Breeze 15
287 15OCT03 Santa Ana 15
274 16OCT03 Jet Stream 1
174 17OCT03 Santa Ana 20
174 17OCT03 Nor’easter 5
174 17OCT03 Scirocco 1
347 18OCT03 Mistral 1
287 21OCT03 Delta Breeze 30
287 21OCT03 Santa Ana 25
;
run;
* Q1.1(10pts);
* Use PROC SQL step(s) to find out total dollar amount spent on all the orders for each
customer;
* Q1.2(15pts);
* Use data-driven programs to find out the customer who ordered the most in $ amount.
Feel free to use PROC SQL or DATA step with other PROCs;
* Q1.3(10pts);
* print the order information of the customer you found in Q1.2 above including only the
ordered date, bike model and correspoding quantities. Add a title to display the customerID;
********************************************************************************************;
*Q2 we are going to use the MON1001 from SASHELP library for the following questions.
This data includes hundreds of variables. Some of them may not have a good data quality.
The goal of this practice is to find a smaller set of variables without missing values.
It will help to reduce data dimentions significantly and keep the best part of the data
for any follow-up analysis.
Considering the amount of the variables, it can be an intensive task to check every
variable manually and hard code the list of variable names. So we are going to program
the selection logic and leveraging macro variables to achieve it.
The sub-questions below is going to lead you achieve the goal step by step. You can think
about how you would do it before following the provided steps.
*Q2.1 (10pts);
* Use a PROC MEANS to calculate the number of missing values for ALL variables in the data.
Save the result in a dataset. The variable names in the result dataset should be the same
as those in the initial data;
*Q2.2 (10pts);
* Your output data of Q2.1 should include only one obs and many variables. However, it’s
always easier to filter rows rather than columns. Use proper step to transpose the data;
*Q2.3 (15pts);
* Examine the tranposed data, find the list of variable names without any missing values.
Use proper method to assign the list of these varaible names into one macro variable
called VARLIST. The value should look something like one of the two options below with
the real variable names
(1) varname1 varname2 varname3 …
(2) varname1,varname2,varname3,…
Choose whichever pattern you like and use a %PUT statement to check the value of VARLIST;
*Q2.4 (15pts);
* Use the macro variable created in Q2.3 to create a subset of the MON1001 data. Use
proper method based on your chosen pattern above;
*Q2.5 (15pts);
* Use proper macro functions to calculate the number of variable names in your VARLIST.
Assign the result to another macro variable called N. What’s the value of N?
Check the number of variables in the data created in Q2.4, is it the same as N? if not,
why?;