M10 Workshop STAT674

*** M10 workshop ***;

*Q1 will be based on these two datasets;

data models;
input Model $ 1-12 Type $ @23 Price DOLLAR9.2 @33 Frame $;
format  price DOLLAR9.2;
datalines;
Black Bora   Track    $796.00   Aluminum
Delta Breeze Road     $399.00   CroMoly
Jet Stream   Track    $1,130.00 CroMoly
Mistral      Road     $1,995.00 Carbon Comp
Nor’easter   Mountain $899.00   Aluminum
Santa Ana    Mountain $459.00   Aluminum
Scirocco     Mountain $2,256.00 Titanium
Trade Wind   Road     $759.00   Aluminum
;
run;

data orders;
input customerid OrderDate DATE7. Model $13-24 Quantity;
format orderDate date7.;
cards;
287 15OCT03 Delta Breeze 15
287 15OCT03 Santa Ana    15
274 16OCT03 Jet Stream   1
174 17OCT03 Santa Ana    20
174 17OCT03 Nor’easter   5
174 17OCT03 Scirocco     1
347 18OCT03 Mistral      1
287 21OCT03 Delta Breeze 30
287 21OCT03 Santa Ana    25
;
run;

* Q1.1(10pts);
*    Use PROC SQL step(s) to find out total dollar amount spent on all the orders for each
     customer;


* Q1.2(15pts);
*    Use data-driven programs to find out the customer who ordered the most in $ amount.
     Feel free to use PROC SQL or DATA step with other PROCs;


* Q1.3(10pts);
*    print the order information of the customer you found in Q1.2 above including only the
     ordered date, bike model and correspoding quantities. Add a title to display the customerID;



********************************************************************************************;
*Q2   we are going to use the MON1001 from SASHELP library for the following questions.
     This data includes hundreds of variables. Some of them may not have a good data quality.
     The goal of this practice is to find a smaller set of variables without missing values.
     It will help to reduce data dimentions significantly and keep the best part of the data
     for any follow-up analysis.
     Considering the amount of the variables, it can be an intensive task to check every
     variable manually and hard code the list of variable names. So we are going to program
     the selection logic and leveraging macro variables to achieve it.
     The sub-questions below is going to lead you achieve the goal step by step. You can think
     about how you would do it before following the provided steps.
    
*Q2.1 (10pts);
*    Use a PROC MEANS to calculate the number of missing values for ALL variables in the data.
     Save the result in a dataset. The variable names in the result dataset should be the same
     as those in the initial data;


*Q2.2 (10pts);
*    Your output data of Q2.1 should include only one obs and many variables. However, it’s
     always easier to filter rows rather than columns. Use proper step to transpose the data;
    

*Q2.3 (15pts);
*    Examine the tranposed data, find the list of variable names without any missing values.
     Use proper method to assign the list of these varaible names into one macro variable
     called VARLIST. The value should look something like one of the two options below with
     the real variable names
           (1) varname1 varname2 varname3 …
           (2) varname1,varname2,varname3,…
     Choose whichever pattern you like and use a %PUT statement to check the value of VARLIST;


*Q2.4 (15pts);
*    Use the macro variable created in Q2.3 to create a subset of the MON1001 data. Use
     proper method based on your chosen pattern above;


*Q2.5 (15pts);
*    Use proper macro functions to calculate the number of variable names in your VARLIST.
     Assign the result to another macro variable called N. What’s the value of N?
     Check the number of variables in the data created in Q2.4, is it the same as N? if not,
     why?;
    

Click here to order similar paper @Udessaywriters.com.100% Original.Written from scratch by professional writers.

You May Also Like

About the Author: admin