****************************************
M3 Assignment: creating new variables
Write codes as requested by each question.
If needed, put your answers in /* */ as comments.
Submit your code in PDF or Word format. Code with colors is preferred.
Make sure your submission file includes the complete program.
****************************************;
**Q1.1(5 pts): create a temporary dataset by reading the data from sashelp.Countseries;
**Q1.2(5 pts): in your dataset, create a new variable DATE2 that will have the same value as DATE, but format it like 01-01-2004.
Note that you need to have the month, day and year separated by ‘-‘, not ‘/’, and month in front of day.
Look for proper format name from SAS Documentation site;
**Q1.3(5 pts): Create a Date3 to have the same value, but show it to look like Jan1940, just showing the month and year;
**Q1.4(10 pts): Using conditional logics, write code to create a categorical variable to group time periods like below:
Before 2010 (but not including 2010),
2010,
2011 and later.
**;
*Q2: Let’s work with the m2_d1.txt flat file again.
**Q2.1(5 pts): Read the m2_d1.txt file to create a dataset. Name your variables.
For the numeric variable, name it ‘CarsSold’;
**Q2.2(8 pts): Using conditional logics, write code to create a categorical variable to indicate level of CarsSold Low, Medium and High.
Follow the logic below :
CarsSold level
<10: low
[10-20): medium
>= 20: high.
;
**Q2.3(8 pts): suppose it is the sales info of some dealership on different types of automobiles around Newark, DE.
Create a new variable to indicate the state of the dealership.
As you know, Wilmington(WIL) and Camden are in Delaware, while Philadelphia(Phila) is in PA;
**2.4(8 pts): Use PROC FORMAT to create a format “sales.” that will display CarsSold similarly to what we used in Q2.2.
You need to tweak the ending points to make sure you excluded the right boundaries.
<10: low
[10-20):medium
>= 20: high.
**2.5(6 pts): After the format is created, use a FORMAT statement in a DATA step to apply this new format to “CarsSold”.
For this data step, name the output dataset ‘carsFMT’.
Compare the variable CarsSold in the initial data set, CarsSold in carsFMT and the variable created in Q2.2,
comment on the comparison.
* Q3: Keep working on the data carsFMT.
* Q3.1(5 pts): We are expecting to subset the data and keep only observations with CarsSold falls into ‘Low’ range.
Before working on any code, how many obs are expected for the purpose?;
* Q3.2(5 pts): Run the data step below. Check the log and the dataset test1. How many obs are there in it?;
Data test1;
set carsFMT;
if CarsSold=”low”;
run;
**Q3.3(5 pts): Does the dataset test1 align with expectation? Why or why not?;
* Q4: import the episode_data.csv dataset and answer the following questions. This is about functions for character
variables. Use SAS documentation to figure out what the function does. Feel free to google for the appropriate
function to use. You can write the codes for 4.1-4.4 in one DATA step for all four questions below;
**Q4.1(5 pts) – All these questions are based on the variable ‘episode_name’ from the source data.
Use proper statement or option to keep only ‘episode_name’ and your created variables;
**Q4.2(5 pts) – create a variable that counts the number of words in episode_name. Hint: use the countw() function;
**Q4.3(5 pts) – create a variable that shows the last word of episode_name. ;
**Q4.4(10 pts) – look for the string ‘now’ in episode_name regardless of the cases. It doesn’t have to be a whole word.
create a variable to indicate whether you find it or not in each observation. The variable should show only Yes or No.
Hint: you are expected to find two observations containing the string.
**Q4.5(Optional)(Bonus 5pts, full points up to 100pts)
There are multiple ways to deal with cases in Q4.4. Use a different method to write the code and create the same variable.
use the follow proc print to check your result, replace FindNOW_flag with the name of your created varaible;
proc print data=episode;
where FindNOW_flag = ‘Yes’;
run;