In this final presentation, you’ll work in a team of 1-3 members, and show date is 5/8. Each team will have 10 minutes to present your project. You can start working on the project any time but some of the project requirements (macro related) may need knowledge from the class scheduled on 4/24 and 5/1.
If you need help or advice, please sign up a Zoom appointment with me via Canvas to discuss. Additionally, office hour on Friday afternoon 4:30 -5:30 will remain open for discussions.
To sign up for an appointment, go to the course home page in Canvas, click View Course Calendar on the top right, click Find Appointment on the right, select this course then submit. Then you should be able to see the available time slots titled “Project Consulting”. Please have one of your group members to reserve the appointment and the time would be reserved for your group only.
Timeline:
- Team up and choose topic by 4/27
- Prepare data (if needed) and discuss analysis plan by 5/1
- Submit first draft by 5/4
- Submit final presentation file by 5/7
- Presentation on 5/8
Project Requirements
- Choose one of the topics from the list below, or something else your team is interested in.
- Your PPT should be no more than 10 pages, yet with all of the following parts:
- Background of your project: why you chose this topic, how it is interesting/useful/important, etc.
No more than 2 pages on this part;
- Goal of your project, along with thinking process of scoping the analysis and/or how to achieve the goal
No more than 2 pages on this part;
- Core SAS procedurals/functions used
This is more critical for the self-learning topics. You can think of the project as a sharing opportunity to give the class an overview of how they can use the techniques by themselves.
- Macros used
Your project must incorporate SAS macros in at least one component. Macros can be utilized to test results with varying variables, or to generate similar outcomes under different specifications. Ideally, you could leverage macros to transform your project into a versatile tool that can be applied to other datasets or analyses, or to easily replicate multiple versions. If your project topic is less suited to this approach, such as in simulations, use macros to perform analyses with different arguments and compare the outcomes
- Summary/Conclusions/Findings.
The summary can be your analysis result, comparisons, or findings during your project. Or if you simply want to make a tool, create a brief instruction indicating how to use the tool, requirements of the feeding data and cautions (if any).
- Every team member will participate on presenting your project.
- Please make sure your project includes a certain level of information. For example, a project that only involves calculating the mean of a sample data may not get a good grade.
Potential Topics
Below are some ideas to consider. Some of the topics may be related like data visualization and graphing. You can mix them up and focus on one area that you are mostly interested in.
Typically, the simulation topics would require more effort on understanding of the theory. The analysis topics focus more on analytical thinking. The self-learning project focus more on SAS techniques and sharing. And macro projects focus more on macro uses and potential application scenarios.
Simulation:
- Central limit theorem (CLT).
Potential macro parameters can be X’s distribution and sample size.
- Simulate the value of π.
Potential macro parameters can be sample size. Below is a reference providing one of the methodologies to simulate π. https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/
Analysis:
- Panel data analysis.
Panel data is data that contains observations about different cross sections across time. E.g. banking data including different products, each product includes performance /accounts through a period of time. You can analyze the trend or difference between the dimensions.
- Data visualization.
Data can be complicated. A well-organized table or graph could represent the data effectively. The project should explore different visualization techniques, aiming to highlight the patterns, trends or insights in the most effective way.
Self-learning on other SAS procedurals/functionalities
- Color-coded table display/report styling
SAS offers a variety of styling customization options. Other than simply producing a pretty result, you can actually use them to serve functional purposes, like displaying a color-coded summary table that dynamically adjusts its styling based on the data values. E.g. summarize the missing rate of all variables in a dataset, and fill the background in red when a variable’s missing rate is more than 20%.
- PROC TABULATE
PROC TABULATE is a very flexible procedure in SAS. It supports various table layouts, displaying calculated summary statistics directly, customizing row/column headers and formats, and applying formatting and styling enhancements to improve the appearance of your reports.
- SAS Graphing
SAS provides several procedures to generate graphs for data visualization. PROC SGPLOT is the most commonly used one and is highly customizable. It can produce different types of graphs like scatter plots, histograms, box plots, and line charts.
- Simple text mining/ text pattern recognition
SAS supports a series of Perl Regular Expressions through PRX functions, which are powerful for text pattern recognition. E.g. recognizing a string to be a date when it includes two digits number, a slash, another two digits number, another slash, then a four digits number.
Macro tools:
- Linear Regression model scoring tool.
Building a model is often not the final step in analytics; the crucial phase is scoring, which involves applying the model to new data to make predictions. While SAS procedures can score models, they require that the model specifications be saved in a format specific to the procedure used, which can be limiting—for example, when a model built in R needs to be scored in SAS. You can develop a versatile scoring tool using macros that can score any linear regression model. It’s ok to require a few preparation steps to specify the model before applying the tool.
- Report Automation.
In the business world, many weekly/monthly reports are regularly updated in formats like Excel or PDF without significant changes to their layout. Manually creating these reports can be labor-intensive and prone to errors. You can use SAS to build a report template that specifies parameters such as the reporting date, which need to be updated with each iteration. It can be created as a macro tool that automatically update the feeding data, refresh the report then export in an Excel or PDF file to automate the report generation process. It should be noted that there won’t be any report template that fit for all purpose. You can pick one reporting application example and focus on the automation part.
Potential data source
Kaggle: https://www.kaggle.com/datasets
GitHub: https://github.com/search?q=dataset&type=repositories
DrivenData: https://www.drivendata.org/competitions/