MSCS3020 -Massive Data Sets-Final Exam

MSCS3020 -Massive Data Sets-Final Exam- 12/09/2023 Submit your answers for the following questions. Each question has 50 points equally. In total 200 points. 1. For each data set given below, give specific examples of classification, clustering, association rule mining, and anomaly detection tasks that can be performed on the data. For each task, state how the data matrix should be constructed (i.e., specify the rows and columns of the matrix). (a) Ambulatory Medical Care data1, which contains the demographic and medical visit information for each patient (e.g., gender, age, duration of visit, physician’s diagnosis, symptoms, medication, etc). (b) Stock market data, which include the prices and volumes of various stocks on different trading days. (c) Database of Major League Baseball (MLB). 2. Classify the following attributes as binary, discrete, or continuous. Also classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briefly indicate your reasoning if you think there may be some ambiguity. • Number of courses registered by a student in a given semester. • Speed of a car (in miles per hour). • Decibel as a measure of sound intensity. • Hurricane intensity according to the Saffir-Simpson Hurricane Scale. • Social security number. 3. Draw the full decision tree that perfectly classifies the data sets given below. There could be more than one answer to each question (you only need to draw one). You do not have to consider the impurity measure used by the decision tree algorithm. Ignore pre-pruning and post-pruning. Assume there are no noise and missing attribute values. Consider a data set with three Boolean attributes, A, B, and C, and a binary classlabel y whose value is True if the number of at- tributeswith True values is even and False otherwise. For example, if A=True, B=True, C=False, then y=True (because there are two attributes with True values). 4. Consider the training set given below for determining whether a loan application should be approved orrejected. Draw the full decision tree obtained using entropy as the impurity measure. Show your steps clearly (i.e., the computation of information gain for every candidate attribute must be shown). Compute the training error of the decision tree. Long-Term Debt Unemployed Credit Rating Down Payment < 20% Class No No Good Yes Approve No No Bad No Approve No No Bad Yes Approve No No Bad No Approve Yes No Good No Approve No Yes Good Yes Reject Yes No Bad No Reject Yes No Bad Yes Reject Yes No Bad Yes Reject Yes Yes Bad No Reject

Click here to order similar paper @Udessaywriters.com.100% Original.Written from scratch by professional writers.

You May Also Like

About the Author: admin