Data Analytics Worksheet

24/7 Homework Help

Stuck on a homework question? Our verified tutors can answer all questions, from basic math to advanced rocket science!

Data Analytics Worksheet

Data Analytics Worksheet

ORDER NOW FOR CUSTOMIZED AND ORIGINAL ESSAY PAPERS 

Hi,

I am looking for help on my assignment in R studio using k-Nearest Neighbor Classification. I’ve attached the assignment details, and there is no link for the database, we are using the “Default” dataset located in R’s ISLR package. I really only need help with questions 2 through 8, the short answer questions for question one and part two I can do on my own. Thank you! Data Analytics Worksheet

1attachments

Slide 1 of 1

  • attachment_1attachment_1

MIS-655 k-Nearest Neighbor Classification

Directions: Use the information below to complete this assignment.

Part 1

For this assignment, you will use the “Default” dataset located in R’s ISLR package.

You are the analyst in the credit department at a large bank who has been tasked with building a model to predict whether a cardholder will default on their credit card. To do so, you have some basic information about cardholders: whether or not they are a student, their credit card balance, and their income. Use kNN to determine how effective these variables are in predicting credit card default by completing the following steps.

Question 1: What are the model assumptions for the k-Nearest Neighbor model? What are the limitations of the kNN model? For what types of business problems would kNN be an appropriate model to use? Use a specific example to support your rationale.

Question 2: Load the “ISLR” and “class” libraries into your R environment. Load the “Default” data into a data frame object called “Default.” Check the dimensions of the data set to ensure it is loaded correctly. (You should get a data set with 10,000 observations and 4 variables.)

Question 3: Change the variable “student” into a numeric variable for use in the kNN model. Check to see that the transformation worked as expected by using the ‘table’ function to show counts of students/nonstudents. How many in the data set are students?

Question 4: The knn()function requires 4 arguments: 1) train, or the predictors/features for the training set; 2) test, or the predictors/features for the testing set; 3) cl, or the true class labels for the training set (so it can “learn” how to associate the features with the classes); and k, or the number of neighbors to consider in making a classification. Therefore, you need to partition your data into training and testing sets and extract the variable “Default” into its own vector for use in the model. Run the following lines of code to complete this step. Include the code as part of your answer and be sure to comment on each line using ## to explain what the code is doing.

set.seed(42)

default_idx <- sample(nrow(Default), .7*(nrow(Default)))

default_idx

default_trn <- Default[default_idx, ]

default_tst <- Default[-default_idx, ]

x_default_trn <- default_trn[, -1]

y_default_trn <- default_trn$default

View(x_default_trn)

View(y_default_trn)

x_default_tst <- default_tst[, -1]

y_default_tst <- default_tst$default

Question 5: Create a new object called ‘kmod1’ that stores the results of a kNN model with a k of 3. How many defaults does the model predict in the testing set? What is the overall predictive accuracy of the model? Do you consider the model to be accurate for predicting credit card default? Explain your answer using the model results.

Question 6:  Following the code provided in the book (see table 7.3), write the code to find the predictive accuracy of each kNN model with k values from 1 to 14. What is the ideal k value based on predictive accuracy? What is the accuracy rate at the ideal value of k? Be sure to include the R console output as part of your submission.

Question 7: Repeat the analysis from question 6, but this time standardize the inputs. What does it mean to “standardize” the variables? How might the results of a kNN model be affected when the inputs are not standardized, and how does standardization avoid this issue? Does the ideal value of k change when the inputs are standardized? Does the predictive accuracy of the model change? If so, how?

Question 8: Run the final kNN model that maximizes predictive accuracy based on your results from questions 6 and 7. Produce the confusion matrix for this model. Does your model significantly improve predictive accuracy over a naïve model that assumes nobody will default? How can you tell?

Part 2 (Analysis of results and recommendations): Present your findings and recommendations in the form of a 250-word (minimum) executive summary that includes relevant data, charts, and tables in Microsoft Word. Be sure to include your R code and R output as a .txt file with your submission.

Hire a competent writer to help you with

Data Analytics Worksheet

troublesome homework