Valid Professional-Machine-Learning-Engineer Dumps shared by ExamDiscuss.com for Helping Passing Professional-Machine-Learning-Engineer Exam! ExamDiscuss.com now offer the newest Professional-Machine-Learning-Engineer exam dumps, the ExamDiscuss.com Professional-Machine-Learning-Engineer exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Professional-Machine-Learning-Engineer dumps with Test Engine here:
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?
Correct Answer: C
Oversampling is a technique for dealing with imbalanced datasets, where the majority class dominates the minority class. It balances the distribution of classes by increasing the number of samples in the minority class. Oversampling can improve the performance of a classifier by reducing the bias towards the majority class and increasing the sensitivity to the minority class. In this case, the dataset includes transactions, of which 1% are identified as fraudulent. This means that the fraudulent transactions are the minority class and the non-fraudulent transactions are the majority class. A random forest model trained on this dataset might have a low recall for the fraudulent transactions, meaning that it might miss many of them and fail to detect fraud. This could have a high cost for the bank and its customers. One way to overcome this problem is to oversample the fraudulent transactions 10 times, meaning that each fraudulent transaction is duplicated 10 times in the training dataset. This would increase the proportion of fraudulent transactions from 1% to about 10%, making the dataset more balanced. This would also make the random forest model more aware of the patterns and features that distinguish fraudulent transactions from non-fraudulent ones, and thus improve its accuracy and recall for the minority class. For more information about oversampling and other techniques for imbalanced data, see the following references: * Random Oversampling and Undersampling for Imbalanced Classification * Exploring Oversampling Techniques for Imbalanced Datasets