DSA-C03 Exam Dumps | A data scientist is tasked with predicting house prices using Snowflake. They have a dataset stored in

<< Prev Question Next Question >>

Question 8/143

A data scientist is tasked with predicting house prices using Snowflake. They have a dataset stored in a Snowflake table called 'HOUSE PRICES' with columns such as 'SQUARE FOOTAGE, 'NUM BEDROOMS, 'LOCATION_ID, and 'PRICE. They choose a Random Forest Regressor model. Which of the following steps is MOST important to prevent overfitting and ensure good generalization performance on unseen data, and how can this be effectively implemented within a Snowflake-centric workflow?

A. Increase the number of estimators (trees) in the Random Forest to the maximum possible value to capture all potential patterns, without cross validation.

B. Tune the hyperparameters of the Random Forest model (e.g., 'max_deptm, 'n_estimators') using cross-validation. You can achieve this by splitting the 'HOUSE PRICES table into training and validation sets using Snowflake's 'QUALIFY clause or temporary tables, then train and evaluate the model within a loop or stored procedure.

C. Train the Random Forest model on the entire 'HOUSE PRICES table without splitting into training and validation sets, as this will provide the model with the most data.

D. Randomly select a small subset of the features (e.g., only use 'SQUARE FOOTAGE and 'NUM BEDROOMS) to simplify the model and prevent overfitting.

E. Eliminate outliers without understanding the data properly to reduce noise.

Question 8/143

LEAVE A REPLY

Download PDF File