DSA-C03 Exam Dumps | You have trained a classification model in Snowflake using Snowpark ML to predict customer churn. After

<< Prev Question Next Question >>

Question 109/143

You have trained a classification model in Snowflake using Snowpark ML to predict customer churn. After deploying the model, you observe that the model performs well on the training data but poorly on new, unseen data'. You suspect overfitting. Which of the following strategies can be applied within Snowflake to detect and mitigate overfitting during model validation , considering the model is already deployed and receiving inference requests through a Snowflake UDF?

A. Calculate the Area Under the Precision-Recall Curve (AUPRC) using Snowflake SQL on both the training and validation datasets. A significant difference indicates overfitting. Then, retrain the model in Snowpark ML with added L1 or L2 regularization, adjusting the regularization strength based on validation set performance, and redeploy the UDF.

B. Monitor the UDF execution time in Snowflake. A sudden increase in execution time indicates overfitting. Use the 'EXPLAIN' command on the UDF's underlying SQL query to identify performance bottlenecks and rewrite the query for optimization.

C. Implement k-fold cross-validation within the Snowpark ML training pipeline using Snowflake's distributed compute. Track the mean and standard deviation of the performance metrics (e.g., accuracy, Fl-score) across folds. A high variance suggests overfitting. Use this information to tune hyperparameters or select a simpler model architecture before deployment.

D. Create shadow UDFs that score data using alternative models. Compare the performance metrics (such as accuracy, precision, recall) between the production UDF and shadow UDFs using Snowflake's query capabilities. If shadow models consistently outperform the production model on certain data segments, retrain the production model incorporating those data segments with higher weights.

E. Since the model is already deployed, the only option is to collect inference requests and compare the distributions of predicted values in each batch with the predicted values on the training set. A large difference indicates overfitting; model must be retrained outside of the validation process.

Question 109/143

LEAVE A REPLY

Download PDF File