Valid Databricks-Machine-Learning-Associate Dumps shared by ExamDiscuss.com for Helping Passing Databricks-Machine-Learning-Associate Exam! ExamDiscuss.com now offer the newest Databricks-Machine-Learning-Associate exam dumps, the ExamDiscuss.com Databricks-Machine-Learning-Associate exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Databricks-Machine-Learning-Associate dumps with Test Engine here:
A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process. Which of the following feature engineering tasks will be the least efficient to distribute?
Correct Answer: D
Among the options listed, calculating the true median for imputing missing feature values is the least efficient to distribute. This is because the true median requires knowledge of the entire data distribution, which can be computationally expensive in a distributed environment. Unlike mean or mode, finding the median requires sorting the data or maintaining a full distribution, which is more intensive and often requires shuffling the data across partitions. Reference Challenges in parallel processing and distributed computing for data aggregation like median calculation: https://www.apache.org