Valid Databricks-Certified-Professional-Data-Engineer Dumps shared by ExamDiscuss.com for Helping Passing Databricks-Certified-Professional-Data-Engineer Exam! ExamDiscuss.com now offer the newest Databricks-Certified-Professional-Data-Engineer exam dumps, the ExamDiscuss.com Databricks-Certified-Professional-Data-Engineer exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Databricks-Certified-Professional-Data-Engineer dumps with Test Engine here:
A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum. Which situation is causing increased duration of the overall job?
Correct Answer: D
Explanation This is the correct answer because skew is a common situation that causes increased duration of the overall job. Skew occurs when some partitions have more data than others, resulting in uneven distribution of work among tasks and executors. Skew can be caused by various factors, such as skewed data distribution, improper partitioning strategy, or join operations with skewed keys. Skew can lead to performance issues such as long-running tasks, wasted resources, or even task failures due to memory or disk spills. Verified References: [Databricks Certified Data Engineer Professional], under "Performance Tuning" section; Databricks Documentation, under "Skew" section.