Databricks-Certified-Professional-Data-Engineer Exam Dumps | A data engineering team is migrating off its legacy Hadoop platform. As part of the process, they are

<< Prev Question Next Question >>

Question 25/82

A data engineering team is migrating off its legacy Hadoop platform. As part of the process, they are evaluating storage formats for performance comparison. The legacy platform uses ORC and RCFile formats. After converting a subset of data to Delta Lake, they noticed significantly better query performance. Upon investigation, they discovered that queries reading from Delta tables leveraged a Shuffle Hash Join, whereas queries on legacy formats used Sort Merge Joins. The queries reading Delta Lake data also scanned less data.
Which reason could be attributed to the difference in query performance?

A. Delta Lake enables data skipping and file pruning using a vectorized Parquet reader.

B. The queries against the Delta Lake tables were able to leverage the dynamic file pruning optimization.

C. Shuffle Hash Joins are always more efficient than Sort Merge Joins.

D. The queries against the ORC tables leveraged the dynamic data skipping optimization but not the dynamic file pruning optimization.

Question 25/82

LEAVE A REPLY

Download PDF File