Valid Databricks-Certified-Data-Engineer-Professional Dumps shared by ExamDiscuss.com for Helping Passing Databricks-Certified-Data-Engineer-Professional Exam! ExamDiscuss.com now offer the newest Databricks-Certified-Data-Engineer-Professional exam dumps, the ExamDiscuss.com Databricks-Certified-Data-Engineer-Professional exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Databricks-Certified-Data-Engineer-Professional dumps with Test Engine here:
A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from part- file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used. Which strategy will yield the best performance without shuffling data?
Correct Answer: B
The key to efficiently converting a large JSON dataset to Parquet files of a specific size without shuffling data lies in controlling the size of the output files directly. Setting spark.sql.files.maxPartitionBytes to 512 MB configures Spark to process data in chunks of 512 MB. This setting directly influences the size of the part-files in the output, aligning with the target file size. Narrow transformations (which do not involve shuffling data across partitions) can then be applied to this data. Writing the data out to Parquet will result in files that are approximately the size specified by spark.sql.files.maxPartitionBytes, in this case, 512 MB. The other options involve unnecessary shuffles or repartitions (B, C, D) or an incorrect setting for this specific requirement (E).