Databricks-Certified-Professional-Data-Engineer Exam Dumps | Given the following PySpark code snippet in a Databricks notebook: filtered_df = spark.read.format("delta").load("/mnt/data/large

<< Prev Question Next Question >>

Question 61/82

Given the following PySpark code snippet in a Databricks notebook:
filtered_df = spark.read.format("delta").load("/mnt/data/large_table") \
.filter("event_date > '2024-01-01'")
filtered_df.count()
The data engineer notices from the Query Profiler that the scan operator for filtered_df is reading almost all files, despite the filter being applied.
What is the probable reason for poor data skipping?

A. The Delta table lacks optimization that enables dynamic file pruning.

B. The filter is executed only after the full data scan, preventing data skipping.

C. The event_date column is outside the table's partitioning and Z-ordering scheme.

D. The filter condition involves a data type excluded from data skipping support.

Question 61/82

LEAVE A REPLY

Download PDF File