DEA-C02 Exam Dumps | You are developing a Snowpark Python stored procedure that performs complex data transformations on a

<< Prev Question Next Question >>

Question 90/149

You are developing a Snowpark Python stored procedure that performs complex data transformations on a large dataset stored in a Snowflake table named 'RAW SALES'. The procedure needs to efficiently handle data skew and leverage Snowflake's distributed processing capabilities. You have the following code snippet:

Which of the following strategies would be MOST effective to optimize the performance of this Snowpark stored procedure, specifically addressing potential data skew in the 'product id' column, assuming 'product_id' is known to cause uneven data distribution across Snowflake's micro-partitions?

A. Implement a custom partitioning strategy using before the transformation logic to redistribute data evenly across the cluster.

B. Utilize Snowflake's automatic clustering on the 'TRANSFORMED_SALES table by specifying 'CLUSTER BY when creating or altering the table to ensure future data is efficiently accessed.

C. Use the 'pandas' API within the Snowpark stored procedure to perform the transformation, as 'pandas' automatically optimizes for data skew.

D. Increase the warehouse size significantly to compensate for the data skew and improve overall processing speed without modifying the partitioning strategy.

E. Combine salting with repartitioning by adding a random number to the 'product_id' before repartitioning, then removing the salt after the transformation to break up the skew. Then, enable automatic clustering on the 'TRANSFORMED SALES' table.

Question 90/149

LEAVE A REPLY

Download PDF File