Valid Professional-Data-Engineer Dumps shared by ExamDiscuss.com for Helping Passing Professional-Data-Engineer Exam! ExamDiscuss.com now offer the newest Professional-Data-Engineer exam dumps, the ExamDiscuss.com Professional-Data-Engineer exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Professional-Data-Engineer dumps with Test Engine here:
You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage One of the pipeline transforms reads CSV files and emits an element for every CSV line. The Job performance is low. the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?
Correct Answer: D
Fusion is an optimization technique that Dataflow applies to merge multiple transforms into a single stage. This reduces the overhead of shuffling data between stages, but it can also limit the parallelism and scalability of the pipeline. By introducing a Reshuffle step, you can force Dataflow to split the pipeline into multiple stages, which can increase the number of workers that can process the data in parallel. Reshuffle also adds randomness to the data distribution, which can help balance the workload across workers and avoid hot keys or skewed data. References: * 1: Streaming pipelines * 2: Batch vs Streaming Performance in Google Cloud Dataflow * 3: Deploy Dataflow pipelines * 4: How Distributed Shuffle improves scalability and performance in Cloud Dataflow pipelines * 5: Managing costs for Dataflow batch and streaming data processing