Valid Professional-Data-Engineer Dumps shared by ExamDiscuss.com for Helping Passing Professional-Data-Engineer Exam! ExamDiscuss.com now offer the newest Professional-Data-Engineer exam dumps, the ExamDiscuss.com Professional-Data-Engineer exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Professional-Data-Engineer dumps with Test Engine here:
You want to schedule a number of sequential load and transformation jobs Data files will be added to a Cloud Storage bucket by an upstream process There is no fixed schedule for when the new data arrives Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery The transformation jobs are different for every table These jobs might take hours to complete You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?
Correct Answer: B
This option is the most efficient and maintainable workflow for your use case, as it allows you to process each table independently and trigger the DAGs only when new data arrives in the Cloud Storage bucket. By using the Dataproc and BigQuery operators, you can easily orchestrate the load and transformation jobs for each table, and leverage the scalability and performance of these services12. By creating a separate DAG for each table, you can customize the transformation logic and parameters for each table, and avoid the complexity and overhead of a single shared DAG3. By using a Cloud Storage object trigger, you can launch a Cloud Function that triggers the DAG for the corresponding table, ensuring that the data is processed as soon as possible and reducing the idle time and cost of running the DAGs on a fixed schedule4 . Option A is not efficient, as it runs the DAG hourly regardless of the data arrival, and it uses a single shared DAG for all tables, which makes it harder to maintain and debug. Option C is also not efficient, as it runs the DAGs hourly and does not leverage the Cloud Storage object trigger. Option D is not maintainable, as it uses a single shared DAG for all tables, and it does not use the Cloud Storage operator, which can simplify the data ingestion from the bucket. Reference: 1: Dataproc Operator | Cloud Composer | Google Cloud 2: BigQuery Operator | Cloud Composer | Google Cloud 3: Choose Workflows or Cloud Composer for service orchestration | Workflows | Google Cloud 4: Cloud Storage Object Trigger | Cloud Functions Documentation | Google Cloud [5]: Triggering DAGs | Cloud Composer | Google Cloud [6]: Cloud Storage Operator | Cloud Composer | Google Cloud