Valid Professional-Cloud-DevOps-Engineer Dumps shared by ExamDiscuss.com for Helping Passing Professional-Cloud-DevOps-Engineer Exam! ExamDiscuss.com now offer the newest Professional-Cloud-DevOps-Engineer exam dumps, the ExamDiscuss.com Professional-Cloud-DevOps-Engineer exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com Professional-Cloud-DevOps-Engineer dumps with Test Engine here:
You are the Operations Lead for an ongoing incident with one of your services. The service usually runs at around 70% capacity. You notice that one node is returning 5xx errors for all requests. There has also been a noticeable increase in support cases from customers. You need to remove the offending node from the load balancer pool so that you can isolate and investigate the node. You want to follow Google-recommended practices to manage the incident and reduce the impact on users. What should you do?
Correct Answer: A
The correct answer is A, Communicate your intent to the incident team. Perform a load analysis to determine if the remaining nodes can handle the increase in traffic offloaded from the removed node, and scale appropriately. When any new nodes report healthy, drain traffic from the unhealthy node, and remove the unhealthy node from service. This answer follows the Google-recommended practices for incident management, as described in the Chapter 9 - Incident Response, Google SRE Book1. According to this source, some of the best practices are: Maintain a clear line of command. Designate clearly defined roles. Keep a working record of debugging and mitigation as you go. Declare incidents early and often. Communicate your intent before taking any action that might affect the service or the incident response. This helps to avoid confusion, duplication of work, or unintended consequences. Perform a load analysis before removing a node from the load balancer pool, as this might affect the capacity and performance of the service. Scale the pool as necessary to handle the expected load. Drain traffic from the unhealthy node before removing it from service, as this helps to avoid dropping requests or causing errors for users. Answer A follows these best practices by communicating the intent to the incident team, performing a load analysis and scaling the pool, and draining traffic from the unhealthy node before removing it. Answer B does not follow the best practice of performing a load analysis before adding or removing nodes, as this might cause overloading or underutilization of resources. Answer C does not follow the best practice of communicating the intent before taking any action, as this might cause confusion or conflict with other responders. Answer D does not follow the best practice of draining traffic from the unhealthy node before removing it, as this might cause errors for users. Reference: 1: Chapter 9 - Incident Response, Google SRE Book