NCA-AIIO Exam Dumps | Your AI infrastructure team is deploying a large NLP model on a Kubernetes cluster using NVIDIA GPUs.

<< Prev Question Next Question >>

Question 32/89

Your AI infrastructure team is deploying a large NLP model on a Kubernetes cluster using NVIDIA GPUs.
The model inference requires low latency due to real-time user interaction. However, the team notices occasional latency spikes. What would be the most effective strategy to mitigate these latency spikes?

A. Deploy the Model on Multi-Instance GPU (MIG) Architecture

B. Use NVIDIA Triton Inference Server with Dynamic Batching

C. Increase the Number of Replicas in the Kubernetes Cluster

D. Reduce the Model Size by Quantization

Question 32/89

LEAVE A REPLY

Download PDF File