NCA-GENM Exam Dumps | You are working on a multimodal model for video captioning, where the model needs to generate captions

<< Prev Question Next Question >>

Question 12/192

You are working on a multimodal model for video captioning, where the model needs to generate captions describing the actions and events happening in a video. You notice that the model tends to focus only on the most salient objects in the scene and ignores subtle but important actions. Which of the following techniques can help the model attend to these subtle actions and generate more comprehensive captions?

A. Increasing the learning rate during training.

B. Using a larger batch size.

C. Implementing a hierarchical attention mechanism that first attends to relevant time steps and then to relevant regions within those time steps.

D. Adding more layers to the LSTM or GRIJ used for sequence modeling.

E. Decreasing the regularization strength.

Question 12/192

LEAVE A REPLY

Download PDF File