Valid NCA-GENM Dumps shared by ExamDiscuss.com for Helping Passing NCA-GENM Exam! ExamDiscuss.com now offer the newest NCA-GENM exam dumps, the ExamDiscuss.com NCA-GENM exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com NCA-GENM dumps with Test Engine here:
You are tasked with building a multimodal A1 system that can generate video descriptions from video footage. You have experimented with several architectures, including combining CNNs for visual feature extraction and LSTMs for sequence generation. However, you are facing challenges with the model capturing long-range dependencies in the video. Which of the following architectural modifications or training techniques is MOST likely to address this issue?
Correct Answer: C
Transformers are known for their ability to capture long-range dependencies due to their self-attention mechanism. Replacing LSTMs with Transformers allows the model to attend to relevant parts of the video sequence regardless of their temporal distance. While CNNs can extract visual features, they don't inherently address long-range dependencies. RNNs are prone to vanishing gradients, making it difficult to learn long- range dependencies. Reducing the frame rate or batch size doesn't directly address the issue of capturing long-range dependencies within the video sequence.