NCA-GENM Exam Dumps | You have a large dataset of images and text descriptions. You want to train a model that can perform

<< Prev Question Next Question >>

Question 167/192

You have a large dataset of images and text descriptions. You want to train a model that can perform both image captioning (generating text from images) and text-to-image generation (generating images from text). What architectural approach is best suited for this multimodal bi-directional task?

A. Train two separate models: one for image captioning and one for text-to-image generation.

B. Use a shared encoder for both images and text, and separate decoders for generating text and images.

C. Use separate encoders for images and text, a shared attention mechanism, and separate decoders for generating text and images.

D. Use a single transformer model with a shared vocabulary and treat both image and text as sequences of tokens.

E. Use a generative adversarial network (GAN) for generating the outputs.

Question 167/192

LEAVE A REPLY

Download PDF File