You have a dataset of customer reviews for a Generative A1 service. The dataset contains text reviews, numerical ratings (1-5 stars), and categorical data about the customer's subscription plan (Basic, Premium, Enterprise). You want to build a model to predict the numerical rating based on the text review and subscription plan. Which data analysis and modeling approach would be MOST suitable?
Correct Answer: C
Using a pre-trained language model like BERT or RoBERTa captures the semantic meaning of the text reviews most effectively. Concatenating the embeddings with the subscription plan allows the model to learn the combined effect of both inputs. Regression layer is used as numeric ratings (1-5 stars) are provided as the target values. Sentiment and topic modeling can work as features but BERT/RoBERTa gives better context. Other options aren't able to capture complex context.