
Explanation:

Scenario RecapYou are building a remote training monitoring solution.
* Requirement: Use video and audio feeds to detect if a learner is present, paying attention, and talking.
* Services available: Face, Speech, Text Analytics.
* From a learner's video feed, verify whether the learner is present.
* The Face API can detect and identify faces in a video feed.
* It can tell if a person is present and recognized, fulfilling the requirement.
* From a learner's facial expression in the video feed, verify whether the learner is paying attention.
* Again, the Face API provides facial expression and emotion recognition (happiness, anger, neutral, etc.).
* This can be mapped to "paying attention vs. distracted."
* From a learner's audio feed, detect whether the learner is talking.
* The Speech service detects spoken input and can determine if speech is present.
* Text Analytics works on text (not raw audio) and is therefore not appropriate here.
Analysis
* From a learner's video feed, verify whether the learner is present: Face
* From a learner's facial expression in the video feed, verify whether the learner is paying attention: Face
* From a learner's audio feed, detect whether the learner is talking: Speech Final Answer (Answer Area Selections)
* Face API - Face detection & identification
* Face API - Emotion recognition
* Azure Speech service
Microsoft References