Correct Answer: A
Data mislabeling occurs for several reasons, which can significantly impact the performance of machine learning (ML) models, especially in supervised learning. According to the ISTQB Certified Tester AI Testing (CT-AI) syllabus, mislabeling of data can be caused by the following factors:
* Random errors by annotators- Mistakes made due to accidental misclassification.
* Systemic errors- Errors introduced by incorrect labeling instructions or poor training of annotators.
* Deliberate errors- Errors introduced intentionally by malicious data annotators.
* Translation errors- Occur when correctly labeled data in one language is incorrectly translated into another language.
* Subjectivity in labeling- Some labeling tasks require subjective judgment, leading to inconsistencies between different annotators.
* Lack of domain knowledge- If annotators do not have sufficient expertise in the domain, they may label data incorrectly due to misunderstanding the context.
* Complex classification tasks- The more complex the task, the higher the probability of labeling mistakes.
Among the answer choices provided, "Lack of domain knowledge" (Option A) is the best answer because expertise is essential to accurately labeling data in complex domains such as medical, legal, or engineering fields.
Certified Tester AI Testing Study Guide References:
* ISTQB CT-AI Syllabus v1.0, Section 4.5.2 (Mislabeled Data in Datasets)
* ISTQB CT-AI Syllabus v1.0, Section 4.3 (Dataset Quality Issues)