A data analyst calculated the average score per student without making any changes to the following table:
Student
Subject
Score
123
Math
100
123
Biology
80
234
Math
96
123
Biology
80
345
Biology
88
234
Math
96
Which of the following exploration techniques should the analyst have considered before calculating the average?
Correct Answer: A
This question pertains to the Data Governance domain, focusing on data quality issues that affect analysis.
The table contains duplicate rows, which would skew the average score calculation if not addressed.
* Student 123: Math (100), Biology (80), Biology (80) - Duplicate Biology score.
* Student 234: Math (96), Math (96) - Duplicate Math score.
* Student 345: Biology (88) - No duplicates.
* Duplication (Option A): The table has duplicate rows (e.g., Student 123's Biology score of 80 appears twice), which would inflate the average if not removed. The analyst should have checked for duplicates before calculating the average.
* Redundancy (Option B): Redundancy refers to unnecessary fields (e.g., storing the same data in multiple columns), not duplicate rows.
* Binning (Option C): Binning groups data into categories, not relevant for addressing duplicates in averaging.
* Grouping (Option D): Grouping (e.g., GROUP BY in SQL) might be part of the solution, but the issue to identify is duplication.
The DA0-002 Data Governance domain includes "data quality control concepts," and checking for duplication is critical to ensure accurate calculations like averages.
Reference: CompTIA Data+ DA0-002 Draft Exam Objectives, Domain 5.0 Data Governance.