Which of the following is a dataset issue that can be resolved using pre-processing?
Correct Answer: D
Pre-processing is an essential step in data preparation that ensures data is clean, formatted correctly, and structured for effective machine learning (ML) model training. One common issue that can be resolved during pre-processing isnumbers stored as strings.
Explanation of Answer Choices:
* Option A: Insufficient data
* Incorrect. Pre-processing cannot resolve insufficient data. If data is lacking, techniques like data augmentation or external data collection are needed.
* Option B: Invalid data
* Incorrect. While pre-processing can identify and handle some forms of invalid data (e.g., missing values, duplicate entries), it does not resolve all invalid data issues. Some cases may require domain expertise to determine validity.
* Option C: Wanted outliers
* Incorrect. Pre-processing usually focuses on handling unwanted outliers. Wanted outliers may need to be preserved, which is more of a data selection decision rather than pre-processing.
* Option D: Numbers stored as strings
* Correct. One of the key functions of data pre-processing isdata transformation, which includes converting incorrectly formatted data types, such as numbers stored as strings, into their correct numerical format.
ISTQB CT-AI Syllabus References:
* Data Pre-Processing Steps:"Transformation: The format of the given data is changed (e.g., breaking an address held as a string into its constituent parts, dropping a field holding a random identifier, converting categorical data into numerical data, changing image formats)".