You are tasked with using the 'PARSE DOCUMENT' function in Snowflake to extract key information (name, address, phone number) from a large collection of scanned invoices stored as PDF files in an AWS S3 bucket. The invoices have varying formats and quality. Which of the following approaches would be MOST effective to structure the extracted data for analysis?
Correct Answer: C
Option C provides the most robust and flexible approach. Given the varying formats and quality of the invoices, a pre-defined JSON schema (option B) is unlikely to work effectively. Loading raw JSON into a VARIANT column (option A) requires extensive post-processing. Option D, while potentially effective, introduces the complexity and cost of a third-party OCR service. And MAX_FILE_SIZE parameter controls the maximum size, in bytes, of a single uncompressed file that can be loaded from the stage. Option E is not a scalable and efficient approach.