Associate-Developer-Apache-Spark Exam Dumps | The code block shown below should return the number of columns in the CSV file stored at location filePath.

Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:

Access Associate-Developer-Apache-Spark Dumps Premium Version
(179 Q&As Dumps, 35%OFF Special Discount Code: freecram)

<< Prev Question Next Question >>

Question 29/63

The code block shown below should return the number of columns in the CSV file stored at location filePath.
From the CSV file, only lines should be read that do not start with a # character. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
__1__(__2__.__3__.csv(filePath, __4__).__5__)

A. 1. size
2. spark
3. read()
4. escape='#'
5. columns

B. 1. DataFrame
2. spark
3. read()
4. escape='#'
5. shape[0]

C. 1. len
2. pyspark
3. DataFrameReader
4. comment='#'
5. columns

D. 1. size
2. pyspark
3. DataFrameReader
4. comment='#'
5. columns

E. 1. len
2. spark
3. read
4. comment='#'
5. columns

Correct Answer: E

Explanation
Correct code block:
len(spark.read.csv(filePath, comment='#').columns)
This is a challenging question with difficulties in an unusual context: The boundary between DataFrame and the DataFrameReader. It is unlikely that a question of this difficulty level appears in the exam. However, solving it helps you get more comfortable with the DataFrameReader, a subject you will likely have to deal with in the exam.
Before dealing with the inner parentheses, it is easier to figure out the outer parentheses, gaps 1 and 5. Given the code block, the object in gap 5 would have to be evaluated by the object in gap 1, returning the number of columns in the read-in CSV. One answer option includes DataFrame in gap 1 and shape[0] in gap 2. Since DataFrame cannot be used to evaluate shape[0], we can discard this answer option.
Other answer options include size in gap 1. size() is not a built-in Python command, so if we use it, it would have to come from somewhere else. pyspark.sql.functions includes a size() method, but this method only returns the length of an array or map stored within a column (documentation linked below).
So, using a size() method is not an option here. This leaves us with two potentially valid answers.
We have to pick between gaps 2 and 3 being spark.read or pyspark.DataFrameReader. Looking at the documentation (linked below), the DataFrameReader is actually a child class of pyspark.sql, which means that we cannot import it using pyspark.DataFrameReader. Moreover, spark.read makes sense because on Databricks, spark references current Spark session (pyspark.sql.SparkSession) and spark.read therefore returns a DataFrameReader (also see documentation below). Finally, there is only one correct answer option remaining.
More info:
- pyspark.sql.functions.size - PySpark 3.1.2 documentation
- pyspark.sql.DataFrameReader.csv - PySpark 3.1.2 documentation
- pyspark.sql.SparkSession.read - PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 3

Question 29/63

LEAVE A REPLY

Download PDF File