Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?
Correct Answer: E
Explanation Correct code block: accum=sc.accumulator(0) def check_if_inc_in_supplier(row): if 'Inc.' in row['supplier']: accum.add(1) itemsDf.foreach(check_if_inc_in_supplier) print(accum.value) To answer this question correctly, you need to know both about the DataFrame.foreach() method and accumulators. When Spark runs the code, it executes it on the executors. The executors do not have any information about variables outside of their scope. This is whhy simply using a Python variable counter, like in the two examples that start with counter = 0, will not work. You need to tell the executors explicitly that counter is a special shared variable, an Accumulator, which is managed by the driver and can be accessed by all executors for the purpose of adding to it. If you have used Pandas in the past, you might be familiar with the iterrows() command. Notice that there is no such command in PySpark. The two examples that start with print do not work, since DataFrame.foreach() does not have a return value. More info: pyspark.sql.DataFrame.foreach - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 3