Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block to accomplish this. Example of DataFrame itemsDf: 1.+------+----------------------------------+-------------------+------------------------------------------+ 2.|itemId|itemName |supplier |itemNameElements | 3.+------+----------------------------------+-------------------+------------------------------------------+ 4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]| 5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] | 6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] | 7.+------+----------------------------------+-------------------+------------------------------------------+ Code block: itemsDf.__1__(__2__(__3__)__4__)
Correct Answer: D
Explanation Correct code block: itemsDf.filter(size("itemNameElements")>3) Output of code block: +------+----------------------------------+-------------------+------------------------------------------+ |itemId|itemName |supplier |itemNameElements | +------+----------------------------------+-------------------+------------------------------------------+ |1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]| |2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] | +------+----------------------------------+-------------------+------------------------------------------+ The big difficulty with this question is in knowing the difference between count and size (refer to documentation below). size is the correct function to choose here since it returns the number of elements in an array on a per-row basis. The other consideration for solving this question is the difference between select and filter. Since we want to return the rows in the original DataFrame, filter is the right choice. If we would use select, we would simply get a single-column DataFrame showing which rows match the criteria, like so: +----------------------------+ |(size(itemNameElements) > 3)| +----------------------------+ |true | |true | |false | +----------------------------+ More info: Count documentation: pyspark.sql.functions.count - PySpark 3.1.1 documentation Size documentation: pyspark.sql.functions.size - PySpark 3.1.1 documentation Static notebook | Dynamic notebook: See test 1