Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
In which order should the code blocks shown below be run in order to create a DataFrame that shows the mean of column predError of DataFrame transactionsDf per column storeId and productId, where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending order by column storeId, leaving out any nulls in that column? DataFrame transactionsDf: 1.+-------------+---------+-----+-------+---------+----+ 2.|transactionId|predError|value|storeId|productId| f| 3.+-------------+---------+-----+-------+---------+----+ 4.| 1| 3| 4| 25| 1|null| 5.| 2| 6| 7| 2| 2|null| 6.| 3| 3| null| 25| 3|null| 7.| 4| null| null| 3| 2|null| 8.| 5| null| null| null| 2|null| 9.| 6| 3| 2| 25| 2|null| 10.+-------------+---------+-----+-------+---------+----+ 1. .mean("predError") 2. .groupBy("storeId") 3. .orderBy("storeId") 4. transactionsDf.filter(transactionsDf.storeId.isNotNull()) 5. .pivot("productId", [2, 3])
Correct Answer: D
Explanation Correct code block: transactionsDf.filter(transactionsDf.storeId.isNotNull()).groupBy("storeId").pivot("productId", [2, 3]).mean("predError").orderBy("storeId") Output of correct code block: +-------+----+----+ |storeId| 2| 3| +-------+----+----+ | 2| 6.0|null| | 3|null|null| | 25| 3.0| 3.0| +-------+----+----+ This question is quite convoluted and requires you to think hard about the correct order of operations. The pivot method also makes an appearance - a method that you may not know all that much about (yet). At the first position in all answers is code block 4, so the question is essentially just about the ordering of the remaining 4 code blocks. The question states that the returned DataFrame should be sorted by column storeId. So, it should make sense to have code block 3 which includes the orderBy operator at the very end of the code block. This leaves you with only two answer options. Now, it is useful to know more about the context of pivot in PySpark. A common pattern is groupBy, pivot, and then another aggregating function, like mean. In the documentation linked below you can see that pivot is a method of pyspark.sql.GroupedData - meaning that before pivoting, you have to use groupBy. The only answer option matching this requirement is the one in which code block 2 (which includes groupBy) is stated before code block 5 (which includes pivot). More info: pyspark.sql.GroupedData.pivot - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 3