Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this. transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))
Correct Answer: B
Explanation Correct code block: transactionsDf.sample(withReplacement=False, fraction=0.15).select(avg('predError')) You should remember that getting a random subset of rows means sampling. This, in turn should point you to the DataFrame.sample() method. Once you know this, you can look up the correct order of arguments in the documentation (link below). Lastly, you have to decide whether to use filter, where or select. where is just an alias for filter(). filter() is not the correct method to use here, since it would only allow you to filter rows based on some condition. However, the question asks to return only the average prediction error. You can control the columns that a query returns with the select() method - so this is the correct method to use here. More info: pyspark.sql.DataFrame.sample - PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2