Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
The code block displayed below contains at least one error. The code block should return a DataFrame with only one column, result. That column should include all values in column value from DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s). Code block: 1.from pyspark.sql.functions import udf 2.from pyspark.sql import types as T 3. 4.transactionsDf.createOrReplaceTempView('transactions') 5. 6.def pow_5(x): 7. return x**5 8. 9.spark.udf.register(pow_5, 'power_5_udf', T.LongType()) 10.spark.sql('SELECT power_5_udf(value) FROM transactions')
Correct Answer: D
Explanation Correct code block: from pyspark.sql.functions import udf from pyspark.sql import types as T transactionsDf.createOrReplaceTempView('transactions') def pow_5(x): if x: return x**5 return x spark.udf.register('power_5_udf', pow_5, T.LongType()) spark.sql('SELECT power_5_udf(value) AS result FROM transactions') Here it is important to understand how the pow_5 method handles empty values. In the wrong code block above, the pow_5 method is unable to handle empty values and will throw an error, since Python's ** operator cannot deal with any null value Spark passes into method pow_5. The order of arguments for registering the UDF function with Spark via spark.udf.register matters. In the code snippet in the question, the arguments for the SQL method name and the actual Python function are switched. You can read more about the arguments of spark.udf.register and see some examples of its usage in the documentation (link below). Finally, you should recognize that in the original code block, an expression to rename column created through the UDF function is missing. The renaming is done by SQL's AS result argument. Omitting that argument, you end up with the column name power_5_udf(value) and not result. More info: pyspark.sql.functions.udf - PySpark 3.1.1 documentation