Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?
Correct Answer: D
Explanation transactionsDf.select('value').union(transactionsDf.select('productId')).distinct() Correct. This code block uses a common pattern for finding the unique values across multiple columns: union and distinct. In fact, it is so common that it is even mentioned in the Spark documentation for the union command (link below). transactionsDf.select('value', 'productId').distinct() Wrong. This code block returns unique rows, but not unique values. transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'}) Incorrect. This code block will output a one-row, two-column DataFrame where each cell has an array of unique values in the respective column (even omitting any nulls). transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'}) No. This command will count the number of rows, but will not return unique values. transactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer') Wrong. This command will perform an outer join of the value and productId columns. As such, it will return a two-column DataFrame. If you picked this answer, it might be a good idea for you to read up on the difference between union and join, a link is posted below. More info: pyspark.sql.DataFrame.union - PySpark 3.1.2 documentation, sql - What is the difference between JOIN and UNION? - Stack Overflow Static notebook | Dynamic notebook: See test 3