Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?
Correct Answer: C
Explanation
itemsDf.withColumnRenamed("supplier", "manufacturer")
Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.
Note that the question asks for "a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it that has the changes applied.
itemsDf.withColumnsRenamed("supplier", "manufacturer")
Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method.
itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
No. Watch out - although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below, withColumnRenamed expects strings.
itemsDf.withColumn(["supplier", "manufacturer"])
Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns.
withColumn is typically used to add columns to DataFrames, taking the name of the new column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.
itemsDf.withColumn("supplier").alias("manufacturer")
No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.
More info: pyspark.sql.DataFrame.withColumnRenamed - PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn - PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias - PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2) Static notebook | Dynamic notebook: See test 1 (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,
https://bit.ly/sparkpracticeexams_import_instructions)