Valid Associate-Developer-Apache-Spark Dumps shared by EduDump.com for Helping Passing Associate-Developer-Apache-Spark Exam! EduDump.com now offer the newest Associate-Developer-Apache-Spark exam dumps, the EduDump.com Associate-Developer-Apache-Spark exam questions have been updated and answers have been corrected get the newest EduDump.com Associate-Developer-Apache-Spark dumps with Test Engine here:
Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?
Correct Answer: C
Explanation This question is tricky. Two things are important to know here: First, the syntax for createDataFrame: Here you need a list of tuples, like so: [(1,), (2,)]. To define a tuple in Python, if you just have a single item in it, it is important to put a comma after the item so that Python interprets it as a tuple and not just a normal parenthesis. Second, you should understand the to_timestamp syntax. You can find out more about it in the documentation linked below. For good measure, let's examine in detail why the incorrect options are wrong: dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) This code snippet does everything the question asks for - except that the data type of the date column is a string and not a timestamp. When no schema is specified, Spark sets the string data type as default. dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date")) In the first row of this command, Spark throws the following error: TypeError: Can not infer schema for type: <class 'str'>. This is because Spark expects to find row information, but instead finds strings. This is why you need to specify the data as tuples. Fortunately, the Spark documentation (linked below) shows a number of examples for creating DataFrames that should help you get on the right track here. dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-dd HH:mm:ss")) The issue with this answer is that the operator withColumnRenamed is used. This operator simply renames a column, but it has no power to modify its actual content. This is why withColumn should be used instead. In addition, the date format yyyy-MM-dd HH:mm:ss does not reflect the format of the actual timestamp: "23/01/2022 11:28:12". dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss")) Here, withColumnRenamed is used instead of withColumn (see above). In addition, the rows are not expressed correctly - they should be written as tuples, using parentheses. Finally, even the date format is off here (see above). More info: pyspark.sql.functions.to_timestamp - PySpark 3.1.2 documentation and pyspark.sql.SparkSession.createDataFrame - PySpark 3.1.1 documentation Static notebook | Dynamic notebook: See test 2