Valid CCA175 Dumps shared by EduDump.com for Helping Passing CCA175 Exam! EduDump.com now offer the newest CCA175 exam dumps, the EduDump.com CCA175 exam questions have been updated and answers have been corrected get the newest EduDump.com CCA175 dumps with Test Engine here:
CORRECT TEXT Problem Scenario 30 : You have been given three csv files in hdfs as below. EmployeeName.csv with the field (id, name) EmployeeManager.csv (id, manager Name) EmployeeSalary.csv (id, Salary) Using Spark and its API you have to generate a joined output as below and save as a text tile (Separated by comma) for final distribution and output must be sorted by id. ld,name,salary,managerName EmployeeManager.csv E01,Vishnu E02,Satyam E03,Shiv E04,Sundar E05,John E06,Pallavi E07,Tanvir E08,Shekhar E09,Vinod E10,Jitendra EmployeeName.csv E01,Lokesh E02,Bhupesh E03,Amit E04,Ratan E05,Dinesh E06,Pavan E07,Tejas E08,Sheela E09,Kumar E10,Venkat EmployeeSalary.csv E01,50000 E02,50000 E03,45000 E04,45000 E05,50000 E06,45000 E07,50000 E08,10000 E09,10000 E10,10000
Correct Answer:
See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create all three files in hdfs in directory called sparkl (We will do using Hue}. However, you can first create in local filesystem and then Step 2 : Load EmployeeManager.csv file from hdfs and create PairRDDs val manager = sc.textFile("spark1/EmployeeManager.csv") val managerPairRDD = manager.map(x=> (x.split(",")(0),x.split(",")(1))) Step 3 : Load EmployeeName.csv file from hdfs and create PairRDDs val name = sc.textFile("spark1/EmployeeName.csv") val namePairRDD = name.map(x=> (x.split(",")(0),x.split('\")(1))) Step 4 : Load EmployeeSalary.csv file from hdfs and create PairRDDs val salary = sc.textFile("spark1/EmployeeSalary.csv") val salaryPairRDD = salary.map(x=> (x.split(",")(0),x.split(",")(1))) Step 4 : Join all pairRDDS val joined = namePairRDD.join(salaryPairRDD}.join(managerPairRDD} Step 5 : Now sort the joined results, val joinedData = joined.sortByKey() Step 6 : Now generate comma separated data. val finalData = joinedData.map(v=> (v._1, v._2._1._1, v._2._1._2, v._2._2)) Step 7 : Save this output in hdfs as text file. finalData.saveAsTextFile("spark1/result.txt")