Valid CCD-410 Dumps shared by ExamDiscuss.com for Helping Passing CCD-410 Exam! ExamDiscuss.com now offer the newest CCD-410 exam dumps, the ExamDiscuss.com CCD-410 exam questions have been updated and answers have been corrected get the newest ExamDiscuss.com CCD-410 dumps with Test Engine here:
To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory. What is the best way to accomplish this?
Correct Answer: B
Explanation/Reference: Hadoop has a distributed cache mechanism to make available file locally that may be needed by Map/ Reduce jobs Use Case Lets understand our Use Case a bit more in details so that we can follow-up the code snippets. We have a Key-Value file that we need to use in our Map jobs. For simplicity, lets say we need to replace all keywords that we encounter during parsing, with some other value. So what we need is A key-values files (Lets use a Properties files) The Mapper code that uses the code Write the Mapper code that uses it view sourceprint? 01. public class DistributedCacheMapper extends Mapper<LongWritable, Text, Text, Text> { 02. 03. Properties cache; 04. 05. @ Override 06. protected void setup(Context context) throws IOException, InterruptedException { 07. super.setup(context); 08. Path[] localCacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); 09. 10. if(localCacheFiles != null) { 11. // expecting only single file here 12. for (int i = 0; i < localCacheFiles.length; i++) { 13. Path localCacheFile = localCacheFiles[i]; 14. cache = new Properties(); 15. cache.load(new FileReader(localCacheFile.toString())); 16. } 17. } else { 18. // do your error handling here 19. } 20. 21. } 22. 23. @Override 24. public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 25. // use the cache here 26. // if value contains some attribute, cache.get(<value>) 27. // do some action or replace with something else 28. } 29. 30. } Note: * Distribute application-specific large, read-only files efficiently. DistributedCache is a facility provided by the Map-Reduce framework to cache files (text, archives, jars etc.) needed by applications. Applications specify the files, via urls (hdfs:// or http://) to be cached via the JobConf. The DistributedCache assumes that the files specified via hdfs:// urls are already present on the FileSystem at the path specified by the url. Reference: Using Hadoop Distributed Cache