Question: Page Rank algorithm Hadoop implementation Add the final MapReduce stage so the whole PageRank calculation job can be submitted and run correctly. In HadoopPageRank.java, look

Page Rank algorithm Hadoop implementation

Add the final MapReduce stage so the whole PageRank calculation job can be submitted and run correctly. In HadoopPageRank.java, look for the string place holder:. This is where you should be adding code to start the final MR stage. For the final MR handling, you will need to add two new classes that have names HadoopPageRankResultMapper.java and HadoopPageRankResultReducer.java.

--------------------------------------------------------------

HadoopPageRank.java

import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;

public class HadoopPageRank extends Configured implements Tool { public int run(String[] args) throws Exception { // house keeping int iteration = 0; Path inputPath = new Path(args[0]); Path basePath = new Path(args[1]); FileSystem fs = FileSystem.get(getConf()); fs.delete(basePath, true); fs.mkdirs(basePath); // configure initial job Job initJob = Job.getInstance(getConf(),"pageRank"); initJob.setJarByClass(HadoopPageRank.class); initJob.setMapperClass(HadoopPageRankInitMapper.class); initJob.setReducerClass(HadoopPageRankInitReducer.class); initJob.setOutputKeyClass(Text.class); initJob.setOutputValueClass(Text.class); initJob.setInputFormatClass(TextInputFormat.class); Path outputPath = new Path(basePath, "iteration_" + iteration); FileInputFormat.addInputPath(initJob, inputPath); FileOutputFormat.setOutputPath(initJob, outputPath); // let initJob run and wait for finish if ( !initJob.waitForCompletion(true) ) { return -1; } // calculate the page ranks int totalIterations = Integer.parseInt(args[2]); while ( iteration iteration ++; inputPath = outputPath; // new input is the old output outputPath = new Path(basePath, "iteration_" + iteration); Job mainJob = Job.getInstance(getConf(),"Iteration " + iteration); mainJob.setJarByClass(HadoopPageRank.class); mainJob.setMapperClass(HadoopPageRankMainJobMapper.class); mainJob.setReducerClass(HadoopPageRankMainJobReducer.class); mainJob.setOutputKeyClass(Text.class); mainJob.setOutputValueClass(Text.class); mainJob.setInputFormatClass(TextInputFormat.class); FileInputFormat.setInputPaths(mainJob, inputPath); FileOutputFormat.setOutputPath(mainJob, outputPath); if ( !mainJob.waitForCompletion(true) ) { return -1; } } // collect the result, highest rank first - you will need to finish this up Job resultJob = Job.getInstance(getConf(),"final result"); resultJob.setJarByClass(HadoopPageRank.class);

/* * place holder: * here is the place you will need to add a final Map/Reduce code */ FileInputFormat.setInputPaths(resultJob, outputPath); FileOutputFormat.setOutputPath(resultJob,new Path(basePath, "result")); if ( !resultJob.waitForCompletion(true) ) { return -1; } return 0; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new HadoopPageRank(), args); System.exit(exitCode); } }

-----------------------------------------------------------------------------------------------

HadoopPageRankInitMapper.java

import java.io.IOException;

import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;

public class HadoopPageRankInitMapper extends Mapper {,>

@Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if ( value == null || value.charAt(0) == '#' ) { return; } int tabIndex = value.find("\t"); String nodeA = Text.decode(value.getBytes(), 0, tabIndex); String nodeB = Text.decode(value.getBytes(), tabIndex + 1, value.getLength() - (tabIndex + 1)); context.write(new Text(nodeA), new Text(nodeB)); } }

______________________________________________________________________________

HadoopPageRankInitReducer.java

import java.io.IOException;

import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;

public class HadoopPageRankInitReducer extends Reducer {,>

public static final long TOTAL_WEB_PAGES = 4; @Override public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { boolean first = true; String links = (1.0 / TOTAL_WEB_PAGES) + "\t";

for (Text value : values) { if (!first) links += ","; links += value.toString(); first = false; } context.write(key, new Text(links)); } }

_______________________________________

HadoopPageRankMainJobMapper.java

import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class HadoopPageRankMainJobMapper extends Mapper {,>

@Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if ( value == null || value.getLength() == 0 ) { return; } int tabIdx1 = value.find("\t"); int tabIdx2 = value.find("\t", tabIdx1 + 1); // extract tokens from the current line String page = Text.decode(value.getBytes(), 0, tabIdx1); String pageRank = Text.decode(value.getBytes(), tabIdx1 + 1, tabIdx2 - (tabIdx1 + 1)); String outlinks = Text.decode(value.getBytes(), tabIdx2 + 1, value.getLength() - (tabIdx2 + 1)); // calculate contribution to each target page String[] allNextPages = outlinks.split(","); if ( allNextPages == null || allNextPages.length == 0 ) { return; } double currentPR = Double.parseDouble(pageRank.toString()); int totalNumOfNextPages = allNextPages.length; for (String nextPage : allNextPages) { Text rankContribution = new Text(currentPR/totalNumOfNextPages + ""); context.write(new Text(nextPage), rankContribution); } // put the original links so the reducer is able to produce the correct output context.write(new Text(page), new Text("|" + outlinks)); } }

______________________________________________________________

HadoopPageRankMainJobReducer.java

import java.io.IOException;

import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;

public class HadoopPageRankMainJobReducer extends Reducer { @Override public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { if ( values == null ) { return; } String links = ""; double receivedContribution = 0.0; for (Text value : values) { String content = value.toString(); if (content.startsWith("|")) { links += content.substring("|".length()); } else { receivedContribution += Double.parseDouble(content); },>

} double newPageRank = receivedContribution; context.write(key, new Text(newPageRank + "\t" + links)); }

}

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Please create total 5 codings for below questions. 1. Edge.java 2.Graph.java 3.GraphAlgorithms.java 4.Vertex.java 5.VertexDistance.java Below is a question. Graph Traversals Forthisassignment, you...

Please solve below question. 5 codings should be created. 1. Edge.java 2. Graph.java 3. GraphAlgorithms.java 4. Vertex.java 5. VertexDistance.java Below is questions. Minimum Spanning Trees For this...

module 14 Please solve below question. 5 codings should be created. 1. Edge.java 2. Graph.java 3. GraphAlgorithms.java 4. Vertex.java 5. VertexDistance.java Below is questions. Minimum Spanning Trees...

Which of the following variable types can be used in a switch statement under some circumstances? (Choose three.) A. An enumerated type B. StringBuilder C. Byte D. Double E. var F. Exception.

On May 1, 2015, Brussels Enterprises issues bonds dated January 1, 2015, that have a $3,400,000 par value, mature in 20 years, and pay 9% interest semiannually on June 30 and December 31. The bonds...

In Exercises find the indefinite integral. 12 1 + 9x dx

One hundred items are simultaneously put on a life test. Suppose the lifetimes of the individual items are independent exponential random variables with mean 200 hours. The test will end when there...

AllTalk Technologies manufactures capacitors for cellular base stations and other communications applications. The companys July 2014 flexible budget shows output levels of 6,500, 8,000, and 10,000...

The following data was taken from the accounting records of Pina Colada Corporation: Total assets Total liabilities Preferred shares Common shares Retained earnings Additional data: Net income...

Describe the operating cycle, and provide an example of a business implementing this operating cycle. Have you participated in this cycle either as an employee or a customer? Use sources to support...

1. A firm has 1,000,000 shares outstanding with a price per share of $25.10 (previous to any dividend payment). It decides to pay out cash dividend of $2,100,000. What will the share price be after...

QWE Bank Limited originates a pool of 250 mortgages, each with 30-year maturity and averaging $250,000 with an annual mortgage coupon rate of 6 percent. The mortgage backed security insurance fee is...

On 15 November 2019, a wheat farmer became concerned that by the time her 200 metric tonne crop was ready for delivery in April 2020 that the price would fall below $335 per tonne - her break-even...

Your firm-has A/P with credit terms of 1/15, net 30. What is the annualized cost of skipping the discount on day 15 and paying the full invoice amount on day 30? 24.6% 37.2% 18.6% 8.1%

total monthly production costs are 10,000 on a lot of 250 units. you sell each unit for $150. If all 250 units sell each month, how would you calculate your annual profit

As financial manager of Corton Incorporated, you are investigating a possible acquisition of Denham Lathes. You have the basic data given in the following table. Particulars Corton Denham Forecast...

Catalytic hydrogenation of naphthalene over PdC results in rapid addition of 2 moles of H 2 . Propose a structure for this product.

96. Consider a large population of families, and suppose that the number of children in the different families are independent Poisson random variables with mean . Show that the number of siblings of...

63. Suppose there are n types of coupons, and that the type of each new coupon obtained is independent of past selections and is equally likely to be any of the n types. Suppose one continues...

64. A and B roll a pair of dice in turn, with A rolling first. As objective is to obtain a sum of 6, and Bs is to obtain a sum of 7. The game ends when either player reaches his or her objective, and...