15619 Project1.2 Guide

Project1.2 Parallel Programming using EMR

Goal: filter and analyze a large dataset using Hadoop MapReducer through AWS EMR.

MapReduce configuration

  1. Please do not use “.” (periods) or, in general, any other non alphanumeric characters in your bucket name (the bucket to which your mapper and reducer code is uploaded), otherwise the EMR job might fail.
  2. You may want to preserve your cluster by unchecking the “Terminate on failure” option and adding steps manually in the EMR web console
  3. Config:

image

jar –cvf mapper.jar Mapper.class
jar –cvf reducer.jar Reducer.class

phoenixpan@Ghost:~/Desktop$ javac -cp test.jar Main.java doesn’t work
phoenixpan@Ghost:~/Desktop$ java -cp test.jar Main will work

image

java -cp Mapper.jar Mapper
java -cp Reducer.jar Reducer

-files s3://ccproject0102/Mapper.jar,S3://ccproject0102/Reducer.jar

Extra practice: find common friends

mapr