Project1.2 Parallel Programming using EMR

Goal: filter and analyze a large dataset using Hadoop MapReducer through AWS EMR.

MapReduce configuration

Please do not use “.” (periods) or, in general, any other non alphanumeric characters in your bucket name (the bucket to which your mapper and reducer code is uploaded), otherwise the EMR job might fail.
You may want to preserve your cluster by unchecking the “Terminate on failure” option and adding steps manually in the EMR web console
Config:

jar –cvf mapper.jar Mapper.class
jar –cvf reducer.jar Reducer.class

phoenixpan@Ghost:~/Desktop$ javac -cp test.jar Main.java doesn’t work
phoenixpan@Ghost:~/Desktop$ java -cp test.jar Main will work

java -cp Mapper.jar Mapper
java -cp Reducer.jar Reducer

-files s3://ccproject0102/Mapper.jar,S3://ccproject0102/Reducer.jar

mapr