Social Networking Timeline with Heterogeneous Backends

Social Networking Timeline with Heterogeneous Backends

Overall

architecture
3 data sources are MySQL, HBase, and MongoDB. We need to load the given data to the databases and optimized them to have required performance.

Source code: https://github.com/PhoenixPan/CloudComputing/tree/master/Project3.4

Task1: Implementing Basic Login with MySQL on RDS

Dataset:

  1. users.csv [UserID, Password]
  2. userinfo.csv [UserID, Name, Profile Image URL]

Request format: GET /task1?id=[UserID]&pwd=[Password]
Response format: {"name":"my_name", "profile":"profile_image_url"}

If the Id and password combination was found in database, return and display user profile image(as the image below), otherwise return and display error message.

task1

Task2: Storing Social Graph using HBase

Dataset:
links.csv [Followee, Follower]

Request format: GET /task2?id=[UserID]
Response format:

1
2
3
4
5
6
7
8
9
10
11
12
{
"followers":[
{
"name":"follower_name_1",
"profile":"profile_image_url_1"
},
{
"name":"follower_name_2",
"profile":"profile_image_url_2"
}
]
}

In HBase, we store the database as key:follower, column: followee1, followee2, followee3,....

  1. Get all followees of the given user
  2. Extract the followees’ IDs
  3. Sort by name in ascending alphabet order
  4. Find their profile image in MySQL database just like we did in Task1 and return

Eventally, we could display something like:
task2

Task3: Build Homepage using MongoDB

Dataset: posts.csv [posts in JSON format]
Request format: GET /task3?id=[UserID]
Response format: {"posts":[{post1_json}, {post2_json}, ...]}

Find posts which match the required “uid”, sort them in ascending timestamp order, and return as response.

  1. Remove the _id field when getting JSON from MongoDB.

Example results:
task3_1

Task4: Put Everything Together

Request format: http://backend-public-dns:8080/MiniSite/task4?id=99
Response format:
One single JSON object includes user name, user profile, an array of followers, and an array of posts.

1
2
3
4
jsonResponse.put("name", name);
jsonResponse.put("profile", profile);
jsonResponse.put("followers", followersArray);
jsonResponse.put("posts", postsArray);

As the title suggests, we put the previous three tasks together. We need to:

  1. Get user profile
  2. Get all the followers
  3. Get the most recent 30 posts for each user, which are sorted by timestamp and then by post id
  4. Return the most recent 30 posts from all selected posts in step 3
  • This is a complicated system, I put details of design such a News Feed System in an separate article.

Final Result:
task4

Bonus Task: Basic Recommendation

Request format: http://backend-public-dns:8080/MiniSite/task5?id=<user_id>
Response format: 10 users that appeared most frequently

We were asked to implement a very simple and yet successful recommendation model, Collaborative filtering. Simply speaking, in a directed graph, we need to find all qualified user R, where min_distance(me, R) = 2. For example:

  1. Given:
    Followee A follows {B, C, D}
    Followee B follows {C, E, A}
    followee C follows {F, G}
    followee D follows {G, H}
  2. To recommend to A, we collaborate B, C, D and get:
    {A:1, C: 1, E: 1, F: 1, G: 2, H: 1}
  3. Then we remove A’s direct followee {B, C, D}
  4. Eventually we have {G: 2, E: 1, F: 1, H: 1}

In our project, we will do:

  1. Find all followees(E) of the user
  2. Find all followees of the followees(EE)
  3. Store them(EE) in HashMap
  4. Remove direct followee(E)
  5. Return the first 10 user with the most frequent appearence

Final Result:
task5