News Feed System

News Feed System

Basics

  1. Your news feed consists of the timelines of all your friends or users that you are following
  2. Core algorithm: merge k sorted array/list/any
  3. Daily active users(DAU), monthly active users(MAU), query per second(QPS). Peak value and growth expectation
  4. Number of reads and writes
  5. Examples:
    • Twitter
    • Facebook
    • 朋友圈

Pull model and Push model

Pull model

Get the first N tweets from K(all of your) following users, merge them, and then take the first N tweets.

Problem: Long waiting period. It takes K times blocking (阻塞型) DB read request to load the news feed.

Solution: Cache, for both timeline and news feed:

  1. Cache timeline: cache the most recent n tweets for each user (DB read –> cache read)
  2. Cache news feed: If read within x seconds from the last time, read from cache. If not, only merge the K tweets after a certain timestamp (such as since the last update)

Push mode

Store a news feed list in the database for each user. When a user publishes a new tweet, do a fanout to push the tweet to all the news feed lists of his/her followers. The fanout is performed asynchronously in another task.

Problem: Delayed display for some users. If we have a star user who has 100 million followers, the fanout will have 100 million write operations, which could take hours and is unacceptable for those unlucky followers.

Solution: Rank users by active level; find star users(who have a lot of followers) and use pull model for them.

Comparison

Item pull push
DB read K (num of following) 1
DB write 1 K (num of follower)
Demand for resources expensive (memory) cheap (disk)
Delay low high (to hear from stars)
Active level users post frequently users post unfrequently