- Your news feed consists of the timelines of all your friends or users that you are following
- Core algorithm: merge k sorted array/list/any
- Daily active users(DAU), monthly active users(MAU), query per second(QPS). Peak value and growth expectation
- Number of reads and writes
Get the first N tweets from K(all of your) following users, merge them, and then take the first N tweets.
Problem: Long waiting period. It takes K times blocking (阻塞型) DB read request to load the news feed.
Solution: Cache, for both timeline and news feed:
- Cache timeline: cache the most recent n tweets for each user (DB read –> cache read)
- Cache news feed: If read within x seconds from the last time, read from cache. If not, only merge the K tweets after a certain timestamp (such as since the last update)
Store a news feed list in the database for each user. When a user publishes a new tweet, do a fanout to push the tweet to all the news feed lists of his/her followers. The fanout is performed asynchronously in another task.
Problem: Delayed display for some users. If we have a star user who has 100 million followers, the fanout will have 100 million write operations, which could take hours and is unacceptable for those unlucky followers.
Solution: Rank users by active level; find star users(who have a lot of followers) and use pull model for them.
|DB read||K (num of following)||1|
|DB write||1||K (num of follower)|
|Demand for resources||expensive (memory)||cheap (disk)|
|Delay||low||high (to hear from stars)|
|Active level||users post frequently||users post unfrequently|