Saturday, July 4, 2020

System design interview: how to design a simple twitter search system


Methodology: READ MF!

[Originally from the Post: System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)]

Remind ourselves with the "READ MF!" methodology.

Requirements

Simple text based search with "AND" or "OR" support. E.g., give me all the tweets that have words "Black" AND "Life" in it sorted by published time in descending order.

search(api_dev_key, search_terms, maximum_results_to_return, sort, page_token)


A follow up question would be sort those results in other orders, E.g., give me all the tweets that have words "Black" AND "Life" in it sorted by the most commented tweets in descending order.

Estimation

  • Storage: assuming twitter has 1B users, 50% of them post a tweet per day, each tweet is 140 char = 140 Byte. Each day, 1B * 0.5 * 280(roughly 300)Byte = 150GB data
  • Write QPS = 1B user * 0.5 post a tweet per day / 86400 = 5787 QPS = 6000 QPS
  • Query QPS = assuming 3x than write QPS = 18000 QPS
Write QPS in general we don't worry too much since it could be processed asynchronously. 18000 QPS for read is easy to handle as long as we have enough number of boxes and most of query would be returned via memory (not hitting disk). 

Key designs and terms


You can also divided the system as usual into three major layers, presentation, service and data layer.

A few key terminologies.

If the follow up question, we just need to build another index given the sort key or we could fetch the results on flight and sort (local optimal). Follow "Index -> Search -> Rank" steps.

  Baozi Youtube Video


References (Credits to original authors)

1 comment:

Thank your for your comment! Check out us at https://baozitraining.org/ if you need mock interviews!