Saturday, February 29, 2020

Leetcode solution 146. LRU Cache

Problem Statement 

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.
get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.
The cache is initialized with a positive capacity.
Follow up:
Could you do both operations in O(1) time complexity?
Example:
LRUCache cache = new LRUCache( 2 /* capacity */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.put(4, 4);    // evicts key 1
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

The LRU cache has become an absolute standard interview question that we have to know how Java's LinkedHashMap or Python's OrderedDict works under the cover (A doubly linked list and a hashmap). I would actually recommend implementing a doubly linked list + hashmap solution in real interviews.

Follow up is how to implement a least frequent cache (LFU)?
  • A doubly linkedlist + hashmap + A min heap (to count), however, both get and put would be O(LgN) since the heap would be modified due to the frequency increased by 1
  • A doubly linkedlist + hashmap + ArrayList<Frequency, doubly linked list), Node structure would also carry a Frequency field.  For each frequency, we keep track of the elements and move them around when accessed. Get would be O(1), Put could potentially be worst case unbounded (we have to loop from frequency 0 to integer.max_value), or could be optimized to O(N)

Solutions

Using LinkedHashMap

Time Complexity: O(1) for both get and set
Space Complexity: O(N) Where N is the number of items inserted

Using Doubly LinkedList & Hashmap

Time Complexity: O(1) for both get and set
Space Complexity: O(N) Where N is the number of items inserted

References

Saturday, February 22, 2020

Leetcode solution 72. Edit Distance

Problem Statement 

Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.
You have the following 3 operations permitted on a word:
  1. Insert a character
  2. Delete a character
  3. Replace a character
Example 1:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')
Example 2:
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation: 
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

Brute force sounds like really overkill because we have to list all the possible ways and it is exponential time complexity. On a high level, the recursion looks like
  • if cannot convert A to B, return -1 (the recursion termination function); if A == B, then return 0
  • try 3 different ways, insert, remove and replace, cut that character and continue the recursion. Compare the minimum of that 3 methods (exclude -1)
Edit distance is a classic Dynamic Programming (DP) problem, just like Coin Change Problem. It follows the DP template perfectly (asking for extreme values)



The mathematical induction function is
DP[i][j] is the minimum operations needed to convert String[0-i] to String[0-j]
Initial values: dp[0][j] = j and dp[i][0] = i
If (A[i] == B[j]) dp[i][j] =  dp[i-1][j-1]  
Else min(dp[i][j-1], dp[i-1][j], dp[i-1][j-1]) +1;

Solutions

DP

Time Complexity: O(M*N) where M is word1 length and N is word2 length
Space Complexity: O(M*N) since we need an extra 2D array

References

Saturday, February 8, 2020

Leetcode solution 243: Shortest Word Distance


Problem Statement 

Given a list of words and two words word1 and word2, return the shortest distance between these two words in the list.
Example:
Assume that words = ["practice", "makes", "perfect", "coding", "makes"].
Input: word1 = “coding”, word2 = “practice”
Output: 3
Input: word1 = "makes", word2 = "coding"
Output: 1
Note:
You may assume that word1 does not equal to word2, and word1 and word2 are both in the list.
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

It is a bit confusing at first to be confused with the "Edit Distance" problem. However, there are duplicate words in the array, it's just a matter of finding the minimum distance between two words, where words could be found in multiple places in the array.

Brute force way (for each word, find the closest distance with the other word) would give you O(N^2) time complexity where N is the array size. An O(N) optimization would be having two indices for each word and keep updating the minimum distance. It is greedy because the closer the two elements are, the smaller the distance would be.

You can refer to Leetcode official solution for a detailed explanation.

Solutions

Implementation

Time Complexity: O(N) where N is the array size
Space Complexity: O(1) Constant space

References

Saturday, February 1, 2020

System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)

System design interview: how to design a chat system (e.g., Messenger, WeChat or WhatsApp)

Methodology: READ MF!

update Dec 2022: Extend READ MF to BREAD MF (Bread My Friend) :). B stands for Business here, where even in design interviews, before the requirement, more senior folks tend to ask and want to understand the business use case, why we want to build this. No need to deep dive into it, but get the point checked. 

For system design interview questions, normally we should follow the "READ MF!" steps. You can easily remember this as "Read! Mother Fucker!", similar to RTFM, with an attitude to your interviewer (Because if I can design this complicated system in 60 minutes, why would I waste my time interviewing here? JK)

READ MF! (Requirement, Estimation, Architecture, Details, Miscellaneous, Future)

  1. Requirement clarification, keep it simple first and make some assumptions to make your life easier, later we can drill down or improve.
  2. Estimation on scalability (QPS, storage size, network bandwidth per sec) the scalability determines the final architecture since we should not over engineering things, keep it simple first 
  3. Design system Architecture & Layers, major services and responsibilities, front-end/edge layer, service layer & storage layer 
  4. Drill down Details into each component, e.g, data models on storage layers, API interface between micro services 
  5. Miscellaneous. Talk about design trade-offs, bottlenecks, peak traffic handling, failover plan, monitoring & alerting, security concerns. Talk things that you are most comfortable with
  6. Future work on if new features are added, how system would be extended to support them

Key designs and terms 



  • WeSocket(On OSI 7, application layer) is more suitable for real time chatting over HTTP due to the bidirectional nature (HTTP long polling is also not efficient)
  • Maintaining connection efficiently. We should reuse those socket connections since it's not efficient to recreate them (still under IP&TCP), also utilize each machine in Chat Service to host more connections. In real world, WhatsApp tech stack Erlang + FreeBSD can host 1 Million connections per commodity host. 
  • Data storage solutions. For large volumn writes and range query (for latest messages), we might not want to go with relational database (write is not super efficient for large scale) and we want to read from memory as well. We could cache over a relational DB but we can also choose HBase (LSM Tree), Cassandra(SSTables) or even Redis
  • Sent, Delivered & Seen State are easy given the above graph. Sent is when Chat Service returns success. Delivered when message is written into the main coveration storage, Seen is when user pulls the latest chat history or connection is open chat service successfully pushed the message. Tying is simply a listener on the client side.

Baozi Youtube Video

Existing Resources (Credits to original authors)

  •  Rick Reed on F8 about WhatsApp Design [Video]
    • Culture is important! 57 engineers in total to reach 1B WhatsApp users, including clients and server sides 
    • Design principle: Just enough engineering. Only provide the most essential features. E.g., simple layers between services, go as native as possible for performance gains, simple data replication by keeping a hot copy
    • Small number of servers, each server support 1M connections. Not just about cost saving, also about easier to maintain (since you have less servers), How to scale to Millions of Simultaneous Connections: Video, Slide  
    • Transient messages when users are offline: WhatsApp won’t delete those messages unless all recipients get it
  • High Scalability: The WhatsApp Architecture Facebook Bought For $19 Billion
  • Tushar Roy on Youtube