Saturday, February 8, 2020

Leetcode solution 243: Shortest Word Distance


Problem Statement 

Given a list of words and two words word1 and word2, return the shortest distance between these two words in the list.
Example:
Assume that words = ["practice", "makes", "perfect", "coding", "makes"].
Input: word1 = “coding”, word2 = “practice”
Output: 3
Input: word1 = "makes", word2 = "coding"
Output: 1
Note:
You may assume that word1 does not equal to word2, and word1 and word2 are both in the list.
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

It is a bit confusing at first to be confused with the "Edit Distance" problem. However, there are duplicate words in the array, it's just a matter of finding the minimum distance between two words, where words could be found in multiple places in the array.

Brute force way (for each word, find the closest distance with the other word) would give you O(N^2) time complexity where N is the array size. An O(N) optimization would be having two indices for each word and keep updating the minimum distance. It is greedy because the closer the two elements are, the smaller the distance would be.

You can refer to Leetcode official solution for a detailed explanation.

Solutions

Implementation

Time Complexity: O(N) where N is the array size
Space Complexity: O(1) Constant space

References

Saturday, February 1, 2020

System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)

System design interview: how to design a chat system (e.g., Messenger, WeChat or WhatsApp)

Methodology: READ MF!

For system design interview questions, normally we should follow the "READ MF!" steps. You can easily remember this as "Read! Mother Fucker!", similar to RTFM, with an attitude to your interviewer (Because if I can design this complicated system in 60 minutes, why would I waste my time interviewing here? JK)

READ MF! (Requirement, Estimation, Architecture, Details, Miscellaneous, Future)
  1. Requirement clarification, keep it simple first and make some assumptions to make your life easier, later we can drill down or improve.
  2. Estimation on scalability (QPS, storage size, network bandwidth per sec) the scalability determines the final architecture since we should not over engineering things, keep it simple first 
  3. Design system Architecture & Layers, major services and responsibilities, front-end/edge layer, service layer & storage layer 
  4. Drill down Details into each component, e.g, data models on storage layers, API interface between micro services 
  5. Miscellaneous. Talk about design trade-offs, bottlenecks, peak traffic handling, failover plan, monitoring & alerting, security concerns. Talk things that you are most comfortable with
  6. Future work on if new features are added, how system would be extended to support them

Key designs and terms 



  • WeSocket(On OSI 7, application layer) is more suitable for real time chatting over HTTP due to the bidirectional nature (HTTP long polling is also not efficient)
  • Maintaining connection efficiently. We should reuse those socket connections since it's not efficient to recreate them (still under IP&TCP), also utilize each machine in Chat Service to host more connections. In real world, WhatsApp tech stack Erlang + FreeBSD can host 1 Million connections per commodity host. 
  • Data storage solutions. For large volumn writes and range query (for latest messages), we might not want to go with relational database (write is not super efficient for large scale) and we want to read from memory as well. We could cache over a relational DB but we can also choose HBase (LSM Tree), Cassandra(SSTables) or even Redis
  • Sent, Delivered & Seen State are easy given the above graph. Sent is when Chat Service returns success. Delivered when message is written into the main coveration storage, Seen is when user pulls the latest chat history or connection is open chat service successfully pushed the message. Tying is simply a listener on the client side.

Baozi Youtube Video

Existing Resources (Credits to original authors)

  •  Rick Reed on F8 about WhatsApp Design [Video]
    • Culture is important! 57 engineers in total to reach 1B WhatsApp users, including clients and server sides 
    • Design principle: Just enough engineering. Only provide the most essential features. E.g., simple layers between services, go as native as possible for performance gains, simple data replication by keeping a hot copy
    • Small number of servers, each server support 1M connections. Not just about cost saving, also about easier to maintain (since you have less servers), How to scale to Millions of Simultaneous Connections: Video, Slide  
    • Transient messages when users are offline: WhatsApp won’t delete those messages unless all recipients get it
  • High Scalability: The WhatsApp Architecture Facebook Bought For $19 Billion
  • Tushar Roy on Youtube

Saturday, January 25, 2020

Leetcode solution 1292: Maximum Side Length of a Square with Sum Less than or Equal to Threshold

Problem Statement 

Given a m x n matrix mat and an integer threshold. Return the maximum side-length of a square with a sum less than or equal to threshold or return 0 if there is no such square.

Example 1:

Input: mat = [[1,1,3,2,4,3,2],[1,1,3,2,4,3,2],[1,1,3,2,4,3,2]], threshold = 4
Output: 2
Explanation: The maximum side length of square with sum less than 4 is 2 as shown.
Example 2:
Input: mat = [[2,2,2,2,2],[2,2,2,2,2],[2,2,2,2,2],[2,2,2,2,2],[2,2,2,2,2]], threshold = 1
Output: 0
Example 3:
Input: mat = [[1,1,1,1],[1,0,0,0],[1,0,0,0],[1,0,0,0]], threshold = 6
Output: 3
Example 4:
Input: mat = [[18,70],[61,1],[25,85],[14,40],[11,96],[97,96],[63,45]], threshold = 40184
Output: 2

Constraints:
  • 1 <= m, n <= 300
  • m == mat.length
  • n == mat[i].length
  • 0 <= mat[i][j] <= 10000
  • 0 <= threshold <= 10^5
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

There is an absolute brute force way is to calculate the sum of each square in every iteration. There are previous similar problems that we can have a prefix sum matrix, similar to the rolling sum idea on one dimensional array.

A prefix sum matrix is a matrix that for each cell[i, j] represents the sum in original matrix from matrix[0, 0] to matrix [i, j] (inclusive)

The advantage is later we can get any rectangle sum simply performing below

sum from [i, j] to [i + len][j + len] = prefixSum[i + len][j + len] - prefixSum[i - 1][j + len] - prefixSum[i + len][j - 1] + prefixSum[i - 1][j - 1]
Two more optimizations we can do
  • We can apply binary search instead of a linear one to find the max length
  • We can also keep a record of the sum while building up the prefix sum matrix. The reason this works is if it has a square say 3x3 that meets the requirement, then it would definitely have a square 2x2 meet the requirement. So we can start from 1x1, 2x2 till we reach the max. Pretty smart! Kudos to @drunkpiano


Solutions

Prefix sum implementation

Time Complexity: O(M*N*min(M,N)) where M is row N is col, essentially O(N^3)
Space Complexity: O(M*N) since we need

References

Saturday, January 18, 2020

Leetcode solution 229: Major Element II


Problem Statement 

Given an integer array of size n, find all elements that appear more than ⌊ n/3 ⌋ times.
Note: The algorithm should run in linear time and in O(1) space.
Example 1:
Input: [3,2,3]
Output: [3]
Example 2:
Input: [1,1,1,3,3,2,2,2]
Output: [1,2]
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

It's a follow up question on Majority Element, given the hint, there should only be at most 2 elements that can satisfy the requirement (You cannot have 3 elements that all appear more than n/3 times). We can perform the voting algorithm on two potential candidates and later do an actual sum to eliminate the false positive (e.g., [1, 2, 1] would have both 1 and 2 as potential candidate, but 2 should not be included)

You can refer to Majority Element for a detailed explanation on the voting algorithm.

An extension could be changing n/3 to n/k. We just need extra arrays to store votes and candidates. It will take O(Nk) time complexity

Solutions

Voting implementation

Time Complexity: O(N) where N is the array size
Space Complexity: O(1) Constant space

References

Saturday, January 11, 2020

Leetcode solution 169: Major Element


Problem Statement 

Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times.
You may assume that the array is non-empty and the majority element always exist in the array.
Example 1:
Input: [3,2,3]
Output: 3
Example 2:
Input: [2,2,1,1,1,2,2]
Output: 2
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

Simple but great question that can be solved in many ways, the voting one approves to be the most time and space efficient solution.

You can refer to Leetcode official solution for a detailed explanation.

Solutions

Voting implementation

Time Complexity: O(N) where N is the array size
Space Complexity: O(1) Constant space

References

Saturday, January 4, 2020

Leetcode solution 6: ZigZag Conversion


Problem Statement 

The string "PAYPALISHIRING" is written in a zigzag pattern on a given number of rows like this: (you may want to display this pattern in a fixed font for better legibility)
P   A   H   N
A P L S I I G
Y   I   R
And then read line by line: "PAHNAPLSIIGYIR"
Write the code that will take a string and make this conversion given a number of rows:
string convert(string s, int numRows);
Example 1:
Input: s = "PAYPALISHIRING", numRows = 3
Output: "PAHNAPLSIIGYIR"
Example 2:
Input: s = "PAYPALISHIRING", numRows = 4
Output: "PINALSIGYAHRPI"
Explanation:

P     I    N
A   L S  I G
Y A   H R
P     I
 Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

It's a very good implementation problem, by simply simulating each character in the string to each row, it would take extra O(N) space to hold up the array where N is the string size. However, there is definitely a difference between a good and poor implementation.

There is also a math formula we can use, please see attached pdf for solutions.

Solutions

Simulation poor implementation

Time Complexity: O(N) where N is the string size
Space Complexity: O(N) since we used an extra array to store each character in the string

Simulation good implementation

Time Complexity: O(N) where N is the string size
Space Complexity: O(N) since we used an extra array to store each character in the string

References