This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(M*N) where M, N is row and col of matrix
Space Complexity: O(M*N) since we used list to store the result, where M, N is row and col of matrix
System design interview: how to design a feeds system (e.g., Twitter, Instagram and Facebook news feed)
Methodology: READ MF!
Please use this "READ MF!" framework for software engineer system interview purpose.
Key designs and terms
So far the best detailed explanation on designing twitter is from Raffi (used to be VP of Twitter) on QCon. Presentation, Slides [All credits go to QCon). Raffi is very smart and articulate, really solid guy!
Read heavy system, not write heavy, optimized for user timeline. To be clear, there are two timelines, one is user's own tweets (easy to do), the other is the main timeline which is an aggregation of all the tweets from the people that user follows.
Pre-calculate all the timelines. This is the interesting part of the design vs using a mysql and having index to query in realtime, which would not be scalable. When a tweet is posted, the tweets service would
Store this tweet in memory, and that later would be flushed to a main DB
Call the fanout deliver service to publish this tweet to all the users' timeline that followed this particular user. It could simply store a tweet ID (and later the content could be retrieved from the Tweets Cache) or hydrating the entire text content is also fine (note how we want to handle eidt or delete, twitter probably doesn't allow delete)
Call search service to index (Lucene). The search index is also hosted in memory on Redis. Note search here needs to fanout to all search clusters but due to the in memory hosting, it's acceptable.
Always remember disk access is at least 100k times slower than memory access, e.g, disk is 10ms vs 100ns on memory. https://gist.github.com/jboner/2841832
With the pre-caculate timeline design, there might be race conditions when celebrity (people with millions of followers) starts to talking to each other with replies. E.g., celebrity A tweets something, takes 30 sec to deliver to all the followers, celebrity B replies before deliver finishes, some followers follow both A and B might see B's reply first before A's original post. One cheat workaround is to sort by timestamp or tweet IDs, but they are also experimenting with only pre-calculate non-celebrity tweets, and when generating timeline, realtime fetching the celebrity tweets. It depends which way is better and also the user experience. This is a good stop point to talk to your interviewers in real world about tradeoffs.
Since twitter is heavily relying on cache, you might want to checkout how they optimized the caching with twemproxy.
It's a straight forward implementation question, we can simply just simulate the spiral order and print. You can choose to do it either iteratively (see reference to download the official Leetcode solution) or use recursion.
There is this awesome one line solution from this guy which is pretty insane.
defspiralOrder(self, matrix):
return matrix and list(matrix.pop(0)) + self.spiralOrder(zip(*matrix)[::-1])
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(M*N) where M, N is row and col of matrix
Space Complexity: O(M*N) since we used list to store the result, where M, N is row and col of matrix
You are given coins of different denominations and a total amount of money amount.
Write a function to compute the fewest number of coins that you need to
make up that amount. If that amount of money cannot be made up by any
combination of the coins, return -1.
Example 1:
This is a classic problem and greedy might be the first thing comes into mind. For example, sort the coins on denomination and always use the largest amount coin. However, the optimal solution might not include the largest denomination.
for example , [1, 3, 4] and target value 6. The optimal is two 3s, not [4, 1, 1].
It seems we have to list out all the different combinations and find the min value, that leads to a brute fore solution as described here using recursion. It would be an exponential time complexity.
We can use a hash map as a lookup to remember for each amount, what would be the min cost. That would significantly reduce the time complexity from exponential to O(S*N) where S is the amount and N is the coins array size. It's not obvious why but once you see the DP solution, you can understand better, it's the same time complexity.
Another way is to use dynamic programming because the problem is asking for a single yes/no, extreme value(min, max), building up a lookup table using formula is the coding pattern. We have similar problems, Wild Card Matching, Regular Expression Matching
The idea is to build a lookup table for each amount, what's the minimal number of coins needed given current denomination.
lookup[i] = min(lookup[i], lookup[i - coin value] + 1) given i - coin value >= 0
Solutions
Dynamic Programming
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
从growth at all cost 到profitability mindset,都是被Uber Wework 弄得。软银老板Masa的头发估计又得脱落几根,不过airbnb也许是2020年被大家追捧的一个新星,从cash at hand / total funds raised 的比例来看,就是下一个google,Facebook和zoom的节奏啊!
Problem Statement
Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.
According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at leasth citations each, and the other N − h papers have no more thanh citations each."
Example:
Input:citations = [3,0,6,1,5]Output: 3
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had
received 3, 0, 6, 1, 5 citations respectively.
Since the researcher has 3 papers with at least3 citations each and the remaining
two with no more than3 citations each, her h-index is 3.
Note: If there are several possible values for h, the maximum one is taken as the h-index.
Hint 1 An easy approach is to sort the array first. Hint 2 What are the possible values of h-index? Hint 3 A faster approach is to use extra space.
It’s not the most straightforward question to understand I have to admit that. I normally start with small examples to help me understand
The requirement is: his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each
What happens if citation contains only 1 element?
Citations = [100] , N = 1, h = 1, N-h = 0
This gives us hint that h has an upper bound of N
What happens if citation contains only 2 elements? And we can assign some large citation values
Citations = [1000, 2000] , N = 2, h = 2. It seems h-index has not too much to do with the citation values itself, rather it’s about the index and the values.
We should now have a super brute force way, since h is bounded to be [0, N], we just go through each value and try to see if it could meet the h criteria (e.g., we start from N, then we see if there is at least N papers that have at least N citations) This would result in a O(N^2) time complexity.
Naturally, we normally can think of if we can sort the array to reduce the time complexity. After sort, using the below graph, the problem becomes find the citation more than the citation array index, and that citation index could be the h-index (that's why it's called h-index, not h-value, it's about index). This would give a O(NlgN) time complexity. Code is attached in the References section.
Finally, we can think about bucket sort since h-index is bounded from [0, N], and if paper citations are larger than N, it's just considered on counting on index N.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Each node in the graph contains a val (int) and a list (List[Node]) of its neighbors.
class Node {
public int val;
public List<Node> neighbors;
}
Test case format:
For simplicity sake, each node's value is the same as the node's index (1-indexed). For example, the first node with val = 1, the second node withval = 2, and so on. The graph is represented in the test case using an adjacency list.
Adjacency list is a collection of unordered lists used to represent a finite graph. Each list describes the set of neighbors of a node in the graph.
The given node will always be the first node with val = 1. You must return thecopy of the given nodeas a reference to the cloned graph.
Example 1:
Input: adjList = [[2,4],[1,3],[2,4],[1,3]]
Output: [[2,4],[1,3],[2,4],[1,3]]
Explanation: There are 4 nodes in the graph.
1st node (val = 1)'s neighbors are 2nd node (val = 2) and 4th node (val = 4).
2nd node (val = 2)'s neighbors are 1st node (val = 1) and 3rd node (val = 3).
3rd node (val = 3)'s neighbors are 2nd node (val = 2) and 4th node (val = 4).
4th node (val = 4)'s neighbors are 1st node (val = 1) and 3rd node (val = 3).
Example 2:
Input: adjList = [[]]
Output: [[]]
Explanation: Note that the input contains one empty list. The graph consists of only one node with val = 1 and it does not have any neighbors.
Example 3:
Input: adjList = []
Output: []
Explanation: This an empty graph, it does not have any nodes.
Example 4:
Input: adjList = [[2],[1]]
Output: [[2],[1]]
Constraints:
1 <= Node.val <= 100
Node.valis unique for each node.
Number of Nodes will not exceed 100.
There is no repeated edges and no self-loops in the graph.
The Graph is connected and all nodes can be visited starting from the given node.
It's a basic graph problem that can be solved in 3 different ways: BFS using queue, DFS using recursion, DFS using stack(very similar to BFT using queue). The trick is using a map to keep a one-to-one mapping between the old nodes and the copied nodes, meanwhile, we can also use the map to avoid a cycle when performing BFS or DFS.
Solutions
BFS using queue
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N)
Space Complexity: O(N) the extra queue and map
DFS using recursion
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N)
Space Complexity: O(N) the extra map
DFS using stack
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N)
Space Complexity: O(N) the extra queue and stack
The main test method
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There are N children standing in a line. Each child is assigned a rating value.
You are giving candies to these children subjected to the following requirements:
Each child must have at least one candy.
Children with a higher rating get more candies than their neighbors.
What is the minimum candies you must give?
Example 1:
Input: [1,0,2]
Output: 5
Explanation: You can allocate to the first, second and third child with 2, 1, 2 candies respectively.
Example 2:
Input: [1,2,2]
Output: 4
Explanation: You can allocate to the first, second and third child with 1, 2, 1 candies respectively.
The third child gets 1 candy because it satisfies the above two conditions.
The straight forward idea is go through the ranking array, for each ranking, we try to make sure it has more points than its left and right neighbor. Note that it’s not going to work if you just do that because while you are increasing the values, it’s a chain effect that will affect the previous values as well, therefore we have to go all the way to the beginning.
Use below ranking as an example, assume everyone has 1 point already
Ranking: [5, 3, 1]
Points [1, 0, 0]
When at 5, [1, 0, 0]
When at 3, [2, 1, 0]
When at 1, [2, 2, 1]
you might think 5 points is enough, but it’s wrong. The culprit is when you at 1, and increase child at 3, you have the potential to break the assumption that child with 3 ranking always has less than its left neighbor if that neighbor has more points. Therefore, we have to go all the way to the beginning and comparison, thus resulting O(N^2) time complexity
That said, we have to go all the way to the beginning when we increase any value, so in the end it looks like [3, 2, 1]
Hint: if you cannot solve it in one pass, can we solve it in two passes?
A more clever thought is trying to distribute the candies in two passes. Previously we are trying to ensure at certain point both the left and right constrains should met. With two passes, namely first pass (from left to right) ensure all the left neighbors constrains are satisfied, and second pass ensures all the right neighbors contains are satisfied.
Left to right pass: Ensure I have more points than my left neighbor if my ranking is higher. Else I just have 1 candy (this is minimum)
Right to left pass: Ensure I have more points than my right neighbor if my ranking is higher.
Solutions
Naive solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N^2)
Space Complexity: O(N) the extra candies array
Two pass linear solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Normally we just implement as we do at work, which is keep increasing the number and do an AND. It will still pass the OJ, just need to pay attention to overflow situation (using a long would solve the problem)
Now we are pushing ourselves, can we solve it more than linear. Only log/binary is faster than linear. The idea is to find the common left bits of m and n, and later shift n (total number of digits - common length digits) because the right part would end up to be 0.
For example, from 4 to 7, thte common left part is 1, the range and value would be 100 (which is n left shift twice)
1 00
1 01
1 10
1 11
Solutions
Linear solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N) essentially n - m
Space Complexity: O(1) no extra space is needed
Logarithmic solution
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Given a binary tree, return the zigzag level order traversal of its nodes' values. (ie, from left to right, then right to left for the next level and alternate between).
For example:
Given binary tree [3,9,20,null,null,15,7],
It’s purely an implementation problem to be honest. Very similar to to binary tree level order traversal, we just need to use some data structure to hold the values while we control the traversal order. I choose to use a deque (i.e., doubled linked list) to keep the order. You can also use a queue like many other online solutions but note when calling add(0, value) to an array list is not very efficient since to have array copy every time.
Solutions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a great problem and I highly recommend we solve it in both DFS and BFS for level order traversal because it covers everything we need to know about trees in interviews.
This would be a better test case because right subtree would not be deep enough to cover the left subtree
1 <---
/ \
2 3 <---
\
5 <---
[1, 3, 5]
Level order traversal using BFS(queue)
It should be relatively natural for us think about using level order traversal and we can simply use a queue and always find the right most element when we pop out element from each level.
Level order traversal using DFS(map)
A pre-order traversal way, just like order level traversal using DFS with a map(key is depth, value is node). This method utilizes a depth to value map to record the right most value on each order while performing a pre-order traversal, namely record the node value, go left then go right. This way, right value can always overwrite the left value thus keeping a “right side view”
I personally don't like the leetcode official solution, where in DFS you don’t need two stacks and in
BFS you don’t need the maps.
Follow up, how about implement a left side view? Trivial right?
Solutions
Level order traversal using BFS(queue)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N) since we visit each node once
Space Complexity: O(N), more precisely the number of element on the last level, aka queue size when it’s a complete tree
Level order traversal using DFS(map)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Time Complexity: O(N) since we visit each node once
Space Complexity: O(lgN) because we are using a map and map size should be tree height, which worst case could be O(N)
A dieter consumes calories[i] calories on the i-th day.
Given an integer k, for every consecutive sequence of k days (calories[i], calories[i+1], ..., calories[i+k-1] for all 0 <= i <= n-k), they look at T, the total calories consumed during that sequence of k days (calories[i] + calories[i+1] + ... + calories[i+k-1]):
If T < lower, they performed poorly on their diet and lose 1 point;
If T > upper, they performed well on their diet and gain 1 point;
Otherwise, they performed normally and there is no change in points.
Initially, the dieter has zero points. Return the total number of points the dieter has after dieting for calories.length days.
Note that the total points can be negative.
Example 1:
Input: calories = [1,2,3,4,5], k = 1, lower = 3, upper = 3
Output: 0
Explanation: Since k = 1, we consider each element of the array separately and compare it to lower and upper.
calories[0] and calories[1] are less than lower so 2 points are lost.
calories[3] and calories[4] are greater than upper so 2 points are gained.
Example 2:
Input: calories = [3,2], k = 2, lower = 0, upper = 1
Output: 1
Explanation: Since k = 2, we consider subarrays of length 2.
calories[0] + calories[1] > upper so 1 point is gained.
Example 3:
Input: calories = [6,5,0,0], k = 2, lower = 1, upper = 5
Output: 0
Explanation:
calories[0] + calories[1] > upper so 1 point is gained.
lower <= calories[1] + calories[2] <= upper so no change in points.
calories[2] + calories[3] < lower so 1 point is lost.
This is an easy problem but might be a bit hard to code it up correctly in one pass. We can use a sliding window to keep a rolling sum and compare it with upper and lower bound.
A few caveats in implementation:
We can simplify using two pointers start and end by keeping an index i (start) and i - k (end)
We can only have one loop instead of having two separate loops
sliding window
Solutions
Sliding window
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters