Sunday, November 18, 2018

Leetcode Solution 438: Find All Anagrams in a String


Problem Statement 


Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.
Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.
The order of output does not matter.
Example 1:
Input:
s: "cbaebabacd" p: "abc"

Output:
[0, 6]

Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:
Input:
s: "abab" p: "ab"

Output:
[0, 1, 2]

Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".
  Problem link

Video Tutorial

You can find the detailed video tutorial here


Thought Process

This looks like a string pattern matching problem, we might start go down the KMP or Rabin Karp route (calculate string hash value). However, the unique requirement is they need to be anagrams. We already know several way to compare if two words are anagrams. Sort, bucketize it. Assuming the longer string length is N, the shorter pattern string length is M, if we do it in brute force way, i.e., for each M length substring in N, we check if they are anagrams, it is O(N*M) assuming we have to repopulate all the maps. Obviously we can optimize this by keeping a sliding window only insert and delete one character at a time. That way, it is only O(N*256), still O(N). (note, 256 is the ASCII char length since every time we need to compare if two ASCII char maps are the same or not)


It is also a good problem to have a pattern on coding for two pointers (sliding) window problem.
  • Initialize the two maps
  • Keep a start and end pointer, move one at a time
  • Don't forget to compare the last element

Solutions

Sliding Window using Two Pointers

Time Complexity:O(N), len(s) is N, len(p) is M. We only loop through s once, each time we compare anagrams with a constant 256 loop, thus O(256*N) = O(N)
Space Complexity: O(N)

References

Sunday, November 4, 2018

Leetcode Solution 49: Group Anagrams


Problem Statement 


Given an array of strings, group anagrams together.
Example:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"],
Output:
[
  ["ate","eat","tea"],
  ["nat","tan"],
  ["bat"]
]
Note:
  • All inputs will be in lowercase.
  • The order of your output does not matter.
  Problem link

Video Tutorial

You can find the detailed video tutorial here


Thought Process

Every time we hear about anagrams, we automatically think would relate to how to find two words are anagrams, where we can solve in O(N), easy. That would result in a brute force solution.

Another quick thought would be sort each word and keep a map to track it, however, sorting is also not free.

Now we think harder, how can we avoid the O(MlgM) sorting (M is the word length) as the map's lookup key? Now the question becomes how we can generate a unique value (ideally string) for the same set of characters regardless of its order?  This is similar to the MD5 or SHA1 or hash value of certain object. So we can keep an array of chars, count the chars and translate that array into a string. Note simply count the sum of the array won't work, e.g., "ac" and "bb" will result in the same value.

Solutions

Brute force, for every pair of string in the array, check if they are anagrams. 

Time Complexity:O(M*N^2), M is the word length, and N is the string array length
Space Complexity: O(N)

Sort each string as key

As described above,we sort each string and use it as the lookup key so we can group the anagrams.
Time Complexity: O(N*M*LgM), M is the word length, and N is the string array length
Space Complexity: O(N)

"Hash" each string as key

Kind similar to hashing, we covert each string as key by generating a unique value using linear algorithm instead of sorting NlgN algorithm. 

Time Complexity: O(N*M), M is the word length, and N is the string array length
Space Complexity: O(N)

References