Saturday, March 28, 2020

Leetcode solution 190: Reverse Bits

Problem Statement 

Reverse bits of a given 32 bits unsigned integer.

Example 1:
Input: 00000010100101000001111010011100
Output: 00111001011110000010100101000000
Explanation: The input binary string 00000010100101000001111010011100 represents the unsigned integer 43261596, so return 964176192 which its binary representation is 00111001011110000010100101000000.
Example 2:
Input: 11111111111111111111111111111101
Output: 10111111111111111111111111111111
Explanation: The input binary string 11111111111111111111111111111101 represents the unsigned integer 4294967293, so return 3221225471 which its binary representation is 10111111111111111111111111111111.

  • Note that in some languages such as Java, there is no unsigned integer type. In this case, both input and output will be given as signed integer type and should not affect your implementation, as the internal binary representation of the integer is the same whether it is signed or unsigned.
  • In Java, the compiler represents the signed integers using 2's complement notation. Therefore, in Example 2 above the input represents the signed integer -3 and the output represents the signed integer -1073741825.

Follow up:
If this function is called many times, how would you optimize it?
Problem link

Video Tutorial

You can find the detailed video tutorial here

Thought Process

Use this problem to bring awareness of bit manipulation in interviews, it's rare but sometimes do get asked by some interviewers.

We can directly use Java's Integer.reverse() method, not suitable for interview purpose.

Simple simulation, reading bits from the right most and adding them up while shifting left.

Regarding the follow up, caching is an obvious solution. There are two ways to cache
  • Cache all the integers,  2^32 = 4294967296, total memory needed is 2^32*4B = 16GB roughly, easily handled by one server 
  • Cache each byte, divide the 32 bits into 4 bytes. Memory footprint is only 2^8*1B = 256B, significantly smaller than 16GB
If you are curious on how Java SDK implemented this method, it uses the algorithm in Hackers Delight book section 7-1
 1195       /**
 1196        * Returns the value obtained by reversing the order of the bits in the
 1197        * two's complement binary representation of the specified {@code int}
 1198        * value.
 1199        *
 1200        * @return the value obtained by reversing order of the bits in the
 1201        *     specified {@code int} value.
 1202        * @since 1.5
 1203        */
 1204       public static int reverse(int i) {
 1205           // HD, Figure 7-1
 1206           i = (i & 0x55555555) << 1 | (i >>> 1) & 0x55555555;
 1207           i = (i & 0x33333333) << 2 | (i >>> 2) & 0x33333333;
 1208           i = (i & 0x0f0f0f0f) << 4 | (i >>> 4) & 0x0f0f0f0f;
 1209           i = (i << 24) | ((i & 0xff00) << 8) |
 1210               ((i >>> 8) & 0xff00) | (i >>> 24);
 1211           return i;
 1212       }


Simple simulation with follow up question

Time Complexity: O(N), where N is number of bits in the input integer
Space Complexity: O(1), no extra space needed 


Saturday, March 21, 2020

System design interview: how to design comments and reply, likes button and total views on Youtube

System design interview: how to design comments and reply, likes button and total views on Youtube

Methodology: READ MF!

[Originally from the Post: System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)]
Remind ourselves with the "READ MF!" methodology.

This is a follow up on the previous post: System design interview: how to design a video platform (e.g, Youtube, Netflix)


First, let's quickly about requirements. Likes and views are relatively straightforward, users can torllerate a bit delay and inaccuracy. For the majority case, as long as user clicks the button, it pluses one, then it's fine. Sometimes it's not the case if videos have too many likes, e.g., if a Youtube video already has 10K likes, you plus 1, it still shows 10K, it's just the UI tricks you that it's toggled.

For comments reply, there are a few different styles. The major ones such as Youtube, Instagram and TikTok uses following style. It displays comments (directly reply to video) based on order of likes and timestamp (descending) and any reply to comments are only 1 on 1, meaning A@B How are you, then B@A I am fine thank you and you? There is no more indentation needed.

Reddit uses the "block building" reply-to style (中文俗称"盖楼"), where it shows which reply replies to which reply, and it needs to show the indentations about those replies.


For viral videos, say normally it has around 10M views
For likes, assuming 20% people liked a video, 10M * 20% = 2M likes
For comments, 1% people would leave a comment (we are lazy) 10M * 10% = 100K comments

For the majority normal videos, it would probably has 1000 views and 100 likes top and maybe 10 comments, a relationship DB could solve it pretty well.


Key designs and terms 

Comments design

If you start building your product, just bootstrap it with a relational DB
Introduce a comments table shard by video UUID, add a reply_to_uuid to know which comment the reply is targeted to and leave it null for root comment. Build an index on the reply_to_uuid

Select * from comments where reply_to_uuid is null order by comment likes desc, timestamp desc

If you need to see the replies to those comments, just

Select * from comments where reply_to_uuid is the_target_comment_uuid order by comment likes desc, timestamp desc

Even if your product becomes Youtube scale, the comments would be around 100K for viral videos, the above solution would still works fine. Simply add more capacity to better shard your comments using consistent hashing, cache the comments would do the trick.

If you need to build the Reddit tree structure, just sort it in memory. If the problem can fit into memory, it becomes much easier.

The extreme case is your comments section becoming a chat, then we can do something like an append only in memory DB or redis cache keep appending the values to the queue with async backup to DB.

Views and Likes count design

Similarly, when you bootstrap the project, keep a counter in DB or in memory cache solves your problem when traffic is low. If within one machine, you don't even need locks just use compare and swap (CAS), atomic operations for counting, thread safe.

If your product starts to become popular, add more capacity using consistent hashing. Add in memory cache like Redis to count the values (memory access time 100us vs disk access time 10ms. 100Kx improvement). Could be further optimized using distributed counter, aggregating the results together when read.

If you product becomes YouTube scale, then use offline counting. Build a pipeline to promote the videos from cold to hot/viral once the view counts hit a certain threshold (say 1M). Use async messaging like Kafka to ingest from those logs and pump it to data warehouse, query it and update the values on a cron schedule. Of course on the UI side, you need to toggle the like button, plus 1 if needed (Sometimes you would see a 100K likes video, even if clicked the like, the count would not be increased)

 Baozi Youtube Video

References (Credits to original authors)

Saturday, March 14, 2020

包子聊天系列 - Perf Review到底怎么回事儿?如何拿到好的Rating?

包子聊天系列 - Perf Review到底怎么回事儿?如何拿到好的Rating?

大家没事儿的时候当做podcast听吧,对于new grad新兵快速往上升职还是有帮助的。我们不是大佬,说的也不一定都对,但是我们确实是老呀,So show some respect son!😃😃😃😃

Tuesday, March 10, 2020


【作者】 包子小旋风老师  【包子铺里聊IT 微信公众号
如果你没有绿卡,可以买,但比较麻烦,你必须要有打猎证才可以买枪。获取打猎证需要经过hunter education,可以选择线下上课2,3天,或者线上课程和线下半天课,然后通过多选题的考试。大概是100题要对90题以上。
如果你有绿卡,你只需要获取Firearm Safety Certificate,你可以上网搜搜题库,然后打电话去你附近的枪店,说你想要考这个证,有时可以随时答题,有时要预约,去店里做多选题,大概是30题主要对25以上就通过了,题目是基本安全须知。
什么枪适合家防Home Defense?
对于第一把枪,大多数菜鸟会选择一把便宜的,400-500之间。推荐买最popular的大品牌的枪,大品牌的枪可靠性更高。枪的品牌非常多,较熟悉的散弹枪品牌有Remington, Winchester, Benelli, Beretta, Browning, Mossberg, Ruger, Marlin等等,这些牌子的入门级shotgun都可以入手。包子君自用的是史上最Popular的Remington 870。在很多电影里都出现它,新手必备。
手枪的话,常见的品牌除了上面这些散弹枪品牌,还有Glock, Sig, Smirth&Wesson, CZ USA等等。其中最Popular的是Glock 19,口径9mm的。
选定了一款散弹枪后,需要注意三个参数:第一个是口径也叫gauge或者caliber,代表枪管的直径,常见的是12gauge和20guage,12的子弹比20的粗,火药更多后坐力更大,钢珠更多,枪身也更重建议175以上的男生用12gauge,身材小一点的男生和女生,都适合20gauge。如果你让你的女票或老婆去打12gauge,那么从今以后去靶场,她肯定不会跟你一起了...第二个枪管的长度barrel length。默认的散弹枪的长度大多28寸,家防的应用场景,需要枪管更短才灵活一般用18寸左右的。如果你想回头拿去打猎,那你需要换个28-30的枪管来增加有效距离第三个参数是子弹的长度和钢珠的大小,最简单的方法是告诉你是用来家防的,让他推荐子弹,别自己去选,多半会选错。
去你附近的枪店(连锁的Sportsman Warehouse, BassPro, Big 5, Dicks或者gun range),带上你的real ID的驾照或者护照,绿卡或Visa,和上面说的证件就可以去挑选枪支了,选好后,要做背景调查(费用37.5),填很多的表格之后,过10天之后去店里取。和任何购物一样,枪也时常会有打折,你可以在这些店的网站上刷deal和评论。
绝对不是,你需要练习,练习,练习!千万不要觉得自己有了枪,遇到坏人拿出来就能把他吓走。你一定要去靶场射击练习至少10次。试想,坏人闯入你家,你迅速的拿到了枪,情况紧急,你如果不能2秒内架好枪,完成瞄准,随时可以射击来形成威慑,坏人很有可能已经射击或者冲到你的面前夺枪。射击和所有运动一样,多练习就会提高,如果你没有个1000发射击经验,遇到紧急情况,肯定无法发挥作用,还不如没有枪,可以去专心考虑逃跑或者别的方法脱困。湾区附近打飞盘和手枪的地方包子君推荐Sunnyvale Rod & Gun Club,这里的工作人员都是Volunteer,对新手非常友好。


新冠肺炎来袭, 视频onsite指南

【作者】 包子霹雳火老师  【包子铺里聊IT 微信公众号

这两天在北美最火的话题应该就是出门要不要戴口罩,以及今天要不要去公司上班了,小编连续在家work from home了两天就感觉得一个人上班是多么的孤单寂寞冷,无比想念大家一起在公司上班有说有笑,谈天谈地的日子。

随着新冠肺炎目前在北美有愈演愈烈之势,再加上蜜汁自信的美国政府跟毫无能力的基础医疗,很多大公司已经开始鼓励自己的员工work from home同时也有一些公司开始取消onsite面试来减少病毒传播的风险。



  •  视频onsite如何更好地做自我介绍
    • 很多小伙伴可能把自己的简历准备得很是熟练,尤其是介绍past experience的时候很多时候在onsite中都可以通过画图的方式来体现自己的projectimpactcomplexity. 但是当onsite转到视频之后这些准备好的图可能一下就无法展现出来了,这时候小伙伴就应该换一种思维去想面试官想要从你的自我介绍中得到什么?
    • 自我介绍一般都是面试官给面试者一个熟悉环境,放松的时间,所以从面试官本身来讲更多是期待面试者能够自然地总结自己过去的经历。这个时候就需要面试者能够有以下的一些能力
      • High-level 讲述自己组或者自己负责serviceresponsibilitybusiness impact
      • 指出在整个system中或者整个project中自我的contribution. 这里的contribution不仅仅是写的code, 还可以是很多自己driveprocess
      • Detail-oriented to key components, 最后面试者要能够详细的讲出自己contribution里面的technical details。这一步一般在视频onsite的时候会被省略掉,但是如果面试官问起来,面试者还需要能够用最简单明了的语言去描述。
    • 总上所述,面试者在准备自我介绍的过程中,可以试着把自我介绍介绍给非CS的小伙伴,看看别人是否能大概知道你在公共做或者学校project里有什么样的作用.

  • 视频onsite如何更好地做online coding
    • 在视频onsite过程中,对于面试这最熟悉的应该就是coding的部分了但是跟普通onsite不同,在视频过程中你的表情,说话语气,包括神态都会放大般的展现在面试官的面前。所以在此,小编要给大家提醒一下几点
    • 不要自言自语,尤其是说中文。因为在小编过往模拟面试的过程中,有些面试者还时不时得自言自语两句中文,这在视频面试中是一个很大的red flag.
    • 跟面试官沟通,沟通,沟通!!!重要的事情说三次。Onsite面试的时候有些面试者喜欢那道题之后自己思考,但是在视频面试中,面试者一定要学会think loud and keep talking. 要让面试官知道你的想法是什么,你的思路是否正确,或者你stuck在哪里。对于面试官来讲,面试的过程是一个选择自己潜在同事的过程,而不是一个考试的过程,所以面试官注重的一点就是能否跟面试者有有效的沟通。
    • Take feedback. 在一般onsite面试过程中,如果面试者的solution有问题,面试官都会给一些hint或者一些feedback通过各式各样的交流。但是在视频面试过程中,面试官可能就会通过举反例或者提问的方式向面试者提供一些feedback,这时候面试者就应该去jump out然后去想这些feedback是否对你的solution有用.
    • 最后,还是要鼓励大家熟悉一些自己coding language. Online coding有些公司是需要跑code的,所以对自己的language一定要熟悉,小编见过很多人喜欢用python只是因为简单好写,但是到了online coding的部分,如果你选了你不熟悉的语言,不知道怎么debug,那么从面试官的角度来讲就是你technical skills不过关了。

  • 视频onsite如何更好地做system design
    • 对于System design的问题视频onsite确实会有很大的挑战。首先面试者没有自己擅长的白板,这样很多架构图就无法展现出来。这时候对于面试者的挑战就变成了如何清晰地展现自己处理问题的思路和如何利用好bullet point来表达自己的观点了。
    • 通常来讲,面试者先要把system designrequirement清晰地记录下来,不明确的要去跟面试官沟通,把需求确定好。
    • 其次就是要把整个system分成不同的部分,这时候就要用bullet point把自己的思维一步步得展现给面试官。就当自己在写ppt的概要
    • 做完上述两步之后,面试者就可以跟面试官讨论然后着重挑其中的一个componentdetail design. 比如db schema, API signature 然后customer use scenario , error handling. 这些都可以用简单的statement写好
    • 最后面试者需要做出一些summary的总结,列举出pros and cons.向面试官展现出一个完整的设计思路

  • 视频onsite如何更好地做behavior question
    • 对于behavior 来说,应该是对于视频onsite来说最直观的了。重要的一点就是管理好自己的表情,表达方式,毕竟onsite面试官看到的是一个整体的人,而在视频onsite中看到的只是面试者的脸。所以任何小的问题都会被放大。