Search This Blog

Sunday, June 5, 2022

Publisher Subscriber Model

Microservices benefit from loose data coupling, which is provided by a publish subscribe model. In this model, events are produced by a publishing service and consumed by downstream services.

Designing the micro service interactions involves event handling and consistency checks. We look into a pub-sub architecture to evaluate it's advantages and disadvantages compared to a request response architecture.



This type of architecture relies on message queues to ensure event passing. An example would be rabbitMQ or Kafka. The architecture is common in real life scenarios and interviews.


If there is no strong consistency guarantee to made for transactions, an event model is good to use in microservices. Here are the main advantages:

  • Decouples a system's services.
  • Easily add subscribers and publishers without informing the other.
  • Converts multiple points of failure to single point of failure.
  • Interaction logic can be moved to services/ message broker.


Disadvantages:

  • An extra layer of interaction slows services
  • Cannot be used in systems requiring strong consistency of data
  • Additional cost to team for redesigning, learning and maintaining the message queues.


This model provides the basis for event driven systems.

Saturday, June 4, 2022

Capacity Planning and Estimation

 


Eg: Estimate the hardware requirements to set up a system like YouTube.

Eg: Estimate the number of petrol pumps in the city of Mumbai.


Let's start with storage requirements:

About 1 billion active users.

I assume 1/1000 produces a video a day.

Which means 1 million new videos a day.


What's the size of each video?

Assume the average length of a video to be 10 minutes. 

Assume a 10 minute video to be of size 1 GB. Or...

A video is a bunch of images. 10 minutes is 600 seconds. Each second has 24 frames. So a video has 25*600 = 150,000 frames.

Each frame is of size 1 MB. Which means (1.5 * 10^5) * (10^6) bytes = 150 GB.

This estimate is very inaccurate, and hence we must either revise our estimate or hope the interviewer corrects us. Normal video of 10 minutes is about 700 MB.


As each video is of about 1GB, we assume the storage requirement per day is 1GB * 1 million = 1 PB. 



This is the bare minimum storage requirement to store the original videos. If we want to have redundancy for fault tolerance and performance, we have to store copies. I'll choose 3 copies. 

That's 3 petabytes of raw data storage.

What about video formats and encoding? Let's assume a single type of encoding, mp4, and the formats will take a 720p video and store it in 480, 360, 240 and 144p respectively. That means approximately half the video size per codec.


If X is the original storage requirement = 1 PB,

We have X + X/2 + X/4 + X/8 == 2*X.

With redundancy, that's 2X * 3 = 6*X.


That's 6 PB(processed) + 3PB (raw)  == 10 PB of data. About 100 hard drives. The cost of this system is about 1 million per day.


For a 3 year plan, we can expect a 1 billion dollar storage price.


Now let's look at the real numbers:

Video upload speed = 3 * 10^4 minutes per minute.

That's 3 * 10^4 *1440 video footage per day =  4.5 * 10^7 minutes.

Video encoding can reduce a 1 hour film to 1 GB. So 1 million GB is the requirement. That's 1 PB.


So the original cost is similar to what the real numbers say.




My Profile

My photo
can be reached at 09916017317