Search This Blog

Thursday, June 16, 2022

SPOF (Single Point of Failure)

A single point of failure(SPOF) in computing is a critical point in the system whose failure can take down the entire system. A lot of resources and time is spent on removing single points of failure in an architecture/design. 



Single points of failure often pop up when setting up coordinators and proxies. These services help distribute load and discover services as they come and leave the system. Because of the critical centralized tasks of these services, they are more prone to being SPOFs.


One way to mitigate the problem is to use multiple instances of every component in the service. The graph of dependencies then becomes more flexible, allowing the system to resiliently switch to another service instead of failing requests.


Another approach is to have backups which allow a quick switch over on failure. The backups are useful in components dealing with data, like databases.


Allocating more resources, distributing the system and replication are some ways of mitigating the problem of SPOF. Hence designs include horizontal scaling capabilities and partitioning.

Wednesday, June 15, 2022

CDN (Content Delivery Network) Explained

 Let's discuss about CDN in details and below are the points you would consider generally while serving static pages, images etc.

Use-case

  • An example of your server serving static and dyanamic html pages, images etc.

Caching

  • To make it more fast and efficient, first approach you would take is cache the details and serve it accordingly.

Device Customised Data

  • Different type of html pages and images that would serve to different devices (desktop, mobile etc). Let's say 5 diff devices and 100 diff countries so 1000 diff data points to be served by cache.

Performance Consideration

  • You want to serve your pages fast to the users else they would lose interest in the product if it takes time to load.
Global Cache
  • You would cache the information outside server to serve the content fast and to avoid single point of failure, you would go with distributed cache and data is spread across multiple servers of cache.
Shard Caches
  • To serve the requests even faster considering many combinations like 1000 data points discussed above, you might go with sharding and shard it based on locations, countries etc and diff type of request like probably request from US would go to diff set of cache box which would serve US related requests etc.
Localized Caches
  • One more problem would be let's say company is from US and cache servers are sitting in US and our user-base is across countries so to serve requests efficiently for diff countries etc we have to make localised cache say for India one data centre in India to serve requests and so on.
Why should you use a CDN?
  • Well If you have to design by your own, you have to take case of all above points and majorly concluded as below
    • Available in different countries
    • Follows regulations
    • Serves the latest content
What are the benefits of a CDN?
  • Specialised solution like CDN takes care of all of the above points and you can focus on your business logics to expand it further.
    • One of the good example if Akamai and specialiases in as below
      • Hosting boxes close to the users.
      • Follow regulations
      • Allow posting content in the boxes via UI.
      • Expiry time in CDNs
        • Something like sometime you need cache for 60 sec or 60 min only etc. Everything handled provided via UI.
    • Another good example for the same is Amazon S3.
      • Super cheap
      • Very reliable
      • Easy to use

In details explanation of CDN : https://learnwithnitin.blogspot.com/2014/02/content-delivery-network-cdn.html


Happy Learning :) 

Sunday, June 5, 2022

Why do Databases fail? AntiPatterns to avoid !

Databases are often used to store various types of information, but one case where it becomes an a problem is when being used as a message broker.

The database is rarely designed to deal with messaging features, and hence is a poor substitute of a specialized message queue. When designing a system, this pattern is considered an anti pattern. 




Here are possible drawbacks:

  • Polling intervals have to be set correctly. Too long makes the system is inefficient. Too short makes the database undergo heavy read load.
  • Read and write operation heavy DB. Usually, they are good at one of the two.
  • Manual delete procedures to be written to remove read messages.
  • Scaling is difficult conceptually and physically.


Disadvantages of a Message Queue:

  • Adds more moving parts to the system.
  • Cost of setting up the MQ along with training is large.
  • Maybe be overkill for a small service.


It is important to be able to reason why or why not a system needs a message queue. These reasons allow us to argue on the merits and demerits of the two approaches.


However, there are blogs on why Databases are perfectly fine as message queues too. A deep understanding of the pros and cons helps evaluate how effective they would be for a given scenario. 


In general, for a small application, databases are fine as they bring no additional moving part to the system. For complex message sending requirements, it is useful to have an abstraction such as a message queue handle message delivery for us.

Publisher Subscriber Model

Microservices benefit from loose data coupling, which is provided by a publish subscribe model. In this model, events are produced by a publishing service and consumed by downstream services.

Designing the micro service interactions involves event handling and consistency checks. We look into a pub-sub architecture to evaluate it's advantages and disadvantages compared to a request response architecture.



This type of architecture relies on message queues to ensure event passing. An example would be rabbitMQ or Kafka. The architecture is common in real life scenarios and interviews.


If there is no strong consistency guarantee to made for transactions, an event model is good to use in microservices. Here are the main advantages:

  • Decouples a system's services.
  • Easily add subscribers and publishers without informing the other.
  • Converts multiple points of failure to single point of failure.
  • Interaction logic can be moved to services/ message broker.


Disadvantages:

  • An extra layer of interaction slows services
  • Cannot be used in systems requiring strong consistency of data
  • Additional cost to team for redesigning, learning and maintaining the message queues.


This model provides the basis for event driven systems.

Saturday, June 4, 2022

Capacity Planning and Estimation

 


Eg: Estimate the hardware requirements to set up a system like YouTube.

Eg: Estimate the number of petrol pumps in the city of Mumbai.


Let's start with storage requirements:

About 1 billion active users.

I assume 1/1000 produces a video a day.

Which means 1 million new videos a day.


What's the size of each video?

Assume the average length of a video to be 10 minutes. 

Assume a 10 minute video to be of size 1 GB. Or...

A video is a bunch of images. 10 minutes is 600 seconds. Each second has 24 frames. So a video has 25*600 = 150,000 frames.

Each frame is of size 1 MB. Which means (1.5 * 10^5) * (10^6) bytes = 150 GB.

This estimate is very inaccurate, and hence we must either revise our estimate or hope the interviewer corrects us. Normal video of 10 minutes is about 700 MB.


As each video is of about 1GB, we assume the storage requirement per day is 1GB * 1 million = 1 PB. 



This is the bare minimum storage requirement to store the original videos. If we want to have redundancy for fault tolerance and performance, we have to store copies. I'll choose 3 copies. 

That's 3 petabytes of raw data storage.

What about video formats and encoding? Let's assume a single type of encoding, mp4, and the formats will take a 720p video and store it in 480, 360, 240 and 144p respectively. That means approximately half the video size per codec.


If X is the original storage requirement = 1 PB,

We have X + X/2 + X/4 + X/8 == 2*X.

With redundancy, that's 2X * 3 = 6*X.


That's 6 PB(processed) + 3PB (raw)  == 10 PB of data. About 100 hard drives. The cost of this system is about 1 million per day.


For a 3 year plan, we can expect a 1 billion dollar storage price.


Now let's look at the real numbers:

Video upload speed = 3 * 10^4 minutes per minute.

That's 3 * 10^4 *1440 video footage per day =  4.5 * 10^7 minutes.

Video encoding can reduce a 1 hour film to 1 GB. So 1 million GB is the requirement. That's 1 PB.


So the original cost is similar to what the real numbers say.




My Profile

My photo
can be reached at 09916017317