Search This Blog

Wednesday, May 11, 2022

How Netflix onboards new content - Video Processing at scaleπŸŽ₯

Everyday, #Netflix handles billions of requests regarding movies, trailers and other video content. Delivering at such a large scale needs an #engineering marvel. This #video will talk about how Netflix is able to onboard new video content onto their platform. We go from video chunking to collating 4 second shots into scenes.

Amazon S3 is used to store the video chunks. Netflix also provides open connect servers to internet service providers, which acts like a cache of movies. Most requests to Netflix can be served by this cache, and the remaining are sent over the network. This reduces the bandwidth and time required for Netflix to operate at scale. Synergy at it's finest.




Video Formats and Resolutions

  • Different formats
    • High quality
    • Medium quality
    • Low quality
  • Different resolutions
    • 1080p
    • 720p
    • 480p
  • Storage combination comes F x R -> V
  • What netflix does is like broken down a video into smaller parts so that it can be deal with effectively per processor
  • One resolution, one format, one chunk - that's one task
Chunk Processing
  • What they are doing intelligently is breaking the chunks not based on timestamps but based on scenes to have seamless user experience.
  • Based os scenes so you can make instead of 3 min thing, you can make it more fine grained 4 sec each, it's called a shot and you can collate shot, put them all together to create a scene.
  • Prediction algorithm is pretty smart to understand if user is not watching with engaged mode and clicking fwd to see the movie, instead of giving the whole content, it gives only data to the user has asked for because they are probably clicking on different points in that buffer you get.
  • On the other hand, if user if watching in engaged mode, so instead of sending just the part user has asked for, it redictively proactively fetches the future parts, gets onto your computer and shows it you.
Storage
  • Amazon S3 is what Netflix uses to store that video content.
    • This is where people store their static data meaning that you don't change that data.
    • It's extremely cheap compared to a database.
OpenConnect for video caching
  • Netflix servers are usually in the U.S which means they are geographically concentrated and in a place like India which is really far, it's going to take lot of time to send the signal and receive it especially if it's video because there is lot of data which is going to be coming in and it's going to be slow.
  • What Netflix did intelligently to extend the concept of caching and apply it to ISP's. When request comes to ISP for the movie, it looks for local cache say cache for Indian movies.
  • Cache has been called as OpenConnect
  • Lots of bandwidth saved
  • Lots of time saved
  • Much better user experience
  • 90% of the Netflix traffic is taken care by these ISP boxes that they provide.
  • W.r.t new content overnight job can be run to copy the new contents when there is less load on server for the requests.

Tuesday, May 10, 2022

Database Sharding - Key Concepts

Sharding a database is a common scalability strategy used when designing server side systems. The server side system architecture uses concepts like sharding to make systems more scalable, reliable and performant.

Sharding is horizontal partitioning of data according to a shard key. This shard key determines which database the entry to be persisted is sent to. Some common strategies for this are reverse proxies.


Let's take a common example of pizza and break it into slices and call your friends over. Each of your friend is going to get one slice of pizza. What you have done effectively is partitioned the pizza according to each friend's share. Just like that we can have serves which are going to be taking the load of the requests.





How we can convert our tech requirement to the pizza model.

  • Basically each server going to handle requests based on partition say 1-100 userIds on 1st server and 101-200 on 2nd server and so on.
  • This kind of partitioning which uses some sort of a key to break the data into pieces and allocate that to different servers is called horizontal partitioning.
  • Servers which we are talking about here are database servers.
While we are discussing about database tuning, we always have to make sure we are maintaining key attributes of the database.
  • Consistency
  • Availability
What should be shard your data on?
  • We are using userId in our case but in applications like tinder which use location, you could shard on the location and if a person says find me all the users in city X and X may fall in one specific sharda and all you need to do just read through this shard.
Problems doing Sharding
  • Joins between shards
  • Fixed number of shards
    • With hierarchichal sharding approach we can overcome this problem, we can break one shard into sub-shards and there can one master sort of thing which can decide which mini shard request needs to route.
Best Practices
  • Create index on shards
    • This index can be on completely different attribute compared to userId.
      • One of the good example is like find me all the people on NewYork having age greater than 50.
  • Use Master Slave architecture on the shards to avoid SPOF.
    • Read requests can go to slaves.
    • Write requests always goes to master.
    • In case master fails, slaves choose one master among themselves.
Conclusion
  • Conceptually it's easy but in terms of doing practically it's quite tricky because consistency is really tough to do.
  • If starting with a new system, take other mechanisms into consideration like indexing, noSQL databases which internally uses these these kind of concepts.
  • Indexing and ready made solutions would be way to go before to think of implementing sharding by our own.

My Profile

My photo
can be reached at 09916017317