Sharding a database is a common scalability strategy used when designing server side systems. The server side system architecture uses concepts like sharding to make systems more scalable, reliable and performant.
Sharding is horizontal partitioning of data according to a shard key. This shard key determines which database the entry to be persisted is sent to. Some common strategies for this are reverse proxies.
Let's take a common example of pizza and break it into slices and call your friends over. Each of your friend is going to get one slice of pizza. What you have done effectively is partitioned the pizza according to each friend's share. Just like that we can have serves which are going to be taking the load of the requests.
How we can convert our tech requirement to the pizza model.
- Basically each server going to handle requests based on partition say 1-100 userIds on 1st server and 101-200 on 2nd server and so on.
- This kind of partitioning which uses some sort of a key to break the data into pieces and allocate that to different servers is called horizontal partitioning.
- Servers which we are talking about here are database servers.
- Consistency
- Availability
- We are using userId in our case but in applications like tinder which use location, you could shard on the location and if a person says find me all the users in city X and X may fall in one specific sharda and all you need to do just read through this shard.
- Joins between shards
- Fixed number of shards
- With hierarchichal sharding approach we can overcome this problem, we can break one shard into sub-shards and there can one master sort of thing which can decide which mini shard request needs to route.
- Create index on shards
- This index can be on completely different attribute compared to userId.
- One of the good example is like find me all the people on NewYork having age greater than 50.
- Use Master Slave architecture on the shards to avoid SPOF.
- Read requests can go to slaves.
- Write requests always goes to master.
- In case master fails, slaves choose one master among themselves.
- Conceptually it's easy but in terms of doing practically it's quite tricky because consistency is really tough to do.
- If starting with a new system, take other mechanisms into consideration like indexing, noSQL databases which internally uses these these kind of concepts.
- Indexing and ready made solutions would be way to go before to think of implementing sharding by our own.
No comments:
Post a Comment