Search This Blog

Friday, May 20, 2022

Whatsapp System Design - High Level Architecture

 Let's design whatsapp :) 

Prioritized requirements

  • We must implement one-to-one chat messaging.
  • We must also show the users, what stage the message is currently on. (Sent, Delivered and Read Receipts)
  • Groups messaging is also allowed.
  • Users can share image, audio and video files.
  • We will also show the Online/Last seen status of users.
  • Chat will be temporary. (i.e. They will be stored on the client side)

One to One messaging and Read Receipts


Whenever users want to send a message they send a request to our server. This request is received by the gateway service. Then the client applications maintain a TCP Connection with the gateway service to send messages.


Once the server sends the message to the recipient, our system must also notify the sender that the message has been delivered. So we also send a parallel response to the sender that the message has been delivered. (Note: To ensure that message will be delivered we store the message in database and keep retrying till the recipient gets the message.) This takes care of Sent receipts.


When the recipient receives the message it sends a response (or acknowledgement) to our system. This response is then routed to session service. It finds the sender from the mapping and sends the Delivery receipts.


The process to send the Read receipts is also the same. As soon as user reads the message we perform the above process.


Note: The response from the client consists of sender and receiver fields.


Components required

  • Gateway Service
    • This service consists of multiple servers.
    • It will receive all the requests from the users.
    • It maintains the TCP connections with the users.
    • Furthermore, it also interacts with all the internal services.
  • Session Service Gateway service is also distributed. So if we want to send messages from one user to another we must know which user is connected to which gateway server. This is handled by session service. It maps each user (userID) to the particular gateway server.
  • Database All the mappings must be persisted in a non volatile storage. For that we need a database.

Trade-offs

  • Storing the mapping in gateway service v/s Storing it in session service
    • If we store the mapping in gateway service then we can access it faster. To get the mappings from session service we have to make a network call.
    • Gateway services have limited memory. If we store the mapping the gateway we have to reduce the number of TCP connections.
    • Gateway service is distributed. So there are multiple servers. In that case there will be a lot of duplication. Also every time there is an update we have to update the data in each and every server.
  • So we can conclude that storing the mapping in the session service is a better idea.


  • Using HTTP for messaging v/s Websockets (WSS)
    • HTTP can only send messages from client to server. The only way we can allow messaging is by constantly sending request to server to check if there is any new message (Long Polling).
    • WSS is a peer to peer protocol that allows client and server to send messages to each other.
  • As we do not need to constantly send requests to server, using XMPP will be more efficient.

Diagram




Last Seen Timestamps of users


We want to show other users whether any user is online or when was he/she last seen. To implement this we can store a table that contains the userID and the LastSeenTimestamps. Whenever any user makes an activity (like sending or reading message) that request is sent to the server. The time at which the request is sent we update the key value pair. We must also consider the requests sent by the application and not by the user (like polling for messages etc.) These requests do not count as user activity so we won't be logging them. We can have an additional flag (something like application_activity) to differentiate the two.


We also need to define a threshold. If the last seen is below the threshold then instead of showing the exact time difference we will just show online.


For e.g. if the last seen of user X is 3sec and the threshold is 5sec then other users will see X as online.


Components Required

  • Last Seen service Every time there is an user activity it is routed to this service. It persists they key value pair in a non volatile database.
  • Database


Group Messaging


Each group will have many users. Whenever a participant in a group sends a message we first find the list of users present in the group. Once the session service has the list of users it finds the gateway services that the users are connected to and then sends the message.


Note: We should also limit the number of users in a group. If there are a lot of users then it can cause fanout. We can ask the client applications to pull new messages from our system but our messages won't be realtime in such case.

  • We do not want the gateway service to parse messages because we want to minimize the memory usage and maximize the TCP connections. So we will use a message parser to convert the unparsed to sensible message.
  • We have a mapping of groupID to userID. This is one to many relationship. Group messaging service has a multiple servers so there can be data redundancy. In order to reduce redundancy we use consistent hashing. We hash the groupID and send the request to the server according to the result.
  • We also need to use a message queue incase there is any failures while sending requests. Once we give a request to message queue it ensures that message will be sent. If we reach maximum number of retries it tells the sender that it failed and we can notify the user.
  • While sending messages in a group we must take care of three things
    • Retries - Message queue takes care of that.
    • Idempotency - It means that each message should be sent only once. We can achieve this by sending messages to queue at least once but each message will have an unique ID. If the service has already seen the ID then it means that message is already sent so the new message is ignored.
    • Ordering of messages - Messages must be ordered by the their timestamps in a group. To ensure this we always assign the messages of a particular group to a particular thread from the thread pool.

Components Required

  • Group Messaging service It stores the mapping of groupID to userID and provides this data to the session service.
  • Message parser service It receives the unparsed message from the gateway service and converts it to sensible format before sending it to other services.
  • Message queue

Diagram



Sending Image, Audio and Video files


We can use a distributed file service to store the files as they are much more efficient and cost effective compared to storing images as BLOBs in database. Every time an user sends an image we can store it in file service and when we can get the image when we need to send it.


Components required

  • Distributed File System

Diagram



Some more optimizations

  • Graceful degradations On some occasions our system might get so many messages that our systems get overloaded. In such cases we can temporarily shut down services that are not critical to our service (like sending read receipts or last seen status etc).
  • Rate Limiting In some situations it might happen that we cannot handle any more requests. In such cases we can rate limit the number of requests and drop extra requests. However this results in bad user experience.

Happy Learning :) 

No comments:

My Profile

My photo
can be reached at 09916017317