Search This Blog

Friday, May 20, 2022

Whatsapp System Design - High Level Architecture

 Let's design whatsapp :) 

Prioritized requirements

  • We must implement one-to-one chat messaging.
  • We must also show the users, what stage the message is currently on. (Sent, Delivered and Read Receipts)
  • Groups messaging is also allowed.
  • Users can share image, audio and video files.
  • We will also show the Online/Last seen status of users.
  • Chat will be temporary. (i.e. They will be stored on the client side)

One to One messaging and Read Receipts


Whenever users want to send a message they send a request to our server. This request is received by the gateway service. Then the client applications maintain a TCP Connection with the gateway service to send messages.


Once the server sends the message to the recipient, our system must also notify the sender that the message has been delivered. So we also send a parallel response to the sender that the message has been delivered. (Note: To ensure that message will be delivered we store the message in database and keep retrying till the recipient gets the message.) This takes care of Sent receipts.


When the recipient receives the message it sends a response (or acknowledgement) to our system. This response is then routed to session service. It finds the sender from the mapping and sends the Delivery receipts.


The process to send the Read receipts is also the same. As soon as user reads the message we perform the above process.


Note: The response from the client consists of sender and receiver fields.


Components required

  • Gateway Service
    • This service consists of multiple servers.
    • It will receive all the requests from the users.
    • It maintains the TCP connections with the users.
    • Furthermore, it also interacts with all the internal services.
  • Session Service Gateway service is also distributed. So if we want to send messages from one user to another we must know which user is connected to which gateway server. This is handled by session service. It maps each user (userID) to the particular gateway server.
  • Database All the mappings must be persisted in a non volatile storage. For that we need a database.

Trade-offs

  • Storing the mapping in gateway service v/s Storing it in session service
    • If we store the mapping in gateway service then we can access it faster. To get the mappings from session service we have to make a network call.
    • Gateway services have limited memory. If we store the mapping the gateway we have to reduce the number of TCP connections.
    • Gateway service is distributed. So there are multiple servers. In that case there will be a lot of duplication. Also every time there is an update we have to update the data in each and every server.
  • So we can conclude that storing the mapping in the session service is a better idea.


  • Using HTTP for messaging v/s Websockets (WSS)
    • HTTP can only send messages from client to server. The only way we can allow messaging is by constantly sending request to server to check if there is any new message (Long Polling).
    • WSS is a peer to peer protocol that allows client and server to send messages to each other.
  • As we do not need to constantly send requests to server, using XMPP will be more efficient.

Diagram




Last Seen Timestamps of users


We want to show other users whether any user is online or when was he/she last seen. To implement this we can store a table that contains the userID and the LastSeenTimestamps. Whenever any user makes an activity (like sending or reading message) that request is sent to the server. The time at which the request is sent we update the key value pair. We must also consider the requests sent by the application and not by the user (like polling for messages etc.) These requests do not count as user activity so we won't be logging them. We can have an additional flag (something like application_activity) to differentiate the two.


We also need to define a threshold. If the last seen is below the threshold then instead of showing the exact time difference we will just show online.


For e.g. if the last seen of user X is 3sec and the threshold is 5sec then other users will see X as online.


Components Required

  • Last Seen service Every time there is an user activity it is routed to this service. It persists they key value pair in a non volatile database.
  • Database


Group Messaging


Each group will have many users. Whenever a participant in a group sends a message we first find the list of users present in the group. Once the session service has the list of users it finds the gateway services that the users are connected to and then sends the message.


Note: We should also limit the number of users in a group. If there are a lot of users then it can cause fanout. We can ask the client applications to pull new messages from our system but our messages won't be realtime in such case.

  • We do not want the gateway service to parse messages because we want to minimize the memory usage and maximize the TCP connections. So we will use a message parser to convert the unparsed to sensible message.
  • We have a mapping of groupID to userID. This is one to many relationship. Group messaging service has a multiple servers so there can be data redundancy. In order to reduce redundancy we use consistent hashing. We hash the groupID and send the request to the server according to the result.
  • We also need to use a message queue incase there is any failures while sending requests. Once we give a request to message queue it ensures that message will be sent. If we reach maximum number of retries it tells the sender that it failed and we can notify the user.
  • While sending messages in a group we must take care of three things
    • Retries - Message queue takes care of that.
    • Idempotency - It means that each message should be sent only once. We can achieve this by sending messages to queue at least once but each message will have an unique ID. If the service has already seen the ID then it means that message is already sent so the new message is ignored.
    • Ordering of messages - Messages must be ordered by the their timestamps in a group. To ensure this we always assign the messages of a particular group to a particular thread from the thread pool.

Components Required

  • Group Messaging service It stores the mapping of groupID to userID and provides this data to the session service.
  • Message parser service It receives the unparsed message from the gateway service and converts it to sensible format before sending it to other services.
  • Message queue

Diagram



Sending Image, Audio and Video files


We can use a distributed file service to store the files as they are much more efficient and cost effective compared to storing images as BLOBs in database. Every time an user sends an image we can store it in file service and when we can get the image when we need to send it.


Components required

  • Distributed File System

Diagram



Some more optimizations

  • Graceful degradations On some occasions our system might get so many messages that our systems get overloaded. In such cases we can temporarily shut down services that are not critical to our service (like sending read receipts or last seen status etc).
  • Rate Limiting In some situations it might happen that we cannot handle any more requests. In such cases we can rate limit the number of requests and drop extra requests. However this results in bad user experience.

Happy Learning :) 

Tuesday, May 17, 2022

Distributed Caching - Key Features

Caching in distributed systems is an important aspect for designing scalable systems. We first discuss what is a cache and why we use it. We then talk about what are the key features of a cache in a distributed system.

The cache management policies of LRU and Sliding Window are mentioned here. For high performance, the cache eviction policy must be chosen carefully. To keep data consistent and memory footprint low, we must choose a write through or write back consistency policy.


Cache management is important because of its relation to cache hit ratios and performance. We talk about various scenarios in a distributed environment.



Use-cases of Cache

  • Save network calls
  • Avoid recomputations
  • Reduce db load
Store everything in cache?
  • As we know response times are much faster for response time to fetch details from cache insetad of db so does that mean we can store lot of data in cache?
    • Well you can't do for mutiple reasons
      • Firstly hardware on which cache runs is usually much more expensive than that of a normal database.
      • Secondly if you store ton of data in cache then search time will increase and as seacrh time keeps on increasing, it makes lesser sense to use the cache.
When to load and evict data from cache?
  • It's entirely depends on our cache policy we use.
    • First popular policy called as LRU (Least Recent Used).
      • kick out bottom most entries.
        • As an example if celebrity made a post/comment, ppl would want to load that and slowly it would be least used one
    • There is one more LFU (Least Frequently Used) but it's not frequently used in real world mostly :)
What problems poor eviction policy can cause?
  • Imaging you are asking for something from cache and it says I don't have it most of the time and you again going to ask DB so you are making more network calls.
    • So the first problem is Extra Calls.
  • Second problem when you have very small cache and imaging making entry for X and then making entry for Y and deleting for X.
    • This concept is called Thrashing.
  • Data Consistency
    • As an example server2 makes an update call and update the DB and now if server1 asks for X profile but it will fetch outdated profile. (would be even severe in terms of passwords updattion etc)


Where cache can be placed?
  • It can be placed closed to the database or can be placed close to the server.
    • There are benefits for both and drawbacks for both.
  • If you want to place close to the server, how close you can place, well you can place it in memory itself.
    • If you do this, amount of memory in your server is going to be used up by your cache.
      • If number of results is really small and you need to save on the network calls then you can just keep it in memory.
      • If let's say server2 fails, it's in-memory cache also fails
      • What if data on S1 and data on S2 are not consistent that means they are not in sync.
  • Putting cache near to db is like global cache.
    • In this case even if S2 crashes, S1 will keep serving requests and there won't be any data inconsistency.
    • Although it will be slightly slower but it's more accurate.
    • You can also scale this independently and servers would be resilient too.

How to make sure data is consistent in cache?
  • There are two approaches to achieve it
    • Write-through
      • You will make update entry in the cache and update further to the database.
      • Possible problems when servers having in-memory cache and let's say S1 making an update call and updated cache bur data would be inconsistent in S2 cache.
    • Write-back
      • Once you hit the database, make sure you make an entry in the cache.
      • Possible problem in write-back is performance
  • Both approaches having advantages and disadvantages.
    • Hybrid sort of solution would be best based on the use-cases.

Happy Learning :) 

Thursday, May 12, 2022

Appium Architecture - Core Concepts

What is Appium?

It’s a NodeJS based open-source tool for automating mobile applications. It supports native, mobile web, and hybrid applications on iOS mobile, Android mobile, and Windows desktop platforms.

Using Appium, you can run automated tests on physical devices or emulators, or both.



Let’s understand the above Appium architecture diagram.

  • Appium is a client-server architecture. The Appium server communicates with the client through the HTTP JSONWire Protocol using JSON objects.
  • Once it receives the request, it creates a session and returns the session ID, which will be used for communication so that all automation actions will be performed in the context of the created session.
  • Appium uses the UIAutomator test framework to execute commands on Android devices and emulators.
  • Appium uses the XCUITest test framework to execute commands on Apple mobile devices and simulators.
  • Appium uses WinAppDriver to execute commands for Windows Desktop apps. It is bundled with Appium and does not need to be installed separately.

Appium - Android visual interaction flow

Let’s understand the interaction flow between the code and the Android device via the Appium server.

  • The client sends the request to the Appium server through the HTTP JSONWire Protocol using JSON objects.
  • Appium sends the request to UIAutomator2.
  • UIAutomator2 communicates to a real device/simulator using bootstrap.jar which acts as a TCP server.
  • bootstrap.jar executes the command on the device and sends the response back.
  • Appium server sends back the command execution response to the client.

Appium - iOS visual interaction flow

Let’s understand the interaction flow between the code and the iOS device via the Appium server.

  • The client sends the request to the Appium server through the HTTP JSONWire Protocol using JSON objects.
  • Appium sends the request to XCUITest.
  • XCUITest communicates to a real device/simulator using bootstrap.js which acts as a TCP server.
  • bootstrap.js executes the command on the device and sends the response back.
  • The Appium server sends the command execution response to the client.

Whiteboard Sessions
  • IOS flow architecture

  • Android flow architecture

  • Drivers which appium supports
    • UI Automator2 (Android)
    • Espresso (Android)
    • WinApp (Windows)
    • MAC Driver (Mac OS)
    • XCUITest (IOS above 9.3 version)
    • UI Automation (IOS below 9.3 version)
    • Tizen (for samsung)


Happy Learning :) 

Wednesday, May 11, 2022

System Design : Tinder as a microservice architecture

Requirements

Prioritized requirements
  • System should store all the relevant profile data like name, age, location and profile images.
  • Users should be recommended matches based on their previous choices.
  • System should store the details when a match occurs.
  • System should allow for direct messaging between two users if they have matches.
Requirements not a part of our design
  • Allowing moderators to remove profile.
  • Payment and subscriptions for extra features.
  • Allowing only limited number of swipes to users without subscription

Estimation

  • We will store 5 profile images per users.
  • We are assuming the number of active user is 10 million.
  • We are assuming the number of matches is 0.1% of total active users i.e., 1000 matches daily.

Requirement 1: Profile creation, authentication and storage

Description

First the system should allow a new user to create an account and once the account has been created then it needs to provide the user with an authentication token. This token will be used by the API gateway to validate the user.

System needs to store profile name, age, location and description in a relational database. However, there are 2 ways to store images.

  • We can store images as file in File systems.
  • We can store images as BLOB in databases.

Components required

  • API Gateway Service
    • It will balance load across all the instances.
    • It will interact with all the services
    • It will also validate authentication token of the users
    • It will redirect users to the required service as per their request.
  • Distributed File System to store images
    • We will use CDN to serve the static images faster.
  • Relational database to store user information
  • We will store the user ID, name, age, location, gender, description, (user preferences) etc.

Trade-offs

  • Storing images as File v/s Storing images as BLOB

          Features provided by database when store images as BLOB

      • Mutability : When we store an image as a BLOB in database we can change its value. However, this is useless as because an update on image is not going to be a few bits. We can simply create a new image file.
      • Transaction guarantee : We are not going to do an atomic operation on the image. So this feature is useless.
      • Indexes : We are not going to search image by its content (which is just 0's and 1's) so, this is useless.
      • Access control : Storing images as BLOB's in database provides us access control, but we can achieve the same mechanisms using the file system.
            Features provided by file system when store images as files
    • They are comparatively cheap.
    • They are comparatively faster because they store large files separately.
    • Files are static, so we can use CDN for faster access.
  • So for storing user images we will use Distributed File System

      Diagram

      Requirement 2: One to one Chat messaging

      Description

      System should allow one to one chat messaging between two users if and only if they have matched. So we have to connect one client to another.

      To achieve this we use XMPP protocol that allows peer to peer communication.

      Components required

      • Relational database

        • We will use this database to store the user ID and connection ID
      • Cache

        • We do not want to access database every time a client sends a message, so we can use a cache to store the user ID and connection ID.

      Trade-offs

      • Use of HTTP for chat v/s Use of XMPP for one to one messaging
        • When we use HTTP XMPP we can only message from client to server. The only way we can allow messaging is by constantly sending request to server to check if there is any new message (polling).
        • XMPP is a peer to peer protocol that allows client and server to send messages to each other.
        • As we do not need to constantly send requests to sever, using XMPP will be more efficient.

      Diagram

      Requirement 3: Matching right swiped user

      Description

      Server should store the following information

      • Users who have matched (both have right swiped each other)
      • One or both the users have left swiped each other.

      This service would also allow the chat service to check if the users are matched and then allow one to one chat messaging.

      Components required

      • Relational database

      • We will use this database to store the user IDs of both the user
      • We will use indexes on the user ID to make queries faster.

      Trade-offs

      • Storing match details on the client v/s Storing match details on the server

        • One benefit of storing the match details on the client we save storage on the server side. However, as we are storing only the user IDs it is not significant.
        • If we store match data on client side then all data is lost when user uninstalls the applications but if we store it on the server then the data is not lost.
        • Benefit of storing the details on the server side is that it becomes a source of truth. And as the details on the server cannot be tampered so, it is more secure.
        • So we store the relevant details on the server side

      Diagram

      Requirement 4: Server recommendations to users

      Description

      Server should be able to recommend profiles to users. These recommendations should take into consideration the age and gender preferences. Server should also be able to recommend profiles that are geographically close to the user.

      Components required

      • Relational database
      • We can do horizontal partitioning (sharding) on the database based on the location. Also, we can put indexes on the name and age, so we can do efficient query processing.
      • For every database partition we will have a master slave architecture. This will allow the application to work even if the primary database fails.

      Diagram

      Database design

      API Design

      Profile service
      • POST /user/signup - Creates new account
      • GET /user/login - Sends the user authentication token
      • GET /user/:userID - Gets the profile of the user ID
      • PUT /user/:userID - Update user details
      • DELETE /user/:userID - Removes the user account
      Session service
      • GET /session/users/:connectionID - Returns both the users that have the connection ID.
      • DELETE /session/connection/:connectionID - Deletes all the data that have the connection ID.
      • POST /session/connection/:userID1/:userID2 - Adds user ID1 and user ID2 with the same connection ID.
      Matcher service
      • GET /match - Return all the matches of the logged-in user.
      • DELETE /match/:userID - Deletes the user ID from the match list.
      Recommendation service
      • GET /recommendation - Returns a collection of most appropriate profiles for logged-in user.

      Happy Learning :) 

      My Profile

      My photo
      can be reached at 09916017317