Search This Blog

Saturday, September 11, 2021

Performance testing with Vegeta

Load testing is an important part of releasing a reliable API or application. Vegeta load testing will give you the confidence that the application will work well under a defined load. In this post, we will discuss how to use Vegeta for your load testing needs with some GET request examples. As it is just a go binary it is much easier to set up and use than you think, let's get started.

Loading a truck

What is Load testing?

Load testing in plain terms means testing an application by simulating some concurrent requests to determine the behavior of the application in the real world like scenario. Basically, it tests how the application will respond when multiple simultaneous users try to use the application.

There are many ways to load test applications/APIs and Vegeta is one of the easiest tools to perform load testing on your APIs or applications.

Prerequisites for this tutorial

Before jumping on the main topic let’s look at some prerequisites:

  • You are good with using the command line (installing and executing CLI apps)
  • Your application/API is deployed on a server (staging/production) to test it. Local tests are fine too still they might not give an accurate picture of how the server will behave on load.
  • You have some experience with load testing (may be used locust or Jmeter in the past)

Alternatives and why Vegeta

Load testing can be done in multiple ways, there are many different SAAS for load testing too. Still, locally installed tools are a great way to load test your application or API. I have used Locust in the past. The setup and execution are not as easy and straightforward as Vegeta.

Another option is to go with JMeter. Apache JMeter is a fully-featured load testing tool which also translates to knowing its concepts and having a steep learning curve.

Vegeta is a go-lang binary (and library) so installing and using it is a breeze. There are not many concepts to understand and learn.

To start with, simply provide a URL and give it how many requests per second you want the URL to be hit with. Vegeta will hit the URL with the frequency provided and can give the HTTP response codes and response time in an easy to comprehend graph.

The best thing about Vegeta is there is no need to install python or Java to get started. Next, let’s install Vegeta to begin Vegeta load testing.

Install Vegeta

Let us look at the official way Vegeta define itself:

Vegeta is a versatile HTTP load testing tool built out of a need to drill HTTP services with a constant request rate. It can be used both as a command-line utility and a library.

The easiest way to begin load testing with Vegeta is to download the right executable from its GitHub releases page. At the time of writing, the current version is v12.8.3.

Install on Linux

If you are on a 64-bit Linux you can make Vegeta work with the following set of commands:

cd ~/downloads

wget https://github.com/tsenart/vegeta/releases/download/v12.8.3/vegeta-12.8.3-linux-amd64.tar.gz

tar -zxvf vegeta-12.8.3-linux-amd64.tar.gz

chmod +x vegeta

./vegeta --version

If you want to execute Vegeta from any path, you can add a symlink to your path executing a command like ln -s ~/downloads/vegeta ~/bin/vegeta , then it will work on a new CLI tab.

Install on Mac

You can also install Vegeta on a Mac with the following command:

brew update && brew install vegeta

If you already have go-lang installed on your machine and GOBIN in your PATH, you can try to start your Vegeta load testing journey:

go get -u github.com/tsenart/vegeta

Check if it installed properly with:

vegeta --version

You should see a version number displayed.

Your first Vegeta load testing command

There are multiple ways to use the Vegeta load testing tool, one of the simplest ways to get the output on the command line for faster analysis. To your first Vegeta load testing command execute the following:

echo "GET http://httpbin.org/get" | vegeta attack -duration=5s -rate=5 | vegeta report --type=text

So what just happened here?

  1. We echoed the URL in this case httpbin.org/get and we passed it through Vegeta attack
  2. vegeta attack is the main command that ran the Vegeta load test with 5 requests per second for 5 seconds
  3. The last but equally important command executed was vegeta report get show the report of the attack as text.

You can see a sample output below:

Text output of 5 RPS for 5 seconds

Vegeta load testing tool ran the attack of 25 requests spread over 5 seconds at 5 RPS. The minimum response time was 240 ms and the maximum was 510 ms with a 100% success rate. This means all the requests came back as a 200. Further, let's have a look at how we can see a more graphical output.

Vegeta Load testing with graphical output

Another representation of Vegeta load testing results is an easy to understand graph. We can get a graph output with the below command:

cd && echo "GET http://httpbin.org/get" | vegeta attack -duration=30s -rate=10 -output=results-veg-httpbin-get.bin && cat results-veg-httpbin-get.bin | vegeta plot --title="HTTP Bin GET 10 rps for 30 seconds" > http-bin-get-10rps-30seconds.html

Let’s analyze how we used Vegeta for load testing httpbin.org here:

  1. We went to the user home with cd command
  2. Then we set up the URL for vegeta attack by echoing GET http://httpbin.org/get
  3. This step is when we “attack” (a.k.a load test) httpbin servers at 10 requests per second for 30 seconds duration (so in total 300 requests in 30 seconds) we also specified that we want the output at results-vegeta-httbin-get.bin file
  4. Now this result is like a binary that can’t be read easily so the next thing is we read the contents of this binary file with cat and passed it to vegeta plot with a fancy title and filename to get the HTML file
  5. When we open the created HTML file we can see a graph like below in the HTML file:
Graph output of 10 RPS for 30 seconds with Vegeta

So we sent 300 requests and all of them came back with a 200, the max response time was 552 milliseconds. One of the fastest response times was 234 milliseconds. This gives us a clear picture that HTTP bin can easily handle 10 requests per second for 30 seconds.

I would advise you to not try it many times, HTTPBin.org might block your IP thinking you are DDOSing their system.

Generally, you get the idea of how you use Vegeta for load testing your own services.

My service uses an Auth token

Well, all the services won’t be open to all, most will use a JWT or some other way to authenticate and authorize users. To test such services you can use a command like below:

cd && echo "GET http://httpbin.org/get" | vegeta attack -header "authorization: Bearer <your-token-here>" -duration=40s -rate=10 -output=results-veg-token.bin && cat results-veg-token.bin | vegeta plot --title="HTTP Get with token" > http-get-token.html

This example uses the same pattern as the above one, the main difference here is the use of -header param in the vegeta attack command used for Vegeta load testing.

If you want to test an HTTP POST with a custom body please refer to the Vegeta docs. It is best to test the GET APIs to know the load unless you have a write-heavy application/API.

How do I load test multiple URLs?

Testing multiple URLs with different HTTP methods is also relatively easy with Vegeta. Let’s have a look at this in the example below with a couple of GET requests:

  1. Create a targets.txt file (filename can be anything) with content like below that has a list of your URLs prefixed by the HTTP verb. In the one below I am load testing 3 GET URLs

                            GET http://httpbin.org/get

                            GET http://httpbin.org/ip

     

  1. Now similar to the first example with the text output run this command in the folder the targets.txt file is created: vegeta attack -duration=5s -rate=5 --targets=targets.txt | vegeta report --type=text
  2. We will see a text output like below:
Text output of multiple GET URLs with Vegeta

As we have seen doing load testing on multiple URLs with Vegeta is a breeze. Vegeta load testing can easily be done for other HTTP verbs like POST and PUT. Please refer to Vegeta docs.

Conclusion

This post was like scratching the surface with a primer on load testing with Vegeta. There are many advanced things that can be done with Vegeta load testing. Vegeta has been very useful on multiple occasions. I had once used Vegeta to load test Google Cloud Functions and Google Cloud Run with the same code to see the response time difference between those two for a talk. The graph comparing both the services made the difference crystal clear.

In another instance, we tested a new public-facing microservice that was replacing a part of an old monolith. It was very useful doing Vegeta load testing to know the response time difference for similar Request Per Second loads.

Load testing the application or API you want to go to production with is crucial.

We once had to open up an API to a much higher load than it would normally get. Our load testing with Vegeta really helped us determine the resources and level of horizontal scaling the API would need to work without issue.

All thanks to Vegeta it was much easier than using another tool or service.

Thursday, September 2, 2021

API Automation Guidelines

 As an automation engineer, we need to follow a few guidelines.

Few of the guidelines as below:

  • No code change in the master branch directly - work on feature branches

  • Build the project locally before raising a PR

  • Run the test(s) locally before raising a PR

  • There has to be at least 1 person who reviews a PR

    • Post your PR link on the slack channel tagging concerned people and the reviewer would merge the PR and update with a comment on the slack thread

    • Reviewer has to ensure that the newly added tests are passing on the pipeline before merging

  • Ensure we add proper commit message while committing any code

    • Example: “automated customer cancel in order flow” or “modified X to achieve Y“. Basically meaningful commit instead of just writing “commit“ “fixed“ etc

  • Test Method should be 40-50 lines long at max

    • Break it into private methods if needed

    • Name the test method such that there is NO need of documenting its behaviour - test method names should start with "verify******"

  • Do NOT span any PR beyond 3-4 days - either get it merged within this time period or close the current one (if it is spilling over 3-4 days) and create another after local rebase

  • Put all assertions in Test classes (use return in helper methods to get what needs to be compared for assertions)

  • Always add a message with assertions to be logged upon a failure - it gives the good context of the issue in the report upon a failure, upfront

  • Ensure the correct tags are attached to the scenarios/tests before raising a PR (Smoke, Regression, ServiceType)

  • Don’t use “System.out.println” in the code, use TestNG logger only.

  • Add allure annotations properly so test reports can be used effectively.

  • Test your code with all negative cases. Avoid null pointer exceptions in your code.

  • Add logging for each api call (Request Call/Request Payload/Response Json are the minimal ones).

  • Add all other necessary logging for your test case so it can be helpful later for the debugging

  • Avoid adding redundant code and create a helper method instead.

  • Always add health check verification for the new APIs.

QA Process Guidelines

 As part of QA, we need to follow a few guidelines to deliver the product fast and with quality.

Few of the guidelines are as follow:

  • QA Process

    • Add postman collection for new API's in the existing collection as part of sprint story (create a sub-task)

    • QA should do PRD review before even picking any sprint task and figure out issue there itself

    • Identify critical scenarios before dev start coding and share your feedback upfront

    • Add test cases in JIRA and share it with dev/PM for review before dropping a build to QA.

    • Ask the dev team to test the critical scenarios by themselves and need to publish the Test Results.

    • Ask dev team to publish unit test report

    • Look into big impact of the sprint story from an application perspective instead of checking specific task

    • Create a checklist for production release

    • Each team member should monitor Crashlytics report after the release

    • Always add first-level analysis in the defect including Request/Response, back-end, mobile application logs

    • Verify logs as part of sprint story testing. If logs have been added for the feature, it helps to debug any issue on the feature

    • QA should take all necessary access to debug the issue so they can do first-level analysis by themselves

    • Enhance App Regression Suite with every release

  • Ask dev to verify APK from their end for few critical cases before sharing with QA

  • QA should own deployment on staging/QA env

 

  • Prod Bugs RCA

    • Template to write RCA in JIRA

      • Description of the issue

        • Application Version

        • Android/iOS Version

      • Steps to reproduce

      • Impact

        • App crash impact

        • Orders impacted

      • The root cause of the issue (how it introduced)

      • How can we avoid this issue in future or similar issues

      • Time took to reproduce the issue

      • Tag/Label issue type (Functional/Performance/Infra)

  • QA Involvement

    • Get a detailed understanding of the issue so we can tag the issue correctly

    • QA of the POD/Team should take responsibility w.r.t issue raised

 

  • Automation

    • Each team member should follow automation guidelines while writing code

    • Identify smoke test suite to automate

    • Everyone in the team must automate test cases

    • At least 50% of sprint tasks must be automated

 

  • Postman Collection

    • Each team member responsible to maintain postman collection w.r.t his deliverables.

NFR Template/Checklist for JIRA


To make NFR as predefined template/checklist, we came up with few critical points to start with and it would be auto-populated as and when someone creates any story to the project.

Idea is to pushing NFR in initial phase discussion like designing and developing and as a cross check goes to QA. Apart from predefined template/checklist, anyone can work on other points too for which checklist already been published in Confluence under Guidelines and having predefined checklist in each story would ensure we are having NFR discussions too along with functional towards any deliverables to production.


NFR ListChecklist_PointsComments if any
Logging
Have we ensured we are not logging access logs?Access logs represent the request logs containing the API Path, status code, latencies & and any information about the request. We can avoid logging this since we already have this information in the istio-proxy logs
Have we ensured we didn't add any sort of secrets in logs (DB passwords, keys, etc) ?
Have we ensured that payload gets logged in the event of an error ?
Have we ensured that logging level can be dyanamic configured ?
Have we ensured that entire sequence of events in particular flow can be identified using an identifier like orderId or anything- The logs added should be meaningful enough such that anyone looking at the logs, regardless of whether they have context on the code should be able to understand the flow.
- For new features, it maybe important that the logs are logged as info to help ensure the feature is working is expected in production. Once we have confidence that the feature is working as expected, we could change these logs to debug unless required. Devs could take a call based on the requirement.
Have we ensured that we are using logging levels diligently ?
Timeouts
Have we ensured that we have set a timeout for database calls ?
Have we ensured that we have set a timeout for API call ?
Have we ensured that timeouts are derived from dependent component timeouts ?An API might have dependencies on few other components (APIs, DB queries, etc) internally. It is important the overall API timeout is considered after careful consideration of the dependent component timeouts.
Have we ensured that we have set a HTTP timeout ?Today, in most of our services we set timeouts at the client (caller). But we should also start looking at setting timeouts for requests on the server (callee). This way we ensure we kill the request in the server if it exceeds a timeout regardless of whether the client closes the connection or not.
Response Codes
Have we ensured that we are sending 2xx only for successfull scenarios ?
Have we ensured that we are sending 500 only for unexpected errors (excluding timeouts) ?
Have we ensured that we are sending 504 for a timeout error ?
Perf
Have we ensured that we did perf testing of any new API we build to get benchmark of the same we can go as per the expectations and can track accordingly going forward ?
We should identify below parameters as part of the perf test & any other additional info as per need:
- Max number of requests a pod can handle with the allocated resources
- CPU usage
- Memory usage
- Response times


Have we ensured we did perf testing of existing APIs if there are changes around it to make sure we didn’t impact existing benchmark results ?
Feature ToggleHave we ensured that we have feature toggle for new features to be able to go back to the old state at any given point until we are confident of the new changes. We may need to have toggles like feature will be enabled for specific users or city ?
ResiliencyHave we ensured that we are resilient to failures of dependent components (database, services ) ?
MetricsHave we ensured that we are capturing the right metrics in prometheous ?Below are some of the metrics that could be captured based on need or criticality:
- Business metrics (example: number of payment gateway failures)
- Business logic failures (example: number of rider prioritization requests that failed)
- Or any other errors which would be important to help assess the impact in a critical flow could be captured as metrics.
Security
Have we ensured that right authentication scheme is active at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway). 
- any of the authentication plugins (jwt,key-auth/basic-auth) must be defined either at the route level or on the service level
- for gateway kong end points, acl plugin must be added and same group must be present on the consumer definition.
Have we ensured that proper rate limiting applied at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway).Team leads are the code owners, so one of them have to check this when approving the PR. 
- rate limiting plugin needs to be enabled on the route / service level on the PR raised against kong-config. 
Have we ensured that we are retreiving the userId from JWT ?if requests is coming from kong, userid in requestbody should be matched with headers. Or for fetching any user related information, we have to read the userId only from the header populated by kong (x-consumer-username).

 


It would be populated in all Jira stories across projects as a predefined NFR checklist as given below screenshot.




Security Test Checklist - Cheatsheet

As part of engineering team, when specially we are dealing with scale and playing a role towards quality of end product we ship to outside world, it becomes quite important to make sure we looked into from security perspective for all the deliverables.

Please follow below checklist as part of your regular deliverables

  • Broken Object Level Authorization

    • Let’s say a user generates a document with ID=322. They should only be allowed access to that document. If you specify ID=109 or some other ID, the service should return the 403 (Forbidden) error. To test this issue, what parameters can you experiment with? You could pass any ID in the URL or as part of Query parameters or Body (in XML or JSON). Try changing them to see what the service returns to you.

  • Broken User Authentication

    • Here, you can test whether session token gets reassigned after each successful login procedure or after the access level gets escalated in the application. In case your application removes or somehow changes the session token, check to see whether it returns a 401 error. We must not allow the possibility of predicting the session token for each next session. It should be as random as possible.

  • Excessive Data Exposure

    • For example, you have an interface that displays three fields: First Name, Position, Email Address, and Photo. However, if you look at the API response, you may find more data, including some sensitive data like Birth Date or Home Address. The second type of Excessive Data Exposure occurs when UI data and API data are both returned correctly, but parameters are filtered on the front end and are not verified in any way on the back end. You may be able to specify in the request what data you need, but the back-end does not check whether you really have permission to access that data.

  • Lack of Resources & Rate Limiting

    • API should not send more than N requests per second. However, this strategy is not quite correct. If your client generates more traffic than another client, your API should be stable for all clients.

      This can be resolved using special status codes, for example, 429 (Too Many Requests). Using this status code, you can implement some form of Rate Limiting. There are also special proprietary headers. For example, GitHub uses its X-RateLimit-*. These headers help regulate how many requests the client can send during a specific unit of time.

    • The second scenario is related to the fact that you may not have enough parameter checks in the request. Suppose you have an application that returns a list of user types like size=10. What happens if an attacker changes this to 200000? Can the application cope with such a large request?

  • Broken Function Level Authorization

    • This is concerned with vertical levels of authorization —the user attempting to gain more access rights than allowed. For example, a regular user trying to become an admin. To find this vulnerability, you must first understand how various roles and objects in the application are connected. Secondly, you must clearly understand the access matrix implemented in the application.

  • Mass Assignment

    • Avoid providing convenient mass assignment functions (when assigning parameters in bulk).

  • Security Misconfiguration

    • What can you test here? First of all, unnecessary HTTP methods must be disabled on the server. Do not show any unnecessary user errors at all. Do not pass technical details of the error to the client. If your application uses Cross-Origin Resource Sharing (CORS), that is, if it allows another application from a different domain to access your application’s cookies, then these headers must be appropriately configured to avoid additional vulnerabilities. Any access to internal files must also be disabled.

  • Injections

    • In my opinion, modern frameworks, modern development methods, and architectural patterns block us from the most primitive SQL or XSS injections. For example, you can use the object-relational mapping model to avoid SQL injection. This does not mean that you need to forget about injections at all. Such problems are still possible throughout a huge number of old sites and systems. Besides XSS and SQL, you should look for XML injections, JSON injections, and so on.

  • Improper Assets Management

    • CI/CD pipelines have access to various secrets and confidential data, such as accounts used to sign the code. Ensure you do not leave hard-coded secrets in the code and don’t “commit” them to the repository, no matter whether it is public or private.

  • Insufficient Logging & Monitoring

    • The main idea here is that whatever happens to your application, you must be sure that you can track it. You should always have logs that show precisely what the attacker was trying to do. Also, have systems in place to identify suspicious traffic, and so on.Also we must check for secrets/credentials/confidential info in log. Have seen this in some cases where we log username/password of databases.

My Profile

My photo
can be reached at 09916017317