Search This Blog

Showing posts with label Infra Testing. Show all posts
Showing posts with label Infra Testing. Show all posts

Saturday, September 11, 2021

Performance testing with Vegeta

Load testing is an important part of releasing a reliable API or application. Vegeta load testing will give you the confidence that the application will work well under a defined load. In this post, we will discuss how to use Vegeta for your load testing needs with some GET request examples. As it is just a go binary it is much easier to set up and use than you think, let's get started.

Loading a truck

What is Load testing?

Load testing in plain terms means testing an application by simulating some concurrent requests to determine the behavior of the application in the real world like scenario. Basically, it tests how the application will respond when multiple simultaneous users try to use the application.

There are many ways to load test applications/APIs and Vegeta is one of the easiest tools to perform load testing on your APIs or applications.

Prerequisites for this tutorial

Before jumping on the main topic let’s look at some prerequisites:

  • You are good with using the command line (installing and executing CLI apps)
  • Your application/API is deployed on a server (staging/production) to test it. Local tests are fine too still they might not give an accurate picture of how the server will behave on load.
  • You have some experience with load testing (may be used locust or Jmeter in the past)

Alternatives and why Vegeta

Load testing can be done in multiple ways, there are many different SAAS for load testing too. Still, locally installed tools are a great way to load test your application or API. I have used Locust in the past. The setup and execution are not as easy and straightforward as Vegeta.

Another option is to go with JMeter. Apache JMeter is a fully-featured load testing tool which also translates to knowing its concepts and having a steep learning curve.

Vegeta is a go-lang binary (and library) so installing and using it is a breeze. There are not many concepts to understand and learn.

To start with, simply provide a URL and give it how many requests per second you want the URL to be hit with. Vegeta will hit the URL with the frequency provided and can give the HTTP response codes and response time in an easy to comprehend graph.

The best thing about Vegeta is there is no need to install python or Java to get started. Next, let’s install Vegeta to begin Vegeta load testing.

Install Vegeta

Let us look at the official way Vegeta define itself:

Vegeta is a versatile HTTP load testing tool built out of a need to drill HTTP services with a constant request rate. It can be used both as a command-line utility and a library.

The easiest way to begin load testing with Vegeta is to download the right executable from its GitHub releases page. At the time of writing, the current version is v12.8.3.

Install on Linux

If you are on a 64-bit Linux you can make Vegeta work with the following set of commands:

cd ~/downloads

wget https://github.com/tsenart/vegeta/releases/download/v12.8.3/vegeta-12.8.3-linux-amd64.tar.gz

tar -zxvf vegeta-12.8.3-linux-amd64.tar.gz

chmod +x vegeta

./vegeta --version

If you want to execute Vegeta from any path, you can add a symlink to your path executing a command like ln -s ~/downloads/vegeta ~/bin/vegeta , then it will work on a new CLI tab.

Install on Mac

You can also install Vegeta on a Mac with the following command:

brew update && brew install vegeta

If you already have go-lang installed on your machine and GOBIN in your PATH, you can try to start your Vegeta load testing journey:

go get -u github.com/tsenart/vegeta

Check if it installed properly with:

vegeta --version

You should see a version number displayed.

Your first Vegeta load testing command

There are multiple ways to use the Vegeta load testing tool, one of the simplest ways to get the output on the command line for faster analysis. To your first Vegeta load testing command execute the following:

echo "GET http://httpbin.org/get" | vegeta attack -duration=5s -rate=5 | vegeta report --type=text

So what just happened here?

  1. We echoed the URL in this case httpbin.org/get and we passed it through Vegeta attack
  2. vegeta attack is the main command that ran the Vegeta load test with 5 requests per second for 5 seconds
  3. The last but equally important command executed was vegeta report get show the report of the attack as text.

You can see a sample output below:

Text output of 5 RPS for 5 seconds

Vegeta load testing tool ran the attack of 25 requests spread over 5 seconds at 5 RPS. The minimum response time was 240 ms and the maximum was 510 ms with a 100% success rate. This means all the requests came back as a 200. Further, let's have a look at how we can see a more graphical output.

Vegeta Load testing with graphical output

Another representation of Vegeta load testing results is an easy to understand graph. We can get a graph output with the below command:

cd && echo "GET http://httpbin.org/get" | vegeta attack -duration=30s -rate=10 -output=results-veg-httpbin-get.bin && cat results-veg-httpbin-get.bin | vegeta plot --title="HTTP Bin GET 10 rps for 30 seconds" > http-bin-get-10rps-30seconds.html

Let’s analyze how we used Vegeta for load testing httpbin.org here:

  1. We went to the user home with cd command
  2. Then we set up the URL for vegeta attack by echoing GET http://httpbin.org/get
  3. This step is when we “attack” (a.k.a load test) httpbin servers at 10 requests per second for 30 seconds duration (so in total 300 requests in 30 seconds) we also specified that we want the output at results-vegeta-httbin-get.bin file
  4. Now this result is like a binary that can’t be read easily so the next thing is we read the contents of this binary file with cat and passed it to vegeta plot with a fancy title and filename to get the HTML file
  5. When we open the created HTML file we can see a graph like below in the HTML file:
Graph output of 10 RPS for 30 seconds with Vegeta

So we sent 300 requests and all of them came back with a 200, the max response time was 552 milliseconds. One of the fastest response times was 234 milliseconds. This gives us a clear picture that HTTP bin can easily handle 10 requests per second for 30 seconds.

I would advise you to not try it many times, HTTPBin.org might block your IP thinking you are DDOSing their system.

Generally, you get the idea of how you use Vegeta for load testing your own services.

My service uses an Auth token

Well, all the services won’t be open to all, most will use a JWT or some other way to authenticate and authorize users. To test such services you can use a command like below:

cd && echo "GET http://httpbin.org/get" | vegeta attack -header "authorization: Bearer <your-token-here>" -duration=40s -rate=10 -output=results-veg-token.bin && cat results-veg-token.bin | vegeta plot --title="HTTP Get with token" > http-get-token.html

This example uses the same pattern as the above one, the main difference here is the use of -header param in the vegeta attack command used for Vegeta load testing.

If you want to test an HTTP POST with a custom body please refer to the Vegeta docs. It is best to test the GET APIs to know the load unless you have a write-heavy application/API.

How do I load test multiple URLs?

Testing multiple URLs with different HTTP methods is also relatively easy with Vegeta. Let’s have a look at this in the example below with a couple of GET requests:

  1. Create a targets.txt file (filename can be anything) with content like below that has a list of your URLs prefixed by the HTTP verb. In the one below I am load testing 3 GET URLs

                            GET http://httpbin.org/get

                            GET http://httpbin.org/ip

     

  1. Now similar to the first example with the text output run this command in the folder the targets.txt file is created: vegeta attack -duration=5s -rate=5 --targets=targets.txt | vegeta report --type=text
  2. We will see a text output like below:
Text output of multiple GET URLs with Vegeta

As we have seen doing load testing on multiple URLs with Vegeta is a breeze. Vegeta load testing can easily be done for other HTTP verbs like POST and PUT. Please refer to Vegeta docs.

Conclusion

This post was like scratching the surface with a primer on load testing with Vegeta. There are many advanced things that can be done with Vegeta load testing. Vegeta has been very useful on multiple occasions. I had once used Vegeta to load test Google Cloud Functions and Google Cloud Run with the same code to see the response time difference between those two for a talk. The graph comparing both the services made the difference crystal clear.

In another instance, we tested a new public-facing microservice that was replacing a part of an old monolith. It was very useful doing Vegeta load testing to know the response time difference for similar Request Per Second loads.

Load testing the application or API you want to go to production with is crucial.

We once had to open up an API to a much higher load than it would normally get. Our load testing with Vegeta really helped us determine the resources and level of horizontal scaling the API would need to work without issue.

All thanks to Vegeta it was much easier than using another tool or service.

Thursday, September 2, 2021

NFR Template/Checklist for JIRA


To make NFR as predefined template/checklist, we came up with few critical points to start with and it would be auto-populated as and when someone creates any story to the project.

Idea is to pushing NFR in initial phase discussion like designing and developing and as a cross check goes to QA. Apart from predefined template/checklist, anyone can work on other points too for which checklist already been published in Confluence under Guidelines and having predefined checklist in each story would ensure we are having NFR discussions too along with functional towards any deliverables to production.


NFR ListChecklist_PointsComments if any
Logging
Have we ensured we are not logging access logs?Access logs represent the request logs containing the API Path, status code, latencies & and any information about the request. We can avoid logging this since we already have this information in the istio-proxy logs
Have we ensured we didn't add any sort of secrets in logs (DB passwords, keys, etc) ?
Have we ensured that payload gets logged in the event of an error ?
Have we ensured that logging level can be dyanamic configured ?
Have we ensured that entire sequence of events in particular flow can be identified using an identifier like orderId or anything- The logs added should be meaningful enough such that anyone looking at the logs, regardless of whether they have context on the code should be able to understand the flow.
- For new features, it maybe important that the logs are logged as info to help ensure the feature is working is expected in production. Once we have confidence that the feature is working as expected, we could change these logs to debug unless required. Devs could take a call based on the requirement.
Have we ensured that we are using logging levels diligently ?
Timeouts
Have we ensured that we have set a timeout for database calls ?
Have we ensured that we have set a timeout for API call ?
Have we ensured that timeouts are derived from dependent component timeouts ?An API might have dependencies on few other components (APIs, DB queries, etc) internally. It is important the overall API timeout is considered after careful consideration of the dependent component timeouts.
Have we ensured that we have set a HTTP timeout ?Today, in most of our services we set timeouts at the client (caller). But we should also start looking at setting timeouts for requests on the server (callee). This way we ensure we kill the request in the server if it exceeds a timeout regardless of whether the client closes the connection or not.
Response Codes
Have we ensured that we are sending 2xx only for successfull scenarios ?
Have we ensured that we are sending 500 only for unexpected errors (excluding timeouts) ?
Have we ensured that we are sending 504 for a timeout error ?
Perf
Have we ensured that we did perf testing of any new API we build to get benchmark of the same we can go as per the expectations and can track accordingly going forward ?
We should identify below parameters as part of the perf test & any other additional info as per need:
- Max number of requests a pod can handle with the allocated resources
- CPU usage
- Memory usage
- Response times


Have we ensured we did perf testing of existing APIs if there are changes around it to make sure we didn’t impact existing benchmark results ?
Feature ToggleHave we ensured that we have feature toggle for new features to be able to go back to the old state at any given point until we are confident of the new changes. We may need to have toggles like feature will be enabled for specific users or city ?
ResiliencyHave we ensured that we are resilient to failures of dependent components (database, services ) ?
MetricsHave we ensured that we are capturing the right metrics in prometheous ?Below are some of the metrics that could be captured based on need or criticality:
- Business metrics (example: number of payment gateway failures)
- Business logic failures (example: number of rider prioritization requests that failed)
- Or any other errors which would be important to help assess the impact in a critical flow could be captured as metrics.
Security
Have we ensured that right authentication scheme is active at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway). 
- any of the authentication plugins (jwt,key-auth/basic-auth) must be defined either at the route level or on the service level
- for gateway kong end points, acl plugin must be added and same group must be present on the consumer definition.
Have we ensured that proper rate limiting applied at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway).Team leads are the code owners, so one of them have to check this when approving the PR. 
- rate limiting plugin needs to be enabled on the route / service level on the PR raised against kong-config. 
Have we ensured that we are retreiving the userId from JWT ?if requests is coming from kong, userid in requestbody should be matched with headers. Or for fetching any user related information, we have to read the userId only from the header populated by kong (x-consumer-username).

 


It would be populated in all Jira stories across projects as a predefined NFR checklist as given below screenshot.




Friday, July 23, 2021

How to do NFR (Non Functional Requirement) Testing

 

What is NFR testing?

NFR testing serves to validate that your system meets all the non-functional requirements (e.g concurrent requests, transactions per second, response times, sub-service failures etc) expected of it.

Why is NFR testing so important?

Today, applications and the eco-systems in which they run have changed drastically. Older applications used to run in closed, barely-distributed, named-(pet)-system environments where everything was most probably largely within your control. Now, with the arrival of clouds and microservices the eco-system has changed drastically, and the importance of testing NFRs has risen considerably.

Why should you care?

I think you should care because:

1) It’s important - actually very important : Its importance can simply be understood from the point of view that if a system can fulfill all its functional requirements, but not fulfill is just one critical NFR (be it related to capacity, speed, security, availability, scalability etc.), then that system will soon get outdated or fail to deliver at all in today’s fast-paced digital world.

2) It’s free : Its free in the sense that you don’t have to invest heavily (for tools or manpower) to do NFR testing. It’s best done by the developers themselves, with freely available open source tools.

3) It’s fun : It is fun for the developers to test the system they build; to see how it performs in the actual world and fine tune, improve and re-engineer their code before it’s too late.

NFR testing steps

Now you know what NFR testing is as well as its importance and why you should care about it, let me explain how you might do it.

0) Be Agile - Don’t wait till the end.

NFR testing should be planned from the beginning of the project as NFRs can have a big effect on the coding/architecture of your application.

Suppose you have an NFR which states that your application should handle very high traffic of say 1000 requests per second. Now, if you start NFR testing from the beginning of the project, you may come to know early in the development cycle that your application architecture can or can’t handle it. This may lead to changes in the code or design or adoption of some coding practice which allows you to achieve this. For example, you may have to use thread pools to minimize the time in creating and spawning new threads. You may also use multi casting and aggregation patterns, or even swap out one framework for another to achieve better response times.

Say another NFR states that the system should not be overloaded in case of failure in any one components of the system. Again, if you test this NFR from the beginning you can find whether your system can cope with this requirement or not. You may decide to build your application to fail fast in case of any error using the hystrix circuit breaker from Netflix.

The above example clearly show that if we are Agile (and early) with our NFR testing then it can assist in verifing the coding approach / architecture of our application and help us to change if required early in the development cycle; thereby saving us a lot of time and money later.

1) Plan

Being Agile in your NFR testing is not enough - you have to plan for it properly. Planning is important as will help to decide which NFR tests to run when you get constrained by time or resources. Believe me - you will. When Planning your NFR testing, you should consider the following points to get the most out of your NFR testing:

a) Be requirement-driven - Locate all NFRs and map them to your tests.

To better understand this, when creating an NFR test plan make sure you map all your NFR tests with the corresponding NFRs.

Suppose you created a test in which you are going to verify that in the circumstance where one of your sub-services fails, your system fails fast without overloading. This is a really important and good test as it validates your system resilience in case of failures and also how your system resources are used in such a scenario.

But if you can map this with a corresponding NFR which states ‘Failure to connect or receive a response from the Sub-Service is handled gracefully, logged, and reported back to the consuming application within X seconds of the call’ then not only will this achieve the above said goals, it will also help you to provide your stakeholders with statistics and data for the corresponding NFR and boosting their confidence in the system being built.

b) Be fear driven too – If something is new, test the hell out of it

Suppose you have a system which uses MongoDB 2.6 as the database and now you’re told to upgrade to MongoDB 3.0. In this scenario MongoDB 3.0 is new to your solution and may impact you in unknown ways - most probably in a positive way, but you don’t know. There will be always be concern/fear about the effect of this change. You should address this concern/fear with high priority. It is therefore better to run all NFR tests which are related to your Database as compared to other NFR test which are non related such as sub-service failure.

c) Prioritize

When you define your NFR tests, make sure you prioritize them too, as you may/will be constrained by time and manpower.

Let say you have three NFR tests:

  1. DB failover and recovery
  2. Handling transaction at 100tps
  3. Graceful handling of Sub service failure.

Now, taking the above example, since your last test you have recently changed your Database from MongoDB 2.6 to MongoDB 3.0. Suppose when the system was built all the NFRs were “MUST”, but in this scenario you may not have enough time to re-run them all, so you have to prioritize them. In this example, it is clear that we “MUST” run NFR 1 and 2 as they specifically depend on the Database. NFR 3 can be termed as nice-to-have, as the system has not changed this test is less likely to be affected.

NFRs can also be prioritized with respect to an environment you are running them. Suppose you have three Database related NFRs:

  1. DB failover and recovery
  2. DB Handling transaction at 100tps
  3. Data center failure and recovery.

Now NFR 1 and 2 were already tested on the Dev and Test environments, but NFR 3 was not tested, as these environments don’t have the infrastructure to support it. Well, Pre-prod does have this infrastructure, so when doing NFR testing on pre-prod, you should prioritize the NFR 3 as this is the one which you have never tested before and has a high significance in production.

2) Setup

Setting up the proper environment for the your NFR testing is very important as it may nullify/invalidate your testing or create doubt around the test results. Consider the points below when setting up your NFR test environment.

a) Only prod-like truly counts - in Config, Data, Platform and Monitoring

When setting up your NFR test environment you should make sure it has the same configuration, physical architecture and monitoring as you will have in production. (Or as close to it as you can get to tell you what you want to know.) If any aspect of this (CPU, RAM, network settings / config) deviates from the production configuration, then results which you will get by running your test will be skewed in some way and be less valid when you present them to your stakeholders.

Take the example of NFR where your system is supposed to handle 100tps. Perhaps you set up your environment with 2 CPU with 4 cores. But, your production haS 4 CPUs and 8 cores. In this circumstance, any test which you will run could have two outcomes:

  1. you will get good results (system meets or exceeds your NFR target)
  2. you will get bad results (system fails to meet your NFR target).

In the case of a good result, you should be happy that you have written very optimized code which is working good on limited hardware. But, if get a bad result you may unnecessarily suspect that there is some issue with your application. You may waste time and manpower in optimizing code which is already optimized, but the result is skewed due to hardware limitation.

Also, if you reverse the scenario and use a more superior configuration in your test environment than the production box you might gain a false sense of security. You can easily understand the impact.

b) N.b. Run-state - Normal? Peak Load? Failure/DR scenario? Latency?

A system can operate under different conditions and loads. You ideally set up your test environment to simulate all possible conditions (normal load, peak load, failure scenario and n/w latency etc). Environment setup may also include creating data, tests and scripts to simulate these conditions. You should note down the run-state as part of NFR result. Run your NFR tests in all such environments to prove the predictable functioning of the system.

c) Be repeatable - config, dataset, request load, and scenario steps

You should ensure that test environment setup should be easily configurable and repeatable. If your test requires some data to be preloaded into the system, then that data should be easy to re-create and should be available to anyone going to run the test. All configuration files should be stored in a central repository so that anyone running the tests can access them as and when required.

3) Perform - Run your NFR tests

The next obvious thing after planning, prioritizing your test runs and setting up the test environment is to actually run them. Again there are several aspects to this:

a) Request Mix - Use case & background, success & failure

When running your NFR tests make sure they reflect the real use cases as closely as possible. Suppose, your system caters for four types of request (add, update, delete and read). As we can see the nature of each of these requests are different, and can have a significant effect on overall response times. Now suppose your test is to validate the NFR that you must support 100tps, then your NFR test should not just send one type of request at 100tps, but it should send a request mix of all four request types. You should also take care of the percentage of each request type to be sent for proper request mix (e.g. 20% add, 5% update, 1% delete and 74% Read. You can get this info from your product owner, a user or a business analyst). You should also consider sending bad and invalid requests mixed into your test load to simulate the real life scenario where not everything is going according to plan elsewhere.

There may also be a test to just send successful requests, in such scenario also you should pass all the four types of request with agreed percentage for each type.

This brings up a key point. Request mix is very important, as it doesn’t just validate that your system can handle all sorts of requests, but also how your system behaves when all sorts of requests come at the same time. A lack of request mix can skew your request and give a false impression of high performance. For example, if you just send read requests, the response time will be quick, but if you send all add requests the response time will be longer than the read. And if you add in some bad requests, resultant error handling and compensatory transactions, then you get a very different picture.

So, take care of the request mix when creating test case for your NFRs.

b) Load - N.b Req/sec & number of users

Again the transactions per second and the number of concurrent users have a huge impact on the performance of the system, hence take care with it and make sure your run your test with the same transactions per second specified in NFRs or what you expect in Production. Suppose you are aware that your system will at max hit with 6tps in Production (tps being controlled by your load balancer), then you should run your test at 6tps only.

c) Events - Fail/Slow down & Recover/Speed up.

You will have NFRs related to system failure/recovery or slowdown/speedup. So, your tests should be run under conditions of sub-system failure and recovery, system slow down and speed up to cover these NFRs.

4) Record

So you’ve run your tests and have your results - you’re done right? Well not really. Recording your NFR test results is a must, without it you will never be able to prove your system performance and go back to it in the future when you learn something new. The following are some basic guidelines about what you should record in your NFR findings and how.

a) Measure - Throughput and Response time (inc. the percentiles).

The key elements to measure in your NFR testing are most usually throughput and response time, and to view these in relation to the 95th and 99th percentiles. Percentiles are important too (especially in a microservice based system) so its important you know about how they work. A good introduction can be found on Wikipedia.

b) Write up using templates - Recording the same thing, in the same way, every time

The best way to record test results is to use templates. Create a generic template which covers what what you want to capture and use it whenever recording the results. Templates gives many benefits, a few of them are:

  • create once, use many times
  • they help to easily compare the results
  • you get consistency
  • they are easy to understand

Recording your NFR test results is important, and it is most important to record the measured variables and environment settings used after each run (e.g throughput, response time, JVM and CPU usage, http and tomcat thread pool and response codes). If you don’t record the same set of parameters in each run it will become hard to validate the test result and see any improvement or side effect after each new release of code.

A practical example will be, for NFR test of 1000req/sec you ran a first test and capture the throughput, response time and CPU and Memory usage and got values of 70tps, 2000ms per request average, and with an average CPU usage of 30% and average Memory usage of 20%. Now to improve the throughput you decided to use more http and tomcat threads. This time you captured only the throughput and response times (100tps and 1800ms). This seems to be good and you have achieved your NFR of 100tps. But you have not captured the CPU or Memory related information. To your surprise the CPU usage was very high 90% and Memory use was almost 100%, which is not desirable state of system are normal load. Therefore, your approach to increase the thread pool was not effective as you expected. This can trip you up later on when you scale even more.

So remember to capture the same set of data every time and have a fixed template to captured it.

c) Save Evidence - logs, Graphs etc.

It is also important to capture the NFR test results in the raw, as this is the data used to validate your findings. There is no point in just writing on a wiki page saying that you ran the test and achieved the throughput of 100tps with response time being less then 1 sec with no errors, until you can back it up with some sort of data (logs, graphs, etc.).

If you can capture logs and graphs from a test tool such as JMeter, this is strong evidence that can be very helpful in supporting your conclusions.

d) Summarise - For the TL;DR crew

In the end, you should summarise the NFR test result to your TL;DR crew. If they can understand and agree with the NFR test results then most probably your stakeholders will too.

5) Investigate

One of the most important tasks of NFR testing is not just to run the tests and capture the data, but also to capture all failures and errant behaviours of the system and attribute explanations to each of them.

a) Attribute all errors - write them up too

Every system is supposed to throw errors, but if we don’t have an explanation for each, then the system is unreliable. So you should make sure that not only do you capture the errors, but also that you attribute them each with a proper explanation. Without a valid explanation for failures your NFR testing becomes invalid. If you don’t have any explanation for the error or errant behaviour of the system, then you’ll need to investigate. This may lead to code changes, documentation updates or simply a deeper understanding of the system. More importantly it can lead to finding bugs in system or at least tuning of the logs being captured. Again, going back to the example of the 100tps NFR. When running this test, one of your requests failed. If you don’t provide any explanation for this error, your NFR test becomes invalid. But if you check the logs and you find an error related to “tcp connection timeout”, then you can attribute that failure to this error and can proceed.

b) Re-run - only changing one thing at a time

If your investigation requires, you should re-run your test to validate your findings. It is good practice to level up the logging when investigating. When it comes to configuration changes, make sure you change only one parameter at a time, to verify the effect of the changes made. To raise the capacity from an observed 75tps to 100tps, suppose you changed the default thread pool size of 10 to 20 and also modified the memory setting from 200mb to 500mb; here you have modified two things. Now, suppose after his change you were able to get 100tps. So, what will you do? will you keep 20 as default thread pool size or memory size to 500mb. You can’t decide until you know the effect of each change individually. So the best way is to make one change at a time and re-run your test to verify the effect of the change made.

6) Improve

NFR testing is not just to check that your application meets NFR specification but also to improve your application. NFR testing always leads to finding bottlenecks in a system and help to improve it. When improving the system we should use the helpers below.

a) Tune - pool sizes, timeouts, queries, caches, indexes etc.

When improving/tuning the system to meet NFRs specification, you should start with configurable settings such as thread pool size, timeouts, queries, caches etc. A code change should be amongst your last options unless you can clearly associate an issue with it.

b) N.b Run the slowest step.

When it come to tuning, it is always good to find the slowest bit and tune that. If you are able to fine tune this then most probably other bits too will improve. This is not a universal law - sometimes this can lead to a flood cascading on downstream elements which in turn has an even more detrimental effect on performance. The best way to find the slowest bit is to view the response time graphs for each request. Through it you can easily figure out which service is taking more time and you can start tuning it.

c) Fix - Bugs, slowness, weakness etc.

When running NFR tests, if you find any error either there will be a valid explanation for it or there will be some bug which is causing it. In the latter case, fix it. Similarly, if your system is slow and does not meet your NFR specification, then to look for improving the system as explained in 6.a.

d) N.b your NFR target - Don’t get carried away if NFRs == met, break

The key thing to note when improving the system is, if your system is able to meet your NFRs, we should not over-optimize it. Suppose, when you were running a NFR test at 30tps you were getting Hystrix thread denied for thread group size of 10. Then, you changed it to thread group size to 15 and it was working fine. Now there is no need to over optimize and test it whether it works with thread group size of 11, 12, 13 -that is unnecessary.

7) Goto 0

If there are still some NFRs to be run, repeat the cycle.

My Profile

My photo
can be reached at 09916017317