Search This Blog

Thursday, September 2, 2021

NFR Template/Checklist for JIRA


To make NFR as predefined template/checklist, we came up with few critical points to start with and it would be auto-populated as and when someone creates any story to the project.

Idea is to pushing NFR in initial phase discussion like designing and developing and as a cross check goes to QA. Apart from predefined template/checklist, anyone can work on other points too for which checklist already been published in Confluence under Guidelines and having predefined checklist in each story would ensure we are having NFR discussions too along with functional towards any deliverables to production.


NFR ListChecklist_PointsComments if any
Logging
Have we ensured we are not logging access logs?Access logs represent the request logs containing the API Path, status code, latencies & and any information about the request. We can avoid logging this since we already have this information in the istio-proxy logs
Have we ensured we didn't add any sort of secrets in logs (DB passwords, keys, etc) ?
Have we ensured that payload gets logged in the event of an error ?
Have we ensured that logging level can be dyanamic configured ?
Have we ensured that entire sequence of events in particular flow can be identified using an identifier like orderId or anything- The logs added should be meaningful enough such that anyone looking at the logs, regardless of whether they have context on the code should be able to understand the flow.
- For new features, it maybe important that the logs are logged as info to help ensure the feature is working is expected in production. Once we have confidence that the feature is working as expected, we could change these logs to debug unless required. Devs could take a call based on the requirement.
Have we ensured that we are using logging levels diligently ?
Timeouts
Have we ensured that we have set a timeout for database calls ?
Have we ensured that we have set a timeout for API call ?
Have we ensured that timeouts are derived from dependent component timeouts ?An API might have dependencies on few other components (APIs, DB queries, etc) internally. It is important the overall API timeout is considered after careful consideration of the dependent component timeouts.
Have we ensured that we have set a HTTP timeout ?Today, in most of our services we set timeouts at the client (caller). But we should also start looking at setting timeouts for requests on the server (callee). This way we ensure we kill the request in the server if it exceeds a timeout regardless of whether the client closes the connection or not.
Response Codes
Have we ensured that we are sending 2xx only for successfull scenarios ?
Have we ensured that we are sending 500 only for unexpected errors (excluding timeouts) ?
Have we ensured that we are sending 504 for a timeout error ?
Perf
Have we ensured that we did perf testing of any new API we build to get benchmark of the same we can go as per the expectations and can track accordingly going forward ?
We should identify below parameters as part of the perf test & any other additional info as per need:
- Max number of requests a pod can handle with the allocated resources
- CPU usage
- Memory usage
- Response times


Have we ensured we did perf testing of existing APIs if there are changes around it to make sure we didn’t impact existing benchmark results ?
Feature ToggleHave we ensured that we have feature toggle for new features to be able to go back to the old state at any given point until we are confident of the new changes. We may need to have toggles like feature will be enabled for specific users or city ?
ResiliencyHave we ensured that we are resilient to failures of dependent components (database, services ) ?
MetricsHave we ensured that we are capturing the right metrics in prometheous ?Below are some of the metrics that could be captured based on need or criticality:
- Business metrics (example: number of payment gateway failures)
- Business logic failures (example: number of rider prioritization requests that failed)
- Or any other errors which would be important to help assess the impact in a critical flow could be captured as metrics.
Security
Have we ensured that right authentication scheme is active at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway). 
- any of the authentication plugins (jwt,key-auth/basic-auth) must be defined either at the route level or on the service level
- for gateway kong end points, acl plugin must be added and same group must be present on the consumer definition.
Have we ensured that proper rate limiting applied at the gateway level ?This is applicable when we are adding any end point on Kong(Gateway).Team leads are the code owners, so one of them have to check this when approving the PR. 
- rate limiting plugin needs to be enabled on the route / service level on the PR raised against kong-config. 
Have we ensured that we are retreiving the userId from JWT ?if requests is coming from kong, userid in requestbody should be matched with headers. Or for fetching any user related information, we have to read the userId only from the header populated by kong (x-consumer-username).

 


It would be populated in all Jira stories across projects as a predefined NFR checklist as given below screenshot.




Security Test Checklist - Cheatsheet

As part of engineering team, when specially we are dealing with scale and playing a role towards quality of end product we ship to outside world, it becomes quite important to make sure we looked into from security perspective for all the deliverables.

Please follow below checklist as part of your regular deliverables

  • Broken Object Level Authorization

    • Let’s say a user generates a document with ID=322. They should only be allowed access to that document. If you specify ID=109 or some other ID, the service should return the 403 (Forbidden) error. To test this issue, what parameters can you experiment with? You could pass any ID in the URL or as part of Query parameters or Body (in XML or JSON). Try changing them to see what the service returns to you.

  • Broken User Authentication

    • Here, you can test whether session token gets reassigned after each successful login procedure or after the access level gets escalated in the application. In case your application removes or somehow changes the session token, check to see whether it returns a 401 error. We must not allow the possibility of predicting the session token for each next session. It should be as random as possible.

  • Excessive Data Exposure

    • For example, you have an interface that displays three fields: First Name, Position, Email Address, and Photo. However, if you look at the API response, you may find more data, including some sensitive data like Birth Date or Home Address. The second type of Excessive Data Exposure occurs when UI data and API data are both returned correctly, but parameters are filtered on the front end and are not verified in any way on the back end. You may be able to specify in the request what data you need, but the back-end does not check whether you really have permission to access that data.

  • Lack of Resources & Rate Limiting

    • API should not send more than N requests per second. However, this strategy is not quite correct. If your client generates more traffic than another client, your API should be stable for all clients.

      This can be resolved using special status codes, for example, 429 (Too Many Requests). Using this status code, you can implement some form of Rate Limiting. There are also special proprietary headers. For example, GitHub uses its X-RateLimit-*. These headers help regulate how many requests the client can send during a specific unit of time.

    • The second scenario is related to the fact that you may not have enough parameter checks in the request. Suppose you have an application that returns a list of user types like size=10. What happens if an attacker changes this to 200000? Can the application cope with such a large request?

  • Broken Function Level Authorization

    • This is concerned with vertical levels of authorization —the user attempting to gain more access rights than allowed. For example, a regular user trying to become an admin. To find this vulnerability, you must first understand how various roles and objects in the application are connected. Secondly, you must clearly understand the access matrix implemented in the application.

  • Mass Assignment

    • Avoid providing convenient mass assignment functions (when assigning parameters in bulk).

  • Security Misconfiguration

    • What can you test here? First of all, unnecessary HTTP methods must be disabled on the server. Do not show any unnecessary user errors at all. Do not pass technical details of the error to the client. If your application uses Cross-Origin Resource Sharing (CORS), that is, if it allows another application from a different domain to access your application’s cookies, then these headers must be appropriately configured to avoid additional vulnerabilities. Any access to internal files must also be disabled.

  • Injections

    • In my opinion, modern frameworks, modern development methods, and architectural patterns block us from the most primitive SQL or XSS injections. For example, you can use the object-relational mapping model to avoid SQL injection. This does not mean that you need to forget about injections at all. Such problems are still possible throughout a huge number of old sites and systems. Besides XSS and SQL, you should look for XML injections, JSON injections, and so on.

  • Improper Assets Management

    • CI/CD pipelines have access to various secrets and confidential data, such as accounts used to sign the code. Ensure you do not leave hard-coded secrets in the code and don’t “commit” them to the repository, no matter whether it is public or private.

  • Insufficient Logging & Monitoring

    • The main idea here is that whatever happens to your application, you must be sure that you can track it. You should always have logs that show precisely what the attacker was trying to do. Also, have systems in place to identify suspicious traffic, and so on.Also we must check for secrets/credentials/confidential info in log. Have seen this in some cases where we log username/password of databases.

My Profile

My photo
can be reached at 09916017317