Originally posted on https://email@example.com
I’m often asked “what’s the best way to performance test”. This blog presents guidance and best practices based on my experiences for effective performance testing.
While performance tuning can at times be just as much an art form as a technical undertaking, performance testing should be a data driven, precise and almost scientific exercise. In school I’m sure many were taught about independent variables (thing you change, in our case inbound load), control variables (thing(s) you keep the same, in our case the system infrastructure and configuration) and dependent variables (thing(s) being tested, in our case throughput/resource consumption/response times).
The same principles apply to a technical IT system or platform, as illustrated below:
The primary purpose of performance testing, is to provide an opportunity in a non-business critical environment to understand system/platform characteristics as they are subjected to inbound load and iteratively optimise configuration to ensure the system will perform as expected in a live Production running setting.
The key point — Performance testing is a critical exercise which must be completed before a Go Live event. The disaster scenario is not executing performance testing and then having to fire-fight in a Production setting, resulting in impact to the end-user experience, company reputation or worse. So make sure a system is comprehensively performance tested :)
The best performance test scenarios exercise the system as it will be used in a Production setting, using real world load metrics, against customer defined SLAs. The last point is important; performance testing should focus on the system being “performant enough plus headroom” against customer defined SLAs; it is not an exercise to see “how fast it can go”.
Performance testing must be performed in “Production like” environment; i.e. an environment with exactly the same instance numbers, network topology, compute resource, network resource, application configuration, et al as Production. In the case of ForgeRock Identity Cloud, the ForgeRock SaaS platform, three fully isolated environments are provisioned for every customer: Development, Staging and Production (and soon UAT). Full isolation means there is no “noisy neighbour” syndrome where a customer experiencing a busy period impacts others as no resources are shared between environments and each customer has its own dedicated tenant.
The ForgeRock Identity Cloud Staging environment is a Production mirror in terms of sizing and resource allocation, so fulfils our “Production like” requirement above . For more on the ForgeRock load testing policy check out this link.
The following summarises some of the best practices for effective performance testing:
- Follow the science lesson — Ensure the independent variables (i.e. the test plan and load profiles) are carefully planned, the control variables (i.e. the system/platform) are kept the same, and the dependent variables (system characteristics) are effectively measured.
- Establish clearly defined customer SLAs — Have a clearly defined set of SLAs which map directly to a customer’s business requirements. Each test scenario should include requests per minute, responses per minute, average response times in ms and low/normal/high load profiles for each scenario. The high load profile should map to key customer events like Black Friday for retail customers, key sporting events for digital media customers, etc.
- Test against these SLAs — Test only against these customer defined SLAs; the purpose is not to see how “fast” the system can go, rather is the system performant enough to meet the customer’s current and projected SLA requirements.
- Monitor the complete system — Ensure all components are monitored and timed and make this as granular as possible. This can help isolate any performance bottlenecks. From a ForgeRock perspective, this should also include monitoring components which are outside of the platform, for example external APIs the system calls out to. A nice feature of the ForgeRock Orchestration Journeys is the ability to inject zero code Timer Nodes into the login experience to measure time intervals between each step of the login flow to isolate bottlenecks.
- Test using a Production like profile — Test using a “blended” rather than “isolated” profile, meaning executing multiple test scenarios in parallel (e.g. registration, login, forgotten credential, etc) over just registration tests in isolation.
- Measure effectively to remove anomalies — It’s important to measure min, max, average and percentile based results over flat results to avoid anomalies such as a handful of long request/responses from skewing test results.
As discussed, only a customer can define the SLAs and the test scenarios based on their usage pattern. From a ForgeRock perspective the following are some functional areas which should always be exercised against each load profile:
There you have it, a condensed view on the key fundamentals of performance testing, where it should be executed, along with some general and ForgeRock specific best practices. For more on sample ForgeRock Gatling test scenarios check out this repo, which can be used as a starting point.
Thanks for reading.