Immutable Deployment Pattern for ForgeRock Access Management (AM) Configuration without File Based Configuration (FBC)

shokard · September 10, 2019, 4:00am

Introduction

The standard Production Grade deployment pattern for ForgeRock AM is to use replicated sets of Configuration Directory Server instances to store all of AM’s configuration. The deployment pattern has worked well in the past, but is less suited to the immutable, DevOps enabled environments of today.

This blog presents an alternative view of how an immutable deployment pattern could be applied to AM in lieu of the upcoming full File Based Configuration (FBC) for AM in version 7.0 of the ForgeRock Platform. This pattern could also support easier transition to FBC.

Current Common Deployment Pattern

Currently most customers deploy AM with externalised Configuration, Core Token Service (CTS) and UserStore instances.

The following diagram illustrates such a topology spread over two sites; the focus is on the DS Config Stores hence the CTS and DS Userstore connections and replication topology have been simplified . Note this blog is still applicable to deployments which are single site.

Dual site AM deployment pattern. Focus is on the DS Configuration stores

In this topology AM uses connection strings to the DS Config stores to enable an all active Config store architecture, with each AM targeting one DS Config store as primary and the second as failover per site. Note in this model there is no cross site failover for AM to Config stores connections (possible but discouraged). The DS Config stores do communicate across site for replication to create a full mesh as do the User and CTS stores.

A slight divergence from this model and one applicable to cloud environments is to use a load balancer between AM and it’s DS Config Stores, however we have observed many customers experience problems with features such as Persistent Searches failing due to dropped connections. Hence, where possible Consulting Services recommends the use of AM Connection Strings.

It should be noted that the use of AM Connection Strings specific to each AM can only be used if each AM has a unique FQDN — for example: https://openam1.example.com:8443/openam, https://openam2.example.com:8443/openam and so on.

For more on AM Connection Strings click here

Problem Statement

This model has worked well in the past; the DS Config stores contain all the stuff AM needs to boot and operate plus a handful of runtime entries.

However, times are a changing!

The advent of Open Banking introduces potentially hundreds of thousands of OAuth2 clients, AM policies entry numbers are ever increasing and with UMA thrown in for good measure; the previously small, minimal footprint are fairly static DS Config Stores are suddenly much more dynamic and contains many thousands of entries. Managing the stuff AM needs to boot and operate and all this runtime data suddenly becomes much more complex.

TADA! Roll up the new DS App and Policy Stores. These new data stores address this by allowing separation from this stuff AM needs to boot and operate from long lived environment specifics data such as policies, OAuth2 clients, SAML entities etc. Nice!

However, one problem still remains; it is still difficult to do stack by stack deployments, blue/green type deployments, rolling deployments and/or support immutable style deployments as DS Config Store replication is in place and needs to be very carefully managed during deployment scenarios.

Some common issues:

Making a change to one AM can quite easily have a ripple effect through DS replication, which impacts and/or impairs the other AM nodes both within the same site or remote. This behaviour can make customers more hesitant to introduce patches, config or code changes.
In a dual site environment the typical deployment pattern is to stop cross site replication, force traffic to site B, disable site A, upgrade site A, test it in isolation, force traffic back to the newly deployed site A, ensure production is functional, disable traffic to site B, push replication from site A to site B and re-enable replication, upgrade site B before finally returning to normal service.
Complexity is further increased if App and Policy stores are not in use as the in service DS Config stores may have new OAuth2 clients, UMA data etc created during transition which needs to be preserved. So in the above scenario an LDIF export of site B’s DS Config Stores for such data needs to be taken and imported in site A prior to site A going live (to catch changes while site A deployed was in progress) and after site B is disabled another LDIF export needs to taken from B and imported into A to catch any last minute changes between the first LDIF export and the switch over. Sheesh!
Even in a single site deployment model managing replication as well as managing the AM upgrade/deployment itself introduces risk and several potential break points.

New Deployment Model

The real enabler for a new deployment model for AM is the introduction of App and Policy stores, which will be replicated across sites. They enable full separation from the stuff AM needs to boot and run, from environmental runtime data. In such a model the DS Config stores return to a minimal footprint, containing only AM boot data with the App and Policy Stores containing the long lived environmental runtime data which is typically subject to zero loss SLAs and long term preservation.

Another enabler is a different configuration pattern for AM, where each AM effectively has the same FQDN and serverId allowing AM to be built once and then cloned into an image to allow rapid expansion and contraction of the AM farm without having to interact with the DS Config Store to add/delete new instances or go through the build process again and again.

Finally the last key component to this model is Affinity Based Load Balancing for the Userstore, CTS, App and Policy stores to both simplify the configuration and enable an all-active datastore architecture immune to data misses as a result of replication delay and is central to this new model.

Affinity is a unique feature of the ForgeRock platform and is used extensively by many customers. For more on Affinity click here.

The proposed topology below illustrates this new deployment model and is applicable to both active-active deployments and active-standby. Note cross site replication for the User, App and CTS stores is depicted, but for global/isolated deployments may well not be required.

Localised DS Config Store for each AM with replication disabled

As the DS Config store footprint will be minimal, to enable immutable configuration and massively simplify step-by-step/blue green/rolling deployments the proposal is to move the DS Config Stores local to AM with each AM built with exactly the same FQDN and serverId. Each local DS Config Store lives in isolation and replication is not enabled between these stores.

In order to provision each DS Config Store in lieu of replication, either the same build script can be executed on each host or a quicker and more optimised approach would be to build one AM-DS Config store instance/Pod in full, clone it and deploy the complete image to deploy a new AM-DS instance. The latter approach removes the need to interact with Amster to build additional instances and for example Git to pull configuration artefacts. With this model any new configuration changes require a new package/docker image/AMI, etc, i.e. an immutable build.

At boot time AM uses its local address to connect to its DS Config Store and Affinity to connect to the user Store, CTS and the App/Policy stores.

Note — the use of “Embedded” DS Config stores, where a DS instance is deployed in the same JVM as AM is not recommended.

Advantages of this model:

As the DS Config Stores are not replicated most AM configuration and code level changes can be implemented or rolled back (using a new image or similar) without impacting any of the other AM instances and without the complexity of managing replication. Blue/green, rolling and stack by stack deployments and upgrades are massively simplified as is rollback.
Enables simplified expansion and contraction of the AM pool especially if an image/clone of a full AM instance and associated DS Config instance is used. This cloning approach also protects against configuration changes in Git or other code repositories inadvertently rippling to new AM instances; the same code and configuration base is deployment everywhere.
Promotes the cattle vs pet paradigm, for any new configuration deploy a new image/package.
This approach does not require any additional instances; the existing DS Config Stores are repurposed as App/Policy stores and the DS Config Stores are hosted locally to AM (or in a small Container in the same Pod as AM).
The existing DS Config Store can be quickly repurposed as App/Policy Stores no new instances or data level deployment steps are required other than tuning up the JVM and potentially uprating storage; enabling rapid switching from DS Config to App/Policy Stores
Enabler for FBC; when FBC becomes available the local DS Config stores are simply stopped in favour of FBC. Also if transition to FBC becomes problematic, rollback is easy — fire up the local DS Config stores and revert back.

Disadvantages of this model:

No DS Config Store failover; if the local DS Config Store fails the AM connected to it would also fail and not recover. However, this fits well with the pets vs cattle paradigm; if a local component fails, kill the whole instance and instantiate a new one.
Any log systems which have logic based on individual FQDNs for AM (Splunk, etc) would need their configuration to be modified to take into account each AM now has the same FQDN.
This deployment pattern is only suitable for customers who have mature DevOps processes. The expectation is no changes are made in production, instead a new release/build is produced and promoted to production. If for example a customer makes changes via REST or the UI directly then these changes will not be replicated to all other AM instances in the cluster, which would severely impair performance and stability.

Conclusions

This suggested model would significantly improve a customer’s ability to take on new configuration/code changes and potentially rollback without impacting other AM servers in the pool, makes effective use of the App/Policy stores without additional kit, allows easy transition to FBC and enables DevOps style deployments.