Problem statement
Most database technologies (Cloud DB as a Service offerings, traditional DBs, LDAP services, etc.) typically run in a single primary mode, with multiple secondary nodes to ensure high availability. The main rationale is it’s the only surefire way to ensure data consistency, and integrity is maintained.
If an active topology was enabled, replication delay (the amount of time it takes for a data WRITE operation on one node to propagate to a peer) may cause the following to occur:
- The client executes a WRITE operation on node 1.
- The client then executes a READ operation soon after the WRITE.
- In an all active topology this READ operation may target node 2, but because the data has not yet been replicated (due to load, network latency, etc) you get a data miss and application level chaos ensues.
Another scenario is lock counters:
- User A is an avid Manchester UTD football fan (alas more of a curse than a blessing nowadays!) and is keen to watch the game. In haste, they try to login but supply an incorrect password. The lock counter increments by +1 on User A’s profile on node 1. Counter moves from 0 to 1.
- User A, desperate to catch the game then quickly tries to login again, but again supplies an incorrect password.
- This time, if replication is not quick enough, node 2 may be targeted and thus the lock counter moves from 0 to 1 instead of from 1 to 2. Fail!!!
These scenarios and others like it mandate a single primary topology, which for high load environments, results in high cost, as the primary needs to be vertically scaled to handle all of the load (plus headroom) and wasted compute resource as the secondaries (same spec as the primary) are sat idle costing $$$ for no gain.
Tada — Roll up Affinity Based Load Balancing
ForgeRock Directory Services (DS) is the high performance, high scale, LDAP-based persistent layer product within the ForgeRock Identity Platform. Any DS instance can take both WRITE and READ operations at scale; for many customers enabling an all active infrastructure without Affinity Based Load Balancing is viable.
However, for high scale customers and/or those who need to guarantee absolute consistency of data, then Affinity Based Load Balancing, a technology unique to the ForgeRock Identity Platform is the key to enabling an all active persistence layer. Nice!
Affinity what now?
It is a load balancing algorithm built into the DS SDK which is part of both the ForgeRock Directory Proxy product and the ForgeRock Access Management (AM) product.
It works like this:
For each and every inbound LDAP request which contains a distinguished name (DN) like uid=Darinder,ou=People,o=ForgeRock, the SDK takes a hash and allocates the result to a specific DS instance. In the case of AM, all servers in the pool compute the same hash, and thus send all READ/MODIFY/DELETE requests for uid=Darinder to say DS Node 1 (origin node).
A request with a different DN (e.g. uid=Ronaldo,ou=People,o=ForgeRock) is again hashed but may be sent to DS Node 2; all READ/MODIFY/DELETE operations for uid=Ronaldo target this specific origin node and so on. This means all READ/MODIFY/DELETE operations for a specific DN always target the same DS instance, thus eliminating issues caused by replication delay and solving the scenarios (and others) described in the Problem statement above. Sweet!
The following topology depicts this architecture:
All requests from AM1 and AM2 for uid=Darinder target DS Node 1. All requests from AM1 and AM2 for uid=Ronaldo target DS Node 2.
What else does this trick DS SDK do then?
Well… The SDK also makes sure ADD requests are spread evenly across all DS nodes in the pool to not overloaded one DS node while the others remain idle.
Also for a bit of icing on top, the SDK is instance aware if the origin DS node becomes unavailable (in our example, say DS Node 1 for uid=Darinder), the SDK detects this and re-routes all requests for uid=Darinder to another DS node in the pool, and then (here’s the cherry) ensures all further requests remains sticky to this new DS node (it becomes the new origin node). Assuming data has been replicated in time; there will be no functional impact.
Oh, and when the original DS node comes back online, all requests fail back for any DNs where it was the origin server (so, in our case, uid=Darinder would flip back to DS Node 1). Booom!
Which components of the ForgeRock Platform support Affinity Based Load Balancing?
- Directory Proxy
- ForgeRock AM’s DS Core Token Service (CTS)
- ForgeRock AM’s DS User / Identity Store
- ForgeRock AM’s App and Policy Stores
Note: the AM Configuration Store does not support affinity but this is intentional as AM configuration will soon move to file-based configuration (FBC) and in the interim customers can look to deploy like this.
What are the advantages of Affinity?
- As the title says, Affinity Based Load Balancing enables an active persistent storage layer
- Instead of having a single massively vertically scaled primary DS instance, DS can be horizontally scaled so all nodes are primary to increase throughput and maximise compute resource.
- As the topology is all active, smaller (read: cheaper) instances can be used; thus, significantly reducing costs, especially in a Cloud environment.
- Eliminates functional, data integrity, and data consistency issues causes by replication delay.
More Innnnput!
To learn more about how to configure ForgeRock AM for Affinity Based Load Balancing check out this.