Hi Folks, I have a question regarding Multi-region DS topology design. This could be an interesting design question. I’m open and welcoming any inputs and advices.
We are setting up a new ForgeRock stack and it involves 4 Data Centers (let’s say Data Center A, B, C and D). A and B are both on East Coast and are close to each other. C and D are both on West Coast and are close to each other. East Coast and West Coast are two regions and hence, distant from each other. Now, within each Data Center, there will be one primary DS instance and one secondary DS instance for each DS profile (Config, CTS and User, separate servers). Take CTS as example, Each Data Center has 2 DS CTS instances (DS1 and DS2), and with 4 Data Centers, that would be 8 instances in total.
Then, we want to have replications set up for all DS instances, meaning put all 8 instances into one replication pool, so that any Data Center can be a backup when things went south. The question here is as we have a fair amount of replication connection (replication between any two DS instances), should we use Dedicated Standalone Replication Server to mitigate too many replication connection issue?
Assume we use dedicated standalone replication server, at least two instances are needed for high availability within each data center. In that case, should all DS profiles (Config, CTS and User) instances go through those two replication servers within its own data center for replication?
If we don’t use dedicated standalone replication server, meaning replication service is deployed along side each DS service on the same server, will performance become a concern. Furthermore, if we decide to add a third instance within each Data Center (DS3), will that replication pool with 12 instances become a problem? I guess, the general question is there a threshold number of instances in the DS pool when a dedicated standalone DS replication server should be used?
The target user number is 10 million. I appreciate any input on this. Thanks!
Thanks for reaching out to the Community!
Have you considered the use of Replication Groups in your environment? According to DS documentation:
Define replication groups so that replicas connect first to local replication servers, only going outside the group when no local replication servers are available. This limits the replication traffic over slow network links to messages between replication servers, except when all local replication servers are down.
Please review the following for further details:
For further reference, I’ve provided some further details on DS deployment patterns:
I hope this helps!
Thanks very much for your response. Yes, I did see the Replication Group documentation. If dedicated server is used for each DC, then DS instances in each DC will be set as a replication group so that it connect to local replication server first. This can probably be the case for when DS/RS are running together on the same server, meaning without using dedicated replication server, so that DS can connect to local group first too. However, I’m trying to debate, does it justify to actually add one more layer of dedicated replication server? If so, what are the catches. See if this makes sense.
I hate to give you the generic consultant answer for this, but it depends. Let’s take a step back and look at what you are actually replicating.
Config Store - first off, are you sure you are going to be replicating this in production? The model I am seeing most frequently now is to keep your configuration static in production, which means replication would not be needed. Assuming you are going to use replication, how frequently are you expecting changes to be made?
Core Token Service - does the business actually require session tokens to synchronize across data centers? What I typically observe is user traffic routed to a specific data center via load balancer (assuming hot/hot) based on geography or other conditions, with traffic remaining within that data center for a user’s session. Only in the even of the data center becoming unavailable would a user be routed to a different data center, and in those cases it can be viewed as acceptable to ask the user to log in again. Do you truly need CTS replication across data centers?
User Store - You have 10mm users, but what data is actually in your user store how frequently are you making adds/removes/updates? This information certainly should be replicated across data centers, and the transaction volume would be the key metric I’d look at before even considering a dedicated replication server.
Generally speaking, I tend to lean towards not introducing dedicated replication servers (even within data centers) and instead would look at utilizing them if we observe a need arise during performance testing, such as large replication delays. Even then I’d focus on other factors external to DS such as network latency or OS configuration before I look at a dedicated replication server. With this in mind, I can’t stress enough how important it is to approach performance testing with a clear definition of your service level objectives.
I know this doesn’t really give you a clear cut answer, but I hope it can help you establish some criteria that you can use to help inform your decision on whether or not to use dedicated replication servers.