Deploying ForgeRock Directory Services on a Kubernetes Multi-Cluster using Google Cloud Multi-cluster Services (MCS)

lee.baines · August 26, 2021, 4:00am

This article shows you how to deploy a multi-region ForgeRock Directory Services (DS) solution that spans two Google Kubernetes Engine (GKE) clusters from different regions using a multi-cluster solution (MCS).

This solution lets pods in one GKE cluster to discover pods in another GKE cluster, which simplifies configuration by automating creation of external DNS and firewall rules. Note: We will use the standard stateful applications (DS-CTS and DS-IDREPO) to deploy in each of the two GKE clusters, and scale them out and back using a native Cloud Console Kubernetes scaling approach.

Introduction

For DS replication to work properly, the following criteria must be met:

All servers in the topology must be able to connect to each other; their network must be routed.
FQDNs must be unique and resolvable by all servers.
The server ID assigned to each server in the topology must be unique.
The DS replication bootstrap server settings must include at least one server from each cluster in the topology.
The certificates used to establish server identities must be verifiable by using the same CA or by properly configuring the key stores.
The method described in this document explains how to the put configuration in place according to the requirements.

Prerequisites

Two GKE clusters running version 1.18.12+ with the following configuration:

Provisioned in the same VPC network
VPC native
The workload identity is enabled

Note: We used this version; the configuration might work on 1.17 or earlier.

The same namespace name is used on both GKE clusters (ex. multi-region) 1

Note: This restriction is imposed by the secret-agent solution used to retrieve DS certificates. This restriction may not apply to an alternative DS certificates storage/reconciliation solution.

Two+ nodes in each GKE cluster for tests to scale out/scale back.

Note: We used this tested configuration:

Node pool with 2 machines of e2-standard-8 type (8 vCPU, 32 GB memory)
Skaffold v1.19.0+
Google Cloud SDK v331.0.0
APIs required for MCS

gcloud services enable gkehub.googleapis.com --project <my-project-id>
gcloud services enable dns.googleapis.com --project <my-project-id>
gcloud services enable trafficdirector.googleapis.com --project <my-project-id>
gcloud services enable cloudresourcemanager.googleapis.com --project <my-project-id>

Limitations

Currently, MCS only configures a single DNS entry for a headless service which returns all pod IPs, so it is not possible to address pods individually, unless logic is added to DS to work with the returned pod IPs. This means that a Kubernetes service is required for each DS pod in each cluster. This works for a couple of pods, but would not work for large numbers of pods. This is on MCS' roadmap to address.

Enable MCS

Follow these steps:

Enable the MCS API:

gcloud services enable multiclusterservicediscovery.googleapis.com \
    --project <my-project-id>

Enable the MCS feature:

gcloud alpha container hub multi-cluster-services enable \
  --project <my-project-id>

Register your clusters to an environment. Please do not use any symbols in the membership name, just characters. These names are also required as part of the FQDN when configuring server identifiers.
```
gcgcloud container hub memberships register <membershipname> \
   --gke-cluster <zone>/<cluster-name> \
   --enable-workload-identity
```
Note: Choose a name to uniquely identify the cluster.

Verify MCS is enabled:
```
gcloud alpha container hub multi-cluster-services describe
```
Look for lifecycleState: ENABLED in the output.

Configure the secret agent parameters

If your DS installation is not using the secret-agent operator as a manager of certificates for server identity verification, as mentioned in number 5 in the Introduction, you can skip this step.

Configure access to Google Cloud Secret Manager

Follow the instructions to configure secret-agent to work with Workload Identity: (Instructions). This is required for both clusters to share the same secrets as required by DS.

Configure the secret agent properties in SAC

The multi-cluster-secrets/kustomization.yaml requires the following changes:

secretsManagerPrefix is changed to ensure uniqueness of stored secrets.
secretsManager is changed to GCP as a chosen Cloud Provider.
gcpProjectID is changed in order to be able to use Secret Manager API.

multi-cluster-secrets/kustomization.yaml (latest version):

resources:
  - ../../../base/secrets

patchesStrategicMerge:
  - |-
    #Patch the SAC
    apiVersion: secret-agent.secrets.forgerock.io/v1alpha1
    kind: SecretAgentConfiguration
    metadata:
      name: forgerock-sac
    spec:
      appConfig:
        secretsManagerPrefix: "multi-cluster"
        secretsManager: GCP # none, AWS, Azure, or GCP
        gcpProjectID: engineering-devops

Configure the ServiceExport objects

MCS requires a Kubernetes service that can be exposed externally to other clusters for multi-cluster communication. To expose the service, a ServiceExport object is required in each cluster. The metadata.name of the ServiceExport object must match the name of the service. For DS, we expose the DS headless service.

us-export.yaml (latest version):

kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
 namespace: prod
 name: ds-idrepo-us
---
kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
 namespace: prod
 name: ds-cts-us

The ServiceExport objects must be deployed first as they take approximately 5 minutes to sync to clusters registered in your environ.

In a US cluster:

kubectl create -f etc/multi-cluster/mcs/files/us-export.yaml

In an EU cluster:

kubectl create -f etc/multi-cluster/mcs/files/eu-export.yaml

Set up DS

Both DS-CTS and DS-IDREPO will be deployed on two clusters to simulate the ForgeRock stack.

This uses a ForgeOps configuration based on:

Kustomize: a standalone tool to customize Kubernetes objects through a kustomization.yml file.
Skaffold: a command-line tool that facilitates continuous development for Kubernetes applications, and handles the workflow for building, pushing, and deploying your application.

The examples show how to configure DS to be deployed on the US cluster. Apply a similar configuration for the other cluster.

Prepare Kustomize definitions

Make the DS server ID unique

To make the server ID of each pod in our topology unique, the DS service name must contain a cluster-specific suffix. This is done by adding the cluster suffix in the kustomization.yml file in each of the region’s Kustomize overlay folders, e.g., a kustomization.yml file (latest version):

patches:
     - target:
         kind: Service
         name: ds-cts
       patch: |-
         - op: replace
           path: /metadata/name
           value: ds-cts-us

Configure the cluster topology

For DS to configure the correct DS server identifiers, the following env vars must be configured. These settings will be used in the docker-entrypoint.sh to ensure the DS pods are unique across both clusters.

See customize/overlay/multi-cluster/mcs/<region>/kustomization.yaml:

              env:
              - name: DS_CLUSTER_TOPOLOGY
                value: "eu,us"
              - name: MCS_ENABLED
                value: "true"

DS_CLUSTER_TOPOLOGY must match the names given to the cluster membership name registered to the hub in section 1c in Register Clusters. This is because the membership name is used as part of the FQDN required to reference pods behind a headless service.

Using these values, DS can dynamically configure the DS_BOOTSTRAP_REPLICATION_SERVERS and the DS_ADVERTISED_LISTEN_ADDRESS vars, which results in the following FQDN:

HOSTNAME.MEMBERSHIP_NAME.SERVICE_NAME.NAMESPACE.svc.clusterset.local

Where:

HOSTNAME = pod hostname.
MEMBERSHIP_NAME = cluster membership name as configured in step 1c: Register Clusters.
SERVICE_NAME = DS service name.

An example FQDN for ds-idrepo-0 in the US cluster would look like:

ds-idrepo-0.us.ds-idrepo-us.prod.svc.clusterset.local

Prepare Skaffold profiles

Add following profile to the scaffold.yaml file. Repeat for the EU, swapping us for eu.

skaffold.yaml (latest version)

# Multi-cluster DS : US profile
- name: mcs-us
  build:
    artifacts:
    - *DS-CTS
    - *DS-IDREPO
    tagPolicy:
      sha256: { }
  deploy:
    kustomize:
      path: ./kustomize/overlay/multi-cluster/mcs/us

Deploy Skaffold profiles

Once the configuration for all clusters is in place, you can start the topology. Below is an example of a Skaffold command to run the preconfigured profile.

Deploy to the US cluster:

skaffold run --profile mcs-us

And for the EU cluster:

skaffold run --profile mcs-eu

Load tests

Addrate load test

Some basic load was added to a deployment consisting of three replicated servers: one in Europe, and two in the US clusters, just to make sure the setup did not have any major problems, independent of absolute numbers. The addrate load was tested on the server in Europe on CTS-like entries for 30 minutes. The screenshot (below) from Grafana shows the behavior of the two servers in the US:

Both US servers are closely following the client load demonstrated by the low replication delay. There are some outliers, but replication recovers easily.

Greater testing was carried out on the kube-dns solution and results were comparable. Please see that documentation for more in-depth test results on addrate and modrate.

Pricing

The only additional costs are CloudDNS costs for the dynamically generated DNS records.

Pros and cons

Pros	Cons
1. Native Kubernetes solution: only modifies K8S objects.	1. Specific configuration of server identifiers handled in docker-entrypoint.sh. Requires the correct values be set to work correctly.
2. Simple installation: automatic generation of DNS records and firewall rules.	2. MCS managed services generate healthchecks, which are based on the service endpoint, which requires a client secret. This currently fails, as the healthcheck is unconfigurable.
3. Scale out/scale back using Kubernetes: no additional administration.	3. We expose the whole DS service to each cluster, even though we only need to expose port 8989. This isn’t configurable.
4. No additional scripts required.
5. Supported by Google.
6. So far, tests are reassuring; replication latency is acceptable.