An Incremental Migration with IDM

patrick_diligent · August 26, 2021, 4:00am

Introduction

Traditionally, ForgeRock® Identity Management (IDM) upgrades are handled either in-place or by leveraging the migration service. Both approaches require removing the IDM instance from update traffic. This will be suitable if the migration can complete in a reasonable time, often overnight, when load is at its lowest. But, there will be a point where a large enough dataset will incur a migration time which exceeds the organization's expectation. Running a migration while updates are still applied will not help. Firstly, the resulting data in the new instance will likely be inconsistent. Secondly, running a second migration to include the missed updates will last the same time, and will never converge, since it will scan over the same source dataset again.

This article proposes a solution whereby successive incremental migrations are performed under live traffic, each time reducing the source dataset, until the last migration—off update traffic—can be completed within an acceptable window. It is based on updating a timestamp column for each database update, in which the timestamp is then used to craft a source query to scope down the source dataset to an acceptable time window for the last migration with downtime.

There is a catch, though. All DELETE requests will be missed, so this method is applicable if you can ensure there won’t be any during the migration. If not, then it is still possible to recover the missed deletions, and this is explained at the end of the article.

This is an alternative to the method exposed in this article (LINK) IDM: Zero Downtime upgrade strategy using a blue/green deployment.

Prerequisites

The instance to be migrated is an install of IDM 5.5, and use Postgresql 9.6 for the repository, configured with the generic tables. However, you should be able to adapt the steps described herein to other supported database vendors, and previous IDM releases.

The second instance is an IDM 6.5 install with an empty repository, from which the migration service will be launched.

Updating the database schema on 5.5

Apply the following change to the repository schema:

ALTER TABLE  openidm.managedobjects
ADD COLUMN IF NOT EXISTS updated_at TIMESTAMPTZ;

ALTER TABLE  openidm.relationships
ADD COLUMN IF NOT EXISTS updated_at TIMESTAMPTZ;

CREATE INDEX objects_updated_at_idx ON openidm.managedobjects (updated_at DESC NULLS LAST);
CREATE INDEX rels_updated_at_idx ON openidm.relationships (updated_at DESC NULLS LAST);

DROP FUNCTION IF EXISTS trigger_set_timestamp() CASCADE;

CREATE OR REPLACE FUNCTION trigger_set_timestamp()
RETURNS TRIGGER AS $
BEGIN
 NEW.updated_at = NOW();
 RETURN NEW;
END;
$ LANGUAGE plpgsql;

CREATE TRIGGER set_timestamp_managedobject
BEFORE UPDATE ON openidm.managedobjects
FOR EACH ROW
EXECUTE PROCEDURE trigger_set_timestamp();

CREATE TRIGGER set_timestamp_relationship
BEFORE UPDATE ON openidm.relationships
FOR EACH ROW
EXECUTE PROCEDURE trigger_set_timestamp();

First, a column ‘updated_at’ is added to both the managedobjects and relationships tables. An index to the column is created for both tables in order to speed up queries.

Secondly, a trigger is attached to each table, which executes the trigger_set_timestamp() function. This sets the timestamp to the current date and time when a record is updated.

Let’s also add a new repository query named find-updated-at, in repo.jdbc.json:

{
   "dbType" : "POSTGRESQL",
   "useDataSource" : "default",
   "maxBatchSize" : 100,
   "maxTxRetry" : 5,
   "queries" : {
       "genericTables" : {
           "credential-query" : [...],
           "find-updated-at" : "SELECT fullobject::text FROM ${_dbSchema}.${_mainTable} obj INNER JOIN ${_dbSchema}.objecttypes objtype ON objtype.id = obj.objecttypes_id WHERE objtype.objecttype = ${_resource} AND updated_at >= ${at}::timestamp"
}

Let’s check it out:

Update a managed user and invoke this REST call (using a date and time that is just before the update):

$ curl -u openidm-admin:password http://openidm55.example.com:8080/openidm/repo/managed/user/?_queryId=find-updated-at&at=2020-01-21 05:51:00

This should return a single entry.

Launching the first migration

The IDM 5.5 instance is still receiving updates, and contrary to what is described in the Installation Guide for 6.5, we are not cutting the instance from write traffic. However, it is highly advisable to test this first in a development environment to ensure the instance in production can sustain an additional load without incurring a performance drop. It is also best to optimize the IDM 5.5 instance to ensure optimum performances so that the migration duration is at its lowest possible.

Note the time as a reference for the next step, and start the migration from the 6.5 instance:

$ curl -u openidm-admin:password --request POST \
  -u 'openidm:password' \
  --header 'content-type: application/json'
'http://openidm65.example.com:8080/openidm/migration?_action=migrate’

And let it finish, however many days it may take.

Perform the next incremental migration

Any update that was performed during the migration will set the timestamp accordingly, and we should be able to query the affected entries with the ‘find-updated-at’ query using the time recorded just before launching the migration. Therefore, we should be able to provide a ‘sourceQuery’ in the repo managed object and relationships mappings. However, due to this issue, it is not possible to fine tune the mappings in the migration configuration, so instead, the workaround is to use the reconciliation service. Note that the same workaround can be used to perform the initial migration in the previous step, optimized as described in the Integrator’s Guide.

To achieve this, let’s modify the IDM 6.5 configuration, in order to launch the migration via the reconciliation service, rather than with the migration service. For this, we query the sync mappings used by the migration service:

curl --request POST \
  -u openidm-admin:password \
  -H 'content-type: application/json'
'http://openidm65.example.com:8080/openidm/migration?_action=mappingConfigurations' | jq .

Which returns an array of mappings:

[
  {
    "name": "repoInternalRole_repoInternalRole",
    "source": "external/migration/repo/internal/role",
    "target": "repo/internal/role",
    "runTargetPhase": false,
    "reconSourceQueryPaging": false,
    "reconSourceQueryPageSize": 1000,
    "allowEmptySourceSet": true,
    "enableLinking": false,
    "properties": [],
    "policies": [],
    "onCreate": {
      "type": "groovy",
[...]

Paste this response into sync.json: in the “mappings” value (array):

{
 "mappings" :  [
   {
    "name": "repoInternalRole_repoInternalRole",
    "source": "external/migration/repo/internal/role",
    "target": "repo/internal/role",
    "runTargetPhase": false,
    "reconSourceQueryPaging": false,
    "reconSourceQueryPageSize": 1000,
    "allowEmptySourceSet": true,
    "enableLinking": false,
    "properties": [],
    "policies": [],
    "onCreate": {
      "type": "groovy",
[...]
  }
 ]
}

Also, to prevent a Groovy runtime error, update the script:

bin/defaults/script/update/mapLegacyObject.groovy

Comment the line : `def remoteVersion = remoteVersion as Version'. This should look like this:

[...]
import org.forgerock.json.JsonValue

def sourceObject = source as JsonValue;
def targetObject = target as JsonValue;
def mappingTarget = mappingConfig.target.getObject() as String
#def remoteVersion = remoteVersion as Version

We are now ready to perform an incremental migration:

$ curl --request POST \
  -u openidm-admin:password \
  -H 'content-type: application/json' \
  --data '{
	"sourceQuery": {
		"_queryId" : "find-updated-at",
		"at" : "2020-01-20 05:43:07"
	}
}' 'http://openidm65.example.com:8080/openidm/recon?_action=recon&mapping=repoManagedUser_repoManagedUser’

This should migrate only the records that were updated during the previous migration. Repeat the process for the relationships mapping, and for all other custom objects if any.

Of course, updates will be performed during this new migration, but because only those records that were modified are migrated. This second migration should take less time, and therefore, less updates should occur. So, this process is repeated again and again, each time noting the time when the migration is launched, and using it to specify the next timeframe for which the new source data set is queried.

Once the delta is small enough, the 5.5 instance should be removed from traffic. Perform the last incremental update with minimal downtime, then switch over network traffic to the new migrated instance.

What about DELETE?

Unless you are a powerful wizard, placing a timestamp on something that does not exist anymore is rather impossible. At the end of the migration, the 6.5 instance may end up with entries that have been deleted on the 5.5 instance. One way to remediate this would be to perform a reconciliation with a target phase. But, this entails scanning the entire dataset again, which is not a viable solution. The other solution is to inspect the access log:

$ cat access.audit.json | jq 'select(.request.operation == "DELETE" ) | .http.request.path'
"http://openidm.example.com:8080/openidm/managed/device/5a0a5af8-651d-4512-af82-23618f2c8a61"
"http://openidm.example.com:8080/openidm/managed/user/615b4d19-39be-4941-8050-885c1c8c116b/device/bbcc5f13-34ad-40e5-ada1-6b77cfc42909"
"http://localhost:8080/openidm/managed/user/57a4be19-51af-4cbf-af13-2589f35e9f85/devices/47ee3600-0b24-424e-b91a-eb544fc0a9e2"
"http://localhost:8080/openidm/managed/device/28c529ea-6eff-4ef2-a1df-785b38dff59a"
"http://localhost:8080/openidm/managed/user/b8e770b3-7f6e-4a60-b290-60118cb6439a"

In this extract, we can see that two ‘user’ entries have been deleted, as well as two ‘device’ entries, and one relationship. These can be simply replayed on the IDM 6.5 instance, after each incremental migration.

Conclusion

This solution requires minimal changes to the system in production, and with minimum impact. It is, however, highly recommended to rehearse the entire process in a test environment before applying to the production system.

The underlying idea behind this paper is reducing downtime by reducing datasets so that they can fit in non peak hours. One other area of research would consider reducing the migrated dataset by partitioning the data—turn traffic off for the first partition, migrate it, then turn over traffic to the new instance for this partition. Then, move on to the next partition the next day. This is more involved though; first, it assumes the data can be partitioned, and it assumes that the migrated instance (6.5) is already considered production. This means, a higher level of tests have been accomplished to master the process under production. And of course, ForgeRock® Identity Gateway (IG) would be the perfect candidate to operate the partial switches.

In the IDM: Zero Downtime upgrade strategy using a blue/green deployment post, we described a methodology that fulfils a zero downtime migration, but it is more involved, and has an impact on the production system due to the additional implicit syncs (unless the systems are boosted to support the additional load). With the incremental approach, production will not be so much impacted, only by the additional requests coming from the second instance. And subsequent migrations should become lighter and lighter, until the last one, with downtime that should stay within a handful of hours.

These articles are a premise to show that there are many ways of accomplishing a migration, with or without the migration service, or a mixture, or none of those at all.

You could also imagine a migration methodology that does not involve the migration or reconciliation service, by having an external client application retrieving entries from the old instance and provisioning them to the second instance, while at the application level, all updates are systematically replayed to the second instance (IG would be a good candidate for this). In fact such an approach can end up more performant. And a migration can also operate at the database level, either transforming the data in place while the instance is taken offline, or live with the appropriate triggers. Whether this can be done or not depends on the nature of the schema, data content changes, and DB vendor (merely: whether it has JSON support). Some of those methods will result in an exact copy of the original data (with transformations)—what is meant here—same object ids. But some other methods won’t.

The last approach that has not yet been listed, is to use an intermediate storage, such as ForgeRock® Directory Server, to move data over. The LDAP connector is given for free, with LiveSync capabilities. Modelling relationships is perhaps more challenging, so your model should be a fit; however, part of the extra computing will be offloaded to the target instance.

So, choosing the best approach depends merely on your requirements and the nature of the data, the provisioning dataflows....and of course your stock of creativity!