This is a summary of the University of Washington's experience migrating from an Grouper 1.6 to Grouper 2.1.2. As you know the principal difficulty in this upgrade is the database conversion.
Our group service (GWS), a RESTful API using Grouper's API as a back-end, is a core service at the UW. It is involved in a great many computer-computer and user-computer interactions---all day and night. We se about 200,000 - 300,000 queries per day, mid-quarter. More than that around the quarter breaks. Even during the night there are no hours with fewer than several thousand hits. Maybe 5% of these are updates.
So a first requirement was that this service is not to be interrupted.
Our DBAs have been after us for some time to move to postgres version 9. It has some abilities that really benefit their operations.
Our sysadmins have been after us for some time to move our database off old hardward and onto new, virtual systems.
So this migration seemed like a good opportunity to accomplish both those tasks as well.
Updating our own code to work with the 2.1.2 API was not difficult, as the new API is mostly compatible with 1.6. Mostly we had to deal with the new requirement that subject lookups need a session. There is supposedly a setting, "subjects.startRootSessionIfOneIsntStarted=true", that defeats the new rule, but I found it not effective. Since that setting seemed like a temporary shortcut anyway, we fixed our code to always have sessions.
Couple of minor items:
This was the bigger problem. Our tests found a couple of problems with the upgrade procedures:
For some time I've imagined clustering independent group systems, using messaging to keep them all in agreement. Because our GWS service is RESTful the only things that need to be sent from one system to the other are resource representations. There are some difficulties to be solved before such a cluster could be put into general service, but even without those this method can provide an easy upgrade path:
Now we have two independent, mutually up-to-date GWS systems. The production GWS web service is itself a cluster, so each member can be dropped from the cluster, upgraded and connected to gws2, and put back into the cluster. During this migration some users will be on gws1 and some on gws2. When it's completed all our users will be on the upgraded service (gws2), without ever knowing they had moved.
Our upgrade pretty much went as outlined in the previous step. At the end we took the old system (gws1) out of service and retired its hardware.
The cluster approach shows promise as a means to a very scalable and resilient groups service. We'll be looking into this.