XML Import / Export for Grouper

As of v1.5.0, not all new features in 1.5 are exported/imported with this tool.  This will be addressed shortly.  Examples are membership enabled/disabled dates, the attribute framework, group aliases, permission management, etc.

User auditing is exported/imported to a separate file.  You can do this in one command.  e.g. to export the user audits and the registry as GrouperSystem:

gsh -xmlexport -userAuditFilename f:/temp/prodAudit.xml GrouperSystem f:/temp/prod.xml
e.g. to import the user audits and registry as grouper system

gsh -xmlimport -userAuditFilename f:/temp/prodAudit.xml GrouperSystem f:/temp/prod.xml

As of v1.4.0 the invocation of these tools has moved from Ant to gsh (GrouperShell):

Grouper v1.2.0+ includes XML import / export tools. Exported XML may be used for:

Imported XML may be used for:

In general, exported data can be imported into the same Grouper instance it was exported from**, or a different instance. Stems and Groups and Group Types will be created, if not already present, or updated if they already exist (depending on import options provided).

The XML formats for import and export are very similar, however, there are some differences.

    The export format:

    while the import format:

Any tool which can create XML, in the correct format, can be used as a loader.

*To successfully load Subject data, the new Grouper instance must be configured with the same Subject Sources. The export tool does not export Subject registries. Subjects which cannot be resolved will be logged, but otherwise ignored.

 **The initial version of the import tool did not maintain system attributes i.e. uuid, date created etc. Since v1.3.0 system attributes are maintained by default, which is the desired behavior if migrating a registry, however, this can cause a problem if you want to copy part of the registry by exporting it and importing it into a new stem because the uuids of imported groups and stems already exist. v1.4.0 introduces a new command line argument -ignoreInternal (see below) which ensures that uuids and other internal attributes are ignored.

Usage

Usage is similar though simpler than before:
Export:

C:\mchyzer\grouper\trunk\grouper\bin>gsh -xmlexport
Using GROUPER_HOME: C:\mchyzer\grouper\trunk\grouper\bin\..
Using GROUPER_CONF: C:\mchyzer\grouper\trunk\grouper\bin\../conf
Using JAVA: java
using MEMORY: 64m-512m
Usage:
args: -h, Prints this message
args:
[-noprompt] filename
e.g. gsh -xmlexport f:/temp/prod.xml

-includeComments, Put comments about foreign keys in XML
-noprompt, Do not prompt user to confirm the database that
will be updated
filename, The file to import


C:\mchyzer\grouper\trunk\grouper\bin>gsh -xmlexport whatever.xml
Using GROUPER_HOME: C:\mchyzer\grouper\trunk\grouper\bin\..
Using GROUPER_CONF: C:\mchyzer\grouper\trunk\grouper\bin\../conf
Using JAVA: java
using MEMORY: 64m-512m
This db user 'grouper' and url 'jdbc:mysql://localhost:3306/grouper' are allowed to be changed in the grouper.properties
Continuing...
Grouper starting up: version: 1.6.0, build date: 2010/02/09 02:24:03, env: <no label configured>
grouper.properties read from: C:\mchyzer\grouper\trunk\grouper\conf\grouper.properties
Grouper current directory is: C:\mchyzer\grouper\trunk\grouper\bin
log4j.properties read from: C:\mchyzer\grouper\trunk\grouper\conf\log4j.properties
Grouper is logging to file: C:\mchyzer\grouper\trunk\grouper\bin\..\logs\grouper_error.log, at min level WARN for package: edu.internet2.middleware.grouper, based on log4j.properties
grouper.hibernate.properties: C:\mchyzer\grouper\trunk\grouper\conf\grouper.hibernate.properties
grouper.hibernate.properties: grouper@jdbc:mysql://localhost:3306/grouper
sources.xml read from: C:\mchyzer\grouper\trunk\grouper\conf\sources.xml
sources.xml groupersource id: g:gsa
sources.xml jdbc source id: jdbc: GrouperJdbcConnectionProvider
Starting: 163 records in the DB to be exported
DONE: 02:32:54: exported 163 records to: C:\mchyzer\grouper\trunk\grouper\bin\whatever.xml
C:\mchyzer\grouper\trunk\grouper\bin>
Import

C:\mchyzer\grouper\trunk\grouper\bin>gsh -xmlimport
Using GROUPER_HOME: C:\mchyzer\grouper\trunk\grouper\bin\..
Using GROUPER_CONF: C:\mchyzer\grouper\trunk\grouper\bin\../conf
Using JAVA: java
using MEMORY: 64m-512m
Usage:
args: -h, Prints this message
args:
[-recordReport]
[-noprompt] filename
e.g. gsh -xmlimport f:/temp/prod.xml

-recordReport, Print a file which lists each insert/update
In addition to import
-noprompt, Do not prompt user to confirm the database that
will be updated
filename, The file to import


C:\mchyzer\grouper\trunk\grouper\bin>gsh -xmlimport whatever.xml -recordReport
Using GROUPER_HOME: C:\mchyzer\grouper\trunk\grouper\bin\..
Using GROUPER_CONF: C:\mchyzer\grouper\trunk\grouper\bin\../conf
Using JAVA: java
using MEMORY: 64m-512m
This db user 'grouper' and url 'jdbc:mysql://localhost:3306/grouper' are allowed to be changed in the grouper.properties
Continuing...
Grouper starting up: version: 1.6.0, build date: 2010/02/09 02:24:03, env: <no label configured>
grouper.properties read from: C:\mchyzer\grouper\trunk\grouper\conf\grouper.properties
Grouper current directory is: C:\mchyzer\grouper\trunk\grouper\bin
log4j.properties read from: C:\mchyzer\grouper\trunk\grouper\conf\log4j.properties
Grouper is logging to file: C:\mchyzer\grouper\trunk\grouper\bin\..\logs\grouper_error.log, at min level WARN for package: edu.internet2.middleware.grouper, based on log4j.properties
grouper.hibernate.properties: C:\mchyzer\grouper\trunk\grouper\conf\grouper.hibernate.properties
grouper.hibernate.properties: grouper@jdbc:mysql://localhost:3306/grouper
sources.xml read from: C:\mchyzer\grouper\trunk\grouper\conf\sources.xml
sources.xml groupersource id: g:gsa
sources.xml jdbc source id: jdbc: GrouperJdbcConnectionProvider
grouper import: reading document: C:\mchyzer\grouper\trunk\grouper\bin\whatever.xml, version: 1.6.0
XML file contains 163 records
02:34:58: Beginning import: database contains 155 records
Ending import: processed 163 records
Ending import: database contains 163 records
Ending import: 8 inserts, 1 updates, and 154 skipped records
DONE: 02:34:59: imported 163 records from: C:\mchyzer\grouper\trunk\grouper\bin\whatever.xml
Wrote record report log to: C:\mchyzer\grouper\trunk\grouper\bin\grouperImportRecordReport_2010_02_09__02_34_58_685.txt

C:\mchyzer\grouper\trunk\grouper\bin>more C:\mchyzer\grouper\trunk\grouper\bin\grouperImportRecordReport_2010_02_09__02_34_58_685.txt
Update: Group: 197c460aff064eb6876b63d500c5ee22, etc:userReceiver
Insert: AttributeDefNameSet: 3e6915e7b4f144b38fe7e5143a60c9b4,
Insert: AuditEntry: f7be69a260514b6db7c3982e997cc012
Insert: AuditEntry: e8bc311da27c468281c4d8867305a998
Insert: AuditEntry: de69f0556d4648169b94ffcb7936cf77
Insert: AuditEntry: faa8130871e549e3947f2d3afaeae460
Insert: AuditEntry: f31a5288f8564b2c8e41a5f693a4f914
Insert: AuditEntry: e5a2c9ef662c483691bd92f8e65d1daa
Insert: AuditEntry: f2227db7415e44659f61e1703a02c81c

C:\mchyzer\grouper\trunk\grouper\bin>

Summary

Since so many new columns and tables have been added to Grouper especially in 1.5, and since exporting/importing these with the current design would be difficult, we decided to rewrite the Grouper export/import.  It will have these differences from the current version

  1. Doesn't store the XML document in memory (SAX)
  2. Versioned
  3. Doesn't manually marshal XML (will use xstream)
  4. Will keep logic in beans (more object oriented)
  5. Handles all data columns in the database (e.g. uuids).  Note, in import will need to lookup the business key to see if there is a different UUID, and maintain the existing UUID if it exists, will not change any UUIDs on import
  6. Handles all the new tables (e.g. new attribute framework, though I didn't think we need to import the "set" tables, e.g. groupSet.  We can calculate that stuff after import.  This is a tradeoff between size of file, speed of import (probably faster to export the "set" tables), and data integrity (probably better to recalc all after import)
  7. Sorted output so XML can be diffed, though uuids might make differs thing there are diffs, when realy there might not be
  8. The export settings will be at the top of the export file (i.e. are exporting the entire registry?)
  9. Basic features will be implemented in the first pass, then we can do the advanced features.  e.g. we will export all the data as GrouperSystem
  10. The XML export is not intended to be used for reporting or provisioning, web services and SQL can be used for that
  11. Its not really possible to have a readonly import mode without a huge transaction that will bog down the db
  12. Should have good logging (depending on level), and should print to stdout status periodically.  i.e. try to say how many records have been processed and how many left to go (will need a preparse for this).  For example every 30 seconds.

Notes

Issues


     (question) Questions or comments? (info) Contact us.