Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. when configuring grouper.client.properties as described below, references to databases are those configured in grouper-loader.properties. 
  2. The "grouper" database is the same db as defined in grouper.hibernate.properties.  No need to define the grouper db in grouper-loader.properties. 
  3. When specifying tableFrom or tableTo, depending on your username/schema being used and assumptions your DB driver makes, you may or may not need to specify schema.tablename.

Create the real time status table

In 2.4 you need to manually create the status tables.  These tables list the provisioners (e.g. provision to targetA, the jobs (e.g. full and incremental), and the groups, users, memberships, and logs.  This allows us a lot of advantages in performance and troubleshooting over the previous provisioning.

Run this from GSH or the java command line.  

Code Block
GSH:
args = new String[1];
args[0] = "false";
edu.internet2.middleware.grouper.app.tableSync.TableSyncCreateTables.main(args);

Example: demo server syncDDL is created with the above.  If you have a gsh container you probably want to volume bind /opt/grouper/grouper.apiBinary/ddlScripts to a space outside the container.  Then you can run the SQL generated with gsh -registry -runsqlfile ddlScripts/sqlfile_generated.sql to create the status tables after you confirm the DDL is good.

Overall flow of syncs

  1. See if needs to run or exit

    Sync typeCheck whenHow check
    fullat startupIf a full has run in the last X (configurable, e.g. 12 hours),
    then dont run
    fullat startup
    1. register that this job wants to run
    2. see if another job is running
    3. wait until it isnt running if so
    4. register as running if nothing else running
    5. wait a couple more seconds
    6. if nothing else running then run
    incrementalat startup

    if another job is running or pending then wait

    incrementalthroughout jobif another job is running or pending then exit


  2. Select all of something.  Note, if there is a source and destination query, do one in a thread, and the other in the current thread.  Handle exceptions appropriately.  Wait for both to finish before proceeding.

    Sync typeSync subtypeSelect whatFrom whereExampleMore info
    fullfullSyncFullselect all records and all columns
    1. source
    2. destination
    select * from tableget all records from both sides
    fullfullSyncGroupingsselect all distinct groupings
    1. source
    2. destination
    select distinct grouping_col from tableeither:
    1. select all one col primary keys
    2. select one col that groups sets of records together (e.g. group_name of memberships)
    fullfullSyncChangeFlagselect all primary keys and a column
    that is a change flag
    1. source
    2. destination
    select uuid, last_updated from table

    This can be a last updated date (to the milli) or a checksum string or something

    fullfullSyncMetadataselect all distinct groupings
    1. source
    2. destination
    select distinct grouping_col from table
    1. if grouping not in dest then add it
    2. if grouping is in dest and not source then delete it

    Useful if groups get renamed or tags get added/removed

    Does not sync all memberships and does not count as a full sync 

    incrementalincrementalAllColumns

    get all incrementals that have happened since the last check

    including all columns

    1. source or change log table
    select * from table where last_updated > last_checkedIf the source table as a last_updated or numeric increasing col.  Note, this will not process deletes if off of source table since deleted rows wont be there
    incrementalincrementalPrimaryKey

    get all incrementals and each row has the primary 

    key to sync

    1. change log table
    select primary_key_col0, primary_key_col1 from change_log_table where last_updated > last_checkedIf the change_log_table doesnt have all columns, but might also have deletes



  3. Initial compare

    Sync typeSync subtypeInitial compare
    fullfullSyncFull

    If primary key exists in destination and not source, then delete in primary key in the destination

    If all columns row in source matches all columns of row in destination then remove from both lists

    Compare all records and batch up the inserts/updates/deletes, done

    fullfullSyncGroupings

    If a grouping exists in destination and not source, then delete that grouping in destination

    fullfullSyncChangeFlag

    If primary key exists in destination and not source, then delete in primary key in the destination

    If change flag of row in source matches change flag of row in destination then remove from both lists

    incremental*NA



  4. Switch job type?

    Note: if an incremental switches to a grouping or full sync, then it wont yield to a real full sync...


    Sync typeSync subtypeIf this occursSwitch to
    incremental*

    Number of records is greater than X (configurable, e.g. 10k), 

    and if grouping, there are fewer than Y groupings (configurable, e.g. 5)

    1. Capture current timestamp or max record in change_log
    2. Full sync everything
    3. Updated last processed to "a", skip records until "a"
    incremental*

    If grouping, and if theres a fullSyncGroupings job, 

    and if a grouping has more than X (configurable, e.g. 5k)

    1. Capture current timestamp or max record in change_log
    2. Do a grouping sync on those groupings (e.g. all records for a group)
    3. Skip records in that grouping until "a"



  5. Batch up requests (e.g. process a certain number of records at once.  Note; this can be done in several threads


    Sync typeSync subtypeSelect whatFrom whereNumber of recordsExample
    fullfullSyncFullNA, This job is already done


    fullfullSyncGroupingsselect all columns from source and destination that are between two grouping indexes
    1. source
    2. destination

    Approx 10k or 100k, might be unknown if grouping col is not primary key.  Based on grouping size configuration

    e.g. for groups, might be 5k groups at once, for people, might be 50k people at once

    select * from table where grouping_col > ? and grouping col < ?
    fullfullSyncChangeFlagselect all primary keys and a column
    that is a change flag
    1. source
    2. destination

    50-900

    constrained by bind var max of 1000

    select * from table where primary_key_col0 = ? and primary_key_col1 = ?
    incrementalincrementalFullAllColumns

    get all incrementals that have happened since the last check

    including all columns

    1. destination

    50-900

    constrained by bind var max of 1000

    select * from table where primary_key_col0 = ? and primary_key_col1 = ?
    incrementalincrementalFullPrimaryKey

    get all incrementals and each row has the primary 

    key to sync

    1. source
    2. destination

    50-900

    constrained by bind var max of 1000

    select * from table where primary_key_col0 = ? and primary_key_col1 = ?



  6. Process records

    Note these use prepared statement batching, so all the deletes happen in batches of 100 (configurable), updates happen in batches of 100 (configurable), and inserts happen in batches of 100 (configurable)

    1. If record exists in source and not in destination by primary key, then insert the destination record
    2. If record exists in destination and not source by primary key, then delete the destination record
    3. If a primary key exists in source and destination, but all cols do not match, then update the destination record
    4. If a primary key exists in source and destination, and all cols match, then ignore the record

  7. Move pointer forward to max number/date processed

...