SQL Source is an External Identity Source Plugin designed to integrate with upstream sources using SQL tables or views.

Configuration

  1. SQL Source is part of the SqlConnector Registry Plugin, which must be activated.
  2. If there isn't already a Server defined in the Servers Registry for the desired Source Database, add a new one.
    1. The Plugin for the Server is CoreServer.SqlServers.
    2. Select the appropriate RDBMS type. Currently, only PostgresMariaDB, and MySQL are fully supported. Other options are supported by the underlying framework but are not regularly tested.
  3. When adding a new External Identity Source, the Plugin is SqlConnector.SqlSources.
  4. After creating the External Identity Source, select the appropriate target Server, as added above.
  5. The Table Mode may be set to either Flat or Relational, see Preparing Source Data below for more information, and for related configurations.
  6. The Source Table is the exact name of the table to be queried in Flat mode. In Relational mode, it is the name of the primary table.

Check Threshold

If set, Check Threshold will prevent sync processing from proceeding when the specified percentage of records have changed.

(warning) Check Threshold is only supported when the source data includes a modified timestamp or when Archive Tables are in use.

In Flat Mode, each row only ever counts as one change, even if multiple columns within that row have changed. The maximum percentage of changes in Flat Mode is 100 (if every row had at least one change).

In Relational Mode, each row in each table counts as one change (although only once, if multiple columns within that row have changed). The maximum percentage of changes in Relational Mode can therefore exceed 100, since the total record count is only based on the primary table, but the change count is based on each related table.

The Threshold may be overridden for the next sync job by ticking Threshold Override. This setting will cause the next sync job to skip the threshold check, and then automatically remove the override.

Preparing Source Data

Inbound data may be represented using any supported database technique, including physical tables, foreign tables, views, and materialized views, so long as the representation meets the requirements described in this documentation.

SqlSource supports two modes for importing data.

  • In Flat Mode, there is a single inbound table with one row per inbound External Identity, with at most one value supported for related models such as Name and Email Address.
  • In Relational Mode, there is a primary table with one row per inbound External Identity, and a set of related tables allowing for multiple values for associated models.

Flat Mode is simpler to set up, but Relational Mode allows for more complex records. There is a small performance penalty for Relational Mode (since multiple tables or views need to be queried), but this should be negligible under most circumstances.

Flat Mode

Flat Mode expects a single Source Table whose definition is consistent with the following (the name may be any valid SQL name):

CREATE TABLE my_source_table (
    id                  integer PRIMARY KEY,
    source_key          character varying(1024),
    honorific           character varying(32),
    given               character varying(128) NOT NULL,
    middle              character varying(128),
    family              character varying(128),
    suffix              character varying(32),
    affiliation         character varying(32),
    date_of_birth       date,
    valid_from          timestamp without time zone,
    valid_through       timestamp without time zone,
    title               character varying(128),
    organization        character varying(128),
    department          character varying(128),
    manager_identifier  character varying(512),
    sponsor_identifier  character varying(512),
    mail                character varying(256),
    identifier          character varying(512),
    telephone_number    character varying(64),
    address             character varying(512),
    url                 character varying(256),
    pronouns            character varying(64),
   
-- if the modified column is present, it will be used to detect records changed since the last full sync
    modified            timestamp without time zone
);

where

  • id: A unique integer :that serves as a Primary Key. This column is used to generate a Role Key, see Defining Multiple Roles, below.
  • source_key: The System of Record Identifier, which should be unique within the table, though see Defining Multiple Roles, below.
  • honorific, given, middle, family, and suffix: Used to construct the (Primary) Name, which will be assigned the type as set in the plugin configuration. Which elements are actually required is governed by CO Settings Name Required Fields (and Name Permitted Fields).
  • affiliation, date_of_birth, valid_from, valid_through, title, organization, department, manager_identifier, sponsor_identifier: Used to construct the External Identity Role.
  • mail: Used to construct the Email Address, which will be assigned the type as set in the plugin configuration.
  • identifier: Used to construct the Identifier, which will be assigned the type as set in the plugin configuration
  • telephone_number: Used to construct the Telephone Number, which will be assigned the type as set in the plugin configuration. Only the number field will be populated.
  • address: A comma separated set of fields used to construct the Address, which will be assigned the type as set in the plugin configuration. The value consists of the concatenated fields street,locality,state,postal_code.
  • url: Used to construct the set of fields used to construct the Url, which will be assigned the type as set in the plugin configuration.
  • pronouns: Used to construct a Preferred Pronouns record, which will be assigned the type as set in the plugin configuration.
  • modified: If this column is defined, the time this record was last updated (GMT). See Handling Change Detection, below.

In addition, any column prefixed a_ (eg: a_favorite_color) will be treated as an Ad Hoc Attribute. (The a_ prefix will be removed to construct the tag name.) These columns should be defined as character varying(256).

For search and retrieval of larger data sets, indexes should be maintained on the following fields:

  • source_key
  • given (lowercase index)
  • family (lowercase index)
  • mail
  • identifier
  • modified (if the column is defined)

Defining Multiple Roles

Multiple External Identity Roles may be specified for the same External Identity in Flat Mode by use of the id column (which will become the Role Key) and the source_key (which must be the same for each External Identity Role for the same External Identity). Some attributes can be resolved different ways, so Flat Mode uses the following technique:

  1. The row ID is the Role Key.
  2. The first Date of Birth seen will be used as the Date of Birth for the External Identity. Different External Identity Roles should not have different Dates of Birth for the same External Identity, though they may be left blank.
  3. Single valued Role attributes (affiliation, title, organization, valid_from, etc) are specific to each External Identity Role.
  4. Multi-valued attributes that can be attached to an External Identity Role or an External Identity (address, telephone_number, url, ad hoc attributes) will be attached to the External Identity Role.
  5. Multi-valued attributes that can only be attached to an External Identity (name, identifier, mail, pronouns) will be attached to the External Identity. Duplicate values will be removed.

If these rules are insufficient, use Relational Mode instead.

Relational Mode

Relational Mode expects a primary Source Table (the name of which is exactly as configured in the Plugin configuration) and a series of one or more related Source Tables (the names of which are the Source Table name, an underscore (_), and the fixed string describing the related table). Only the _names and _roles tables are required, all others will be silently skipped if not present. The tables must be consistent with these SQL definitions:

CREATE TABLE my_source_table (
    id                  integer PRIMARY KEY,
    source_key          character varying(1024) UNIQUE,
    date_of_birth       date,
    -- if the modified column is present, it will be used to detect records changed since the last full sync     modified            timestamp without time zone
);
 
CREATE TABLE my_source_table_names (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    honorific       character varying(32),
    given           character varying(128) NOT NULL,
    middle          character varying(128),
    family          character varying(128),
    suffix          character varying(32),
    type            character varying(32) NOT NULL,
    language        character varying(16)
);
 
CREATE TABLE my_source_table_addresses (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    street          text NOT NULL,
    room            character varying(64),
    locality        character varying(128),
    state           character varying(128),
    postal_code     character varying(16),
    country         character varying(128),
    type            character varying(32) NOT NULL,
    language        character varying(16)
);
 
CREATE TABLE my_source_table_email_addresses (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    mail            character varying(256) NOT NULL,
    type            character varying(32) NOT NULL,
    verified        boolean
);
 
CREATE TABLE my_source_table_identifiers (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    identifier      character varying(512) NOT NULL,
    type            character varying(32) NOT NULL
);

CREATE TABLE my_source_table_roles (
    id                  integer PRIMARY KEY,
    source_key          character varying(1024) REFERENCES my_source_table (source_key),
    affiliation         character varying(32),
    date_of_birth       date,
    valid_from          timestamp without time zone,
    valid_through       timestamp without time zone,
    title               character varying(128),
    organization        character varying(128),
    department          character varying(128),
    manager_identifier  character varying(512),
    sponsor_identifier  character varying(512)
);
 
CREATE TABLE my_source_table_telephone_numbers (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    country_code    character varying(3),
    area_code       character varying(8),
    number          character varying(64) NOT NULL,
    extension       character varying(16),
    type            character varying(32) NOT NULL
);
 
CREATE TABLE my_source_table_urls (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    url             character varying(256) NOT NULL,
    type            character varying(32) NOT NULL
);
 
CREATE TABLE my_source_table_ad_hoc_attributes (
    id              integer PRIMARY KEY,
    source_key      character varying(1024) REFERENCES my_source_table (source_key),
    tag             character varying(128) NOT NULL,
    value           character varying(256)
);

where

  • id: A unique integer that serves as a Primary Key. This column is not currently used by SqlSource for Relational Mode but may be used in a future release.
  • source_key: The System of Record Identifier, which must be unique within the primary table only. For related tables the source_key is effectively a foreign key to the primary table.
  • affiliation, date_of_birth, valid_from, valid_through, title, organization, department, manager_identifier, sponsor_identifier: Used to construct the External Identity.
  • modified: If this column is defined, the time this record was last updated (GMT). Only applies to the primary table.
  • honorific, given, middle, family, suffix, type, language, and primary_name: Used to construct a Name, at least one of which must be provided. Which elements are actually required is governed by CO Settings Name Required Fields (and Name Permitted Fields).
  • street, room, locality, state, postal_code, country, type, and language: Used to construct an Address.
  • mail, type, and verified: Used to construct an Email Address. If verified is not provided, the Email Address will be considered unverified.
  • identifier and type: Used to construct an Identifier.
  • country_code, area_code, number, extension, and type: Used to construct a Telephone Number. Only the number field will be populated.
  • url and type: Used to construct the set of fields used to construct a Url
  • tag and value: Used to construct the set of fields used to construct an Ad Hoc Attribute

For search and retrieval of larger data sets, indexes should be maintained on the following fields:

  • source_key (for all tables)
  • given (lowercase index)
  • family (lowercase index)
  • mail
  • identifier
  • modified (if the column is defined)

Handling Change Detection

When running a full sync, SqlSource needs to be able to detect changes to determine which records to process. Several techniques are available.

  1. Modified Timestamps are the most efficient.
  2. Archive Tables are less efficient, but do not require the incorporation of a modified timestamp into each record.
  3. If neither Modified Timestamps nor Archive Tables are available, SqlSource will revert to record-by-record comparisons, which are signfiicantly less efficient. For smaller datasets, however, this may be sufficient.
    1. (info) Check Threshold is unavailable when using record-by-record comparisons.

Modified Timestamps

If a modified column is present in the Source Table, its values will be used to determine when the record was last changed. If present, values should be provided for all rows; new records should be given a value of the time they were added.

In Relational Mode, only the primary table supports the modified column. Changes to related tables will only be processed if the primary table's modified value is updated.

Modified Timestamps cannot be used to detect deleted rows, since the row is no longer available to be detected via a timestamp. As such, this approach is not suitable where the removal of a row is used to indicate loss of eligibility. There are several alternatives:

  1. Change the inbound SQL view to use valid_through to indicate loss of eligibility, instead of removing the row.
  2. Occasionally run Sync Job with the force=1 option to process all rows, which will include deletes. This might be suitable for smaller data sets, or where it is acceptable to check for deletes less often than adds/updates.
  3. Use Archive Tables instead, described below.

Archive Tables

(warning) Archive Tables are not yet available in Registry PE.

Debugging

Table Not Found In Datasource

If the specified table is not in fact in the database, the error message "Table my_source_table for model SourceRecord was not found in datasource sourcedb." probably means the ownership or permissions on the Source Tables are incorrect, and Registry is unable to query them. (The actual database issued permission error is masked by Cake, but may be visible in the database error logs.)

Archive Tables Appear To Be Ignored

Check that

  1. The Source Table has a modified column (in which case that is used instead)
  2. The ownership or permissions on the Archive Tables is correct (otherwise Registry is unable to query or update them)
  3. If there is already data in the Source Tables that has been processed, that the Archive Tables have been bootstrapped.

Note that manually running individual records (eg: as a test) will not populate the Archive Tables.

See Also

Changes From Earlier Versions

As of Registry v5.0.0

  • The sorid column is now source_key.
  • The o column is now organization.
  • The ou column is now department.
  • Support for Pronouns is available.
  • In Flat Mode, Ad Hoc Attributes are prefixed a_. Previously, any extra columns were treated as an Ad Hoc Attribute.
  • It is now possible to specify multiple External Identity Roles for a single External Identity, in both Flat and Relational modes. In Relational Mode, many attributes moved from the primary table to the new _roles table.
  • In Relational Mode, primary_name is no longer specified for _names records.
  • In Relational Mode, login is no longer specified for _identifier records (AR-Identifier-1).
  • Check Threshold is available.
  • No labels