SQL Source is an External Identity Source Plugin designed to integrate with upstream sources using SQL tables or views.
Configuration
- SQL Source is part of the SqlConnector Registry Plugin, which must be activated.
- If there isn't already a Server defined in the Servers Registry for the desired Source Database, add a new one.
- The Plugin for the Server is
CoreServer.SqlServers
. - Select the appropriate RDBMS type. Currently, only Postgres, MariaDB, and MySQL are fully supported. Other options are supported by the underlying framework but are not regularly tested.
- The Plugin for the Server is
- When adding a new External Identity Source, the Plugin is
SqlConnector.SqlSources
. - After creating the External Identity Source, select the appropriate target Server, as added above.
- The Table Mode may be set to either Flat or Relational, see Preparing Source Data below for more information, and for related configurations.
- The Source Table is the exact name of the table to be queried in Flat mode. In Relational mode, it is the name of the primary table.
Check Threshold
If set, Check Threshold will prevent sync processing from proceeding when the specified percentage of records have changed.
Check Threshold is only supported when the source data includes a modified
timestamp or when Archive Tables are in use.
In Flat Mode, each row only ever counts as one change, even if multiple columns within that row have changed. The maximum percentage of changes in Flat Mode is 100 (if every row had at least one change).
In Relational Mode, each row in each table counts as one change (although only once, if multiple columns within that row have changed). The maximum percentage of changes in Relational Mode can therefore exceed 100, since the total record count is only based on the primary table, but the change count is based on each related table.
The Threshold may be overridden for the next sync job by ticking Threshold Override. This setting will cause the next sync job to skip the threshold check, and then automatically remove the override.
Preparing Source Data
Inbound data may be represented using any supported database technique, including physical tables, foreign tables, views, and materialized views, so long as the representation meets the requirements described in this documentation.
SqlSource supports two modes for importing data.
- In Flat Mode, there is a single inbound table with one row per inbound External Identity, with at most one value supported for related models such as Name and Email Address.
- In Relational Mode, there is a primary table with one row per inbound External Identity, and a set of related tables allowing for multiple values for associated models.
Flat Mode is simpler to set up, but Relational Mode allows for more complex records. There is a small performance penalty for Relational Mode (since multiple tables or views need to be queried), but this should be negligible under most circumstances.
Flat Mode
Flat Mode expects a single Source Table whose definition is consistent with the following (the name may be any valid SQL name):
|
where
id
: A unique integer :that serves as a Primary Key. This column is used to generate a Role Key, see Defining Multiple Roles, below.source_key
: The System of Record Identifier, which should be unique within the table, though see Defining Multiple Roles, below.honorific
,given
,middle
,family
, andsuffix
: Used to construct the (Primary) Name, which will be assigned the type as set in the plugin configuration. Which elements are actually required is governed by CO Settings Name Required Fields (and Name Permitted Fields).affiliation
,date_of_birth
,valid_from
,valid_through
,title
,organization
,department
,manager_identifier
,sponsor_identifier
: Used to construct the External Identity Role.mail
: Used to construct the Email Address, which will be assigned the type as set in the plugin configuration.identifier
: Used to construct the Identifier, which will be assigned the type as set in the plugin configurationtelephone_number
: Used to construct the Telephone Number, which will be assigned the type as set in the plugin configuration. Only thenumber
field will be populated.address
: A comma separated set of fields used to construct the Address, which will be assigned the type as set in the plugin configuration. The value consists of the concatenated fieldsstreet,locality,state,postal_code
.url
: Used to construct the set of fields used to construct the Url, which will be assigned the type as set in the plugin configuration.pronouns
: Used to construct a Preferred Pronouns record, which will be assigned the type as set in the plugin configuration.modified
: If this column is defined, the time this record was last updated (GMT). See Handling Change Detection, below.
In addition, any column prefixed a_
(eg: a_favorite_color
) will be treated as an Ad Hoc Attribute. (The a_
prefix will be removed to construct the tag name.) These columns should be defined as character varying(256)
.
For search and retrieval of larger data sets, indexes should be maintained on the following fields:
- source_key
- given (lowercase index)
- family (lowercase index)
- identifier
- modified (if the column is defined)
Defining Multiple Roles
Multiple External Identity Roles may be specified for the same External Identity in Flat Mode by use of the id
column (which will become the Role Key) and the source_key
(which must be the same for each External Identity Role for the same External Identity). Some attributes can be resolved different ways, so Flat Mode uses the following technique:
- The row ID is the Role Key.
- The first Date of Birth seen will be used as the Date of Birth for the External Identity. Different External Identity Roles should not have different Dates of Birth for the same External Identity, though they may be left blank.
- Single valued Role attributes (affiliation, title, organization, valid_from, etc) are specific to each External Identity Role.
- Multi-valued attributes that can be attached to an External Identity Role or an External Identity (address, telephone_number, url, ad hoc attributes) will be attached to the External Identity Role.
- Multi-valued attributes that can only be attached to an External Identity (name, identifier, mail, pronouns) will be attached to the External Identity. Duplicate values will be removed.
If these rules are insufficient, use Relational Mode instead.
Relational Mode
Relational Mode expects a primary Source Table (the name of which is exactly as configured in the Plugin configuration) and a series of one or more related Source Tables (the names of which are the Source Table name, an underscore (_
), and the fixed string describing the related table). Only the _names
and _roles
tables are required, all others will be silently skipped if not present. The tables must be consistent with these SQL definitions:
|
where
id
: A unique integer that serves as a Primary Key. This column is not currently used by SqlSource for Relational Mode but may be used in a future release.source_key
: The System of Record Identifier, which must be unique within the primary table only. For related tables the source_key is effectively a foreign key to the primary table.affiliation
,date_of_birth
,valid_from
,valid_through
,title
,organization
,department
,manager_identifier
,sponsor_identifier
: Used to construct the External Identity.modified
: If this column is defined, the time this record was last updated (GMT). Only applies to the primary table.honorific
,given
,middle
,family
,suffix
,type
,language
, andprimary_name
: Used to construct a Name, at least one of which must be provided. Which elements are actually required is governed by CO Settings Name Required Fields (and Name Permitted Fields).street
,room
,locality
,state
,postal_code
,country
,type
, andlanguage
: Used to construct an Address.mail
,type
, andverified
: Used to construct an Email Address. Ifverified
is not provided, the Email Address will be considered unverified.identifier
andtype
: Used to construct an Identifier.country_code
,area_code
,number
,extension
, andtype
: Used to construct a Telephone Number. Only thenumber
field will be populated.url
andtype
: Used to construct the set of fields used to construct a Url.tag
andvalue
: Used to construct the set of fields used to construct an Ad Hoc Attribute.
For search and retrieval of larger data sets, indexes should be maintained on the following fields:
- source_key (for all tables)
- given (lowercase index)
- family (lowercase index)
- identifier
- modified (if the column is defined)
Handling Change Detection
When running a full sync, SqlSource needs to be able to detect changes to determine which records to process. Several techniques are available.
- Modified Timestamps are the most efficient.
- Archive Tables are less efficient, but do not require the incorporation of a modified timestamp into each record.
- If neither Modified Timestamps nor Archive Tables are available, SqlSource will revert to record-by-record comparisons, which are signfiicantly less efficient. For smaller datasets, however, this may be sufficient.
- Check Threshold is unavailable when using record-by-record comparisons.
Modified Timestamps
If a modified
column is present in the Source Table, its values will be used to determine when the record was last changed. If present, values should be provided for all rows; new records should be given a value of the time they were added.
In Relational Mode, only the primary table supports the modified
column. Changes to related tables will only be processed if the primary table's modified
value is updated.
Modified Timestamps cannot be used to detect deleted rows, since the row is no longer available to be detected via a timestamp. As such, this approach is not suitable where the removal of a row is used to indicate loss of eligibility. There are several alternatives:
- Change the inbound SQL view to use
valid_through
to indicate loss of eligibility, instead of removing the row. - Occasionally run Sync Job with the
force=1
option to process all rows, which will include deletes. This might be suitable for smaller data sets, or where it is acceptable to check for deletes less often than adds/updates. - Use Archive Tables instead, described below.
Archive Tables
Archive Tables are not yet available in Registry PE.
Debugging
Table Not Found In Datasource
If the specified table is not in fact in the database, the error message "Table my_source_table for model SourceRecord was not found in datasource sourcedb." probably means the ownership or permissions on the Source Tables are incorrect, and Registry is unable to query them. (The actual database issued permission error is masked by Cake, but may be visible in the database error logs.)
Archive Tables Appear To Be Ignored
Check that
- The Source Table has a
modified
column (in which case that is used instead) - The ownership or permissions on the Archive Tables is correct (otherwise Registry is unable to query or update them)
- If there is already data in the Source Tables that has been processed, that the Archive Tables have been bootstrapped.
Note that manually running individual records (eg: as a test) will not populate the Archive Tables.
See Also
Changes From Earlier Versions
As of Registry v5.0.0
- The
sorid
column is nowsource_key
. - The
o
column is noworganization
. - The
ou
column is nowdepartment
. - Support for Pronouns is available.
- In Flat Mode, Ad Hoc Attributes are prefixed
a_
. Previously, any extra columns were treated as an Ad Hoc Attribute. - It is now possible to specify multiple External Identity Roles for a single External Identity, in both Flat and Relational modes. In Relational Mode, many attributes moved from the primary table to the new
_roles
table. - In Relational Mode,
primary_name
is no longer specified for_names
records. - In Relational Mode,
login
is no longer specified for_identifier
records (AR-Identifier-1). - Check Threshold is available.