The Format Identifier Assignment Plugin generates Identifiers based on a format specification and attributes associated with the subject entity.
Supported Contexts
- Department
- Group
- Person
Configuration
- The Format Identifier Assigner is part of the CoreAssigner Registry Plugin, which is activated by default.
- When adding a new Identifier Assignment, the Plugin is
CoreAssigner.FormatAssigners
. - Plugin configuration options are described below.
Format
Identifier formats are specified using three components:
- Substitutions: In a format specification, a substitution is replaced with some other string. Substitutions are delimited with parentheses.
- Collision Numbers: A number used to take a string that is not unique and makes it unique. For example, the string
j.smith
with a collision number added might becomej.smith.3
. - Sequenced Segments: In a format specification, sequenced segments allow adding additional components to a string to help generate a unique result. An Identifier is first generated from a format without any sequenced segments. If that Identifier is not unique, sequenced segments are added one at a time until a unique Identifier is generated.
If no format is specified, identifiers will simply be assigned as an integer, eg 109
or 523788
.
Substitutions
Substitutions replace a segment of a format with some other value, usually (but not necessarily) associated with the entity for which the Identifier is being assigned. Not all Substitutions are supported in all contexts. Substitutions are delimited with parentheses, for example (G)
. The following Substitutions are available:
- Person Name
(G)
: Given Primary Name(M)
: Middle Primary Name(F)
: Family Primary Name(g)
: Given Primary Name (lowercased)(m)
: Middle Primary Name (lowercased)(f)
: Family Primary Name (lowercased)
- Department, Group Name
(N)
: Entity name(n)
: Entity name (lowercased)
- Identifier
(I)
: See below
- Random, see below
(h)
: Hexadecimal characters (0-9a-f)(L)
: Random letters (A-Z, but no O to avoid confusion with zero)(l)
: Random letters (a-z, but no l to avoid confusion with one)
- Collision Number
(#)
: See below
Substitutions can be limited in length using the colon notation (X:n)
. For example, an initial can be used in lieu of the full given name by using (g:1)
. (This does not apply to Collision Numbers, see below.)
Identifier Substitutions
Existing Identifiers can be embedded in the format string, using the parameter (I/name)
where name is the alphanumeric database value of the Registry Type (not the Display Name), for example (I/uid)@myvo.org
. This capability is available in all contexts. Note that an Identifier of the specified type must already exist or the Identifier Assignment will fail. Identifier Assignments can be ordered, so it is possible to ensure that the first Identifier is generated before the second Identifier that uses it.
If more than one Identifier of a given type, it is non-deterministic as to which Identifier will be used for the Substitution.
Random Substitutions
Random Substitutions operate differently from Collision Numbers. Random Substitutions are generated once as part of the Identifier construction, and are not guaranteed to make a unique string. In contrast, if a Collision Number does not generate a unique string, it will be replaced until a unique string is found (or a limit is reached).
Random substitutions support width specifiers, so (eg) (l:5)
will generate a five character string of lowercase letters, such as hxnwp
.
Random sequences can, and probably should, be combined with collision numbers. For example, (L:3)(#:2)
will generate a string like DGP23
. If that Identifier is already in use, the random string portion (DGP
) will be preserved, but a new collision number will be generated, resulting in an Identifier like DGP77
.
Collision Numbers
A Collision Number is simply a type of substitution that uses a number to generate a unique identifier. How the number is generated is controlled by Collision Mode, and Minimum and Maximum Collision Values, described below.
Only one Collision Number is permitted in a format.
The Collision Number can be made fixed width by specifying the number of characters n in the Substitution as (#:n)
.
Sequenced Segments
Sequenced Segments are fragments of an identifier format that are added one at a time in order to help generate a unique Identifier. This can be useful for situations like
- Adding a collision number, but only to the second Identifier of a given form.
- Inserting a middle name, but only if an Identifier already exists using only the given and family names.
A Sequenced Segments is denoted in brackets as a number followed by a colon, and includes the text (including Substitutions) to be used when that sequenced segment is in effect. When assigning Identifiers, all Sequenced Segments will initially be ignored. Then, starting with 1
and incrementing by 1 each time, Sequenced Segments will be added in until a unique Identifier is generated. Currently, up to 9 Sequenced Segments may be defined.
For example, consider the format (G)[1:.(M:1)].(F)[2:.(#)]@myvo.org
. This somewhat confusing string will first generate Werner.Heisenberg@myvo.org
. If that isn't unique, it will then generate Werner.K.Heisenberg@myvo.org
. Finally, it will generate Werner.K.Heisenberg.1@myvo.org
. (The Minimum Collision Value should probably set to 2 when used with sequenced segments. That would generate Werner.K.Heisenberg.2@myvo.org
instead, which is presumable less confusing if there is already a Werner.K.Heisenberg@myvo.org
assigned.)
There are actually two types of Sequenced Segments: additive and single use. Additive sequenced segments are denoted with [
and ]
, and are inserted starting with their designated sequence and remain in place for future identifier attempts. Single use sequenced segments are indicated with an additional =
inserted after the open bracket. So, for example, the segment [1:.(M:1)]
will be inserted into the second and each subsequently generated Identifier candidate, while the segment [=1:.(M:1)]
will only be inserted into the second generated candidate (and no subsequent candidates).
Example Formats
Description | Format | Example Identifiers |
---|---|---|
Identifier consisting of the letter | C(#) | C109 , C523788 |
Identifier consisting of the letter C followed by an eight character collision number | C(#:8) | C00000109 , C00523788 |
Use given and family names to generate an Email Address | (G).(F)@myvo.org | Albert.Einstein@myvo.org |
Use first initial and family name to generate a lowercase Email Address | (g:1).(f)@myvo.org | a.einstein@myvo.org |
Create a Network ID (netid) based on initials and a collision number | (g:1)(m:1)(f:1)(#) | rdm75 |
Generate an Email Address based on the Network ID | (I/netid)@myvo.org | rdm75@myvo.org |
Collision Mode
The Collision Mode controls how collision numbers are assigned. Supported modes are:
- Random: The Collision Number is generated randomly.
- Sequential: The Collision Number is generated sequentially (for each string constructed prior to assigning the collision number), using the next unassigned integer beginning with the Minimum Collision Value.
Sequential Collision Number assignment requires storing state in the database, using the format_assigner_sequences
table. No specific action is required for this, however deployers may wish to be aware of the (minor) additional overhead of using Sequential Collision Numbers.
Permitted Characters
The substitutions described in this document are controlled by the Permitted Characters. Consider someone with the given name "Mary Anne" and the family name "Johnson-Smith". It might not be desirable to allow spaces or dashes in the generated identifier, so specifying AlphaNumeric Only as the permitted characters would result in an identifier like "maryanne.johnsonsmith" instead of "mary anne.johnson-smith". AlphaNumeric and Dot, Dash, Underscore would generate "maryanne.johnson-smith".
If any Sequenced Segment generates text consisting only of non-permitted characters, it will be skipped.
Auto-generated Identifiers are subject to (XXX update link) Identifier Validation. Identifier Validator Plugins can be used to further constraint auto-generated Identifiers.
Minimum Length
If set, any generated Identifier must have at least the number of characters specified by Minimum Length. If not, Sequenced Segments will be applied until the minimum length is achieved, or the available Sequenced Segments have been exhausted.
Minimum Collision Value
For Random Collision Numbers, the minimum value that be assigned. For Sequential Collision Numbers, the first value to be assigned.
The Minimum Collision Value is useful for avoiding collision numbers starting with the number 1
, which may be confused with the letter l
.
Maximum Collision Value
For Random Collision Numbers, the maximum value that may be assigned. Currently, the maximum may not exceed the value returned by PHP's mt_getrandmax()
function, which is typically 2,147,483,647.
Maximum Collision Values cannot be set for Sequential Collision Numbers.
Pre-populating Identifier Assignment Collision Numbers
This section is for advanced use cases.
It is possible to manually pre-populate sequential collision numbers, which may be useful when migrating data from another system. There is not currently a user interface to handle this (CO-386), so the steps must be handled manually.
First, define an Identifier Assignment using the Format Assigner as described above, if not already done. Obtain the ID for the Format Assigner, which can be found via the plugin's configuration page. In a URL like the following, the ID is 3:
http://localhost/registry-pe/core-assigner/format-assigners/edit/3
Next, determine the affix or affixes. These are equivalent to the format with parameters substituted (with %s
replacing (#)
). For example, a format used to generate identifiers consisting of a person's initials might be (G:1)(M:1)(F:1)(#)
. This would translate into a set of rows for each initial sequence, eg:
|
Note that rows in this table are not automatically created until an Identifier with a given affix is assigned.
Plugin Application Rules
- A maximum of 10 attempts will be made to assign an Identifier.
See Also
- Registry PE Identifier Assignment
- Registry Table: format_assigners
- Registry Table: format_assigner_sequences
Changes From Earlier Versions
As of Registry v5.0.0
- The functionality previously provided by the core Identifier Assignment code is now provided by the Format Assigner plugin.