Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Collision Numbers: A number used to take a string that is not unique and makes it unique. For example, the string j.smith with a collision number added becomes (eg) j.smith.3. Collision numbers can be assigned sequentially or randomly.
  • Parameters: In a format specification, a parameter is replaced with some other string.
  • Sequenced Segment: In a format specification, sequenced segments are incorporated into an identifier in order to generate a unique string. An identifier is first generated from a format without any sequenced segments. If that identifier is not unique, sequenced segments are added until a unique identifier is generated.

Collision Numbers

Identifier formats can be a bit tricky, so let's start with the easier ones. The parameter (#) means "replace with a collision number". A collision number is the next number that will generate a unique identifier. If your identifier is assigned using the sequential algorithm, it is the next unassigned integer beginning with the minimum value you configured. For identifiers assigned using the random algorithm, the collision number is selected randomly. Only one collision number is permitted in a format.

...

The collision number can be made fixed width by specifying the number of characters n in the parameter as (#:n). For example, the format C(#:8) will generate C00000109 or C00523788.

Name Substitutions

It is also possible to generate identifiers based on one or more components of a CO Person name (as defined in cm_names). The following parameters are available:

...

So (G).(F)@myvo.org might generate Albert.Einstein@myvo.org. To use initials instead of a full name, simply limit the length of the name to 1 character. (g:1).(f)@myvo.org would generate a.einstein@myvo.org instead.

Identifier Substitutions

As of Registry v3.3.0, existing identifiers can be embedded in the format string, using the parameter (I/name) where name is the alphanumeric name of the Extended Type (not the display name), for example (I/uid)@myvo.org. This capability is available in all contexts. Note that an identifier of the specified type must already exist or the Identifier Assignment will fail. Also as of Registry v3.3.0, Identifier Assignments can be ordered, so it is possible to ensure that the first identifier is generated before the second identifier that uses it.

Note that while a length specifier for (#) specifies a fixed width padded with zeros, when used with name-based parameters such as (G), the length specifier indicates a maximum width.

Random Substitutions

As of Registry v3.3.0, several types of random strings can be generated:

  • (h): Hexadecimal characters (0-9a-f)
  • (L): Random letters (A-Z, but no O to avoid confusion with zero)
  • (l): Random letters (a-z, but no l to avoid confusion with one)

Random substitutions support width specifiers, so (eg) (l:5) will generate a five character string of lowercase letters, such as hxnwp.

(warning) It's important to understand the difference between random substitutions and collision numbers. Random substitutions are generated once as part of the identifier construction, and are not guaranteed to make a unique string. In contrast, if a collision number does not generate a unique string, it will be replaced until a unique string is found (or a limit is reached).

Random sequences can, and probably should, be combined with collision numbers. For example, (L:3)(#:2) will generate a string like DGP23. If that Identifier is already in use, the random string portion (DGP) will be preserved, but a new collision number will be generated, resulting in an Identifier like DGP77.

Sequenced Segments

These formats can't guarantee a unique identifier if your organization is non-trivial in size, so a collision number should be added. (G).(F)(#)@myvo.org would generate Albert.Einstein1@myvo.org.

...

The good news is you may not need to know all of this. Various common default formats are available via a drop down menu, and you may be able to just use one of those.

Permitted Characters

The substitutions described above are controlled by the permitted characters. Consider someone with the given name "Mary Anne" and the family name "Johnson-Smith". You might not want to allow spaces and dashes in the generated identifier, so specifying "AlphaNumeric Only" as the permitted characters will result in identifiers like "maryanne.johnsonsmith" instead of "mary anne.johnson-smith". "AlphaNumeric and Dot, Dash, Underscore" would generate "maryanne.johnson-smith".

...

(warning) Auto-generated identifiers are subject to Identifier Validation. Identifier Validator Plugins can be used to further constraint auto-generated identifiers.

Assigning Identifiers on Demand

...