BulkLoad Shell allows for faster loading of large initial datasets. BulkLoad Shell is available as of Registry v3.3.0. The benefits of BulkLoad Shell are typically seen for datasets of at least 50k records.
It is important to understand that BulkLoad bypasses the normal Registry data processing engine, and as such is only suitable for an initial bulk load of data.
\t
) are not supported in attribute values.\n
) are supported for fields that support multiple lines, but must be double escaped (\\n
).The following tables are supported. In general, any field specified in the Registry Data Model documentation is supported. Foreign keys and other attribute metadata will be automatically inserted by BulkLoad, and so should be omitted. CoOrgIdentityLinks will also be automatically inserted.
As of Registry v4.0.0, BulkLoad Shell supports three types of Plugin tables: CoGroup, CoPerson and Configuration. CoPerson tables are those with a foreign key to CoPerson (which will automatically be inserted), CoGroup tables operate similarly. Configuration tables have no dependent foreign keys. The plugin models must be declared in the JSON File metadata (described below).
The inbound file consists of multiple JSON objects, one to a line. The first object is a File Metadata object, which is described below. Subsequent lines each hold one object record, as described below.
Data is generally represented as a complete primary object (CoGroup, CoPerson), with related data (EmailAddress, Identifier, CoPersonRole, etc) nested within. Foreign keys are automatically inserted by BulkLoad Shell.
Values must be their actual database values, not application enums. For example, a valid CoPerson:status
value is "A
", not "Active
". In general, the values are documented in the Data Model.
The following Metadata attributes are available:
The Metadata line may be an empty JSON object ({}
) if no Metadata is required.
The JSON Schema definition of the File Metadata Object is available here.
The JSON Schema definition for Data Objects is available here.
It is possible to reference primary objects defined earlier in the inbound file using Cross References. This enables scenarios such as
When preparing the inbound data, each relevant record is given a cross reference label (identified in the record metadata via the xref
attribute). Subsequent objects may refer to the Registry internal foreign key assigned for the newly created object, via @{label}
notation, which can be used anywhere in any non-metadata attribute of a suitable type (eg: not boolean, etc).
Cross Reference labels must be alphanumeric.
Labels are parsed after the JSON document is parsed, and so must not create invalid JSON. This mostly means that they must be placed in quotes when used to create a foreign key (eg: co_group_id
). This will technically convert the value to a JSON string rather than an integer, but PHP will coerce it back to an integer when needed.
Cross References are available as of Registry v4.0.0.
Each primary object (CO Group, CO Person) may have Metadata attributes, of which the following are available:
CoGroup
or CoPerson
. This attribute is not currently required.abc1
is assigned to a CoPerson object, the object could be referred to in a later object via { "co_person_id": "@{abc1}" }
.Record Metadata is available as of Registry v4.0.0.
Each subsequent line consists of a single JSON object representing a single CO Person. The members of the object are labeled using the model name (CamelCase, singular) in this layout:
Records must be "pre matched". If a CO Person has multiple Org Identities, they must be placed in the same JSON object.
Records can be linked to Organizational Identity Sources, to link them for purposes of future updates. To do so, include a suitable OrgIdentitySourceRecord, linked to the appropriate OIS configuration (via the org_identity_source_id
foreign key). The OrgIdentity as would be returned by the appropriate OIS backend must be included. The OrgIdentity should include the SORID as one of its Identifiers.
Do not specify CO Person Roles that would be created via Pipelines attached to the OIS. BulkLoad will create them.
CO Group Mappings are not supported, because creating these mappings would require instantiating the OIS plugin backend, which would degrade performance. Instead, create the desired CoGroupMember records in the JSON record.
Each subsequent line consists of a single JSON object representing a single CO Group. The members of the object are labeled using the model name (CamelCase, singular) in this layout:
Nested Groups are not supported.
CO Group Records are supported as of Registry v4.0.0.
Plugin Configuration Records may be included one per line. See the examples, below.
The later examples here are shown with newlines for readability, however the actual file should include no newlines, except to separate each record.
{} {"CoPerson":{"status":"A"},"Name":[{"given":"Myrtle","family":"Jefferson","type":"official","primary_name":true}]} {"CoPerson":{"status":"A"},"Name":[{"given":"Novella","family":"Torres","type":"official","primary_name":true}]} |
{ "CoPerson": { "status": "A" }, "Name": [ { "given": "Myrtle", "family": "Jefferson", "type": "official", "primary_name": true } ], "Identifier": [ { "identifier": "d3b5b15c-3ce2-4ce5-9752-acb941ed0e78", "type": "reference", "login": false, "status": "A" }, { "identifier": "476-56-5741", "type": "national", "login": "false", "status": "A" } ], "EmailAddress": [ { "mail": "MyrtleWJefferson@university.edu", "type": "official", "verified": true } ], "CoGroupMember": [ { "co_group_id": 37, "member": true, "owner": false } ], "CoPersonRole": [ { "affiliation": "employee", "title": "Employee", "o": null, "ou": "Biology", "TelephoneNumber": [ { "country_code": "1", "number": "507-798-2339", "type": "campus" } ] } ], "OrgIdentity": [ { "OrgIdentity": { "affiliation": "member" }, "Name": { "given": "Myrtle", "family": "Jefferson", "type": "official", "primary_name": true }, "Identifier": [ { "identifier": "24n9vBgj@social.com", "type": "eppn", "login": true, "status": "A" } ], "EmailAddress": [ { "mail": "myrtle787@social.com", "type": "personal", "verified": true } ] } ] } |
{ "CoPerson": { "status": "A" }, "Name": [ { "given": "Novella", "family": "Torres", "type": "official", "primary_name": true } ], "Identifier": [ { "identifier": "0dc187ef-5d10-412d-83d2-b55221b556b5", "type": "reference", "login": false, "status": "A" }, { "identifier": "509-74-0614", "type": "national", "login": "false", "status": "A" } ], "EmailAddress": [ { "mail": "NovellaDTorres@university.edu", "type": "official", "verified": true } ], "CoGroupMember": [ { "co_group_id": 22, "member": true, "owner": false } ], "OrgIdentitySourceRecord": [ { "org_identity_source_id": 2, "sorid": "hrms2", "source_record": "{\"0\":\"hrms2\",\"1\":\"\",\"2\":\"\",\"3\":\"Novella\",\"4\":\"\",\"5\":\"Torres\",\"6\":\"\",\"7\":\"\",\"8\":\"\",\"9\":\"\",\"10\":\"\",\"11\":\"\",\"12\":\"NovellaDTorres@university.edu\",\"13\":\"913-626-2317\",\"14\":\"1\",\"15\":\"509-74-0614\",\"16\":\"Employee\",\"17\":\"Linguistics\",\"18\":\"0dc187ef-5d10-412d-83d2-b55221b556b5\"}", "reference_identifier": "0dc187ef-5d10-412d-83d2-b55221b556b5", "OrgIdentity": { "OrgIdentity": { "affiliation": "employee", "title": "Employee", "o": null, "ou": "Linguistics" }, "Address": [], "Name": [ { "given": "Novella", "family": "Torres", "type": "official", "primary_name": true } ], "EmailAddress": [ { "mail": "NovellaDTorres@social.com", "type": "personal", "verified": true } ], "Identifier": [ { "identifier": "0dc187ef-5d10-412d-83d2-b55221b556b5", "type": "reference", "login": false, "status": "A" }, { "identifier": "509-74-0614", "type": "national", "login": "false", "status": "A" }, { "identifier": "hrms2", "type": "sorid", "login": "false", "status": "A" } ], "TelephoneNumber": [ { "country_code": "1", "number": "913-626-9988", "type": "mobile" } ] } } ] } |
{ "meta":{ "pluginModels":{ "CoPerson":[ "SshKeyAuthenticator.SshKey" ], "Configuration":[ "MyPlugin.GreenObject" ] } } } { ... } |
{"meta":{"pluginModels":{"CoPerson":["SshKeyAuthenticator.SshKey"],"Configuration":["MyPlugin.GreenObject"]}}}, {"CoPerson":{"status":"A"},"SshKey":[{"ssh_key_authenticator_id":1,"type":"ssh-dss","skey":"abc123"}]}, {"GreenObject":{"my_attribute":"my_value"}}, {"GreenObject":{"my_attribute":"my_other_value"}} |
{"meta":{"pluginModels":{"CoPerson":["SshKeyAuthenticator.SshKey","UnixCluster.UnixClusterAccount"],"Configuration":["UnixCluster.UnixClusterGroup"]}}} {"meta":{"objectType":"CoGroup","xref":"gobj1"},"CoGroup":{"co_id":"2","name":"Primary Unix Cluster","description":"Primary Unix Cluster users","open":false,"status":"A","group_type":"S","auto":false,"nesting_mode_all":false}} {"meta":{"objectType":"CoPerson","xref":"obj1"},"CoPerson":{"status":"A"},"Name":[{"given":"Nelly","family":"Brasel","type":"official","primary_name":true}],"CoGroupMember":[{"co_group_id":"@{gobj1}","member":true,"owner":false}],"UnixClusterAccount":[{"unix_cluster_id":1,"sync_mode":"F","status":"A","username":"nbrasel","uid":10001,"gecos":"Nelly Brasel","login_shell":"\/bin\/bash","home_directory":"\/home\/nbrasel","primary_co_group_id":44}]} {"meta":{"plugin":"UnixCluster","objectType":"UnixClusterGroup","xref":"obj2"},"UnixClusterGroup":{"unix_cluster_id":1,"co_group_id":"@{gobj1}"}} |
cake bulk_load [-a actor] [-t dbtype] coid infile |
where
actor_identifier
column, defaults to Bulk Load ShellBe careful with file ownerships here. You will need to run this command as the web server user or as root, depending on whether or not the web server user can read the input file. If running as root, make sure any files generated in the cache directory (wherever |
BulkLoad can process ~500k records on a single vCPU with 2GB of RAM and a local database in about 10 minutes. Larger datasets may require somewhat more memory, but additional vCPUs are unlikely to help much. Make sure the database server has sufficient disk space available. (A few GB should be sufficient, depending on the size of the dataset.) Communications to a database server over a network (vs local to the same server) may result in slower run times.
See also: Registry Installation - PHP (Memory Considerations)
./Console/cake bulk_load coid infile.json
cake database
command to rebuild the indexes again.