Terminology:
- "data field" is a user attribute, do not want use attribute since it overlaps with attribute framework
This is a suggestion for how user data could flow to Grouper in future state
The problem this is trying to solve
- User data in subjects, provisioners, and loaders solve similar problems
- Real time data solved in multiple ways
- Security of who is allowed to see what (by person aka row or data field aka value)
- Efficiency of being able to query data without reaching out to other resources
- Ability to use data from multiple sources at one time
- Reduce the number of network calls in various places
- Reduce the SQL and LDAP syncs required to make things work
- Troubleshooting access is difficult when the history of data field changes is not known
- Unresolvable subjects are a pain... history of data fields of users will help
Setup entity resolvers
The first configuration step is to set up entity resolvers
For users
- SQL queries
- LDAP filters
- WS calls
Returns
- Single valued data fields for users
- Returns multi-valued data fields for users
- Multivalued rows of data fields for users (e.g. affiliation rows that have affiliation and dept)
Two types of data fields
- Informational
- e.g. name, description, email, etc
- Needed for provisioning or UI or WS
- Access related
- e.g. dept, title, school, DN
- Needed for loading groups, jexl scripted groups, provisioning events
Point in time
- Grouper can store point in time information about data fields
Assumption
All institutions are either
- OK with full sync of user data fields on a schedule and thats how up to date they are (e.g. every 30 minutes, hourly, daily)
- or: Can get events of when data changes in source systems
- or: Queries to source systems have last updated dates or change logs for real time updates
Grouper gets that data
- Copies to Grouper database
- Could process the data a tad
- "Virtual data fields" can have logic and make a complicated description data field (across multiple resolver sources)
- Its possible that this could help the problem of having too many subject sources though this isn't intended to be an identity system
- Can assign security so Grouper knows who is allowed to read which data
- Each data field could have a group assigned who can see the data
- There are real time events or timestamps that ensures data is up to date
Subject source
- Points to Grouper's database
- Instead being configured against sources would be configured against entity resolver data
- Can use data from multiple sources
- All identifiers must be unique
- Note, if entity resolver data is secure and available over UI/WS then the subject doesnt need any fields... e.g. Penn would not need first name and last name etc in the subject configuration.
- Subject is really just a collection of prioritized identifiers (e.g. employeeId is highest priority) and attributes
- People who are allowed to see various entity resolver data fields would see description a certain way, name a certain way, and whatever data fields they can see when they need it
- Imagine a more detailed subject page for people who can see the data... easier to troubleshoot access
- If an employee ID does change (and no other conflicts), the user could be resolved by other identifiers and it might "just work"
uuid, idIndex, subjectType (group/person/app/thing), search strings, sort strings, resolvable, etcGrouper_members_identifiers
grouper_member_idIndex, subject_identifier (unique)
When data fields are referenced, also a two part process. If a group (and user allowed to see), go to group table(s), if anything other than a group, then its the data field tables
Loaders
- Loaders and jexl scripted groups can be written on top of entity data
- Non admins can securely use that data since Grouper knows who is allowed to see what
- When the entity resolver knows that data changed real-time, it knows which loader/jexl scripted group to update
- Not all data about users will be entity resolvers... more than what was in subject source, but not everything
- If there is peripheral data you can make SQL/LDAP loaders for that
- Privileges for loaded groups could be loaded with users who can see all the related data fields
UI/WS
- Imagine more data fields than subject data fields available over WS/UI securely in one query
Provisioning
- No more "subject link"
- You can provision any entity resolver data easily
- When data changes, Grouper can tell a provisioner to recalc a user
Summary
In summary here is a metaphor... we used to have SQL credentials in multiple places, then we made an external system layer to re-use that. This suggested is similar. Have a data layer that can we re-used across things. Includes real-time updates, security, and data manipulation configured centrally... why? if we want to be ABAC and data field-based, we need to organize our data fields
Data model
grouper_members
Existing table can be stripped down since data is in the entity tables
- id (012)
- subject_id (12345678)
- idIndex
- subjectType (group / person / app / thing)
- search strings
- sort strings
- resolvable
grouper_members_identifiers
Make sure unique identifiers.
When subjects are looked up, it can be a two part process (instead of N-part for N subject sources).
- Look at groups in group table,
- Look at entities (including GrouperSystem, users, apps, things) in the data_field tables based on data fields that are marked as identifiers
- id (737)
- member_id (012)
- subject_identifier (12345678)
grouper_data_field
Types of data fields for user or rows
- id (234)
- system_name (emailAddress)
- display_name (Email)
- type (user)
- cardinality (single-valued)
- description
- viewable_by_group_id abc123
- id (567)
- system_name (org)
- display_name (Org)
- type (row)
- cardinality (single-valued)
- description
- viewable_by_group_id xyz234
grouper_data_row
Type of data field rows available for users
- id (123)
- system_name (affiliation)
- display_name (Affiliation)
- description
- viewable_by_group_id xyz234
grouper_data_row_field
Which fields are in which rows
- id (538)
- grouper_data_row_id (012)
- grouper_data_field_id (567)
grouper_data_member_field
Assignment of a data field to an entity. When data is synced to the data field tables it will need to do some matching and assign a new grouper_members row if existing not found
- id (480)
- member_id (012)
- grouper_data_field_id (234)
- value_id (789)
grouper_data_member_field_pit
History of data field to entity
- id (480)
- member_id (012)
- grouper_data_field_id (234)
- value_id (789)
- started_on 1/2/3
- ended_on
grouper_data_member_row
Assignment of a row of data to an entity
- id (321)
- member_id (012)
- grouper_data_row_id (123)
grouper_data_member_row_pit
HIstory of assignment of a row of data to an entity
- id (321)
- member_id (012)
- grouper_data_row_id (123)
- started_on 1/2/3
- ended_on
grouper_data_member_row_field
Assignment of a field to a row assignment
- id (637)
- grouper_member_data_row_id (321)
- grouper_data_field_id (567)
- value_id (654)
grouper_data_member_row_field_pit
History of assignment of a field to a row assignment
- id (637)
- grouper_member_data_row_id (321)
- grouper_data_field_id (567)
- value_id (654)
- started_on 4/5/2021
- ended_on 2/3/2021
grouper_dictionary
Keep data field values here to reduce data redundancy
- id (789)
- value (a@b.c)
- id (654)
- value (math)
grouper_data_field_sec_group
List of security groups for data field columns and rows
Could be who is allowed to see a column, who is allowed to see a rowGroup, or who is in the row
- id
- group_id_index
grouper_data_field_sec_group_mem_cache
Cache these memberships so lookups are fast. Cache this in memory too for long running processes
- id
- sec_group_id
- mem_id_index
grouper_data_field_row_sec
Row level security for data
- id (941)
- grouper_data_field_id (234)
- group_id_of_result_member
- viewable_by_group_id rst567
grouper_data_field_row_pop_group
- id
- group_id_of_result_member
- viewable_by_group_id rst567