Include Page | ||||
---|---|---|---|---|
|
This is in Grouper v5+
Table of Contents |
---|
Children Display |
---|
Terminology
- "data field" is a user attribute, do not want use attribute since it overlaps with attribute framework
This is a suggestion for how user data could flow to Grouper in future state
The problem this is trying to solve
...
- . This is named "data field" since "attribute" is used in many other places, e.g. the attribute framework
Description
- Data field values are assigned to users, groups, or globally available
- The data can be single or multi-valued
- The data can be structured in a row (the data in a cell can still be multi-valued)
- The data is stored in Grouper, updated in real time and full syncs
- Point in time history will be maintained
- Security on data fields will ensure that private data remains private
- The data will be stored efficiently so it does not take a lot of space and queries are efficient
- Data fields are documented with examples so users can easily request access, see what data fields represent and how to use them
- Data fields can be configured in the UI
- The data can be used:
- To construct ABAC policies on groups (scripted group based on data field values)
- Subject sources can be replaced by a data field source (this is the future direction, and all subject sources will eventually need to be migrated)
- Provisioning data about users to other systems
- Reports about access and users
- Etc
Data field flow
...
Gliffy Diagram | ||||||||
---|---|---|---|---|---|---|---|---|
|
Setup entity resolvers
The first configuration step is to set up entity resolvers
For users
- SQL queries
- LDAP filters
- WS calls
Returns
- Single valued data fields for users
- Returns multi-valued data fields for users
- Multivalued rows of data fields for users (e.g. affiliation rows that have affiliation and dept)
Two types of data fields
- Informational
- e.g. name, description, email, etc
- Needed for provisioning or UI or WS
- Access related
- e.g. dept, title, school, DN
- Needed for loading groups, jexl scripted groups, provisioning events
Point in time
- Grouper can store point in time information about data fields
Assumption
All institutions are either
- OK with full sync of user data fields on a schedule and thats how up to date they are (e.g. every 30 minutes, hourly, daily)
- or: Can get events of when data changes in source systems
- or: Queries to source systems have last updated dates or change logs for real time updates
Grouper gets that data
- Copies to Grouper database
- Could process the data a tad
- "Virtual data fields" can have logic and make a complicated description data field (across multiple resolver sources)
- Its possible that this could help the problem of having too many subject sources though this isn't intended to be an identity system
- Can assign security so Grouper knows who is allowed to read which data
- Each data field could have a group assigned who can see the data
- There are real time events or timestamps that ensures data is up to date
Subject source
- Points to Grouper's database
- Instead being configured against source would be configured against entity resolver data
- Can use data from multiple sources
- Note, if entity resolver data is secure and available over UI/WS then the subject doesnt need as many fields... e.g. Penn would not need first name and last name etc.
- Subject could just be id
- People who are allowed to see various entity resolver data fields would see description a certain way, name a certain way, and whatever data fields they can see when they need it
- Imagine a more detailed subject page for people who can see the data... easier to troubleshoot access
- Points to Grouper's database
Members table
- Members table can be stripped down since data is in the entity tables
Loaders
- Loaders and jexl scripted groups can be written on top of entity data
- Non admins can securely use that data since Grouper knows who is allowed to see what
- When the entity resolver knows that data changed real-time, it knows which loader/jexl scripted group to update
- Not all data about users will be entity resolvers... more than what was in subject source, but not everything
- If there is peripheral data you can make SQL/LDAP loaders for that
- Privileges for loaded groups could be loaded with users who can see all the related data fields
UI/WS
- Imagine more data fields than subject data fields available over WS/UI securely in one query
Provisioning
- No more "subject link"
- You can provision any entity resolver data easily
- When data changes, Grouper can tell a provisioner to recalc a user
Summary
In summary here is a metaphor... we used to have SQL credentials in multiple places, then we made an external system layer to re-use that. This suggested is similar. Have a data layer that can we re-used across things. Includes real-time updates, security, and data manipulation configured centrally... why? if we want to be ABAC and data field-based, we need to organize our data fields
Data model
grouper_members
Existing table
- id (012)
- subject_id (12345678)
grouper_data_field
Types of data fields for user or rows
- id (234)
- system_name (emailAddress)
- display_name (Email)
- type (user)
- cardinality (single-valued)
- description
- viewable_by_group_id abc123
- id (567)
- system_name (org)
- display_name (Org)
- type (row)
- cardinality (single-valued)
- description
- viewable_by_group_id xyz234
grouper_data_row
Type of data field rows available for users
- id (123)
- system_name (affiliation)
- display_name (Affiliation)
- description
grouper_data_row_field
Which fields are in which rows
- id (538)
- grouper_data_row_id (012)
- grouper_data_field_id (567)
grouper_data_field_row_sec
Row level security for data
- id (941)
- grouper_data_field_id (234)
- group_id_of_result_member pcm428
- viewable_by_group_id rst567
grouper_member_data_field
Assignment of a data field to an entity
- id (480)
- member_id (012)
- grouper_data_field_id (234)
- value_id (789)
- created_on 1/2/3
- updated_on 2/3/2021
grouper_member_data_row
Assignment of a row of data to an entity
- id (321)
- member_id (012)
- grouper_data_row_id (123)
- created_on 4/5/2021
grouper_member_data_row_field
Assignment of a field to a row assignment
- id (637)
- grouper_member_data_row_id (321)
- grouper_data_field_id (567)
- value_id (654)
- created_on 4/5/2021
- updated_on 2/3/2021
grouper_dictionary
Keep data field values here to reduce data redundancy
|
- In this diagram, the green data field resolvers are either cached (e.g. source systems), or not (one off which doesnt need the overhead of PIT etc)
- A provisioner target could have entity data field values for users
Previous state
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
Data field and row diagram
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
Example usages:
- Provision any identifier to a target without having to "resolve the subject"
- Make a JEXL scripted group: People who have a payroll data row where org is "MATH" and have an affiliation data row where affiliation name is Staff or Faculty with an End date in the next month. Put a rule on that group for the Business Analyst to review people who might need to be renewed.
- Load users from Zoom and match accounts to users in Grouper by any of their email addresses
- A staff member creates a report where a column represents if the user in the service is an Employee
- A help desk worker can see the history of affiliations and troubleshoots access by seeing that the user's payroll org recently changed
Configure data field privacy realms
A privacy realm is a configuration for privacy of one or many data fields. Re-use them as much as you can to reduce the number of configurations
Configure data fields
Each data field or row column is configured as a data field.
Configure data rows
Rows are configured as a "table". The columns are data fields.
Configure data providers
A data provider is a set of queries that load data into Grouper in real time or full sync
Configure data provider queries
These select data from the target to populate Grouper with data field values. A single provider can have multiple queries. Each query has one provider.
Configure data provider real time query
This helps the change log know which data to update
...
- value (a@b.c)
...