You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 26 Next »

Terminology:

  • "data field" is a user attribute, do not want use attribute since it overlaps with attribute framework

This is a suggestion for how user data could flow to Grouper in future state

The problem this is trying to solve

  • User data in subjects, provisioners, and loaders solve similar problems
  • Real time data solved in multiple ways
  • Security of who is allowed to see what (by person aka row or data field aka value)
  • Efficiency of being able to query data without reaching out to other resources
  • Ability to use data from multiple sources at one time
  • Reduce the number of network calls in various places
  • Reduce the SQL and LDAP syncs required to make things work
  • Troubleshooting access is difficult when the history of data field changes is not known
  • Unresolvable subjects are a pain... history of data fields of users will help


dataReimagined

  • In this diagram, the green data field resolvers are either cached (e.g. source systems), or not (one off which doesnt need the overhead of PIT etc)
    • A provisioner target could have entity data field values for users

Setup entity resolvers

The first configuration step is to set up entity resolvers

For users

  • SQL queries
  • LDAP filters
  • WS calls

Returns

  • Single valued data fields for users
  • Returns multi-valued data fields for users
  • Multivalued rows of data fields for users (e.g. affiliation rows that have affiliation and dept)

Two types of data fields

  • Informational
    • e.g. name, description, email, etc
    • Needed for provisioning or UI or WS
  • Access related
    • e.g. dept, title, school, DN
    • Needed for loading groups, jexl scripted groups, provisioning events

Point in time

  • Grouper can store point in time information about data fields

Merge / match / split

  • Similar to current subject sources, the upstream IDM needs to do the merging of entities.  Identifier changes might create conflicts, but could also flow through without incident.
  • We should have a "copy access from" function to copy from another user from a PIT, which could help with that though

A datafield could be dynamic in that it could not compute a certain value but instead look at the calling user (GrouperSession), and perhaps some env data (nonStaffView), and pick another attribute based on a script.  For example the name might depend on if the calling user is a staff member or if it is being provisioned for staff use only

Assumption

All institutions are either

  • OK with full sync of user data fields on a schedule and thats how up to date they are (e.g. every 30 minutes, hourly, daily)
  • or: Can get events of when data changes in source systems
  • or: Queries to source systems have last updated dates or change logs for real time updates

Data field resolver loads the data

  • Copies to Grouper database
  • Could process the data a tad
  • "Virtual data fields" can have logic and make a complicated description data field (across multiple resolver sources)
    • Its possible that this could help the problem of having too many subject sources though this isn't intended to be an identity system
  • Can assign security so Grouper knows who is allowed to read which data
    • Each data field could have a group assigned who can see the data
  • There are real time events or timestamps that ensures data is up to date
  • Each attribute should only come from one resolver
  • Active flag will signify if the user can be added to Grouper groups / privileges (i.e. "resolvable")


Subject source

  • Points to Grouper's database
    • Instead being configured against sources would be configured against entity resolver data
    • Can use data from multiple sources
    • All identifiers must be unique
  • Note, if entity resolver data is secure and available over UI/WS then the subject doesnt need any fields... e.g. Penn would not need first name and last name etc in the subject configuration.
  • Subject is really just a collection of prioritized identifiers (e.g. employeeId is highest priority) and attributes
  • People who are allowed to see various entity resolver data fields would see description a certain way, name a certain way, and whatever data fields they can see when they need it
  • Imagine a more detailed subject page for people who can see the data... easier to troubleshoot access
  • If an employee ID does change (and no other conflicts), the user could be resolved by other identifiers and it might "just work"


When data fields are referenced, also a two part process.  If a group (and user allowed to see), go to group table(s), if anything other than a group, then its the data field tables

Loaders

  • Loaders and jexl scripted groups can be written on top of entity data
  • Non admins can securely use that data since Grouper knows who is allowed to see what
  • When the entity resolver knows that data changed real-time, it knows which loader/jexl scripted group to update
  • Not all data about users will be entity resolvers... more than what was in subject source, but not everything
  • If there is peripheral data you can make SQL/LDAP loaders for that
  • Privileges for loaded groups could be loaded with users who can see all the related data fields

UI/WS

  • Imagine more data fields than subject data fields available over WS/UI securely in one query

Provisioning

  • No more "subject link"
  • You can provision any entity resolver data easily
    • Provisioners would not need their own data resolvers and can just use centrally cached or non-cached resolvers
  • When data changes, Grouper can tell a provisioner to recalc a user

Summary

In summary here is a metaphor... we used to have SQL credentials in multiple places, then we made an external system layer to re-use that.  This suggested is similar.  Have a data layer that can we re-used across things.  Includes real-time updates, security, and data manipulation configured centrally...  why?  if we want to be ABAC and data field-based, we need to organize our data fields


Subject security

Could have certain groups of subjects who are able to see certain other groups subjects.   We will need to get requirements for this and see how it fits in with the data field and data row security

Data model

grouper_members

Existing table can be stripped down since data is in the entity tables

  • id (012)
    • subject_id (12345678)
    • idIndex
    • subjectType (group / person / app / thing)
    • search strings
    • sort strings
    • resolvable

grouper_members_identifiers

Make sure unique identifiers.

When subjects are looked up, it can be a two part process (instead of N-part for N subject sources).

  1. Look at groups in group table,
  2. Look at entities (including GrouperSystem, users, apps, things) in the data_field tables based on data fields that are marked as identifiers
  • id (737)
    • member_id (012)
    • subject_identifier (12345678)

grouper_data_field

Types of data fields for user or rows

  • id (234)
    • system_name (emailAddress)
    • display_name (Email)
    • data_type (boolean, string, integer)
    • type (user)
    • cardinality (single-valued)
    • description
    • viewable_by_group_id abc123
  • id (567)
    • system_name (org)
    • display_name (Org)
    • data_type (boolean, string, integer)
    • type (row)
    • cardinality (single-valued)
    • description
    • viewable_by_group_id xyz234

grouper_data_row

Type of data field rows available for users

  • id (123)
    • system_name (affiliation)
    • display_name (Affiliation)
    • description
    • viewable_by_group_id xyz234

grouper_data_row_field

Which fields are in which rows

  • id (538)
    • grouper_data_row_id (012)
    • grouper_data_field_id (567)

grouper_data_member_field

Assignment of a data field to an entity.   When data is synced to the data field tables it will need to do some matching and assign a new grouper_members row if existing not found

  • id (480)
    • member_id (012)
    • grouper_data_field_id (234)
    • value_id (789)

grouper_data_member_changelog

Events that happen to data fields to be processed by loaders/provisioners/etc.  Keep data for a week then delete

  • id (480)
    • member_id (012)
    • grouper_data_field_id (234)
    • old_value_id
    • new_value_id
    • date
    • action
    • grouper_data_row_id (123)

grouper_data_member_field_pit

History of data field to entity

  • id (480)
    • member_id (012)
    • grouper_data_field_id (234)
    • value_id (789)
    • started_on 1/2/3
    • ended_on

grouper_data_member_row

Assignment of a row of data to an entity

  • id (321)
    • member_id (012)
    • grouper_data_row_id (123)

grouper_data_member_row_pit

HIstory of assignment of a row of data to an entity

  • id (321)
    • member_id (012)
    • grouper_data_row_id (123)
    • started_on 1/2/3
    • ended_on

grouper_data_member_row_field

Assignment of a field to a row assignment

  • id (637)
    • grouper_data_row_field_id (538)
    • value_id (654)

grouper_data_member_row_field_pit

History of assignment of a field to a row assignment

  • id (637)
    • grouper_data_row_field_id (538)
    • value_id (654)
    • started_on 4/5/2021
    • ended_on

grouper_dictionary

Keep data field values here to reduce data redundancy

  • id (789)
    • value (a@b.c)
  • id (654)
    • value (math)


grouper_data_field_sec_group_mem_cache

Cache these memberships so lookups are fast.  Cache this in memory too for long running processes.  The groups that are cached are... any groups that secure fields, any groups that secure rows, etc

  • id
    • sec_group_id
    • mem_id_index

grouper_data_field_row_sec

Row level security for data

  • id (941)
    • grouper_data_field_id (234)
    • group_id_of_result_member
    • viewable_by_group_id rst567

grouper_data_field_row_pop_group

  • id
    • group_id_of_result_member
    • viewable_by_group_id rst567


  • No labels