Per-Entity Metadata Working Group - 2016-08-17
Agenda and Notes
[EtherPad used to create these notes: Agenda_and_Notes_-_2016-08-17.etherpad]
Dial in from a Phone:
Dial one of the following numbers:
+1.408.740.7256
+1.888.240.2560
+1.408.317.9253
195646158 #
Meeting URL (for VOIP and video): https://bluejeans.com/195646158
Wiki space: https://spaces.at.internet2.edu/x/T4PmBQ
Attendees
- Scott Koranda (LIGO)
- Nick Roy (Internet2/InCommon)
- Ian Young
- Paul Engle (Rice U)
- Paul Caskey (Internet2)
- IJ (Internet2)
- Michael Domingues (University of Iowa)
- Scott Cantor (tOSU)
- Tom Scavo, InCommon/Internet2
- David Walker, Internet2
- John Kazmerzak, University of Iowa
- Phil Pishioneri, Penn State
- Chris Phillips / CANARIE (arrived late 10:30am EDT)
Agenda and Notes
- NOTE WELL: All Internet2 Activities are governed by the Internet2 Intellectual Property Framework. - http://www.internet2.edu/policies/intellectual-property-framework/
- NOTE WELL: The call is being recorded.
- Agenda bash
- DRAFT slides for the 8/24/2016 InCommon TAC webinar
- https://docs.google.com/presentation/d/1YJiDpFUshWKpP77iBw1qvQeREHsRgVL8vTsvt3JEhfA/edit?usp=sharing
- Tiered (no pun intended) architecture
- HA CDN-based solution operated by TSG
- SC: Sounds doable, assuming timeouts can be set short enough. We would want to add code to avoid servers that not behaving well for a while, then try them again.
- This probably does not obviate the need for a local distribution server when there are very high availability needs on a campus (e.g., for local services, or critical off-campus services).
- Could the IdP, for example, be configured to prefetch metadata for entities with a relying-party configuration? Or perhaps just from a list of "top five" (number arbitrary) critical SPs.
- Yes (though relying-party overrides don't always refer specifically to a single SP). It just requires code to be written... Could also be done just with a scheduled task / cron task to pull down per-entity files to the on-disk backing cache
- DavidW offered to do some analysis of log files to determine the rate at which metadata is reused before it expires (i.e., how successful the client-side caching will be)
- Second tier operated by community? Perhaps also CDN based? Is this the role for samlbits.org?
- We can/should certainly recommend this. Final decision would be InCommon's.
- Are we looking at primary/secondary CDNs, or two CDNs that are used relatively equally?
- samlbits.org is appropriate as secondary, but probably not primary.
- Clients can achieve higher availability goals configured with primary and then secondary as backup
- What's the practicality of achieving 5 9's by utilizing two independent CDNs?
- Can it meet response time requirements, as well as availability?
- What requirements does that put on Shibboleth and SimpleSAMLphp?
- What are the current gaps in Shibboleth and SimpleSAMLphp?
- Service Level Requirements
- Availability (How many 9's?)
- Achieving the best balance between what can reasonably be achieved with existing CDNs and what we can ask of Shib/SSP teams for caching
- Consensus on whether (and what type of) persistent caching (between boots) is expected of IdPs and SPs?
- Perhaps different scenarios (e.g., federation only for external services vs. federation for internal services, so availability of campus Internet connectivity is an issue)?
- We'll want to address these considerations in the report. There are more factors affecting availability to clients than just server reliability.
- What are the target platforms?
- Shibboleth, simpleSAMLphp
- Ping, AD?
- (DHW) Do we care about platforms that do not consume metadata automatically?
- Response time
- Retrieving metadata from the aggregate by an IdP or SP
- Signing a new aggregate
- What if we move from daily to hourly signing? Separate question from how we actually deliver the service.
- This affects cache timeouts and, therefore, the effectiveness of client-side caching - setting an overly long cache timeout could prevent upstream changes from being picked up, but this really depends on if the cache gets hit first or second
- (DHW) Perhaps combine availability and response time? Without much thought, something like...
- 99% of days in a year have 99.999% of response times less than 100 ms
- (Response times during an outage are considered to be greater than 100 ms.)
- No day of the year has > 8.6 seconds outage/response (4 9's for a day)
- Other service requirements?
- All should be expressed as business requirements.
- Distributing split aggregates
- No time. We'll address this first next week.
- Is this a good idea? How does it fit in our roadmap?
- (SK input) Yes, good idea, should be in roadmap in near future
- (DHW) Does the end of our roadmap include aggregates?
- (SK input) No
- Should production of split aggregates have the same stages?