Per-Entity Metadata Working Group - 2016-08-03
Agenda and Notes
[EtherPad used to create these notes: Agenda_and_Notes_-_2016-08-03.etherpad]
===>> Note the new PIN and meeting URL <<===
Dial in from a Phone:
Dial one of the following numbers:
+1.408.740.7256
+1.888.240.2560
+1.408.317.9253
195646158 #
Meeting URL (for VOIP and video): https://bluejeans.com/195646158
Wiki space: https://spaces.at.internet2.edu/x/T4PmBQ
Attendees
- David Walker, Internet2
- Ian Young
- Phil Pishioneri, Penn State
- Michael Domingues, University of Iowa
- Paul Engle, Rice U
- Tom Scavo, InCommon/Internet2
- Tommy Doan, Southern Methodist University
- Scott Cantor, tOSU
- Tom Mitchell, GENI
- John Kazmerzak, University of Iowa
- Rhys Smith, Jisc
- Paul Caskey, Internet2
- Walter Hoehn, Memphis
- Chris Phillips, CANARIE
https://public.etherpad-mozilla.org/p/Agenda_and_Notes_-_2016-08-03
Agenda and Notes
- NOTE WELL: All Internet2 Activities are governed by the Internet2 Intellectual Property Framework. - http://www.internet2.edu/policies/intellectual-property-framework/
- NOTE WELL: The call is being recorded.
- Agenda bash
- Should we talk about (functional) requirements for the service before risks?
- Qualities of the service -- expected and how close actual existing meets it (why the 'requirement' or expectation is suggested) -CP
- Some people have been assuming a "DNS" model, that the service is very reliable, not usually requiring special client-side mechanisms to accommodate to failures.
- What are the risks for a per-entity metadata service and the possible mitigations
- I suggest we list risks along with their likelihood, impact, and potential mitigation (DHW)
- Risks from last week's call (https://spaces.at.internet2.edu/x/pYIABg) and subsequent electronic mail discussion
- Availability
- Expectations: ability to query for a given piece of metadata at anytime
- Failure of the distribution service for IdPs and SPs for longer than ??
- Failure of the aggregation/signing service for longer than ??
- Security
- Q: will MDQ have any material difference in security than the existing aggregate?
- Scott/Michael -- no difference at this time.
- Disclosure of the signing key
- IdPs and SPs that do not verify signatures
- Clients not checking metadata signatures
- Service Delivery
For reference: Terms and their meaning around availability and uptime implications
3 9’s allowed downtime: 8.76hrs/yr, 43.8 min/month, 10.1 min/week
4 9’s allowed downtime: 52.6 min/yr, 4.32 min/month, 1.01min/week
5 9’s allowed downtime: 5.26 min/yr, 25.9 sec/month, 6.05sec/week- Expectations:
- Q: should perfect reliability assumed? (Scott C)
- Observations:
- Rhys -- as reliable as the current delivery model, as reliable as possible. Since serving static content, could throw it on a commercial CDN if necessary
- Chris -- similar to Rhys, but in order to deliver 5 9's like experience, caching at various levels to contribute to the whole. +1 to CDN comment
- Different clients will present diversity on how to solve availability.
- There are mitigations that don't involve mods to the IdP/SP code (e.g., http caching proxies)
- There are no 100% solutions.
- What is an acceptable level?
- High 90s (for the aggregation/signing portion of the infrastructure)
- At least Akamai (for the distribution portion of the infrastructure) (Walter H)
- At least 2 9's, probably 3 or 4.
- Consensus (in this call) is that we need at least 3-4 nines of reliability in the distribution service, even better.
- Note that retrieving (reading) an MDQ artifact/response is DIFFERENT than being able to UPDATE the content of the MDQ response.
- These should be considered separate qualities.
- e.g. you may need 5 9's on read/publishing the content, but can tolerate changing the data less reliably (due to cost of offering said reliability)
- Clients start up with nothing cached.
- Should we recommend something for that?
- Is it something that's nice to have our something we *should* have before rolling this out?
- CONCLUSION:Consensus existing client-side caching is sufficient.
We can, however, tell them what they can do to increase reliability- If you point out that people can add additional caching, this might invite questions of reliability of the service. Consensus around it not being worth mentioning that at all.
- CONCLUSION:Consensus existing client-side caching is sufficient.
- These should be considered separate qualities.
- Does MDQ change the calculus about using federation infrastructure for storing /local/ service metadata?
- ANS: YES.
- Risk if (single path) internet connection goes down, lose access to metadata for local services.
- This could be an argument for enterprise-provided distribution infrastructure.
- ANS: YES.
- Further discussion of risks
- Responsiveness / Capacity
- Operations
- Expectations: Ability to sign metadata
- Q: is it 'real time'?
- Q Is it 'online signing?'
- The service is up, but unusably slow
- Capacity is not sufficiently elastic
- Rate of update
- Rate of query
- Malfunctioning entity...
- Cost
- Cost of elastic capacity not budgeted
- Rhys: You can use the Azure CDN with current UK federation level of traffic (50 TB/year) --> 200 GBP per month
- Staff time and attention not sufficient
- Your favorite risk here...
- Requirements for availability and scalability
- Next call is August 10, 2016 @ 10:00 AM (America/New York)