Current issues in attributes
Issues around attributes have become the central theme in federated identity, particularly as the understandings around authentication coalesce. Carried as payloads in authentication assertions or via other channels, attributes are at the core of both privacy preservation and scalable access control.
For the purposes of this paper, attributes are understood to mean a set of data objects about individuals, with a controlled set of values the objects can have. Their common syntax and semantics are agreed on (formally or informally) by a community of interest and are used for exchanges, typically around access control, within that community. Protocols such as SAML and Open ID Connect are used to implement the exchanges. Attributes of interest include: citizenship, address, accessibility needs, preferred language, licensed content access entitlements, etc. Note that some attributes, particularly in federated instances, do not exist per se within a directory but are calculated from directory values for on-the-wire transmission For example, because legal age varies by jurisdiction, the attribute “over legal age” might be best managed by dynamic calculation based on the jurisdictions of the IdP and RP.
One particularly important set of attributes for a person are called identifiers – values that have a one-one relationship with that person and so are potentially privacy-destructive. Because of that, care must be taken in the selection of identifiers for each specific use case. Important characteristics of identifiers include opacity, scope of uniqueness, human readability, reassignment, etc. Identifiers in widespread use include email addresses, display names, login accounts, edupersonTargetedId, MBUN, etc.
Discussions go back to the Tao of Attributes workshop at NIST in 2009. More recently, conversations have been occurring in Kantara, IETF, Refeds, the NSTIC IDESG and OASIS. This report is based upon those set of conversations as well as involvement in the international identity space. It should be noted that many of the issues and findings could be considered as part of the attribute metadata discussions now forming under ISOC. Activities now beginning around that topic could inform many of the issues identified below.
In addition, some of the Scalable Privacy deliverables played a key role in shaping this report. The citizen centric attribute registry that was established at resulted in interactions with several sectors about their needs and approaches. And, most recently, the ongoing development of PrivacyLens, a next-gen consent manager, has elevated a number of concerns about some of the attributes discussed below. All of these deliverables can be found on the project website at https://spaces.at.internet2.edu/display/scalepriv/Scalable+Privacy
The state of understanding about identifiers:
For all their simplicity, there has been little effort to leverage identifiers to preserve privacy. Part of the reason is concern over how to represent the issues to users; part may be managing public perceptions; part is that the marketplace prefers identity rich information rather than the privacy that identifiers can provide.
While more than a dozen distinct characteristics of identifiers can be discussed, in practice a smaller set of concepts are most relevant to the ecosystem:
Persistence is a measure of the length of time during which an identifier can be reliably associated with a particular subject. A very short-term identifier might be associated with an application session. A permanent identifier is associated with its entry for its lifetime (which is not necessarily "forever", so permanence is just a relative notion). Examples of persistent identifiers include social security numbers, passport numbers, eppn, etc.
Many identifiers do not specifically guarantee that a given value will always refer to a single subject forever. Reassignment means the association of an identifier value to one subject, and then assigning the same value to a different subject at some point in the (possibly distant) future. Examples of non-reassignable identifiers include social security numbers, passport numbers, the orcid identifier, etc.
• Privacy-preserving with Opacity
Some identifiers are designed to preserve a subject's privacy and limit the ability of unrelated applications from correlating activity by comparing values they receive. Such identifiers are therefore required by design to be opaque, and to have no particular relationship to a subject's legal identity or other identifiers. Note that this definition still permits sharing/commonality of the identifier among multiple applications if they are deemed to be equivalent to a single application for privacy purposes. Examples include epTID, pairwise (IdP-SP) opaque identifiers, etc.
• Human Palatability
An identifier that is human-palatable is intended to be rememberable and reproducible by typical human users, in contrast to identifiers that are, for example, the randomly generated sequences of bits in opaque identifiers. There is a natural tension between palatability and both privacy and non-reassignment and they are often in opposition. The world does not have a popular solution to all three problems at once today, which feeds into the oft-noted recommendation that many applications really need to use multiple identifiers for different purposes. Examples of human palatable identifiers includes surname, Display Name, email address, etc.
• The scope of uniqueness
All identifiers must have some degree of uniqueness, within a particular "namespace" in which the identifiers are being created and managed. Sometimes this namespace is explicitly made part of the identifier (as in the case of a "scoped" identifier, see below), in which case the identifier is globally unique. In other cases, the namespace may be implicit, in which case the identifier may not stand-alone without the namespace being articulated and stored in some form. This becomes particularly relevant when applications are truly federated (supporting multiple Identity Providers accessing the same data), or otherwise store identifiers in a common way. Because of DNS, email addresses are considered to be globally unique. Generally human names are not globally unique. Display names may not be unique even within a given domain.
• Privacy preserving transactions with Non-Correlating identifiers and unlinkability
Transactional identifiers can be handed out by an identity provider in several ways that can provide additional privacy protections. This includes
handing out a different short-term identifier to each web site that an authenticated user visits. This allows the many benefits of a stateful user experience but stops correlation attacks. It does not address linkability concerns. Examples include epTID from eduperson schema.
Handing out a different short-term identifier each time a authenticated user goes to a web site. This provides ultimate unlinkability of users actions, but at some impact on user experience. IdP software such as Shibboleth can be configured to generate a new short-term identifier for each web session.
The state of understanding about attributes:
The issues around attributes are being wrestled with in many forums and standards groups. Because of it being early in development and the inherently unbounded variety of use cases, foundational issues are paramount. These include
• LOA of Attributes:
The prospect of LOA for attributes in general remains unclear. We have come to use the Stephen Colbert word “truthiness” to describe a wide variety of approaches for a RP to determine veracity of an attribute in an assertion. Marketplace approaches such as attribute verifiers as a service, regulatory approaches, self-asserted consistencies being mined, and other efforts are all working on this central issue. Ironically, at the same time it is not clear that there is much RP interest in truthiness beyond a coarse-grain measure.
• Attribute bundles
Attributes tend to travel in bundles. Attributes, particularly in bundles are linked to, but distinct from end-entity tags used as trust marks. There will likely be a several to many mapping of attribute bundles to tags. For example, a “vanilla bundle of an opaque identifier and a display name” may serve a variety of application needs and their tags. A basic “chocolate bundle” might consist of a login identifier and a givenname+surname pair. A frequent add-in, perhaps, might be an ORICD identifier for R&E purposes. FICAM has proposed an “identity resolution” bundle, based on a set of attributes (some parsed to be partially identifiable) developed by NASPO to provide a high assurance in binding new federated credentials to an existing account.
There is a balance between bundles, add-ins and usability. In particular, complexity may adversely affect informed consent. And user patience for this active involvement in privacy management is uncertain. The ability to minimize the impacts of consent on usability will be important.
• Attribute Metadata
It is also not clear how to share the metadata. Is it attached to each attribute, or, at the other end of the granularity spectrum, is it shared at the federation level? If so, is it part of the regular federation metadata or should other vehicles, such as attribute registries, be used? Solutions will reflect issues around the dynamic vs. static nature of the attribute metadata, need to manage its information, etc.
While different weights of importance to different players, and different ways of conveying the metadata information, we should have agreement on what are the basic attribute metadata dimensions to be considered. This is of particular import as it is likely that any controls on downstream use of attributes will be expressed in metadata.
• Shared syntax, semantics and meta-attributes for mappings between federations
The syntax and semantics of attributes are the most fundamental pieces of attribute metadata. Because of what they are intended to capture, even attributes used globally may have widely different local interpretations. Names, format of dates, and over legal age are key attributes in many countries, but vary in their syntax, if not semantics. For these common attributes, mappings will need to happen. Where these mappings happen will vary widely, from hubs in hub and spoke federations to RP themselves. Meta-attributes – a few key concepts such as those listed in the examples – will be useful for establishing the mappings.
• End-entity categories and Granularity and Composition of trust marks:
While not strictly about attributes , both of these issues affect attribute release and use deeply. End-entity categories allow RP and IdP’s to announce what sets of behaviors they follow in attribute release and use. Trustmarks can be consumed by two audiences – machine readable marks, used as components of a computable trust framework; and human-readable marks, used by other IdP administrators or end-users to determine which attributes to release to a relying party. In this latter case, the issue of composition of trust marks in the UI becomes critical; users do not want to be overwhelmed by the complexity of guidance provided.
• Consent and active End User Privacy Management
Perhaps the greatest challenge in federated identity today is managing the release of attributes from the user and IdP to the RP. In the cases where there is an underlying contract, the IdP and RP can negotiate attribute release as part of that agreement and the need for consent is obviated. But almost all the cases coming forward now are without contracts (beyond the user click-through), and many are international. In these instances, the guidance and mechanics of consent is quite tangled. Despite the principle of user control lies a morass of legal uncertainty and great doubts about the willingness and capabilities of most end-users to manage. Doing this right – where the spectrum of users (from the privacy fundamentalists through the pragmatics to the don’t cares in the famous Westin Privacy Index) are all well served, will be critical and difficult.
Attributes are aggregated at several functional areas in the ecosystem – at IdP’s in a repository for the users to manage coherently, at RP’s to help make an authorization decision, at gateways which augment IdP attributes with gateway managed permissions, at attribute and policy servers that users might manage, etc. Where aggregation occurs has profound consequences on privacy and consent, on business models, on the user experience, and many other parts of the ecosystem.
• Business models
As authentication becomes a more mature industry, much of the frontier is shifting to attributes, including business start-ups. The business models offer validation, aggregation, management and other attribute oriented services. Because there is little understanding and even less standards around attributes, the marketplace that is emerging is fractured, conflicting, and sometimes beyond legal or ethical norms.
This report recommends some follow-on actions.
• It would be helpful to link and leverage various attribute registry efforts underway. The federal government should participate, but not lead, in such discussions.
• A few critical attributes, most notably name, are inconsistently defined across a variety of communities of interest and their registries. Two responses seem appropriate. The first is to agree on mappings, using a “meta-attribute”, and where (IdP, SP or elsewhere) the mappings are done. The second is to use the attribute registry linkage described above to minimize further corrosion. Again, the federal government should participate in these conversations and as a significant community of interest itself, will have considerable influence on approaches taken.
• There is an immediate need for a common set of metadata items about identifiers. Conversations about identifiers often tend to be difficult because participants have different assumptions about their characteristics. Many of these possible distinctions among identifier characteristics will need to be reflected in the UX component of privacy management, yet another driver to focus on this task. NIST, as part of its work in developing taxonomies and glossaries in support of the IDESG, could do this.
• We have to enable the use of privacy-preserving identifiers as options in transactions. That means examining the UX issues and how to build effective educational advice, both for users and for RP. Again, good practices work coming from the IDESG should provide this advice.
• As noted above, we should have agreement on what are the basic attribute metadata dimensions that will help structure the “attribute ecosystem”. While different RP’s and IdP’s might weigh the importance of various metadata dimensions, we should have a common understanding of what those dimensions are. This topic has been proposed for work in several different venues, but has not gelled into a structured activity. The IDESG could have a starting discussion and develop a straw man, but consensus building should happen in another forum.
• As noted above, aggregation of attributes occurs naturally and widely across the identity ecosystem. The issue has recently become central to some NSTIC privacy discussions. Guidelines and good practices in this area would be welcome. Again, lacking any other alternative, a structural framing of the issues by IDESG, with NIST support, would be welcome.
• There is a dearth of good practices and codes of conduct in the identity ecosystem. The only one with any traction is the EU Code of Conduct. But even that document, on the proper use and disposal of attributes, is EU-internal. Attribute exchanges from EU to non-EU countries and then exchanges among the non-EU countries are not managed. The NSTIC activity has laudable goals in this area.