IDtrust 2010 9th Symposium on Identity and Trust on the Internet Program Notes Transportation There will be a shuttle bus leaving the Gaithersburg Holiday Inn at 8:00 a.m. Tuesday and Wednesday morning to travel to NIST. The shuttle will return to the hotel at the end of the sessions on Tuesday and Wednesday. There will not be a shuttle bus on Thursday - please car pool or use the hotel shuttle. Wireless 802.11b Wireless access points will be available for at least SSH, IPSEC, HTTP, DNS, FTP, POP, IMAP, and SMTP connectivity. Proceedings at ACM Digital Library The proceedings are also available in the ACM International Conference Proceeding Series archive: Proceedings of the 9th Symposium on Identity and Trust on the Internet (ISBN:978-1-60558-895-7). Blogging Participants and observers are encouraged to use the tag "idtrust2010" when blogging and tweeting about the symposium. Program Tuesday, April 13, 2010 - Full Day 8:00 Bus Departs from Gaithersburg Holiday Inn for NIST 8:30 - 9:00 Registration and Continental Breakfast 9:00 - 9:10 Welcome and Opening Remarks Program Chair: Carl Ellison, Independent (Slides: pptx ) 9:10 - 10:00 Keynote Talk I The Central Role of Identity Authentication in Immigration Reform Bruce Morrison, Morrison Public Affairs Group 10:00 - 10:15 Break 10:15 - 11:40 Session 2: Technical Papers - Identity Providers and Federation Session Chair: Peter Alterman, National Institutes of Health Federated Login to TeraGrid (Presentation slides: pptx pdf ) Jim Basney, National Center for Supercomputing Applications Terry Fleury, National Center for Supercomputing Applications Von Welch, National Center for Supercomputing Applications An Identity Provider to manage Reliable Digital Identities for SOA and the Web (Presentation slides: pdf ) Ivonne Thomas, Hasso-Plattner-Institute Christoph Meinel, Hasso-Plattner-Institute CardSpace-Liberty Integration for CardSpace Users (Presentation slides: ppt ) Haitham Al-Sinani, Royal Holloway, University of London Waleed Alrodhan, Royal Holloway, University of London Chris Mitchell, Royal Holloway, University of London 11:40 - 12:00 Break 12:00 - 1:00 Session 3: Technical Papers - Policy Conflict Resolution Session Chair: David Chadwick, University of Kent An Attribute-based Authorization Policy Framework with Dynamic Conflict Resolution (Presentation slides: ppt ) Apurva Mohan, Georgia Tech Douglas Blough, Georgia Tech Computational Techniques for Increasing PKI Policy Comprehension by Human Analysts (Presentation slides: ppt ) Gabriel Weaver, Dartmouth College Sean Smith, Dartmouth College Scott Rea, Dartmouth College 1:00 - 2:00 Lunch 2:00 - 3:30 Session 4: Panel - Identity Proofing Panel Moderator: Elaine Newton, National Institute of Standards and Technology Darrell Williams, Department of Homeland Security Jim McCabe, American National Standards Institute (ANSI) (Slides: ppt pdf ) Brian Zimmer, Coalition for a Secure Driver's License 3:30 - 4:00 Break 4:00 - 5:30 Session 5: Panel - Four Bridges Forum: How Federated Identity Trust Hubs Improve Identity Management Panel Moderator: Peter Alterman, National Institutes of Health Tim Pinegar, Federal PKI Architecture (Slides: pptx pdf ) Mollie Shields-Uehling, SAFE-BioPharma Association (Slides: ppt ) Scott Rea, HEBCA Operating Authority (Slides: ppt ) Jeff Nigriny, CertiPath (Slides: ppt ) 5:30 Bus Departs for Gaithersburg Holiday Inn 6:00 Social Gathering and Dinner Buffet - Gaithersburg Holiday Inn Wednesday, April 14, 2010 - Full Day 8:00 Bus Departs from Gaithersburg Holiday Inn for NIST 8:30 - 9:00 Registration and Continental Breakfast 9:00 - 9:50 Keynote Talk II Internet Voting: Threat or Menace? (Presentation slides: ppt pdf ) Jeremy Epstein, SRI International 9:50 - 10:10 Break 10:10 - 11:10 Session 7: Panel - End-to-End and Internet Voting Panel Moderator: Neal McBurnett, Internet2 Poorvi Vora, George Washington University (Slides: ppt ) Jeremy Epstein, SRI International 11:10 - 11:40 Invited Talk - Using the DNS as a Trust Infrastructure with DNSSEC Scott Rose, NIST (Slides: pptx ) 11:40 - 12:00 Break 12:00 - 1:00 Session 8: Technical Papers - Privacy Session Chair: Stephen Whitlock, Boeing Efficient and Privacy-Preserving Enforcement of Attribute-Based Access Control (Presentation slides: pdf ) Ning Shang, Purdue University Federica Paci, University of Trento Elisa Bertino, Purdue University Privacy-Preserving DRM (Presentation slides: ppt ) Radia Perlman, Intel Labs Charlie Kaufman, Microsoft Ray Perlner, NIST 1:00 - 2:00 Lunch 2:00 - 2:30 Session 9: Invited Talk - Hash Competition Introduction: Carl Ellison, Independent Bill Burr, National Institute of Standards and Technology (Slides: pdf ppt ) 2:30 - 3:00 Session 10: Technical Papers - Biometrics Session Chair: David Chadwick, University of Kent Biometrics-Based Identifiers for Digital Identity Management (Presentation slides: pdf ) Abhilasha Bhargav-Spantzel, Intel Corporation Anna Squicciarini, Pennsylvania State University Elisa Bertino, Purdue University Xiangwei Kong, Dalian University of Technology Weike Zhang, Dalian University of Technology 3:00 - 3:30 Break 3:30 - 4:00 Session 11: Invited Talk: Personal Identity Platforms Bill MacGregor, National Institute of Standards and Technology (Slides: ppt ) 4:00 - 5:30 Session 12: Panel - The Path to Citizen Identity Federation Worldwide: How Kantara Initiative Programs are enabling Citizen Identity Federation Panel Moderator: Roger Martin, Kantara Initiative Vikas Mahajan, AARP Jim Zok, Computer Sciences Corporation Elaine Newton, National Institute of Standards and Technology Jack Leipold, Social Security Administration Bill Young, Department of Internal Affairs 5:30 Bus Departs for Gaithersburg Holiday Inn Dinner (on your own) Thursday April 15, 2010 - Half Day 8:00 No Bus - please share rides to NIST 8:30 - 9:00 Registration and Continental Breakfast 9:00 - 10:00 Session 13: Technical Papers - Infrastructure Session Chair: Peter Alterman, National Institutes of Health Practical and Secure Trust Anchor Management and Usage (Presentation slides: ppt pdf ) Carl Wallace, Cygnacom Solutions Geoff Beier, Cygnacom Solutions A Proposal for Collaborative Internet-scale trust infrastructures deployment: the Public Key System (PKS) (Presentation slides: pdf ) Massimiliano Pala, Dartmouth College 10:00 - 10:30 Break 10:30 - 11:30 Session 14: Panel – Levels of Assurance for Attributes Panel Moderator: Carl Ellison, Independent (Slides: pptx ) David Chadwick, University of Kent (Slides: ppt ) Ken Klingenstein, Internet2 (Slides: ppt ) Chris Louden, Protiviti (Slides: ppt ) Peter Alterman, National Institutes of Health (Slides: ppt ) 11:30 - 12:15 Session 15: RUMP Session (Work in Progress) Session Chair: Neal McBurnett, Internet2 Deployment Experience for the PKI Resource Query Protocol for Grids and FBPKI (Presentation slides: odp pdf ) Massimiliano Pala, Dartmouth College Preferred model for multiple federations: 'Interfederation' ala the Internet or 'Superposition of federations' ala credit card industry? Bill MacGregor's question, NIST 12:15 - 12:30 Wrap up See Also This workshop is part of the IDtrust Symposium Series •2011: 10th Symposium on Identity and Trust on the Internet (IDtrust 2011) •2010: 9th Symposium on Identity and Trust on the Internet (IDtrust 2010) •2009: 8th Symposium on Identity and Trust on the Internet (IDtrust 2009) •2008: 7th Symposium on Identity and Trust on the Internet (IDtrust 2008) •2007: 6th Annual PKI R&D Workshop •2006: 5th Annual PKI R&D Workshop •2005: 4th Annual PKI R&D Workshop •2004: 3rd Annual PKI R&D Workshop •2003: 2nd Annual PKI Research Workshop •2002: 1st Annual PKI Research Workshop IDtrust 2010 13 April 2010 Sponsors NIST Internet2 Federal Public Key Infrastructure Policy Authority (FPKIPA) OASIS IDtrust Member Section Special Thanks To the program committee – who did a great job evaluating and selecting papers To Neal McBurnett – who carried a heavy load and provided stability through this process To Radia Perlman – for coordinating the selection of panels And most especially, to Sara Caswell – who made this symposium happen Brief History of IDtrust 2001: PKI Labs 2002: the 1st PKI Research Workshop 2004: name change: PKI R&D Workshop 2008: IDtrust Longer History 1976: New Directions in Cryptography 1978: RSA 1978: Loren Kohnfelder 1980s: X.500 and X.509 1990s: commercial CAs Where’s the beef? Bridges, Federation, Applications, SSO Real security policies The Real Job: Making a Security Decision Deciding what policy to enforce; and Translating that policy into a form computers and other humans can understand. Our end customers have real security problems – not just a desire to deploy PKI. PKI, bridges, federation, smart-cards, etc., are tools available to address the real problems and are no longer ends in themselves. Today’s Keynote Bruce Morrison: The Central Role of Identity Authentication in Immigration Reform Federated Login to TeraGrid Jim Basney Terry Fleury Von Welch jbasney@illinois.edu tfleury@illinois.edu vwelch@illinois.edu National Center for Supercomputing Applications University of Illinois 1205 West Clark Street Urbana, Illinois 61801 ABSTRACT In this article, we present the design and implementation We present a new federated login capability for the Tera- of a new system that enables researchers to use the authenti- Grid, currently the world’s largest and most comprehensive cation method of their home organization for access to Tera- distributed cyberinfrastructure for open scientific research. Grid. Participating in the InCommon Federation2 enables Federated login enables TeraGrid users to authenticate us- TeraGrid to accept authentication assertions from U.S. in- ing their home organization credentials for secure access to stitutions of higher education, so researchers can use their TeraGrid high performance computers, data resources, and existing campus login to authenticate to TeraGrid resources. high-end experimental facilities. Our novel system design This federated login capability brings multiple benefits: links TeraGrid identities with campus identities and bridges • It mitigates the need for researchers to manage au- from SAML to PKI credentials to meet the requirements of thentication credentials specific to TeraGrid in addi- the TeraGrid environment. tion to their existing campus credentials. Simplifying researchers’ access to TeraGrid helps them to better Categories and Subject Descriptors focus on doing science. K.6.5 [Management of Computing and Information Systems]: Security and Protection—Authentication • Reducing or eliminating the need for a TeraGrid pass- word eases the burden on TeraGrid staff, by reducing the number of helpdesk calls requesting password re- General Terms sets and avoiding the need to distribute passwords to Security researchers in the first place. Keywords • Using the campus login to access TeraGrid helps to in- tegrate campus computing resources with TeraGrid re- PKI, SAML, identity federation, grid computing, TeraGrid, sources. Researchers should be able to easily combine MyProxy, GridShib, Shibboleth resources on campus with resources from TeraGrid and other national cyberinfrastructure. Harmonizing secu- 1. INTRODUCTION rity interfaces across the infrastructure is a positive TeraGrid1 is an open scientific discovery infrastructure step towards this goal. combining leadership class resources at eleven partner sites to create an integrated, persistent computational resource. • Federated login enables the provisioning of TeraGrid TeraGrid serves over 4,500 researchers from over 300 col- resources according to campus-based identity vetting leges, universities, and research institutions in the United and authorization. TeraGrid resources could be allo- States. TeraGrid resources are allocated to researchers by cated to a university class or department, and Tera- peer review. Researchers must authenticate to TeraGrid re- Grid could rely on the university to determine who source providers and charge their usage to project accounts. on their campus is authorized to use the resource al- TeraGrid supports authentication via passwords, SSH public location (e.g., who is enrolled in the class or who is a keys, and X.509 certificates. department member), thereby eliminating the need for per-user accounting by TeraGrid staff and giving the 1 http://www.teragrid.org campus greater flexibility and control in managing the TeraGrid allocation. Federated login is being applied in many environments Permission to make digital or hard copies of all or part of this work for to simplify authenticated access to resources and services. personal or classroom use is granted without fee provided that copies are In this article, we focus on the unique challenges we faced not made or distributed for profit or commercial advantage and that copies in implementing federated login for TeraGrid. A primary bear this notice and the full citation on the first page. To copy otherwise, to technical challenge was the need to support multiple usage republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. models, from interactive browser and command-line access IDtrust ’10, April 13-15, 2010, Gaithersburg, MD 2 Copyright 2010 ACM ISBN 978-1-60558-895-7/10/04 ...$10.00. http://www.incommonfederation.org 1 to multi-stage, unattended batch workflows. Another chal- TeraGrid lenge was the need to establish trust among campuses, Tera- Kerberos KDC Grid members, and peer grids (such as Open Science Grid3 TeraGrid Central and the Enabling Grids for E-sciencE4 ) in the mechanisms Database verify and procedures underlying the federated login capability. In look up user password the remainder of the article, we discuss these and other chal- distinguished name TeraGrid UI lenges and present our solution in detail. TeraGrid MyProxy CA obtain user User User Portal certificate 2. BACKGROUND access Before presenting the federated login capability we devel- TeraGrid TeraGrid Client Toolkit Resources oped for TeraGrid, we first provide background information about the previously existing TeraGrid authentication ar- chitecture and the InCommon Federation. 2.1 TeraGrid Authentication Architecture Figure 1: TeraGrid single sign-on provides certifi- cates for secure access to TeraGrid resources. The TeraGrid allocations process provisions TeraGrid user accounts and assigns TeraGrid-wide usernames and pass- words, which grant single sign-on access to TeraGrid re- name and initial password to the researcher, and send the sources. Our work, which we describe subsequently, lever- username and password via postal mail to the researcher. ages this existing architecture without modifying it in order The letter distributed with the initial password instructs not to disrupt access for existing users. the researcher to change the password and store the letter in a secure place. If the researcher forgets the password, 2.1.1 TeraGrid Allocations he or she can call the helpdesk and request that the pass- As described in the Introduction, TeraGrid resources are word be reset to the initial value. If the researcher has lost allocated to researchers by peer review. Principal Investi- the letter with the initial password, he or she can call the gators (PIs) submit proposals for resource allocations to a helpdesk and request that a new letter be sent to their postal resource allocations committee, which consists of volunteers address on record. Alternatively, a researcher can reset his selected from the faculty and staff of U.S. universities, labo- or her password via the TeraGrid User Portal, which au- ratories, and other research institutions. All members serve thenticates the request via the researcher’s registered email a term of 2–5 years and have expertise in computational address. In the future, TeraGrid researchers will be able science or engineering. Each proposal is assigned to two to set their username and password when they request an committee members for review. The committee members account, eliminating the need for passwords to be sent via can also solicit an external review. After several weeks of postal mail. review, the entire committee convenes to discuss the relative The process of enrolling a new user into the TeraGrid merits of each proposal and award time based on availabil- Central Database also assigns a unique certificate subject ity of resources. To apply, the PI must be a researcher or distinguished name to the user. The distinguished name educator at a U.S. academic or non-profit research institu- includes the user’s first and last names, with an optionally tion. Proposals are judged on scientific merit, potential for appended serial number in case of name conflicts. The data- progress, numerical approach, and justification for resources. base management system ensures that distinguished names Allocations are typically awarded for one year, though multi- are uniquely assigned and are never re-assigned to a different year allocations may be granted for well-known PIs. PIs can user. submit renewal or supplemental proposals to the committee As described later, our federated login solution relies on to extend their allocation. the fact that the TeraGrid Central Database contains a re- PIs are instructed not to share their accounts with others. cord for every TeraGrid user, as well as the fact that every Instead, they use the Add User Form on the TeraGrid User TeraGrid user has a TeraGrid-wide username and password. Portal5 to request accounts for their project members. PIs can also use this form to remove project members. PIs sub- 2.1.2 TeraGrid Single Sign-On mit name, telephone, email, and postal address information The researcher’s TeraGrid-wide username and password for the users on their project. For users on multiple projects, enables single sign-on access to all TeraGrid resources. Re- each project PI must complete the required information sep- searchers can use TeraGrid single sign-on from the TeraGrid arately for each user to request the user to have access to the User Portal (TGUP) and from the command-line (via the project’s resources. The PI is notified by postal mail when- TeraGrid Client Toolkit). Upon entering their username and ever a user is added to their project. All users are required to password, researchers obtain a short-lived certificate from sign the TeraGrid User Responsibility Form, which educates a MyProxy6 Certificate Authority (CA) [1, 6] operated by users about secure and appropriate computing practices. NCSA. Researchers use this certificate to authenticate to re- When a PI’s proposal is accepted, or when an active PI mote login, data transfer, batch job submission, and other requests an account for a project member, TeraGrid alloca- services. Furthermore, researchers can delegate a proxy cer- tions staff members enroll the PI or project member in the tificate [15] to remote login sessions and batch jobs, allow- TeraGrid Central Database, assign a TeraGrid-wide user- ing those sessions/jobs to access resources on their behalf. 3 Figure 1 presents the TeraGrid single sign-on system archi- http://www.opensciencegrid.org tecture. 4 http://www.eu-egee.org 5 6 https://portal.teragrid.org http://myproxy.ncsa.uiuc.edu 2 The TeraGrid PKI consists of CAs (including the NCSA MyProxy CA) operated by TeraGrid member institutions InCommon and other partners. TeraGrid resource providers accept a Federation WWW consistent set of CAs to facilitate single sign-on across the member member TeraGrid resources. The TeraGrid Security Working Group Service Service reviews requests to add or remove CAs and operates by con- Provider member Provider sensus across the TeraGrid members. According to the pol- icy of the working group, new CAs must be accredited by the International Grid Trust Federation (IGTF),7 the de facto Identity Provider standards body for defining levels of assurance for PKIs in production academic grids around the world. As discussed subsequently, IGTF accreditation was an important step in Web Campus deploying a new federated CA in TeraGrid in support of Browser single sign-on with federated login. TeraGrid runs a Kerberos domain to validate usernames Authentication System User Attributes and passwords. Kerberos is not typically exposed to end (e.g., Kerberos or (e.g., LDAP) Active Directory users directly but is instead used by other services (such as User the MyProxy CA) as an authentication service. 2.2 InCommon Federation The InCommon Federation enables users to use their local identity, assigned by their campus, to access services such Figure 2: The InCommon Federation defines stan- as academic publications and educational materials, and to dard behavior, attributes, and protocols. The cam- collaborate with partners outside the borders of the campus. pus identity provider converts the user’s campus InCommon facilitates the adoption of standard policies by identity into standard SAML format for access to federation participants on technology issues, legal issues, and web services. acceptable uses of identity information. Several U.S. fed- eral agencies (e.g., NSF, NIH) have joined InCommon, and national-scale infrastructures such as the Ocean Observato- InCommon membership continues to grow, many TeraGrid ries Initiative8 are exploring its use. InCommon promises to users come from campuses that are not (yet) InCommon provide a standard interface to the differing campus iden- members. InCommon member ProtectNetwork11 operates tity management systems and allow outside leverage of lo- an open identity provider that can provide logins for these cal identities without the need to understand the nuances at users. each campus. As depicted in Figure 2, the operational components of Many federation members use the Shibboleth9 software the InCommon Federation are the identity providers, ser- for expressing and exchanging identity information between vice providers, and the federation that brings them together. organizations. Shibboleth allows organizations to federate Identity providers convert the user’s campus identity (identi- identity information. In practical terms, this means a user fier and/or attributes) into the standard SAML format, pro- from one institution can authenticate at their home insti- viding single sign-on to multiple service providers and sup- tution and have the resulting identity (identifier and/or at- porting anonymity, pseudonymity, and other privacy con- tributes) made available to a second institution for the pur- trols. SAML identity providers rely on campus authentica- poses of accessing resources at that second institution. Shib- tion systems (such as Kerberos) and attribute stores (such boleth is commonly used in privacy-preserving applications, as LDAP) to authenticate users and provide identity infor- where access to resources is granted based on the user’s at- mation. Service providers consume SAML assertions from tributes (e.g., “University of Illinois student”) without re- identity providers to determine a user’s identifier and/or at- quiring disclosure of the user’s name or other identifying tributes for making access control decisions and providing a information. For example, many universities partner with personalized user experience. SAML metadata, distributed online content providers to enable students to access jour- centrally by the federation, identifies the federation mem- nal articles using Shibboleth attributes. Shibboleth imple- bers and provides public keys, resource endpoints (URLs), ments the SAML Web Browser Single Sign-On protocols,10 and other information about the members that helps iden- which work well for browser-based applications but do not tity providers and service providers establish trust and in- translate directly to the command-line, complex-workflow, teroperate. unattended/batch processes that make up a significant pro- portion of TeraGrid computing workloads. 3. APPROACH As of January 2010, the InCommon Federation includes Recall that our goal is to enable TeraGrid researchers to over 200 universities, representing over 4 million users. Of use the authentication method of their home organization for the 38 institutions that each represent over 50 TeraGrid access to TeraGrid. We achieve this goal by implementing a users, 24 (67%) are currently InCommon members. While federated login capability that leverages the InCommon Fed- 7 eration to provide a bridge from campus authentication to http://www.igtf.net the existing TeraGrid authentication architecture. In this 8 http://ooi.oceanleadership.org section, we present the details of our developed solution, 9 http://shibboleth.internet2.edu 10 11 http://saml.xml.org/saml-specifications http://www.protectnetwork.org 3 Campus InCommon/SAML TeraGrid SSO/X.509 It is important to note that the account linking process TeraGrid Trusted does not replace the TeraGrid allocations process. Rather, Campus Web Identity Web Identity SAML Portal/ Providers the account linking process relies on the allocations pro- Browser Authn Provider Service cess for identity vetting and authorization of TeraGrid users. Provider Account User Verify Identity Linking DB The federated login capability provides only a new authen- Access tication method for vetted TeraGrid researchers. X.509 TeraGrid users may link identities from multiple identity providers to their TeraGrid account, allowing researchers Campus TG Authn InCommon MyProxy Resources associated with multiple research institutions to log in to Metadata CA Service TeraGrid using whichever identity provider is convenient at the time. However, to avoid account sharing (which is a violation of TeraGrid policy), researchers may link at most Figure 3: Federated login to TeraGrid relies on one identity from each identity provider with their TeraGrid translation of credentials between the campus do- account. For example, a professor may not link his or her main, InCommon, and the TeraGrid single sign-on graduate students’ campus identities with his or her Tera- system. Grid account. Instead, the TeraGrid policy requires each professor, graduate student, etc., to obtain their own indi- vidual TeraGrid account. After login, TeraGrid users may which at its core combines account linking and credential view and delete their account links. translation. Our solution builds on the InCommon Federa- Account links expire one year after creation, at which tion and existing TeraGrid authentication architecture de- point the user is required to perform the account linking scribed in the previous section. process again, to re-verify the binding between the user’s Figure 3 shows a conceptual overview of the credential federated identity and his or her TeraGrid account. This pe- translation processes. The translation at left between the riodic verification of the binding protects against stale or re- campus domain and InCommon is handled by Shibboleth or assigned campus identities (e.g., when a student graduates). a similar SAML identity provider. The translation at right When federating with each campus, TeraGrid staff members between InCommon and the existing TeraGrid single sign- confirm with the campus operators that campus procedures on system constitutes our contribution and the focus of this ensure that identities are never re-assigned within a one year paper. This translation uses the account linking process to interval. bind SAML identities to existing TeraGrid identities. 3.2 Credential Translation 3.1 Account Linking The account linking process facilitates a browser-based, The account linking process binds the researcher’s cam- federated login to TeraGrid systems. However, as discussed pus identity, conveyed via InCommon/SAML, to his or her previously, a significant proportion of TeraGrid use cases TeraGrid identity, as stored in the TeraGrid Central Data- and workloads are command-line, complex-workflow, and/or base (TGCDB). When the researcher visits the TeraGrid unattended/batch processes, which are not well supported federated login web site, which implements a standard In- by browser-based authentication (i.e., SAML Web Browser Common SAML service provider using the Shibboleth soft- Single Sign-On). So, the TeraGrid federated login employs ware, he or she sees a prompt to select an InCommon iden- credential translation to convert the browser-based creden- tity provider (i.e., the researcher’s home campus) in order tial to a credential that supports these use cases. to initiate authentication. The Shibboleth software redi- Specifically, the TeraGrid federated login converts the au- rects the researcher to the selected identity provider, where thentication assertion, provided by an InCommon-member the researcher logs in. The identity provider then redirects identity provider, to an X.509 certificate, provided by a cer- the researcher back to the TeraGrid site with a SAML au- tificate authority (CA) trusted by TeraGrid. TeraGrid has thentication assertion, according to the SAML protocols. a significant investment in a certificate-based single sign-on At this point the account linking component is activated. infrastructure. Support for certificate-based authentication It first searches the account-link database (actually a ta- in remote login (GSISSH), job submission (GRAM), and ble in the existing user database) for an entry matching file transfer (GridFTP) protocols enables today’s interactive the researcher’s authenticated campus (SAML) identifier. If TeraGrid use cases. Furthermore, proxy certificate delega- found, the entry identifies the TeraGrid username linked to tion [15] enables complex, multi-tier workflows and batch that campus identity, allowing the researcher’s TeraGrid lo- processing in TeraGrid. gin to proceed. If no entry is found, the federated login site Through TeraGrid’s federated login capability, TeraGrid prompts the researcher for his or her TeraGrid-wide user- researchers can use their campus login to obtain certificates name and password. If the username and password verify for web and desktop applications. After federated login, the (via the TeraGrid Kerberos service), the federated login site TeraGrid web site presents a menu of options. Researchers creates a new entry in the account-link database linking the can launch remote login and file transfer applets in their TeraGrid account with the campus identity. Then the re- browser, authenticating with a certificate loaded into their searcher’s TeraGrid login can proceed with that TeraGrid- browser session. Additionally, researchers can launch an ap- wide username. When the researcher returns to the site at plication that delivers a certificate to the local filesystem, a later time, the account-link entry will be in place, so the ready to be used with desktop applications such as those researcher will be able to log in using his or her campus provided by the TeraGrid Client Toolkit. Implementation identity without being prompted again for a TeraGrid-wide details are provided in later sections. username and password. In summary, the researcher’s federated login to TeraGrid 4 requires multiple credential translation steps. First, the Since federating with campuses was a manual, campus- local campus identity provider translates a local campus by-campus process, and there is no method to discern what credential (such as a Kerberos username and password) to behavior a campus would present until they were engaged, a SAML authentication assertion as specified by InCom- we focused our efforts on campuses with over 50 TeraGrid mon. Then, TeraGrid’s federated login system translates users. Of the 38 target institutions, 24 (67%) were InCom- the SAML assertion to an X.509 certificate. Finally, Tera- mon members. To date, we have successfully federated with Grid resource providers translate the certificate to a local 16 of those. We have also federated by request with 11 addi- resource login (i.e., a Unix account). tional campuses outside our initial target list, bringing our current total number of supported campuses to 27. 3.3 Trust Establishment Establishing trust is critical to successfully bridging from 3.3.2 PKI Federation campus identity providers to TeraGrid resource providers. Translating SAML authentication assertions from InCom- Deploying the TeraGrid federated login required negotiation mon members to certificates accepted by TeraGrid resource with InCommon members (to release identities to TeraGrid) providers and peer grids required us to deploy a certificate and accreditation of our CA by IGTF (so the certificates will authority (CA) and obtain accreditation of the CA from the be accepted by TeraGrid members). International Grid Trust Federation (IGTF), to satisfy Tera- Grid Security Working Group policies. The IGTF consists 3.3.1 Campus Federation of three regional Policy Management Authorities (PMAs). When TeraGrid became a member of the InCommon Fed- The Americas Grid PMA (TAGPMA)13 covers the U.S. re- eration, it was not automatically entitled to obtain authen- gion. tication assertions from InCommon-member identity pro- Worldwide participation in the IGTF ensures that certifi- viders. First, TeraGrid needed to register its federated lo- cates issued by accredited CAs can be accepted by TeraGrid gin service provider with the federation, so its information and peer grids around the world. While today’s academic would be included in the federation metadata, enabling it SAML federations are national in scope, with limited in- to be recognized by identity providers. This registration is ternational inter-federation, translating SAML assertions to a lightweight task, requiring only a few minutes of effort. internationally accepted certificates supports international Following that registration, and of significant effort to ar- science projects such as the Worldwide Large Hadron Col- range, the identity providers need to configure their local lider Computing Grid (WLCG).14 policies to release identity information to the TeraGrid’s fed- The IGTF currently supports accreditation under three erated login service. Specifically, the federated login service CA profiles: Classic, Member Integrated Credential Services depends on receiving a persistent user identifier from the (MICS), and Short-Lived Credential Services (SLCS).15 For identity provider via the eduPersonPrincipalName (ePPN) Classic CAs, subscriber identity vetting is performed by reg- or eduPersonTargetedID (ePTID) attribute defined by the istration authority (RA) staff persons. In contrast, MICS eduPerson specification.12 and SLCS CAs leverage an existing identity management In our effort to have identity providers release ePPNs or system for vetting certificate requests. We pursued accred- ePTIDs to TeraGrid, we encountered three categories of itation for our federated CA under the SLCS profile, since identity providers: our CA leverages the TeraGrid Central Database and iden- tity providers in the InCommon Federation. • The first type of identity provider was willing to release SLCS CAs issue short-lived certificates. The short cer- ePPNs or ePTIDs to any InCommon-member service tificate lifetime acts as a countermeasure against credential provider by default. In this case, after reviewing the theft and misuse. The maximum lifetime of one million published policies of the identity provider, we asked a seconds (or about twelve days) was determined through a TeraGrid user associated with that identity provider requirements-gathering process in the Global Grid Forum to help us with testing. After a successful test (i.e., a [12] and was later incorporated into the SLCS profile. valid assertion with ePPN or ePTID was received), we IGTF profiles require that CAs operate according to com- added that identity provider to the supported list. munity standards. Each CA must publish a Certificate Pol- icy and Certification Practices Statement (CP/CPS) accord- • The second type of identity provider was willing to ing to RFC 3647 [7]. NCSA’s CP/CPS documents are pub- release ePPNs or ePTIDs on request. In this case, lished on the NCSA CA web site.16 Certificates and Certifi- we sent email to the contact address found in InCom- cate Revocation Lists (CRLs) must conform to RFC 5280 [8] mon Federation metadata, explaining our application and the Open Grid Forum Grid Certificate Profile [10]. Ad- and requesting the needed attribute. Once we received ditionally, since SLCS CAs are online and automated, and a reply that our request was approved, we proceeded therefore subject to network-based attacks, the SLCS profile with testing as in the first case. requires that the CA private key be protected in a FIPS 140 level 2 rated hardware security module [13]. • The third type of identity provider required local spon- The TAGPMA review process includes a presentation to sorship and review of our request. In this case, we sent the TAGPMA membership at a regularly scheduled meeting a list of TeraGrid PIs affiliated with the institution to and a checklist-based review of the CA’s policies and oper- the identity provider contact and worked with them ations, followed by a vote for acceptance by the TAGPMA to identify sponsors and follow the local approval pro- 13 cess. For some of these campuses, the review is still in http://www.tagpma.org 14 progress or stalled. http://lcg.web.cern.ch 15 http://www.tagpma.org/authn_profiles 12 16 http://middleware.internet2.edu/eduperson http://ca.ncsa.uiuc.edu 5 SAML TeraGrid authentication assertions (using Shibboleth) and map cam- Identity Kerberos pus identities to TeraGrid usernames. The web application Provider KDC sends the authenticated TeraGrid username to MyProxy, User Desktop which issues a short-lived certificate corresponding to that Federated Login Applet Web Application username. The web application authenticates to MyProxy Web Browser MyProxy CA using its own trusted certificate. The federated MyProxy GridShib CA Web App (Kerberos) instance will only accept requests properly authenticated us- SAML Service (customized) ing that certificate. Both MyProxy instances map TeraGrid Provider MyProxy CA usernames to certificate subject distinguished names via the GridShib CA Credential Account (Federated) TGCDB. Link Retriever When the TeraGrid user launches one of the browser ap- Database plets that require a certificate for authentication to Tera- TeraGrid Grid resources, the federated login web application, via the Central MyProxy API, generates a new RSA keypair associated with Database the user’s web session (via state in the web server referenced by a session cookie) and issues a certificate request contain- ing the RSA public key to MyProxy, which returns a short- Figure 4: The TeraGrid federated login system pro- lived, signed certificate for the user to the web application. vides certificates, issued by a MyProxy CA, for The applets can then access the private key and certificate web and desktop applications. The web application for authentication on the user’s behalf. Similarly, when the binds campus identities to TeraGrid identities via TeraGrid user selects the credential retrieval desktop appli- an account-link database. cation, the browser downloads and launches the application via Java Web Start [11]. The desktop application then gen- erates a new RSA keypair and issues a certificate request membership. NCSA began the TAGPMA review process to the web application, which passes it to MyProxy and re- for the federated CA in March 2009 and obtained certifi- turns the signed certificate to the desktop application, which cation in May 2009. NCSA has been a TAGPMA member writes the certificate and private key to the filesystem for ac- since 2005, and this was our third CA to be accredited via cess by TeraGrid client applications. The credential retrieval the TAGPMA process. Approved CAs are included in the application and components of the web application are re- IGTF CA distribution, as well as the TERENA Academic used from the GridShib CA software as developed by the CA Repository (TACAR).17 GridShib project [18].18 3.4 System Architecture 3.5 Current Status Figure 4 presents the components of the TeraGrid feder- ated login system. The federated login web application is a The TeraGrid federated login service19 is in production, SAML service provider, which consumes SAML authentica- supporting logins from 27 institutions. After accreditation tion assertions from InCommon-member identity providers, by TAGPMA in May 2009, the site entered a friendly-user via the Shibboleth software implementation. The web ap- beta testing period, where we solicited test users from each plication has a local PostgreSQL database that stores the supported campus to try the service and give their feedback. account linking information. We decided to (initially) main- We announced the service to all TeraGrid researchers via tain this information in a local database separate from the TeraGrid News on September 1, 2009. TeraGrid Central Database (TGCDB), to obtain local data- As of February 2010, we have 72 entries in the identity- base performance and simplify the initial implementation. mapping table from 21 (of the 27 available) institutions, and However, we plan to migrate it to the TGCDB (also Post- we have issued over 800 certificates. The most popular appli- greSQL) when we integrate the federation functionality with cation is the remote login GSI-SSHTerm applet,20 followed the TeraGrid User Portal (see Section 6.1). closely by the credential retrieval desktop application. The web application interacts with two MyProxy CA in- stances (via the simple MyProxy protocol [2]) for verifying 4. SECURITY CONSIDERATIONS TeraGrid passwords and obtaining short-lived certificates. The first MyProxy CA instance was already in existence Security was a primary consideration throughout the de- (certified by TAGPMA in March 2007) serving TeraGrid sign and deployment of the federated login service. We high- single sign-on. It verifies the user’s TeraGrid-wide user- light security considerations of particular interest in this sec- name and password and issues short-lived certificates. In the tion. federated login application, we use this MyProxy instance to verify TeraGrid (Kerberos) passwords at account linking 4.1 Trust Architecture time. Since the web application already contained MyProxy Adding federated identity to the TeraGrid single sign-on client libraries, using the MyProxy interface to Kerberos model gives rise to two meaningful changes to the trust re- rather than interacting with Kerberos directly simplified the lationships in the TeraGrid security architecture. web application. The second MyProxy CA instance is the First, the InCommon identity providers add a new set of new federated CA, certified by TAGPMA in May 2009. It trusted entities. Identity providers are trusted to correctly issues certificates based on federated login. It trusts the 18 federated login web application to properly validate SAML http://gridshib.globus.org 19 https://go.teragrid.org 17 20 http://www.tacar.org https://sourceforge.net/projects/gsi-sshterm 6 authenticate users, disallow the reuse of identifiers, and ad- by an unauthorized party using the stolen password of the here to other basic policies, as discussed in the following account holder. To enforce this policy, we allow only one section. Identity providers also play a role in incident re- identifier per identity provider to be linked with a particu- sponse as discussed in Section 4.6. lar TeraGrid identity. Second, the federated MyProxy CA outsources authenti- cation to the web front-end. In the current TeraGrid User 4.4 Web Application Security Portal, a user presents a username and password, which are We use multiple methods in the web front end to protect passed to the MyProxy CA for validation before issuance of against web-based attacks. The web front end accepts con- a credential. In the federated identity model, the web ap- nections only via HTTPS, which provides certificate-based plication presents just a username to the MyProxy CA and authentication of the service to the web browser and pri- authenticates using a trusted certificate specific to the web vacy of network data (including SAML assertions, cookies, application instead of the user. The MyProxy CA trusts that and certificate requests). To protect against cross-site re- the web application has done appropriate authentication of quest forgery (CSRF) attacks, the GridShib CA software the user. This increases the ramifications of a compromised uses standard anti-CSRF mechanisms (cookies and hidden web application. form fields) to ensure that web sessions follow an approved The MyProxy CA could be modified to require and vali- workflow, i.e., requiring the user to always visit the login date some proof that the web application actually authen- page before requesting a certificate, so a malicious site can ticated the user. One way to provide this validation could not redirect the user’s browser directly to the certificate- be to implement SAML delegation.21 The ShibGrid project request form to force a malicious certificate issuance. [14] modified MyProxy to validate SAML authentication as- The account-link database is configured to allow only local sertions obtained by the web application. While that imple- access, and anonymous read access to the database is dis- mentation does not use SAML delegation, it provides some abled. The username and password for accessing the data- additional protection. This capability could be added to the base is stored outside publicly accessible web space, and is TeraGrid service, but it would increase the complexity of readable only by the web server process. This configuration the solution. gives the server-side web application read and write access to the database while preventing all client-side web access. 4.2 Peering with Identity Providers The trusted certificate used to request user certificates As discussed in Section 3.3.1, federating with campus iden- from the federated MyProxy CA is stored on the web server tity providers is a manual process. Identity providers de- outside publicly accessible web space and is readable only cide whether they are willing to release user identifiers to by the web server process. the TeraGrid service. Likewise, TeraGrid staff members, in Remote login to the web server is restricted to a small set their role as administrators of the federation service, decide of remote hosts through the use of an iptables-based firewall. whether to peer with a given campus identity provider. The Additionally, SSH access is limited to a small number of federated login service is explicitly configured with a list of administrators, who must log in with a one time password trusted identity providers (i.e., not all InCommon-member (OTP), e.g., by using a CRYPTOCard token generator. identity providers are automatically accepted). Our review process confirms that the identity provider: (1) serves Tera- 4.5 MyProxy CA Security Grid users; (2) is operated by a known and respected organi- The back-end MyProxy CA is secured according to IGTF zation; and (3) operates a trustworthy authentication service standards. The CA private key is protected in FIPS 140 and provides globally-unique and non-reassigned identifiers, level 2 rated hardware security modules. The servers are so that subscribers are uniquely identified. located on a dedicated network, behind a hardware firewall So far, the issue of identifier re-assignment has blocked with a restrictive policy, with network-based and host-based us from peering with a few campus identity providers. Our intrusion detection. The firewall allows network connections annual verification process allows us to support campuses to the MyProxy CA instance used by the web application that re-assign identifiers only after a one year or greater only from the host on which that application resides. System hiatus period. We have found in some cases, campuses will logs are streamed to a dedicated syslog collector host, where re-assign identifiers more quickly for a subset of their popu- they are monitored by the NCSA security team. The CA is- lation (e.g., undergraduate students and/or visitors), and we sues a certificate revocation list (CRL) daily or immediately are working with those campuses to identity a method to dis- after any revocation. tinguish between those identities that meet our requirements (i.e., those not re-assigned more quickly than our threshold) 4.6 Incident Response and those that don’t. InCommon’s new Identity Assurance The federated login system architecture provides multiple program22 may help with this issue. methods for responding to account compromises and other 4.3 Disallowing Account Sharing security incidents. In case a federated identity is deemed sus- pect, the account link for that identity can be disabled in the As discussed in Section 2.1.1, TeraGrid policy forbids ac- account-link database by administrators so it can no longer count sharing. This policy is primarily for clarity during be used to obtain certificates. In case an identity provider is incident response, since multiple users sharing an account deemed suspect, it can be removed by an administrator from complicates the process of determining if suspect account the list of trusted identity providers so assertions from that activity was performed by the authorized account holder or provider can no longer be used to log in. Extensive CA log- 21 ging enables administrators to quickly identify certificates http://docs.oasis-open.org/security/saml/Post2.0/ sstc-saml-delegation.html associated with a compromise so they can be revoked. 22 http://www.incommonfederation.org/assurance TeraGrid incident response is coordinated through the se- 7 curity working group. In response to compromise, TeraGrid would be a useful addition to this trust establishment pro- resource providers can locally disable accounts, and Tera- cedure. Grid staff can centrally disable or reset TeraGrid-wide pass- words. 5.3 Software Issues InCommon metadata contains operational contact infor- A major source of issues during our beta testing period mation for each identity provider that TeraGrid security was the lack of constraint as to the contents of eduPerson- staff can utilize during incident response. Additionally, work TargetedID (ePTID) values. We found significant variety is underway in the Committee on Institutional Coopera- in the formatting and character sets of ePTID values across tion23 Identity Management Taskforce to propose a set of campuses, which clashed with several assumptions in our policies and additional available information for incident re- software: sponse in federated identity environments such as InCom- mon. • The various ePTID values triggered exceptions in the Like all IGTF CAs, the federated NCSA CA publishes GridShib CA identifier sanitizing routines, which at- operational contact information on its home page and in tempted to sanitize data from the identity provider to metadata files included in the IGTF CA distribution. The protect against accidental or malicious string encoding IGTF Risk Assessment Team24 is available for coordinating that could cause problems. These routines were too ag- response to incidents and vulnerabilities impacting IGTF gressive in removing “invalid characters”, thereby cor- CAs. rupting the identifiers, and we were forced to abandon such sanitization. 5. LESSONS LEARNED • There was also an assumption in the original software In this section we discuss some of the lessons learned dur- of the identifiers being usable as filenames to maintain ing the deployment of our solution and establishment of trust an audit record of issued credentials (a requirement of with identity providers in InCommon. IGTF accreditation). However, some of the charac- ters were meaningful to the file manipulation routines 5.1 Effort for Trust Establishment (e.g., forward slashes which represent a path separator As we described previously in Section 3.3.1, while InCom- under Unix). Hence the approach of using the ePTID mon defines standard (SAML) profiles for identity and at- was abandoned and instead we used a hash of the dis- tribute transmission and an automated means of metadata tinguished name with a constrained character set. distribution, simply being a member of InCommon as a ser- vice provider does not guarantee that any particular identity • Finally, our web site originally displayed the ePTID provider will release user attributes to that service provider. value to the user after login. While this approach Nor does it provide guarantees about identifier persistence in worked with eduPersonPrincipalName values, which that ePPN identifiers can be potentially re-issued (e.g., after are reasonably similar to users’ campus usernames and a student leaves the student’s identifier could be re-assigned email addresses, we found that the lengthy ePTID to a new incoming student). string with its broad range of characters distracted The process of contacting identity providers to arrange and confused users, who expect to see their friendly attribute release and establish their policies on identifier re- campus username. issuance is very time consuming. This manual, campus-by- In summary, we have learned to treat ePTIDs as opaque campus effort will be very difficult to scale to the hundreds blobs unsuitable for use as a string representation of an iden- of campuses associated with TeraGrid researchers, not to tifier and have strengthened the underlying GridShib CA mention the thousands of research institutions in the U.S. identifier-handling code to support the full range of ePTID from where future TeraGrid users might come. values. We look forward to deployment of user-driven attribute release in the InCommon Federation, which would avoid the need for manual policy changes by campus operators. User- 6. FUTURE WORK driven attribute release, via tools such as uApprove,25 allows We consider this work to be just a first step toward en- users to review and consent to the release of requested at- abling federated login to TeraGrid and other U.S. cyberin- tributes when they access the service. frastructure. We envision the following future work. 5.2 Testing 6.1 Integration with TeraGrid User Portal Another complexity encountered during attribute release The next step for the TeraGrid effort is to integrate fed- testing was that the identity provider administrators at cam- erated login with the TeraGrid User Portal (TGUP). Cur- puses were rarely TeraGrid users. This meant that only our rently, the federated login site is separate from the TGUP, end users, who are not generally Shibboleth experts, could and the TGUP itself requires login with TeraGrid-wide user- test the system from end-to-end, as they were the only ones name and password. Integration with the TGUP will pro- with accounts at both the identity provider and the Tera- vide a more coherent experience to TeraGrid researchers, as Grid. Adding a simple test application that could be used by well as make TGUP functionality (such as management of identity provider operators to more fully test the attribute TeraGrid allocations) accessible via federated login. release process, without needing to have a TeraGrid account, The TeraGrid project is in the process of integrating the 23 Partnership Online Proposal System (POPS)26 with the user http://www.cic.net portal, which opens up the possibility of federated logins 24 http://tagpma.es.net/wiki/bin/view/IGTF-RAT 25 26 http://www.switch.ch/aai/support/tools https://pops-submit.teragrid.org 8 for TeraGrid proposal submission, potentially eliminating allocations process for identity vetting restricts the avail- the need for TeraGrid-specific passwords as described in the ability of the TeraGrid federated login service to registered following section. TeraGrid users. The CILogon project28 is deploying a mod- ified version of the TeraGrid federated login service that re- 6.2 Eliminating TeraGrid Passwords moves the TeraGrid dependencies. The CILogon Service The account linking process as described so far requires will directly leverage campus identity vetting for certificate TeraGrid researchers to log in with their TeraGrid username issuance. The InCommon Silver Identity Assurance Pro- and password at least once per year to maintain the link with file, which maps to NIST Level of Assurance (LOA) 2 [5], their campus identity. This method provides a transition for provides identity assertions which meet IGTF SLCS profile existing TeraGrid users from daily use of a TeraGrid-specific requirements [3]. password to daily use of campus credentials for TeraGrid ac- Scaling the CILogon Service to serve the national cyberin- cess, but it does not entirely obviate the need for TeraGrid- frastructure will be a significant challenge. Federating with specific passwords. thousands of U.S. research institutions will require moving In the future, we plan to integrate account linking with the beyond the manual campus-by-campus trust establishment TeraGrid allocations process, giving TeraGrid researchers process. Providing a usable method for choosing among the option of never using a TeraGrid-specific password. In thousands of available identity providers for a given login is this scenario, TeraGrid researchers would authenticate with an unsolved challenge. Certainly today’s interfaces, where their campus identity when submitting a proposal for Tera- users select their identity provider from a list, will not scale. Grid access. A researcher’s campus identity will be linked with the proposal at that point, so if the proposal is accepted and TeraGrid access is granted, the researcher’s TeraGrid 7. RELATED WORK account will be linked with the campus identity when the The two areas of related work we find most relevant to TeraGrid account is created. the TeraGrid federated login service are (1) similar efforts Likewise, project members to be added to a TeraGrid al- to bridge SAML and PKI for grids in Europe and (2) Tera- location will first authenticate with their campus identity Grid’s Science Gateways program. and register a TeraGrid account linked with that campus identity. Then, the project PI will lookup the prospective 7.1 European SAML-PKI Bridging Efforts member’s account and add the member to the TeraGrid pro- Many European countries have established national SAML ject. Thus, PIs and other project members will have their federations, with multiple national-scale efforts to link with campus identities linked with their TeraGrid accounts when PKIs in support of cyberinfrastructure. the TeraGrid accounts are created, so researchers will be In Switzerland, SWITCH operates the SWITCHaai fed- able to access TeraGrid resources using their campus logins eration29 deployed by most Swiss universities supporting e- without ever having a TeraGrid-specific password. These learning, e-conferencing, and document exchange services. linked identities could be re-verified each year as part of the The IGTF-accredited SWITCH Short Lived Credential Ser- allocations renewal process. vice (SLCS) issues certificates based on successful authenti- It is an open question whether TeraGrid could ever truly cation at a SWITCHaai identity provider. eliminate TeraGrid-specific passwords for all users. While In Germany, the IGTF-accredited DFN-SLCS CA30 is- we expect many users would prefer to use a federated login, sues certificates to users of the DFN-AAI federation31 of some users may still desire TeraGrid-specific passwords by universities, technical colleges, and research organizations preference or special requirements. in Germany. 6.3 Access Based on Attributes In the UK, JANET, the national education and research network, operates the UK Access Management Federation These is a small amount of access to TeraGrid today that for Education and Research,32 with over 700 members. The is not based on the peer-review process previously described, SARoNGS Credential Translation Service [16] issues cer- but is instead granted to a class or workshop for educational tificates to users of the UK National Grid Service33 based purposes. In theory, this access could be granted based on on successful authentication in the UK Access Management a user’s attribute, namely their membership in the class, if Federation. it were asserted by their identity provider. Working with Additionally, the Trans-European Research and Educa- campuses to grant access to TeraGrid resources based on tion Networking Association (TERENA) has recently devel- such attributes is another area of future investigation. oped the TERENA Certificate Service (TCS),34 which lever- 6.4 Alternative Authentication Technologies ages the national SAML-based federations across Europe to deliver certificates to tens of thousands of grid users. Initial While InCommon and SAML appear to be the most pop- TCS partners include the national grid projects and SAML ular technology for federated identity at the home institu- federations of Denmark, Finland, Netherlands, Norway, and tions of most TeraGrid users, other web-based authentica- Sweden. tion methods such as OpenID27 are popular in the commer- cial space. We plan on investigating the support of these 28 http://www.cilogon.org technologies in our federation model. 29 http://www.switch.ch/aa 30 6.5 CILogon 31 http://www.pki.dfn.de https://www.aai.dfn.de Expanding federated login to other U.S. cyberinfrastruc- 32 http://www.ukfederation.org.uk ture is another area of future work. Relying on the TeraGrid 33 http://www.ngs.ac.uk 27 34 http://openid.net https://www.terena.org/activities/tcs 9 Our work to implement federated login for TeraGrid bene- dentials. In order for such certificate authorities to be con- fited from the examples provided by these related efforts and sidered trusted by the TeraGrid they must have achieved discussions in IGTF on lessons learned and best practices for accreditation by the International Grid Trust Federation as bridging SAML and PKI for grids. described in Section 2.1.2. 7.2 TeraGrid Science Gateways Program 8.2 End-to-End SAML Solution? Considering that our work to deploy federated login for To replace the PKI currently in use for single sign-on in TeraGrid is motivated by the desires to make secure access the TeraGrid today would not only require that TeraGrid to TeraGrid more convenient for researchers as well as re- modify a large software deployment base, but would also duce TeraGrid’s identity management burdens (e.g., pass- require addressing functional limitations in SAML, namely: word resets), we find similar motivations for the security design of the TeraGrid Science Gateway program [4, 17]. • Support for clients other than web browsers. Many of TeraGrid science gateways35 provide community-based ac- the science applications supported by TeraGrid involve cess to TeraGrid resources, typically via web portals with desktop applications rather than or in addition to web custom interfaces and applications for specific science com- browsers. munities. The gateway program is part of TeraGrid’s effort to serve the larger science community, while continuing to • Delegation support. Our architecture supports au- provide high-end computing services to a smaller number of thentication on behalf of the user by the web appli- leading-edge researchers. TeraGrid’s gateways are designed cation. It also supports authentication by unattended to serve orders of magnitude more users than can be sup- processes, for example, when the initiating user is of- ported by TeraGrid’s existing accounting procedures. fline. (SAML delegation may address this require- To achieve this goal, TeraGrid provides community allo- ment.) cations to gateways. Gateway PIs and staff are registered in • International federation support. SAML federations the TeraGrid Central Database (TGCDB), but the gateways have not (yet) reached the global scope of the Inter- manage their own user registration. Gateways access com- national Grid Trust Federation as needed to support munity accounts on TeraGrid resources, with the gateway large grid applications. taking responsibility for isolating its users from one another, so the TeraGrid resource providers are not burdened with Until these issues are addressed, we do not envision a migra- managing orders of magnitude more local accounts. Since tion away from PKI to be a practical option for TeraGrid. TeraGrid’s federated login capability is based on TGCDB registration, science gateway users do not benefit directly. However, we hope science gateways will provide their own 9. CONCLUSION federated login capability. For one proposal, see [9]. In conclusion, we have presented TeraGrid’s new feder- ated login capability, which enables TeraGrid users to au- 8. ALTERNATIVE APPROACHES thenticate using their home organization credentials for se- cure access to high performance computers, data resources, A question often posed is what is needed in order to imple- and high-end experimental facilities. This capability binds ment a user authentication solution based entirely on SAML campus identities to TeraGrid identities (via account link- or PKI instead of a SAML to PKI bridge. There are signifi- ing) and issues certificates based on SAML assertions (via cant components missing for each approach, as we describe credential translation). It is the first effort to leverage fed- in the following subsections, that led us to the bridge ap- erated authentication for access to national-scale research proach. cyberinfrastructure in the United States. It is our opinion that the world is unlikely to ever settle 8.1 End-to-End PKI Solution? on a single authentication technology, due to varied techni- The TeraGrid has a PKI solution in place with its ex- cal requirements, as well as significant social and economic isting single sign-on system as described in Section 2.1.2. issues. Therefore, we believe that the bridging approach de- However, ideally TeraGrid would not need to issue certifi- scribed in this article is not simply a short-term hack, but cates, but instead would rely on certificates issued by the rather an approach that will continue to be required and user’s home organization, taking advantage of the in-person further refined over time. vetting that is (or at least could be) accomplished by that organization. However, despite some progress, we are seeing very limited deployment of externally usable PKIs at uni- 10. ACKNOWLEDGMENTS versities, as compared with the number of universities that This material is based upon work supported by the Na- have joined the InCommon Federation. It is the broad and tional Science Foundation under Grant No. 0503697. increasing adoption of InCommon in the organizations rep- resenting TeraGrid users that led us to build on it, rather 11. REFERENCES than any technical aspect of the SAML technology. Note that users with credentials from trusted certificate [1] T. Barton, J. Basney, T. Freeman, T. Scavo, authorities at universities that do operate a PKI can bind, F. Siebenlist, V. Welch, R. Ananthakrishnan, through existing mechanisms in the TeraGrid User Portal, B. Baker, M. Goode, and K. Keahey. Identity the identity asserted by those credentials to their existing Federation and Attribute-based Authorization through TeraGrid account and access the TeraGrid with those cre- the Globus Toolkit, Shibboleth, Gridshib, and MyProxy. In Proceedings of the 5th Annual PKI R&D 35 http://www.teragrid.org/gateways Workshop, April 2006. 10 [2] J. Basney. MyProxy Protocol. Global Grid Forum N. Wilkins-Diehr. A AAAA model to support science GFD-E.54, November 2005. gateways with community accounts. Concurrency and [3] J. Basney. Mapping InCommon Bronze and Silver Computation: Practice and Experience, 19(6):893–904, Identity Assurance Profiles to TAGPMA SLCS 2007. Requirements, March 2009. [18] V. Welch, T. Barton, K. Keahey, and F. Siebenlist. http://sl.cilogon.org/incommon-slcs-map.pdf. Attributes, Anonymity, and Access: Shibboleth and [4] J. Basney, S. Martin, J. Navarro, M. Pierce, T. Scavo, Globus Integration to Facilitate Grid Collaboration. In L. Strand, T. Uram, N. Wilkins-Diehr, W. Wu, and Proceedings of the 4th Annual PKI R&D Workshop, C. Youn. The Problem Solving Environments of April 2005. TeraGrid, Science Gateways, and the Intersection of the Two. IEEE International Conference on eScience, pages 725–734, 2008. [5] W. E. Burr, D. F. Dodson, and W. T. Polk. Electronic Authentication Guideline. NIST Special Publication 800-63, April 2006. [6] S. Chan and M. Andrews. Simplifying Public Key Credential Management Through Online Certificate Authorities and PAM. In Proceedings of the 5th Annual PKI R&D Workshop, April 2006. [7] S. Chokhani, W. Ford, R. Sabett, C. Merrill, and S. Wu. Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework. IETF RFC 3647, November 2003. [8] D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, and W. Polk. Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. IETF RFC 5280, May 2008. [9] T. Fleury, Y. Liu, T. Scavo, and V. Welch. A Web Browser SSO Model for Science Gateways. In Proceedings of the 2009 TeraGrid Conference, June 2009. [10] D. Groep, M. Helm, J. Jensen, M. Sova, S. Rea, R. Karlsen-Masur, U. Epting, and M. Jones. Grid Certificate Profile. Open Grid Forum GFD-C.125, March 2008. [11] A. Herrick. Java Network Launching Protocol & API Specification. JSR-56, 2005. [12] S. Mullen, M. Crawford, M. Lorch, and D. Skow. Site Requirements for Grid Authentication, Authorization and Accounting. Global Grid Forum GFD-I.032, October 2004. [13] NIST. Security Requirements for Cryptographic Modules. Federal Information Processing Standards (FIPS) Publication 140-2, May 2001. [14] D. Spence, N. Geddes, J. Jensen, A. Richards, M. Viljoen, A. Martin, M. Dovey, M. Norman, K. Tang, A. Trefethen, D. Wallom, R. Allan, and D. Meredith. ShibGrid: Shibboleth Access for the UK National Grid Service. In Proceedings of the International Conference on e-Science and Grid Computing, December 2006. [15] S. Tuecke, V. Welch, D. Engert, L. Pearlman, and M. Thompson. Internet X.509 Public Key Infrastructure Proxy Certificate Profile. IETF RFC 3820, June 2004. [16] X. D. Wang, M. Jones, J. Jensen, A. Richards, D. Wallom, T. Ma, R. Frank, D. Spence, S. Young, C. Devereux, and N. Geddes. Shibboleth Access for Resources on the National Grid Service (SARoNGS). International Symposium on Information Assurance and Security, 2:338–341, 2009. [17] V. Welch, J. Barlow, J. Basney, D. Marcusiu, and 11 Goal •  Enable researchers to use the authentication method of their home organization for access to TeraGrid •  Researchers don’t need to use TeraGrid-specific credentials Federated Login to •  Avoid distribution of TeraGrid-specific passwords TeraGrid •  Avoid TeraGrid password reset requests •  Better integrate TeraGrid with campus resources Jim Basney •  Provision TeraGrid resources according to campus-based Terry Fleury identity vetting and authorization Von Welch National Center for Supercomputing Applications University of Illinois at Urbana-Champaign This material is based upon work supported by the National Science Foundation under Grant No. 0503697 Federated Login to TeraGrid Challenges TeraGrid •  Support TeraGrid usage models •  Interactive browser and command-line access •  Multi-stage, unattended batch workflows •  Establish trust among campuses, TeraGrid members, and peer grids (OSG, EGEE) Federated Login to TeraGrid Federated Login to TeraGrid 1 TeraGrid Allocations TeraGrid Single Sign-On •  Resources allocated by peer review TeraGrid Kerberos •  Project principal investigators add user accounts via the KDC TeraGrid User Portal Central Database •  Central Database (TGCDB) contains records for all users verify look up user password •  TeraGrid-wide username and password assigned to TeraGrid UI distinguished name every user TeraGrid MyProxy CA obtain user User User Portal certificate access TeraGrid TeraGrid Client Toolkit Resources Federated Login to TeraGrid Federated Login to TeraGrid TeraGrid PKI InCommon Federation •  TeraGrid PKI consists of CAs operated by TeraGrid •  InCommon facilitates use of campus identity with member institutions and other partners external service providers •  TeraGrid resource providers trust a consistent set of Cas •  By supporting adoption of standard mechanisms and policies •  Provides consistent experience for users •  By distributing metadata that identifies members •  Determined by consensus through Security Working Group •  Uses SAML Web Browser Single Sign-On protocols •  CAs accredited by International Grid Trust Federation (IGTF) •  Shibboleth implementation from Internet2 •  Work well for browser-based applications, but not command-line or batch workflows •  InCommon represents >200 institutions (>4m users) •  Of 38 institutions with over 50 TG users, 24 (67%) are currently InCommon members Federated Login to TeraGrid Federated Login to TeraGrid 2 InCommon Federation Our Approach InCommon •  Account Linking Federation WWW •  Bind the researcher’s campus identity (conveyed via InCommon/ member member SAML) to his/her existing TeraGrid identity (TGCDB) Service Service Provider member Provider •  InCommon motivates our use of SAML •  Rely on the existing TeraGrid allocations process for identity Identity vetting and authorization Provider •  Rely on campus for authentication of a persistent user identifier Web Browser Campus •  Credential Translation •  Convert from a browser-based (SAML) credential to a certificate Authentication System User Attributes for command-line, workflow, and batch processes (e.g., Kerberos or (e.g., LDAP) User Active Directory •  Deliver certificate to desktop and web session •  Rely on the existing TeraGrid PKI •  Adding a new certificate authority Federated Login to TeraGrid Federated Login to TeraGrid Our Approach Campus InCommon/SAML TeraGrid SSO/X.509 TeraGrid Trusted Campus Web Identity Web SAML Providers Identity Portal/ Browser Authn Provider Service Provider Account User Experience User Verify Identity Linking DB Access X.509 Campus TG InCommon MyProxy Resources Authn CA Metadata Service Federated Login to TeraGrid Federated Login to TeraGrid 3 Federated Login to TeraGrid Federated Login to TeraGrid (one-time only) Federated Login to TeraGrid Federated Login to TeraGrid 4 TeraGrid Federated Login System SAML TeraGrid Identity Kerberos Provider KDC User Desktop Federated Login Applet Web Application Web Browser MyProxy CA GridShib CA (Kerberos) Web App SAML (customized) Service Provider MyProxy CA GridShib CA Credential Account (Federated) Link Retriever Database TeraGrid Central Database Federated Login to TeraGrid Federated Login to TeraGrid Trust Establishment Trust Establishment Process: Campus •  Campus and InCommon •  Join the InCommon Federation •  TeraGrid PKI •  Add service provider to InCommon metadata •  Request identity providers to release identity information (a manual, campus-by-campus process) •  Some released identifiers automatically to all InCommon members •  Some released identifiers on email request •  Some required local sponsorship and review •  Current status: •  Targeted 38 campuses with over 50 TeraGrid users •  24 (67%) are InCommon members •  16 (of the 24) successfully federated to-date •  11 additional campuses federated outside the target list Federated Login to TeraGrid Federated Login to TeraGrid 5 Trust Establishment Process: PKI •  Publish Certificate Policy and Certification Practices Statement (CP/CPS) according to RFC 3647 •  Present CA to regional IGTF policy management authority – The Americas Grid PMA (TAGPMA) •  Checklist-based review by TAGPMA of CA’s policies and operations Security Considerations •  Vote for acceptance by TAGPMA members •  Current status: •  Submitted to TAGPMA (March 2009) •  Approved by TAGPMA (May 2009) •  CA certificate included in TERENA Academic CA Repository (TACAR) Federated Login to TeraGrid Federated Login to TeraGrid Security Considerations Security Considerations •  Changes to TeraGrid trust architecture •  Web application security •  Adding InCommon identity providers as trusted entities •  Use HTTPS for privacy and authentication •  Adding web authentication as a trusted method •  Cross-Site Request Forgery (CSRF) attack protections (cookies •  Peering with identity providers (IdPs) and hidden form fields) •  Locked down servers (firewalls, OTP for admin access, etc.) •  IdP decides whether to release identifiers to TeraGrid •  TeraGrid decides to accept IdP assertions – review includes: •  CA security •  IdP serves TeraGrid users •  FIPS 140 level 2 rated hardware security modules •  IdP is operated by a known and respected organization •  Locked down servers •  IdP operates a trustworthy authentication service •  IdP provides globally-unique and non-reassigned identifiers Federated Login to TeraGrid Federated Login to TeraGrid 6 Security Considerations Related Work •  Disallowing account sharing •  Federated CAs (some accredited by IGTF) in Europe: •  Account sharing complicates incident response •  Switzerland: SWITCH SLCS CA for SWITCHaai federation •  Allow only one identifier per identity provider to be linked with a •  Germany: DFN-SLCS CA for DFN-AAI federation given TeraGrid identity •  UK: SARoNGS Credential Translation Service for UK Access •  Incident response Management federation •  Actions may include: •  TERENA Certificate Service for national federations (Denmark, Finland, Netherlands, Norway, Sweden, and more) •  Disable account links •  Disable identity provider trust •  TeraGrid Science Gateways •  Revoke certificates •  Web-based community access to TeraGrid resources •  Coordinate response with TeraGrid security working group, •  Gateways manage their own user registration and authentication InCommon, and IGTF •  May independently support federated login Federated Login to TeraGrid Federated Login to TeraGrid Status •  In production at https://go.teragrid.org since Sep 2009 •  Questions? Comments? •  Supporting logins from 27 institutions •  Issued >800 certificates so far •  Contact: jbasney@illinois.edu •  Work in progress: •  Integrate with TeraGrid User Portal (https://portal.teragrid.org) •  CILogon Project (www.cilogon.org) •  Provide certificates to all InCommon members (not just TeraGrid users) •  Other possible future work for TeraGrid: •  Phase out TeraGrid passwords •  Attribute-based authorization •  Support for OpenID Federated Login to TeraGrid Federated Login to TeraGrid 7 An Identity Provider to manage Reliable Digital Identities for SOA and the Web Ivonne Thomas Christoph Meinel Hasso-Plattner-Institute for IT-Systems Hasso-Plattner-Institute for IT-Systems Engineering Engineering Prof.-Dr.-Helmert-Str. 2-3 Prof.-Dr.-Helmert-Str. 2-3 D-14482 Potsdam D-14482 Potsdam ivonne.thomas@hpi.uni-potsdam.de meinel@hpi.uni-potsdam.de ABSTRACT be it to offer personalized services or to hold it liable in case In this paper, we describe the implementation of our iden- anything bad happens. Examples include: the purchase of tity provider, based on open web service standards, which a good, that requires payment and delivery, or the provision has been extended to distinguish between different qualities of tailored recommendations based on the history of past of identity attributes; therefore enabling a relying party to purchases. distinguish between verified and unverified digital identities. A digital identity usually comprises a limited set of at- Our contribution is the definition and representation of tributes of a ”real-life identity” that characterizes this entity identity meta information for identity attributes on the iden- (cf. also [23] or [7]). Unfortunately, managing numerous tity provider side and the conveyance of this information as digital identities and associated authentication credentials Identity Attribute Context Classes to a relying party. As is cumbersome for most computer users. Users do not only a main result, we propose a format and semantic to in- have difficulties to remember their passwords, they also bear clude identity attribute meta information into security token a great burden to keep their account information up-to-date. which are sent from the identity provider to a relying party To overcome the limitations of the closed domain, open in addition to the attribute value itself. identity management models emerged as a way of sharing identity information across several trust domains in a con- trolled manner. The basic idea is having several places to Categories and Subject Descriptors manage a user’s identity data (so called identity providers) K.6.5 [Management of Computing and Information and to exchange identity attributes between entities hold- Systems]: Security and Protection—Authentication ing identity information (the identity providers) and those consuming it (the relying parties). Open protocols and stan- dards exists to exchange identity attributes as security to- General Terms kens between identity providers and relying parties (cf. e.g. Security OASIS Identity Metasystem Interoperability specification 1.0 [19]). Keywords Nevertheless, when we look at the Internet today, we still find an environment of mostly isolated domains. The rea- SOA Security, Identity Management, Identity Provider, At- sons for the pre-dominance of the isolated model are com- tribute Management prehensible. Isolation allows organizations to retain control over their identity management systems. As organizations 1. INTRODUCTION usually have different legal and technical requirements for Digital Identity Management broadly refers to the estab- identity management, they find it difficult to give up this lishment and controlled use of a persons ”real-life” identity as control. digital identities in computer networks. Looking at the cur- However, with regard to the Internet, we can find many rent online world, performing transactions as online banking, identity attributes which do not require strong verification. online shopping or communicating in social networks has Often the user can enter information into his account which become an inherent part of life. Hereby, personal, identity- does not require any verification. It really depends on what related data plays a major role, since for many activities a a digital identity is used for. If the user logs on to a site to service provider requires details about the identity of a user, prove on repeat visits that it is the same user, it does not matter whether his digital identity matches with his ”real- life identity” as long as it is always the same digital identity he uses to log on. Only if critical transactions are performed, Permission to make digital or hard copies of all or part of this work for as ordering an item or paying for a service, the integrity of personal or classroom use is granted without fee provided that copies are provided user data is required to hold the user liable in case not made or distributed for profit or commercial advantage and that copies anything bad happens. Current approaches for sharing iden- bear this notice and the full citation on the first page. To copy otherwise, to tity data between domains as proposed by the open identity republish, to post on servers or to redistribute to lists, requires prior specific management models mainly considers the attribute value it- permission and/or a fee. IDtrust ’10, April 13-15, 2010, Gaithersburg, MD self, but hardly how this value was collected or whether any Copyright 2010 ACM ISBN 978-1-60558-895-7/10/04 ...$10.00. 26 verification process took place. required by every service or web site provider. Examples In order to enable service providers to rely on information include: the name and address of a person or its birth- from a foreign source, an identity management for the In- day. Hence, basically every service or web site provider has ternet should be able to deal with attributes with a strong identity information, i.e. information about its user’s digi- verification besides attributes without any verification which tal identities, which he could provide to other participants are managed by the users themselves. Moreover, it should (given the user’s consent) and basically every service or web allow a relying party (such as a service) to assess the value site provider also consumes certain information which it re- of received identity information in terms of correctness and quests from the user and which it does not necessarily need integrity. to manage itself. A possible solution towards a more ef- In Thomas et al. [22], we argued that this assessment fective management of identity information is demonstrated should be done on the granularity level of the identity data in Figure 1. Instead of entering the same information into – meaning, that the decision to trust should not only be different user accounts, the user could reference to another made between the issuing and the relying party on a all- account which already contains this information. For ex- comprising level, but for each identity attribute, which is ample, the newspaper publisher would receive the assertion exchanged, separately. To give an example, we could con- that its customer is a student directly from the users univer- sider a university which is trusted to make right assertions sity and the information about the user’s banking account about whether a user is a student, but not about whether information directly from the bank. this user pays its telephone bills. In this paper, we concentrate on the information required Bank Federal Registration in addition to the attribute value itself to make right as- Lastname Office sertion about the credibility of an identity attribute. This Firstname Lastname meta identity information is all information additionally to Birthday Firstname the attribute value itself which enables a relying party to de- Credit Card Birthday Number cide whether it trusts the received value with regard to an Permanent Account Number Address intended transaction. To be specific, we provide an identity provider which • is based on open web service standards, such as WS- Trust, SAML and WS-Metadata-Exchange Newspaper University Publisher Lastname • allows the definition of identity meta data and Lastname Firstname Firstname Birthday • conveys identity meta data as so called Attribute Con- Birthday is a Student text Classes in SAML security tokens to a relying party is a Student Student Number The rest of this paper is structured as follows. Section 2 Delivery Address Permanent Address shows how a scenario could look like which opens up current Customer ID identity islands by using identity information from many Account Number sources across the Internet. In Section 3 we lay some foun- dations by giving a short introduction to claim-based iden- indicates a possibly reliance tity management and the Identity Metasystem. It follows an overview of related work in the area of assurance frame- works as well as a discussion of their limitations in Section Figure 1: Usecase showing independent identity 4. After this, Section 5 introduces the trust model, that we domains and potential reliance on other Identity use to identify and classify identity meta data that a relying Providers party requires to assess identity information from a foreign source. Section 6 describes the implementation of our iden- tity provider with regard to the definition and exchange of meta data between independent trust domains. In the cen- 3. BACKGROUND tre of this section is our extension to the SAML 2.0 token format to convey meta information as part of the security 3.1 Claim-based Identity Management token. Finally, Section 7 concludes the paper and highlights future work. In order to implement a scenario such as introduced in Sec- tion 2, identity management concepts are required that take the decentralized nature of the Internet into account. Open 2. MOTIVATING EXAMPLE identity management models evolved to address exactly this Basically, we can make two observations with regard to requirement. Instead of having isolated identity silos as with the storage and administration of identity information on the traditional approaches, open identity management mod- the Internet, today. The first observation is that basically els are based on the idea of having several places to manage every service provider on the Internet manages information a users identity data (so called identity providers) and to which is specific to its domain, namely the information which share the identity information between these places and the was created during the interaction with a customer and the places where this information is needed. system, such as a customer number. A second observation A concrete implementation of such an open identity man- is that information stored in independent domains is often agement model offers the claim-based identity management. redundant, because certain pieces of a subject’s identity are Claim-based identity management uses the notion of claims 27 to describe identity attributes. A claim is an identity at- which holds all necessary meta data about the interaction tribute named with an abstract identifier (e.g. a URI), between the user and the identity provider, including the which applications and services can use to specify the at- URI to contact the IdP, the authentication to the IdP, the tributes they need as for example a name or a user’s address. claims the IdP can assert as well as supported token types. Given as a URI, claims provide a platform-independent way It is important to note, that Information Cards do not con- to present identity information and are well integrated into tain any claim values, only the information how to connect the open web service standards such as SAML [8], WS-Trust to an identity provider to obtain asserted claims as security [15] or WS-Policy [6] which can be used to request and ex- tokens. change identity information as claims. Finally, the identity selector is a piece of software on the user’s system which handles the communication between the 3.2 The Identity Metasystem relying party and the identity provider and provides a con- As claim-based identity management provides interoper- sistent user interface to manage Information Cards. Upon ability among different identity systems, it is also used as request, the identity selector retrieves the policy of the re- one possibility to implement a related concept, the concept lying party, matches the requirements with the Information of an Identity Metasystem. Identity Metasystems provide Cards of the user and presents the user with a selection of an identity layer on top of existing identity systems and suitable identity providers, from which he can choose. The promise an easier management of digital identities among identity selector takes care of performing the authentication the Internet. This layer abstracts from concrete technolo- procedure between the user and IdP (e.g. by requesting a gies and provides the necessary mechanisms to describe, ex- password or digital signature) and sends an request for a change and distribute identity information across identity security token to the identity provider. Upon successful au- management solutions. thentication, the identity provider answers with a security token, which the user can use to prove his identity to the relying party. 4. RELATED WORK The need to trust on information received from a foreign party is inherent to open identity management systems. If a relying party has to rely on identity information received from a foreign party, the need for assurance that the infor- mation is reliable is a natural requirement prior to using it. In order to address this need, several initiatives around the world have defined assurance frameworks which cluster trust requirements into different levels of trust. A level of trust or level of assurance (LoA) reflects the degree of confidence that a relying party can assign to the assertions made by another identity provider with respect to a user’s identity Figure 2: Participants involved in the Identity information. Metasystem (FMC Block Diagram [12]) 4.1 Assurance Frameworks In the area of authentication trust level, the UK Office To do so, the Identity Metasystem distinguishes three dif- of the e-Envoy has published a document called ”Registra- ferent types of participants as denoted in Figure 2: the con- tion and Authentication – E-Government Strategy Frame- sumer of identity information (relying parties), authorities work Policy and Guideline” [20]. In this document the initial which manage and provide users’ digital identities (identity registration process of a person with the system as well as provider) as well as a component to choose a digital iden- the authentication process for a user’s engagement in an tity, called identity selector, and the user. In fact, putting e-government transaction are defined. Depending on the the user in the center of all decision processes regarding his severity of consequences that might arise from unauthorized identity and creating a consistent and justifiable user experi- access, four authentication trust levels are defined, reaching ence belongs to the main principles of Identity Metasystems. from Level 0 for minimal damage up to Level 3 for substan- These principles which explain sucesses and failures of iden- tial damage. The more severe the likely consequences, the tity management systems have been written down by Kim more confidence in an asserted identity will be required when Cameron in the Laws of Identity [7]. engaging in a transaction. For example, for filing an income The relying party is a service or Web site, which requires tax return electronically, an authentication trust level of two a certain set of user attributes / claims to perform a cer- is needed, which is reached when the client can present a tain action. Instead of managing this information itself, it credential (preferable a digital certificate) and can proof his allows users to authenticate themselves at a federated iden- right to that credential, e.g. by signing it with his private tity provider and then relies on the assertion issued by this key. identity provider. The e-Authentication Initiative, another approach, is a An identity provider (IdP) holds digital identities of reg- major project of the e-government program of the US. The istered users for the purpose of provisioning these identities, core concept is a federated architecture with multiple e- or portions of them, to a party willing to rely on this in- government applications and credential providers. In order formation (the relying party). Upon successful registration to assist agencies in determining the appropriate level of the identity provider issues a so-called Information Card, identity assurance for electronic transactions, the initiative 28 has published a policy called ”E- Authentication Guidance Also, using existing assurance frameworks, it is hard to for Federal Agencies” (OMB M-04-04) [10]. The document reflect possible changes of a user’s identity trust level over defines four assurance levels, which are based on the risks time. As identity proofing processes are cost-intensive and associated with an authentication error. The four assurance time-consuming due to the effort required to verify a user’s levels reach from ”little or no confidence in the asserted iden- identity attributes, a verification of an attribute might not tity” to ”very high confidence in the asserted identity”. be desired as long as a user is not involved in transac- In order to determine the required level of assurance, a risk tions that demand a higher trust level. Therefore a user assessment is accomplished for each transaction. Hereby, the might decide to register with an identity provider without potential harm and its likelihood of occurrence are identi- proper identity proofing, having for example his/her name fied. The technical requirements that apply for each assur- self-asserted and getting involved in the identity proofing ance level are described in a recommendation of the Na- only upon concrete requirement. This requires a different tional Institute of Standards and Technology (NIST), which trust level per user and does not allow to rate an identity is called ”Electronic Authentication Guideline”(NIST 800- provider as a whole. 63) [17]. This document states specific technical require- Furthermore, identity providers are inherently different ments for each of the four levels for the token type, the au- due to their affiliation with an organization or institution thentication protocol as well as the types of attacks which and might be suitable for asserting certain identity at- need to be prevented. tributes only to a limited extent. For example, a banking A quite comprehensive approach that extends the OM- identity provider will be in particular suitable to assert that B/NIST levels has been proposed by InCommon, a fed- a user can pay for a certain service, but might have weak eration of more than 100 members from industry, govern- records of the user’s status as a student while for a univer- ment and the higher education sector [11]. InCommon uses sity’s identity provider it would probably be the opposite. the Shibboleth specifications and defines an Identity Assur- In fact, such a diversity of identity provisioning sources is ance Assessment Framework. Aspects covered are Business, intended in the user-centric model which aims at reflecting Policy and Operational Factors, Registration and Identity the way identities are managed in the real world. Proofing, Digital Electronic Credential Technology, Creden- Taking all these facts into account, current approaches are tial Issuance and Management, Security and Management of likely to work for federations in which members have similar Authentication Events, Identity Information Management, trust requirements, but are less likely to work when applied the Identity Assertion Content as well as the Technical En- to the open market and user-centric models. vironment. In our approach we aim at providing identity meta infor- Further approaches have been developed as part of the mation for identity attributes in order to allow an identity Liberty Alliance project’s Identity Assurance Framework [1] provider to manage a mix of verified and not-verified at- as well as in the context of the european Stork project [5]. tributes and more importantly in order to enable a relying party to distinguish between these different qualities of trust. 4.2 Limitations 4.3 Levels of Assurance for Attributes Current approaches for assurance frameworks as described Work regarding trust levels for attributes has been con- in the previous section provide a comprehensive assessment ducted by Chadwick et al. in [9]. Chadwick et al. build for identity providers by defining(gathering) trust require- on NIST’s concept of assurance levels. Similar to our work, ments with regard to all the processes, technologies, techni- they propose to have separate metrics for identity proofing cal infrastructure and further protection in place that have processes (expressed in the Registration LOA) and the au- an influence on the degree of confidence into the assertion’s thentication of a subject (expressed in the Authentication contents made by an identity provider. The result is a LOA). Authentication LOA and Registration LOA are com- global trust semantics, which allows a classification of iden- bined to a Session LOA and sent in each assertion from an tity providers with respect to different levels of trust. Such a identity provider to a service provider. Compared to this, classification can serve as the input to policy frameworks as our work is targeted more towards the relying party site. well as a base for contracts and inter-organizational agree- In our work, we aim at providing more choices for a rely- ments. ing party’s access control decisions by conveying not only Although current approaches provide a quite comprehen- a trust level, but also trust-related information to be eval- sive assessment, a number of limitations exists. Existing as- uated during access control. For this purpose, we propose surance frameworks mostly refer to the identity as a whole, to extend existing protocols by so called Attribute Context but do not refer to trust requirements of specific attributes. Classes that contain, besides a basic trust level, further meta It is for example not possible to distinguish between self- data to enable the relying party to assess the trustworthiness asserted attributes an identity provider might manage be- of the received information. sides attributes that were verified. Especially with regard to platforms of non-institutional providers such as Face- book, users often prefer using pseudonyms when acting in 5. A LAYERED TRUST MODEL these communities. In fact, in blogs and forum discussions, This section presents our trust model used by a relying anonymity of users is a frequent requirement. Also for over- party such as a service provider to accept identity informa- 18-services, anonymity of the users often is in favor while at tion from a foreign partner and to perform access control the same time a verified assertion of a user’s age is required. decisions based on the received information. In this model For these purposes, an identity provider could manage self- we basically distinguish between two types of trust. First, asserted attributes besides verified attributes. When doing a trust relationship is required between the service provider so, reflecting these differences in the assertions is a major and the identity provider in order to trust the correctness requirement. of the assertions and second, for a concrete transaction, the 29 service provider has to decide whether the identity-based a level of trust, which in almost all frameworks is one of information in the assertions are sufficient to reach a cer- {1, 2, 3, 4}. tain trust level which is required to perform the request. In the trust model underlying our implementation, we While in the first case, the trust relationship is of a long- use a simplified variant of this function with two trust lev- running kind, the trust establishment in the second case is els {trusted, untrusted }. Our trust criteria is the identity part of identity-based access control mechanisms. We call provider as a whole (Issuer ): the first kind of trust, organizational trust and the second kind identity trust. The following section gives a detailed isTrusted : Issuer 7→ {trusted, untrusted} characterization and comparison. 5.1 The Concept of Organizational Trust Organizational trust refers to the quality of the trust re- 5.2 The Concept of Identity Trust lationship between the participants of a SOA or web-based Identity trust refers to the trust an entity such as a service scenario. When service consumers and service providers are provider has into the identity of a subject and its behavior. located within the same trust domain, registration, authen- While the organizational trust level indicates the credibility tication and management of participants happen under the of the issuer of assertions, the identity trust level indicates same administrative control and are, therefore, usually fully the trustworthiness of the subject about which assertions are trusted. However, with regard to cross-organizational sce- made. Identity trust is established by credentials that verify narios involving services from different organizations, trust properties of the subject. In the claim-based identity man- between the participants of a SOA is not given per default. agement model, these required properties to build up trust Models for identity management as federated identity man- (trust requirements) are expressed as claims and exchanged agement establish cross-organizational trust by setting up in security tokens. In order to assess the trust into the iden- federation agreements and contracts to extend the trust do- tity of a subject, such as a user, a relying party needs to main of an organization to the federation. Having a feder- assess the received tokens. Hereby, several factors influence ation or not, whenever organizational borders are crossed, the trustworthiness. In order to identify these factors, we the question of whether the partner is trusted arises. Fac- use our model of a digital identity. tors as past experience, the minimum trust settings for, for example, registration and authentication of users or the rep- Credential utation of a company are important properties to assess the authenticates 1..* trustworthiness of the potential business partner. Also, the kind of business relationship is an important factor. A B2B relationship is usually much more trustworthy than a B2C 1 relationship due to contracts which manifest certain obliga- 1 * tions and procedures of the business partners. In order to Subject registers Account Attribute classify different qualities of trust relationships, assurance is managed by 1 frameworks exists to help business partners to assess their identity management services. (cf. Section 4). However, a consists of detailed assessment is not always feasible. Sometimes the decision to trust is founded on much fewer assessments. Es- 1..* pecially in the user-centric model, a relying party such as an 1 Digital 0..* 1..* Subject Identity consists of Attribute online store might decide to trust an identity provider based Identity globally known as 1 1 on soft criteria as the reputation or global image of the com- Provider Data refers to pany running the identity service rather than on verifiable 1 Transfer facts. Object In our trust model, we assume that any kind of assessment 1 has been done by the relying party and led to a classification * * Token of identity providers into two (trusted, not trusted) or more is issued by Global Type levels of trust. It is important to note that this decision is (ClaimType) 1 specific to a relying party and can be based on strong con- tracts, the certification of an identity provider by a trusted Issuer 1 Information authority, past experiences just as any other trust criteria that the service provider regards as appropriate. 1..* Assertion 5.1.1 Formalism On an abstract level, we can express the quality of any trust relationship as a mapping from a set of Trust Criteria Figure 3: Model of a digital identity based on [13] (TC) to a level of trust or level of assurance (LoA): Figure 3 shows our model of a digital identity which we ex- isTrusteduni : (T C1 , ...T Cn ) 7→ LoA tended from Menzel et al.[13]. This model shows the major relationships between the identity provider, the concept of a digital identity, accounts as well as token and authentication This is exactly what assurance frameworks do. Assurance credentials. As can be seen from the picture, a digital iden- frameworks define a mapping from certain trust criteria to tity consists of several Subject Attributes and is hold in an 30 Account. Each Account can comprise several Digital Identi- Depending on the needs, we plan to extend this function in ties. Using this model, we can identify the aspects that have future implementations. an influence on the overall trust into an identity. These are To derive the overall credibility, we combine the results of (as marked in green): the functions isTrusted and isVerified. The way, in which both results are combined shall be defined by a function h, TC-1 Trust into the authentication process and the which can be application-specific or globally defined. The subject-to-account mapping. function h describes, in which way the fact whether an iden- TC-1 refers to the trust that an identity provider asso- tity attribute has been verified is combined with the fact ciates a specific subject with the correct record in the iden- whether this has been done by a trusted identity provider. tity provider database during an authentication event. To follow our observation, we would define the credibility of a claim to be 1 only if the claim was verified and issued by TC-2 Trust into the subject’s attributes. a trusted issuer. In all other cases, it is 0. A mathematical TC-2 refers to the process of identity / attribute proofing definition for h is given below. and the mechanisms used to verify a specific attribute. credibility(issuer, claim) 7→h(isTrusted(issuer), TC-3 Trust into the token. isVerified(issuer, claim)) TC-3 refers to the characteristics of the data transfer be- tween the identity provider and a service provider, e.g. the nature of the token and mechanisms used to protect the to- with h e.g. defined as ken from being forged, replayed or altered. All these factors are subject to vary between different dig- h : {trusted, untrusted} × {verified, unverified} 7→ {1, 0} ital identities of the same or different users within an iden- ( 1, if b1 = trusted and b2 = verified tity provider. In this case, a relying party needs to check on h : (b1 , b2 ) 7→ them per transaction. For example, if an identity provider 0, otherwise offers various ways of authentication, the relying party re- Of course, alternative definitions of h are possible to model quires to know whether the user typed in a password or other trust behavior. In [22], we give for example the follow- presented a signed certificate. The same holds for the sub- ing definition of h which distinguishes three different levels ject’s attributes. If the process of identity proofing varies be- of trust. tween different attributes or different users, a relying party requires to know whether the user presented her/his ID card h : {trusted, untrusted} × {verified, unverified} 7→ {2, 1, 0} upon registration or whether the name was self-assigned. Of 8 course, if these factors are static, it is reasonable to consider <2, if b1 = trusted and b2 = verified > them as part of the organizational trust relationship as it is h : (b1 , b2 ) 7→ 1, if b1 = untrusted and b2 = verified > :0, otherwise usually done in current frameworks. As we aim in our identity provider to provide digital iden- tities with varying qualities of user attributes, we focus on Please refer to [22] for further details. TC-2 and define a metric on the subject attributes. 5.4 Comparison 5.3 Formalisms Table 1 summarizes the concepts of organizational trust We define AttributeTrust to be a function which returns and identity trust and compares them. As Organizational the strength of the attribute proofing process in dependence Trust refers to the quality of the trust relationship between of the issuer and a certain attribute. organization, it implicitly answers the question: ”Can we trust the issuer of a token?”. The decision to trust another AttributeTrust : (Issuer, Attribute) 7→ AttributeLoA entity as an identity provider in a SOA or web-based infras- tructure, is a decision which is drawn before any messages start flying around. Usually, federation agreements or sim- As with the isTrusted function defined in 5.1.1, it requires a ilar contracts are negotiated and signed when setting up common semantic of the AttributeLoA. Again, it is possible the federation. These decisions are then configured in the to cluster different trust requirements into levels of assur- infrastructure. As compared to this, identity trust is the ance. Caution has to be taken as trust requirements usually trust between the subject of the transaction and the service differ between attribute groups, for example processes to provider. It is service-call specific and therefore is negotiated verify a name might be different from processes to verify a each time, a call for a new transaction receives. membership or the ownership of a specific email-address. In the trust model underlying our implementation, we 6. IMPLEMENTING AN IDENTITY use a variant of this function which uses two trust levels with a common semantic {verified, unverified } and leave PROVIDER FOR VERIFIED DIGITAL the specifics for each attribute to be checked separately. We IDENTITIES define, isVerified to be a function which returns whether This section describes our implementation of a trust-aware an identity attribute/ claim was verified by the identity claim-based identity provider. The section starts with a provider. short description of the technical and functional character- isVerified : (Issuer, Attribute) 7→ {verified, unverified} istics of the existing identity provider. After this, Section 6.2 shows a use case which demonstrates the use of identity meta data in our identity provider. The next sections give 31 Organizational Trust Identity Trust 6.1.2 Technical Details refers to the quality of refers to the identity as- The prototype is developed in Java utilizing a number of the trust relationship be- sociated with a transac- open-source libraries. Most important are Suns web service tween organizations tion stack Metro [4] for handling web services and supporting Can we trust the issuer Can we trust the subject web service security mechanisms such as the security token of a security token? in the token? service, openid4java to provide support of the OpenID 2.0 determined out-of-band determined during ser- Authentication protocol [3] as well as maven [2] to provide vice call configuration and deployment options. configurable negotiable A single Web application makes up the prototype, which is deployed and run in Apache Tomcat. The web application Table 1: Comparison of Identity Trust and Organi- offers a web interface as well as a web service-based interface. zational Trust 6.2 Prototype Use Case This section describes a small use case which demonstrates insights into our implementation. We describe, how we de- the use of identity attributes with different qualities in our fined a data structure to express identity meta data as so identity provider. Figure 4 shows the attribute manage- called Attribute Context Classes and how we extended the ment page of our identity provider. On this page, a user SAML 2.0 assertion specification to send identity meta data can manage its identity attributes and assign them to dig- as part of security token. ital identities, which are shown on the right-hand side. As can be seen in Figure 4, a user can have several attributes of 6.1 Existing Identity Provider the same type such as the E-Mail Address or Given Name. This section gives a short overview about our implemen- The type of the identity attribute is mapped to the protocol tation of an identity provider which is in the focus of this specific type defined by the protocol which is used to request paper. attributes. In case of Information Card, the type is mapped to the global claim types and in case of OpenID the type 6.1.1 Functional Details refers to the attributes defined by the OpenID community Our prototype is an implementation of an identity that can be used with OpenID Attribute Exchange (cf. e.g. provider for service-oriented architectures as well as web ap- AXSchema.org). For each stored identity attribute the in- plications which features formation whether this attribute has been verified during the collection of the data is shown. Moreover, additional infor- • a security token service in accordance to the WS-Trust mation about the verification process is available as can be specification 1.3 [16] seen in Figure 5. Figure 5 shows all available identity meta data for a specific attribute type and for all attributes which • an information card provider based on the specification have been verified. In this example, the user has registered of SAML 1.1, SAML 2.0 as well as Information Card three different email addresses - two of which are verified and one which is unverified. Looking at the verification de- • an OpenID Provider according to the OpenID 2.0 Au- tails, we find additional information for the two which have thentication specification [21] been verified. One important information in the meta data It provides is the source of the identity attribute. The source is the entity which provided the data. For example, it is possible • security token service functionality including that the verification process is the same, but has been per- formed by different identity providers. One use case that – a WS-Meta Data Exchange endpoint to request shows the relevance of this is the following: If, for example, meta data an identity provider is federated with another partner and – requesting, issuing and signing of security tokens the user decides to link its accounts and to share a certain attribute, so that this attribute is available in both identity – support for authentication via username token or providers, the source would indicate the original identity certificate provider that verified that attribute. In case this attribute is issued to another party, the information who verified the • information card provider functionality including attribute will be of interests for the relying party to assess the trustworthiness of the information as the organizational – issuance of information cards for digital identities trust might differ between the issuing and the verifying iden- – creation, editing and deletion of claim types tity provider. As such a federation scenario still bears many – support for various identity selectors open questions, it is due to future work. In our example in the current implementation, the source is in one case an au- • general identity management system functionality in- thority (the company itself) and in the other case the user, cluding who has provided the data. In addition to the source, the verification method is detailed in the identity meta data. – the creation, editing and deletion of multiple dig- Again in our example, this is for the first email address, the ital identities per user company itself who is acting as an email provider and in – creation, editing and deletion of claims the second case, in which the user had entered the data, a verification email had been sent. – assignment of attributes to digital identities Upon request, this information is sent as part of the se- 32 curity token to a requesting party. Our demo application to of verified, unverified or unknown for all claims, the verifica- show the use of the identity provider in a complete scenario tion details can differ tremendously between different types is a classical web site for an online store selling music files of claims. Therefore, we keep the data structure at this point which is shown in Figure 6. To complete the purchase, sev- very general and easily adaptable and extensible. The next eral personal attributes are requested from the user, such as section goes into detail about this. his name, address and payment information. Furthermore, In order to model organizational trust, at the moment the music stores requires a valid email address to deliver the we simply store for each issuer of security tokens, whether purchased mp3 files to. Therefore, once the store receives we trust this issuer to make right assertions. As a possi- a security token from the identity provider, it will check ble refinement in future work, one could also store certain whether the provided email address fulfills this requirement. meta information, which is specific to this issuer, such as identity provider meta information. Such meta information Identity Attribute Meta data could include, but is not limited to the authentication pro- This page shows additional information about the registration and issuance cess supported by an identity provider as well as aspects processes with regard to your attributes. concerning the storage and management of tokens. E-Mail Address 6.3.2 Attribute Context Classes We use so called Attribute Context Classes to define meta Source of identity data Verification Method Identity Metadata (Where does the data come (How was the identity data for claims. The notion of Attribute Context Classes from?) attribute verified?) ivonne.thomas@hpi.uni- issued by an authority has been inspired by a former specification in the SAML issuer is owner potsdam.de (HPI) community, that is the one of the so-called Authentication ivonne567@gmx.de entered by the user verification mail has Context Classes [18]. Authentication Context Classes are a been sent concept which was introduced in SAML 2.0 and which allows to specify meta data for the authentication used between two parties. As the security of an authentication mecha- Figure 5: Identity Provider prototype screenshot nism depends highly on the values, which characterize such showing identity meta information for the email ad- an authentication method, SAML Authentication Classes dress of a user. offer the possibility to describe the authentication process in much more detail. While with SAML 1.1 it was only pos- sible to state that an authentication process was performed 6.3 Using Identity Meta Data using a specific authentication method as, for example, a password, Kerberos or a hardware token, SAML 2.0 now This section describes selected implementation aspects allows to specify how the authentication was performed in with regard to the use of identity meta data in the iden- addition to the fact that it was performed. This way it is tity provider. possible to state whether a password with a length of two 6.3.1 Identity and Organizational Trust characters was used or a password with six characters, which was well-chosen and has a limit of three false attempts. Given the classification into Organizational and Identity For the identity meta data, we adapted the idea and de- Trust as described in Section 5, this section shows its appli- fined our own data model, which contains the following ele- ance in the identity meta system upon which our prototype ments: implementation is based. As said before, there are three dif- ferent types of participants in the identity meta system: the • Attribute Context This data element holds the at- identity providers, the relying parties and the clients/users. tribute context, which is comprised of all additional A relying party usually specifies a list of identity provider information to the attribute value itself. This element it trusts to make right assertions. Using the notion of claims, is the upper container for all identity metadata. the relying party can express for a list of claims the issuer(s) • Attribute Data Source This data element indicates it will accept tokens from. When receiving a security token, the source from which the attribute value was origi- the relying party verifies the issuer of the token by checking nally received and is part of the Attribute Context. whether the signature of the token matches the certificate This can be for example another identity provider, of one of his trusted identity providers. This is in accor- some authority as a certificate authority or the user dance to our notion of Organizational Trust. Only upon himself who entered the data. correct verification, the relying party will continue with the information in the token. • Verification Context This data element holds the The information in the token is required to build up iden- verification context, which comprises all information tity trust, that is the trust that the requesting user is in fact related to the verification of an identity attribute value. entitled to access the system. Therefore, the relying party The Verification Context is one specific context within lists the required identity information as claims in its pol- the Attribute Context. icy. Upon retrieval of this information from a user’s identity • Verification Status This data element indicates the provider, the relying party checks the value of the identity verification status of an identity attribute value, which data with its access control policy and makes an entitle- should be one of verified, not verified or unknown. The ment decision. We store for each claim in accordance of the verification status is part of the verification context. issuer certain attribute meta data information. This is on one hand the information whether a claim value has been • Verification Context Declaration The verification verified (the verification status) and on the other side cer- context declaration holds the verification process de- tain verification details. While the verification status is one tails. Such a detail could for example be the method 33 Figure 4: Identity Provider prototype screenshot showing the management of verified and unverified identity attributes. Figure 6: The MP3 Store Scenario: Acting as a relying party of the identity provider. 34 that has been used for verifying the correctness of the Context contains the data source of the attribute value attribute. Further extensions are possible and should as well as a verification context, which is meant to con- be added here. The verification context declaration tain all information about the verification of the attribute. besides the verification status make up the verification This includes the verification status besides further informa- context. tion about the verification process comprised in an element named VerificationContext as for example the verification 6.4 SAML Attribute Statement Extentions method. The verification method is dependent on the at- In order to exchange identity meta data as part of SAML tribute type. Therefore this element can encompass any assertions, we introduce extensions to the SAML 2.0 schema. element structure and is intended to be extended by a suit- These extensions allow to specify an attribute context to able data structure to describe an attributes verification. All hold further information about an attribute value. The additional elements are listed in the following with a brief XML schema in Listing 1 presents our extensions, which are explanation of their meanings: defined in a new namespace: http://de.hpi.ip/saml20/ext. • AttributeContext This element holds the attribute context. This element can be used within the SAML AttributeStatement element. • VerificationStatus This element holds the verifica- defined as a general string to allow possibly extensions later on. Listing 2 gives an example that uses the introduced schema. According to our use case described in Section 6.2, the assertion states that the email address of the user is staff@company.de and has been verified. The method used to the user. x m l n s : s a m l=” u r n : o a s i s : n a m e s : t c : S A M L : 2 . 0 : a s s e r t i o n ” r e f=” V e r i f i c a t i o n C o n t e x t D e c l ” /> [...] MaxMustermann name=” V e r i f i c a t i o n C o n t e x t D e c l ”> [...] FriendlyName=” e m a i l A d d r e s s ”> x s i : t y p e=” x s : s t r i n g ”>staff@company . de user Listing 1: XML schema definition of identity meta- data extensions verified The root element is the AttributeContext, which is added to the complex type Attribute of the SAML 2.0 namespace. level in the SAML 2.0 type AttributeValue. The Attribute- 35 [8] S. Cantor, J. Kemp, E. Maler, and R. Philpott. Assertions and Protocols for the OASIS Security Assertion Markup Language (SAML) V2.02. OASIS Standard Specification, 2005. [9] D. W. Chadwick and G. Inman. Attribute aggregation in federated identity management. Computer, 42:33–40, 2009. Listing 2: Example for a SAML security token con- [10] e-Authentication Initiative, US. E-Authentication taining identity meta information. Guidance for Federal Agencies. http://www.whitehouse.gov/omb/memoranda/ fy04/m04-04.pdf, 2007. [11] InCommon Federation. Identity Assurance Assessment 7. CONCLUSION Framework. Past experiences have shown that there would be no sin- http://www.incommonfederation.org/docs/assurance/ gle center to the world of information. In order to get from InC IAAF 1.0 Final.pdf, 2008. the isolated model, in which each consumer of identity in- [12] A. Knoepfel, B. Groene, and P. Tabeling. Fundamental formation manages this information himself to an identity Modeling Concepts. John Wiley & Sons Ltd, 2005. management which takes the decentralized nature of the In- [13] M. Menzel and C. Meinel. A security meta-model for ternet into account, we argue that consumers of identity service-oriented architectures. Services Computing, information need to be able to assess and distinguish the IEEE International Conference on, 0:251–259, 2009. quality of the information they receive. In particular, with [14] Microsoft. Microsoft’s Vision for an Identity regard to the launch of electronic ID cards as fostered by Metasystem, May 2005. several european governments, different sources of identity [15] A. Nadalin, M. Goodner, M. Gudgin, A. Barbir, and information will have a different quality in terms of correct- H. Granqvist. WS-Trust 1.3. OASIS Standard ness and integrity. To have this information integrated into Specification, 2007. OASIS Standard. current identity management models is the essence of this [16] A. Nadalin, M. Goodner, M. Gudgin, A. Barbir, and paper. Therefore, we defined a data structure to express H. Granqvist. WS-Trust 1.3. http://docs.oasis- identity meta data as so called Attribute Context Classes open.org/ws-sx/ws-trust/v1.3/ws-trust.pdf, 2007. and extended the SAML 2.0 assertion specification to send OASIS Standard. identity meta data as part of security token. As a proof of [17] National Institute of Standards and Technology. concept, we presented an identity provider which is able to Electronic Authentication Guideline. manage user-defined digital identities besides verified digi- http://csrc.nist.gov/publications/nistpubs/ tal identities. Therefore, for each identity attribute, a so 800-63/SP800-63V1 0 2.pdf, 2006. called claim, an attribute context is stored to hold informa- [18] OASIS. Authentication Context for the OASIS tion such as the method of verification that has been used Security Assertion Markup Language (SAML) V2.02 . to verify the claim. OASIS Standard Specification, March 2005. As part of future work, we plan to extend the definition of [19] OASIS. Identity Metasystem Interoperability Version required claims in web service policies by a policy reflecting 1.0 . OASIS Standard, July 2009. the additional identity meta data required to assess a claim value. [20] Office of the e-Envoy, UK. Registration and Authentication - e-Government Strategy Framework Policy and Guidelines. 8. REFERENCES http://www.cabinetoffice.gov.uk/csia/documents/pdf/ [1] Liberty Identity Assurance Framework. RegAndAuthentn0209v3.pdf, 2002. http://www.projectliberty.org/content/download/ [21] OpenID Authentication 2.0 - Final Specification. 4315/28869/file/liberty-identity-assurance-frame- http://openid.net/specs, 2009. work-v1.1.pdf, 2007. [22] I. Thomas and C. Meinel. Enhancing claim-based [2] Apache Maven. http://maven.apache.org/, 2009. identity management by adding a credibility level to [3] openid4java - Project Hosting on Google Code. the notion of claims. In SCC ’09: Proceedings of the http://code.google.com/p/openid4java/, 2009. 2009 IEEE International Conference on Services [4] The Sun Metro Web Service Framework. Computing, pages 243–250, Washington, DC, USA, https://metro.dev.java.net/, 2009. 2009. IEEE Computer Society. [5] The Stork Project Page. [23] P. Windley. Digital Identity. O’Reilly, 2005. http://www.novay.nl/okb/projects/stork/4561, 2010. [6] S. Bajaj, D. Box, D. Chappell, F. Curbera, G. Daniels, P. Hallam-Baker, M. Hondo, C. Kaler, D. Langworthy, A. Nadalin, N. Nagaratnam, H. Prafullchandra, C. von Riegen, D. Roth, J. Schlimmer, C. Sharp, J. Shewchuk, A. Vedamuthu, Ümit Yalçinalp, and D. Orchard. Web Services Policy 1.2. Technical report, W3C, http://www.w3.org/Submission/WS-Policy/, April 2006. [7] K. Cameron. The Laws of Identity, 2005. 36 An Identity Provider to manage Reliable Digital Identities for SOA and the Web Ivonne Thomas, Prof. Christoph Meinel Research School on “Service-Oriented Systems Engineering” Hasso-Plattner-Institute, University of Potsdam April 2010 Tuesday, April 13, 2010 Short Introduction 2 ■ PhD. Student at the Hasso-Plattner-Institute (HPI), University of Potsdam ■ Member of the Research School for Service-Oriented Systems Engineering ■ 3rd year ■ Research Focus on □ Security □ Service-oriented Architectures ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Motivation 3 The law states that southkorean web sites with at least 100,000 daily visitors must force users to register with verifiable real names. Real Name Policy Act, South Korea ■ Very controversial!, BUT: □ we find different requirements for the reliability of identity attributes in the online world □ users have verified identities besides anonymous identities □ user need to decide which identity to use in correspondence with the provider ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Identity Assurance Frameworks 4 ■ need to trust on information from a foreign party is inherent to open identity management systems! ■ basic principle: cluster trust requirements into levels of trust ■ A level of trust (level of assurance (LoA)) □ reflects the degree of confidence that a relying party can assign to the assertions made by another identity provider with respect to a users identity information ■ Several initiatives have formed and proposed approaches ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Identity Assurance Frameworks Examples 5 ■ UK Office of the e-Envoy □ ”Registration and Authentication – E-Government Strategy Frame- work Policy and Guideline” ■ US e-Authentication Initiative □ ”E- Authentication Guidance for Federal Agencies” (OMB M-04-04) ■ NIST □ ”Electronic Authentication Guideline”(NIST 800- 63) ■ InCommon federation □ Identity Assurance Assessment Framework □ Bronce and Silver Profile ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Assurance Frameworks Limitations 6 ■ Identity is mostly considered as a whole □ no distinction between different qualities of trust ■ no changes of a trust level over time □ identity attributes are gathered during the registration and often fix ■ hard to reflect the uniqueness of identity providers with regard to their ability to assert certain identity attributes ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Everybody is Identity Provider Everybody is Relying Party 7 Bank Federal Lastname Registration ■ Every Participant on the Internet Office Firstname Lastname □ needs identity information Birthday Firstname Credit Card Birthday □ has identity information, he Number Account Number Permanent Address could share ■ Aim: □ Decentralized storage of Online Store University identity information to Lastname Lastname – reduce redundancy Firstname Firstname Birthday – ease maintenance Birthday is a Student is a Student Student Number Delivery Address Customer ID Account Number ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 AGENDA 8 ■ Motivation & Introduction ■ Related Work: Assurance Frameworks □ Limitations ■ The need for Levels of Assurance for Attributes □ Our Model of a Digital Identity □ A Layered Trust Model ■ An Identity Provider to manage Reliable Digital Identities □ Identity Meta Information □ SAML Attribute Statement Extensions □ Demo ■ Conclusion ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Model of a digital identity 9 Credential ■ TC-0 Trust into the authenticates 1..* Identity Provider 1 ■ TC-1 Trust into the 1 * Subject registers Account Attribute authentication process is managed by 1 and the subject-to- consists of account mapping 1..* ■ TC-2 Trust into the Digital 0..* Subject subject’s attributes. 1..* 1 consists of Identity Attribute Identity globally known as 1 1 Provider Data ■ TC-3 Trust into the refers to 1 Transfer Object token. 1 * * Token is issued by Global Type (ClaimType) 1 Issuer 1 Information 1..* Assertion ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 What about trust? 10 Online Store ■ Claim-based Identity Management allows Lastname Firstname □ to state the attributes a relying party requires on a per- Birthday claim basis is a Student Delivery Address Customer ID Account Number ■ Trust □ is usually defined in a general manner Tr – between organizations u st – complex contracts balance the risk between independent organizations ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 What about trust? 10 Online Store ■ Claim-based Identity Management allows Lastname Firstname □ to state the attributes a relying party requires on a per- Birthday claim basis is a Student Delivery Address Customer ID Account Number ■ Trust □ is usually defined in a general manner Tr – between organizations u st – complex contracts balance the risk between independent organizations ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Layered Trust Model 11 Trust is required on two levels ■ between the service provider and the identity provider □ general requirement to trust the issuer of an assertion □ = Organizational Trust ■ for a request: between the service provider and the requester □ for a concrete request to trust the subject of an assertion □ = Identity Trust User's Identity Remote Identity Provider Remote Provider Students Internet Student Lecture Subscription Service University A Partner University Security Domain ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Comparison 12 Organizational Trust Identity Trust refers to the quality of the trust refers to the identity associated relationship between organizations with a transaction Can we trust the issuer of a Can we trust the subject in the security token? token? determined out-of-band determined during service call configurable negotiable ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Model of a digital identity revised 13 Credential ■ Organizational Trust authenticates 1..* □ Identity Provider □ TC-1 Trust into the 1 1 * Subject Account Attribute registers authentication is managed by 1 process and the consists of subject-to-account mapping 1..* Digital 0..* 1..* Subject 1 consists of Identity Attribute Identity □ TC-3 Trust into the globally known as 1 1 Provider token. Data refers to 1 Transfer Object 1 is issued by * Token Global Type * ■ Identity Trust (ClaimType) 1 □ TC-2 Trust into the 1 Issuer Information subject’s attributes. 1..* Assertion ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Formalism 14 Two aspects ■ Is the issuer of the assertion trusted? (Organizational Trust) isTrusted : Issuer �→ {trusted, untrusted} ■ Has the attribute been verified by the issuer? (Identity Trust) isVerified : (Issuer, Claim) �→ {verified, unverified} ■ Trust into a claim ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Formalism 14 Two aspects ■ Is the issuer of the assertion trusted? (Organizational Trust) isTrusted : Issuer �→ {trusted, untrusted} ■ Has the attribute been verified by the issuer? (Identity Trust) isVerified : (Issuer, Claim) �→ {verified, unverified} ■ Trust into a claim h : {trusted, untrusted} ×{ verified, unverified} �→ {1, 0} � 1, if b1 = trusted and b2 = verified h : (b1 , b2 ) �→ 0, otherwise ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Formalism 14 Two aspects ■ Is the issuer of the assertion trusted? (Organizational Trust) isTrusted : Issuer �→ {trusted, untrusted} ■ Has the attribute been verified by the issuer? (Identity Trust) isVerified : (Issuer, Claim) �→ {verified, unverified} ■ Trust into a claim h : {trusted, untrusted} ×{ verified, unverified} �→ {2, 1, 0}  2, if b1 = trusted and b2 = verified  h : (b1 , b2 ) �→ 1, if b1 = untrusted and b2 = verified   0, otherwise ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 AGENDA 15 ■ Motivation & Introduction ■ Related Work: Assurance Frameworks □ Limitations ■ The need for Levels of Assurance for Attributes □ Our Model of a Digital Identity □ A Layered Trust Model ■ An Identity Provider to manage Reliable Digital Identities □ Identity Meta Information □ SAML Attribute Statement Extentions □ Demo ■ Conclusion ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Identity Provider 16 ■ Identity Provider Identity Provider Attribute Claims □ Add, Edit, Remove ClaimTypes Values meta data □ Compose ClaimTypes to Digital Attribute Identities Service □ Request identity information and STS receive security tokens □ different protocols are possible:WS- Trust, OpenID ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Identity Meta Information 17 ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Attribute Context Classes 18 For each attribute, we store additional trust information: Attribute Context Attribute Data Source Verification Context Verification Status Verification Context Declaration ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 SAML Attribute Statement Extensions Example x m l n s : s a m l=” u r n : o a s i s : n a m e s : t c : S A M L : 2 . 0 : a s s e r t i o n ”required claims in web service policies by the additional identity meta data require [...] value. 19 MaxMustermann 8. REFERENCES [1] Liberty Identity Assurance Framew [...] http://www.projectliberty.org/cont [3] openid4java - Project Hosting on G staff@company . de [4] The Sun Metro Web Service Frame https://metro.dev.java.net/, 2009. [5] The Stork Project Page. user http://www.novay.nl/okb/projects/ [6] S. Bajaj, D. Box, D. Chappell, F. C P. Hallam-Baker, M. Hondo, C. Kal verified A. Nadalin, N. Nagaratnam, H. Pra C. von Riegen, D. Roth, J. Schlimm J. Shewchuk, A. Vedamuthu, Ümit April 2006. [7] K. Cameron. The Laws of Identity, [8] S. Cantor, J. Kemp, E. Maler, and Assertions and Protocols for the OA Assertion Markup Language (SAML Standard Specification, 2005. [9] D. W. Chadwick and G. Inman. At in federated identity management. C ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Listing 2: Example for a SAML security token con- 42:33–40, 2009. Tuesday, April 13, 2010 AGENDA 20 ■ Motivation & Introduction ■ Related Work: Assurance Frameworks □ Limitations ■ The need for Levels of Assurance for Attributes □ Our Model of a Digital Identity □ A Layered Trust Model ■ An Identity Provider to manage Reliable Digital Identities □ Identity Meta Information □ SAML Attribute Statement Extensions □ Demo ■ Conclusion ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 Conclusion 21 □ SOA requires an open, decentralized Identity Management □ Everybody is Identity Provider as well as Relying Party □ Claims express the identity attributes a relying party requires □ use Claims Meta Information in order to enable a relying party to rely on identity data from remote resources ID Trust 2010 | Ivonne Thomas | An Identity Provider to manage Reliable Digital Identities Tuesday, April 13, 2010 CardSpace-Liberty Integration for CardSpace Users ∗ Haitham S. Al-Sinani Waleed A. Alrodhan Chris J. Mitchell Information Security Group Information Security Group Information Security Group Royal Holloway, University of Royal Holloway, University of Royal Holloway, University of London London London http://www.isg.rhul.ac.uk http://www.isg.rhul.ac.uk http://www.isg.rhul.ac.uk H.Al-Sinani@rhul.ac.uk W.A.Alrodhan@rhul.ac.uk C.Mitchell@rhul.ac.uk ABSTRACT a number of identity management systems have been pro- Whilst the growing number of identity management sys- posed. tems have the potential to reduce the threat of identity at- Identity management deals with uniquely identifying indi- tacks, major deployment problems remain because of the viduals in a system, and with effectively controlling access to lack of interoperability between such systems. In this paper the system resources by managing the rights and privileges we propose a novel scheme to provide interoperability be- associated with digital identities. The most important ser- tween two of the most widely discussed identity management vice provided by an identity management system is authenti- systems, namely Microsoft CardSpace and Liberty. In this cation. Such a system may also support other services, such scheme, CardSpace users are able to obtain an assertion to- as pre-authentication, authorisation, single sign-on, identity ken from a Liberty-enabled identity provider that will satisfy repository management, user self-service registration, and the security requirements of a CardSpace-enabled relying audit. Examples of identity management systems include party. We specify the operation of the integration scheme CardSpace1 , Liberty2 , OpenID3 , and Shibboleth4 [5, 8, 17, and also describe an implementation of a proof-of-concept 46, 50]. prototype. Additionally, security and operational analyses Most identity management architectures involve the fol- are provided. lowing main roles. 1. The identity provider (IdP), which issues an identity Categories and Subject Descriptors token to a user. K.6.5 [Management of Computing and Information Systems]: Security and protection 2. The service provider (SP), or the relying party (RP) in CardSpace terminology, which consumes the identity token issued by the IdP in order to identify the user, General Terms before granting him/her access. Security 3. The user, also known as the principal. Keywords 4. The user agent, i.e. software employed by a user to send Identity Management, CardSpace, Liberty Alliance Project, requests to webservers and receive data from them, Interoperability, SAML, Browser Extension such as a web browser. Typically, the user agent pro- cesses protocol messages on behalf of the user, and 1. INTRODUCTION prompts the user to make decisions, provide secrets, In line with the continuing increase in the number of on- etc. line services requiring authentication, there has been a pro- portional rise in the number of digital identities needed for An identity provider supplies a user agent with an authen- authentication purposes. This has contributed to the re- tication token that can be consumed by a particular service cent rapid growth in identity-oriented attacks, such as phish- provider. Whilst one service provider might solely support ing, pharming, etc. In an attempt to mitigate such attacks, CardSpace, another might only support Liberty. Therefore, to make these systems available to the largest possible group ∗This author is sponsored by the Diwan of Royal Court, of users, effective interoperability between systems is needed. Sultanate of Oman. In this paper we investigate a case involving a CardSpace- enabled relying party, a Liberty-enabled identity provider, and a user agent that is (only) CardSpace-enabled. The goal is to develop an approach to integration that is as transpar- Permission to make digital or hard copies of all or part of this work for ent as possible to both identity providers and relying parties. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies 1 http://msdn.microsoft.com/en-us/library/aa480189. bear this notice and the full citation on the first page. To copy otherwise, to aspx republish, to post on servers or to redistribute to lists, requires prior specific 2 http://www.projectliberty.org/ permission and/or a fee. 3 IDtrust ’10, April 13–15, 2010, Gaithersburg, MD http://openid.net/ 4 Copyright 2010 ACM ISBN 978-1-60558-895-7/10/04 ...$10.00. http://shibboleth.internet2.edu/ 12 We have chosen to consider the integration of Liberty The InfoCards themselves do not contain any sensitive with CardSpace because of Liberty’s wide adoption (see sec- information; instead an InfoCard carries metadata that in- tion 2.2.1). Currently, it is a leading identity management dicates the types of personal data that are associated with architecture, that has gained the acceptance of a number of this identity, and from where assertions regarding this data technology-leading companies and organisations. Comple- can be obtained. The data referred to by personal cards is menting this, the wide use of Windows, recent versions of stored on the user machine, whereas the data referred to by which incorporate CardSpace, means that enabling interop- a managed card is held by the identity provider that issued eration between the two systems is likely to be of significance it [6, 16, 18, 24, 34, 35, 38]. for large numbers of identity management users and service By default, CardSpace is supported in Internet Explorer providers. Another reason for choosing Liberty is because of (IE) from version 7 onwards. Extensions to other browsers, the similarity between the message flows in its ID-FF profile such as Firefox5 , and Safari6 also exist. Microsoft has re- and CardSpace. cently released an updated version of CardSpace, known The remainder of the paper is organised as follows. Sec- as Windows CardSpace 2.0 Beta 27 . However, in this pa- tion 2 presents an overview of CardSpace and Liberty, and per we refer throughout to the CardSpace version that is section 3 contains the proposed integration scheme. In sec- shipped by default as part of Windows Vista and Windows tion 4, we provide an operational analysis of the scheme and, 7, which has also been approved as an OASIS standard un- in section 5, we describe a prototype implementation. Sec- der the name ‘Identity Metasystem Interoperability Version tion 6 highlights possible areas for related work, and, finally, 1.0’ (IMI 1.0) [28]. section 7 concludes the paper. 2.1.2 CardSpace Personal Cards 2. CARDSPACE AND LIBERTY The core idea introduced in this paper is to use CardSpace personal cards to make Liberty identity providers available We provide an introduction to the CardSpace and Lib- via the CardSpace identity selector. We therefore next de- erty identity management systems. SAML is also briefly scribe CardSpace personal cards. outlined. 2.1 CardSpace Creation of Personal Cards. We first give a general introduction to CardSpace, cover- Prerequisites for use of a CardSpace personal card include: ing relevant operational aspects. 1. a CardSpace-enabled RP; and 2.1.1 Introduction to CardSpace 2. a CardSpace-enabled user agent, e.g. a web browser CardSpace is Microsoft’s implementation of a digital iden- capable of invoking the CardSpace identity selector, tity metasystem, in which users can manage digital identities such as those shipped as part of Windows Vista and issued by a variety of identity providers, and use them in a Windows 7. range of contexts to access online services. In CardSpace, The identity selector allows a user to create a personal digital identities are represented to users as Information card and populate its fields with self-asserted claims. To pro- Cards (or InfoCards). From the CardSpace perspective, tect users from disclosing sensitive information, CardSpace InfoCards are XML-based files that list the types of claim restricts the contents of personal cards to non-sensitive data, made by one party about itself or another party. CardSpace such as that published in telephone directories. Personal is designed to reduce reliance on username-password authen- cards currently only support 14 editable claim types, namely tication, and to provide a consistent authentication experi- First Name, Last Name, Email Address, Street, City, State, ence across the Web to improve user understanding of the Postal Code, Country/Region, Home Phone, Other Phone, authentication process. It is claimed that CardSpace is also Mobile Phone, Date of Birth, Gender, and Web Page. Data designed to reflect the seven identity laws promulgated by inserted in personal cards is stored in encrypted form on the Microsoft [6, 10, 17, 34]. user machine. The concept of an InfoCard is inspired by real-world cards, When a user creates a new personal card, CardSpace gen- such as driving licences and credit cards. A user can employ erates an ID and a master key for this card. The card ID is one InfoCard with multiple websites. Alternatively, just as a globally unique identifier (GUID), and the master key is different physical ID cards are used in distinct situations, 32 bytes of random data. separate InfoCards can be used at different websites, help- ing to enhance user privacy and security. If InfoCards are obtained from different IdPs, the credentials referred to by Using Personal Cards. When using personal cards, CardSpace adopts the follow- such cards are stored in distinct locations, potentially im- ing protocol. We describe the protocol for the case where proving reliability and security, as well as giving users flexi- the RP does not employ a security token service (STS8 ). bility in choosing points of trust. There are two types of InfoCards: personal (self-issued) 1. User agent → RP. HTTP/S request: GET (login cards and managed cards. Personal cards are created by page). users themselves, and the claims listed in such an InfoCard 5 https://addons.mozilla.org/en-US/firefox/addon/ are asserted by the self-issued identity provider (SIP) that 10292 co-exists with the CardSpace identity selector on the user 6 http://www.hccp.org/safari-plug-in.html machine. In this paper we use personal cards to enable 7 http://technet.microsoft.com/en-us/library/ interoperation between CardSpace and Liberty. Managed dd996657(WS.10).aspx cards, on the other hand, are obtained from remote identity 8 The STS is responsible for security policy and token man- providers. agement within an IdP and, optionally, within an RP [27]. 13 2. RP → user agent. HTTP/S response. A login page Private Personal Identifiers. is returned containing the CardSpace-enabling tags in The private personal identifier (PPID) is a unique iden- which the RP security policy is embedded. tifier linking a specific InfoCard to a particular RP [6, 7, 38]. CardSpace RPs can use the PPID along with a digital 3. User → user agent. The user agent offers the user the signature to authenticate a user. option to use CardSpace (e.g. via a button on the RP When a user uses a personal card at an RP for the first web page); selection of this option causes the agent to time, CardSpace generates a site-specific: invoke the CardSpace identity selector, passing the RP policy to the selector. Note that if this is the first time • PPID by combining the card ID with data taken from that this RP has been contacted, the identity selector the RP certificate; and will display the identity of the RP, giving the user the option either to proceed or to abort the protocol. • signature key pair by combining the card master key with data taken from the RP certificate. 4. User agent → user agent (identity selector → Info- Cards). The CardSpace identity selector, after evalu- In both cases, the domain name or IP address of the RP is ating the RP security policy, highlights the InfoCards used if no RP certificate is available. that match the policy, and greys out those that do Since the PPID and key pair are RP-specific, the PPID not. InfoCards previously used for this particular RP does not function as a global user identifier, helping to en- are displayed in the upper half of the selector screen. hance user privacy. In addition, compromising the PPID and key pair for one RP does not allow an adversary to im- 5. User → user agent (user → identity selector). The user personate the user at other RPs. The CardSpace identity chooses a personal card. (Alternatively, the user could selector only displays a shortened version of the PPID to create and choose a new personal card). The user can protect against social engineering attacks and to improve also preview the card (with its associated claims) to readability. see which claim values are being released. Note that When a user first registers with an RP, the RP retrieves the selected InfoCard may contain several claims, but the PPID and the public key from the received authentica- only the claims explicitly requested in the RP security tion token, and stores them. If a personal InfoCard is re-used policy will be passed to the requesting RP. at a site, the supplied authentication token will contain the same PPID and public key as used previously, signed us- 6. User agent user agent (identity selector SIP). ing the corresponding private key. The RP compares the The identity selector creates and sends a SAML-based received PPID and public key with its stored values, and Request Security Token (RST) to the SIP, which re- verifies the digital signature. If all checks succeed it has sponds with a SAML-based Request Security Token assurance that it is the same user. Response (RSTR). The PPID could be used on its own as a shared secret 7. User agent → user agent (identity selector → user to authenticate a user to an RP. However, it is recom- agent). The RSTR is then passed to the user agent, mended that the associated (public) signature verification which forwards it to the RP. key, as held by the RP, should also always be used to verify the signed authentication token to provide a more robust 8. RP → user. The RP validates the token, and, if satis- authentication method [6]. fied, grants access to the user. The managed card operational protocol is similar, except 2.1.3 CardSpace Protocols that the remote IdP specified in the InfoCard is contacted In order to maximise interoperability with non-Windows instead of the SIP. The CardSpace identity selector then platforms, CardSpace has been specifically designed to use uses the standard identity metasystem protocols (see sec- open standards-based protocols, notably the WS-* standards, tion 2.1.3) to first retrieve the IdP security policy9 and then the most significant of which are listed below. obtain a security token representing the selected digital iden- WS-Policy/WS-SecurityPolicy is used to describe se- tity from the STS of the remote IdP. The identity selector curity policies [3, 21]. Note that a website can also then passes the received token to the user agent, optionally describe its policy in HTML/XHTML. after first obtaining permission from the user10 [27, 41]. For CardSpace to work, both the RP and the IdP must WS-MetadataExchange is used to fetch security policies be CardSpace-enabled. The problem that we address here and exchange service description metadata over the is the incompatibility issue that will occur if the RP is Internet [4]. Note that a website can also transmit its CardSpace-enabled whereas the IdP is not, but is instead security policy using HTTP/S. Liberty-enabled. Addressing this issue could help to extend the applicability of CardSpace. WS-Trust is used to acquire security tokens (e.g. SAML tokens) from IdPs [2]. 9 Depending on the IdP security policy, the user may be re- quested to provide credentials for authentication to the se- WS-Security is used to securely deliver security tokens to lected IdP. The authentication methods currently supported RPs [37]. Note that HTTP/S can also be used. by CardSpace include username-password authentication, a KerberosV5 service ticket, an X.509v3 certificate, and a self- 2.1.4 Proof Keys issued token. 10 This may involve presenting the user with a ‘display token’, A SAML security token can be coupled with cryptographic prepared by the remote IdP, listing the claim values asserted evidence to demonstrate the sender’s rightful possession of in the ‘real’ security token; the identity selector will only the token. A ‘proof key’ is a key associated with a security continue if the user is willing to release such values. token, and the data string used to demonstrate the sender’s 14 knowledge of that key (e.g. through the inclusion of a digital signature or MAC computed using the key) is called the ‘proof-of-possession’ of the security token [27, 38]. A security token can be associated with two types of proof key. 1. Symmetric proof keys If a symmetric key token is requested, a symmetric proof key is established between the identity selector and the CardSpace-enabled IdP [38], which is then re- vealed to the RP. This key is used to prove the sub- Figure 1: The Liberty model ject’s rightful possession of the security token. Whilst the use of such a key may optimise token processing in terms of speed and efficiency [36], it involves revealing principal (or a user) can federate its various identities to a the identity of the RP to the IdP, which is not ideal single identity issued by an identity provider, so that the user from a privacy perspective. can access services provided by service providers belonging to the same circle of trust by authenticating just once to 2. Asymmetric proof keys the identity provider. This relies on a pre-established re- If an asymmetric key token is requested, the iden- lationship between the identity provider and every service tity selector generates an ephemeral RSA key pair and provider in the circle of trust. sends the public part of the key to the CardSpace- The Liberty specifications are divided into three frame- enabled IdP. The identity selector also sends a sup- works: the identity federation framework (ID-FF) [49], the porting signature to prove ownership of the correspond- identity web services framework (ID-WSF) [47] and the ser- ing private key [38]. If approved by the IdP, the public vice interface specifications (ID-SIS) [30]. In this paper we part is sent to the RP in the security token. The pri- focus on the ID-FF. The ID-FF provides approaches for im- vate part of the RSA key pair is then used to prove plementing federation and SSO, including supporting mech- the subject’s rightful possession of the security token. anisms such as session management and identity/account Although the use of such a key may not be as efficient linkage. as the symmetric approach, it helps to protect user 2.2.2 Liberty Functional Requirements privacy since the identity of the RP does not need to be disclosed to the IdP. The Liberty architecture [49] supports the following ac- tivities. It merits mentioning that the default behaviour of the Identity federation This is the process of linking a user’s CardSpace identity selector is different in the special case SP identity with a specific IdP (given user consent). of browser-based client interactions with a website, in which At the time of federation, two user pseudonyms14 are case ‘bearer’ tokens are requested. Because a web browser created for the IdP-SP association, one for use by each is only capable of submitting a token to a website passively party. De-federation is the reverse process. over HTTP without any proof-of-possession, bearer tokens with no proof keys are used [36]. Single sign-on This feature enables a user to log in once to an IdP in a Liberty circle of trust and subsequently 2.2 Liberty use SPs belonging to this circle without the need to We next give a general introduction to Liberty, covering log in again. Global log-out is the reverse process. relevant operational aspects. Anonymity A Liberty SP may request a Liberty IdP to 2.2.1 Introduction to Liberty supply a temporary pseudonym that will preserve the The Liberty Alliance is a large consortium, established anonymity of a user. This identifier may be used to in 2001 by approximately 30 organisations; it now has a obtain information for or about the user (given their global membership of more than 15011 . The Liberty Al- consent) without requiring the user to consent to a liance Project (or simply Liberty) builds open, standards- long term relationship with the SP [49]. based specifications for federated identity, provides interop- 2.2.3 Single Sign-on and Federation Profiles erability testing, and helps to prevent identity theft. Liberty The Liberty ID-FF protocol specification [14] defines the also aims to establish best practices and business guidelines SSO and federation protocol. The ID-FF bindings and pro- for identity federation. According to its website, Liberty has file specification [12] defines profiles, i.e. mappings of ID- been widely adopted with, as of 2006, more than one billion FF protocol messages to particular communication protocols Liberty-enabled identities and devices12 . As of mid 2009, (e.g. HTTP [22]). The latter document also describes the the work of the Liberty Alliance is being adopted by the common interactions and processing rules for these profiles. Kantara Initiative13 . The single sign-on and federation protocol has three asso- Figure 1 shows the general Liberty model, which is essen- ciated profiles, summarised below. tially a single sign-on (SSO) model [11]. In this model, a 14 11 A pseudonym is an opaque but unique handle (identifier) http://www.projectliberty.org/liberty/membership/ for the user, enabling the user’s real identity to remain pri- current_members/ vate. Pseudonyms can be temporary or persistent, and are 12 http://www.projectliberty.org/liberty/adoption/ included in SAML tokens exchanged between a Liberty IdP 13 http://kantarainitiative.org/ and SP. 15 Liberty artifact profile The Liberty artifact profile in- 3. an authorization decision statement, asserting that volves embedding an artifact (i.e. an opaque handle) a particular user is permitted to perform a certain in a URI exchanged between the IdP and SP via Web action on a specific resource. redirection, and also requires direct (background) com- munication between the SP and IdP [49]. The SP uses SAML protocols define data structures for sending SAML the artifact to retrieve the full SAML assertion from requests and returning assertions. the IdP. As it requires direct SP-IdP communication, SAML bindings map SAML protocol messages onto stan- which is inconsistent with the CardSpace approach15 , dard communication protocols, e.g. HTTP. the proposed scheme does not support this profile. SAML profiles describe how SAML assertions, protocols Liberty browser post profile JavaScript-enabled brows- and bindings are combined together to support a par- ers can perform an HTTP redirect between IdPs and ticular use case. SPs by using JavaScript to automatically send a form (containing the authentication data). This profile em- SAML 1.0 [26] was first adopted as an OASIS standard in beds the entire SAML assertion in an HTML form. As 2002; a minor revision, SAML 1.1 [33], was formally adopted a result, it does not use an artifact and does not re- in 2003. A major revision led to SAML 2.0 [13], which be- quire any direct communication between the SP and came a standard in 2005. The differences16 between version the IdP. The scheme proposed here supports this pro- 1.1 and 2.0 are significant, and SAML assertions of the two file. types are incompatible. Finally note that the CardSpace SIP currently only is- Liberty-enabled client (and proxy) profile This prof- sues tokens conforming to SAML 1.1 [38], whereas the Lib- ile defines interactions between Liberty-enabled clients erty specifications require IdPs to generate assertions using (and/or proxies), SPs, and IdPs. A Liberty-enabled SAML 2.0 syntax. client (LEC) is a user agent that can directly com- municate with the IdP that the user intends to use to support its interactions with an SP. In addition, 3. THE INTEGRATION SCHEME the LEC sends and receives Liberty messages in the This section provides an overview of the scheme, and also body of HTTP requests/responses using ‘post’, rather gives a brief description of its protocol flow. However, we than relying upon HTTP redirects and encoding pro- first highlight the main differences between the integration tocol parameters into URLs. Therefore, LECs do not scheme proposed here and a previously proposed scheme of impose any restrictions on the size of the protocol mes- this type. sages. Interactions between a user agent and an IdP are SOAP-based, and the protocol messages include 3.1 Previous Work Liberty-specified HTTP headers. The integration scheme proposed here builds on a previ- ous proposal for CardSpace-Liberty integration [1], referred Although it adds complexity, this profile seems like a to below as the AM scheme. Whilst the scheme proposed natural fit to the proposed scheme. We propose to use here has some properties in common with this previous pro- the CardSpace identity selector to act as a Liberty- posal, for example both approaches concentrate on support- enabled client. In our scheme, the identities of the ing integration at the client rather than at the server, there IdPs are stored on CardSpace personal cards. are a number of important differences. 2.2.4 Proof Keys Instead of focusing on CardSpace users only, as is the case with the scheme described here, the AM scheme allows for The Liberty ID-FF supports SAML 2.0 assertions as a se- full interoperability even in the case where the SP is Liberty- curity token type. The SAML 2.0 specifications offer three enabled and the IdP is CardSpace-enabled. However, since proof-of-possession methods (also referred to as subject con- no prototype has been developed, issues which might arise firmation methods): Holder-of-Key (HoK), Sender-Vouches, during deployment have not been explored. By contrast, and bearer [13]. the scheme described below has been prototyped, and hence The HoK method [45] can be used to address both the greater confidence can be derived in its practicality. symmetric and asymmetric proof-of-possession requirements One important goal for any identity management system of a CardSpace-enabled RP. is ease of use. However, user interface issues, notably the 2.3 SAML operation of the integration software on the client platform, have not been explored for the AM scheme, whereas the pro- SAML is an XML-based standard for exchanging identity- posal here addresses this through a combination of a browser related information across the Internet. The SAML specifi- extension and the CardSpace interface. In addition, whereas cations cover four major elements. the relationship between the integration software and the A SAML assertion can contain three types of statement: web browser is not specified for the AM scheme, this issue has been resolved for the scheme presented here by imple- 1. an authentication statement, asserting that a user menting the functionality in a web browser plug-in residing was authenticated at a particular time using a on the user machine. particular authentication method; The means by which the integration software is triggered 2. an attribute statement, asserting that a user is is also not clear for the AM scheme. For example, if the associated with certain attributes; and integration software is assumed to run at all times, then 15 16 In CardSpace, all RP-IdP communications must go through https://spaces.internet2.edu/display/SHIB/ the identity selector on the user machine. SAMLDiffs 16 problems arise if the user wants to use CardSpace or Liberty • The CardSpace-enabled RP must not employ an STS without integration. By contrast, several ways of addressing (see section 4.7). Instead, the RP must express its se- this particular issue are described in sections 3.2.3 and 4.3. curity policy using HTML/XHTML, and interactions The AM scheme does not address how to handle the pri- between the CardSpace identity selector and the RP vate personal identifier (PPID), described in section 2.1.2, must be based on HTTP/S via a web browser. This when supporting interoperation between RPs and Liberty- is because of the use of a browser extension (see sec- enabled IdPs. Additionally, it is not clear whether providing tion 3.2.4) in the scheme, and a browser extension by the full address of the IdP is the responsibility of the RP, the itself is incapable of managing the necessary commu- integration software, or the user. These issues are addressed nications with an STS. in sections 3.2 and 5. • The CardSpace-enabled RP must support SAML 2.0 3.2 Integration Protocol (see section 2.3). We now present the novel protocol. • As well as being able to verify the InfoCard signature, 3.2.1 System Parties the CardSpace-enabled RP must be able to verify the As stated earlier, the integration scheme addresses the the IdP digital signature in the provided SAML token. incompatibility issue arising if the RP is CardSpace-enabled and the IdP is Liberty-enabled. The parties involved are as • The Liberty-enabled IdP must be prepared to provide follows. SAML assertions for SPs for which a federation agree- ment does not exist for the user concerned17 . In the 1. A CardSpace-enabled RP. absence of the IdP-SP-specific user pseudonyms (which 2. A CardSpace-enabled user agent (e.g. a suitable web would exist if federation had occurred) the IdP is pre- browser). pared to use the InfoCard PPID for the user in place of Liberty pseudonyms in the SAML request and re- 3. A Liberty-enabled IdP. sponse messages (and in the created SAML assertion). 4. The integration browser extension (which must first be This avoids changes to the Liberty message formats, installed). but does require a minor policy/operational change to the Liberty-enabled IdP. Note that there is no need for a Liberty-enabled user agent. Instead the user only needs to install the integra- 3.2.3 LibertyCards tion browser extension. Either prior to, or during, use of the integration protocol, Figure 2 gives a simplified picture of the high-level inter- the user must create a special personal card, referred to as actions between system parties on the user machine. The a LibertyCard, which will represent the Liberty IdP. This parties shown are the browser extension, the user agent card must contain the URL of the Liberty IdP it represents, (browser), the identity selector, and the SIP. The arrows and must also contain a predefined sequence of characters, indicate information flows. e.g. the word ‘Liberty’, which will be used to trigger the integration software (see section 4.3). The browser extension, described in section 3.2.4, must process the policy statement provided by the RP before it is passed to the identity selector. It must first decide whether or not the RP policy requirements can be met by one or more of the LibertyCards; if not then it leaves the policy statement unchanged, and the browser extension plays no further active part in processing. However, if use of a Liber- tyCard is appropriate, then the browser extension changes the policy to include the types of claim employed by Liberty- Cards. For example, if the URL of the Liberty IdP is stored in the web page field of the LibertyCard, then the browser extension must modify the RP security policy to add the web page claim (see section 5.3.1 for further details). Note that adding the claim types to the RP security policy is nec- essary to ensure that the token supplied by the SIP contains the values of these claims, which can then be processed by Figure 2: Data flows between client parties the browser extension; otherwise these values would not be available to the browser extension18 . 3.2.2 Preconditions 17 It is thus not necessary for the user to Liberty-federate The scheme has the following requirements. the IdP with the RP (which would in any case be difficult to achieve given that we are not requiring the RP to be • The user must have an existing relationship with a Liberty-enabled). CardSpace RP. 18 Unfortunately, whilst necessary for the operation of the browser extension, adding claims to the RP policy means • The user must have an existing relationship with a that CardSpace-compliant IdPs for which the user has ‘man- Liberty-enabled IdP, and hence the IdP has a means aged’ InfoCards, and which might otherwise be acceptable of authenticating the user. to the RP, cannot be selected by the user. 17 One approach that would avoid the need to store the URL The detailed operation of steps 8 and 10 is dependent on of the IdP in a personal card would involve the browser ex- the Liberty profile in use between the user agent and the tension prompting the user to enter the URL of the IdP that IdP. The construction of the SAML authentication request they wish to contact, after they have selected a card. This in step 8 differs depending on whether the Liberty browser could occur as part of step 8 in section 3.2.5. However this post (LBP) profile or the Liberty-enabled client (LEC) pro- approach is not adopted here because it would require the file is in use. For example, the URI identifier ‘URI: http:// user to manually enter the URL every time a LibertyCard projectliberty.org/profiles/brws-post’ must be used is used, causing usability issues. when employing the LBP profile, whereas ‘URI: http:// projectliberty.org/profiles/lecp’ must be used when 3.2.4 Browser Extension employing the LEC profile. In addition, when using the The integration scheme is based on a browser extension LEC profile, the authentication request must be submitted that is able to: to the IdP as a SOAP [25] request with a Liberty-enabled header, whereas when using LBP, the authentication request • automatically execute; to the IdP can be embedded in an HTML form. • read and inspect browser-rendered web pages; The details of steps 10 and 11 differ significantly depend- ing on which of the two Liberty profiles is in use. In the • modify rendered web pages if certain conditions hold; LEC profile, in step 10 the IdP returns the authentication response to the client (which is responsible for forwarding • intercept, inspect and modify messages exchanged be- it to the specified SP). In the LBP profile, however, the tween a CardSpace identity selector and a CardSpace- IdP sends the HTML form carrying the authentication re- enabled RP (via a browser); sponse to the user agent, and redirects the user via the user • automatically forward security tokens (via browser- agent to the specified SP. Such a procedure would deny based HTTP redirects) to Liberty-enabled IdPs and the browser extension the opportunity to intercept the com- to CardSpace-enabled RPs; and munication and give the user the choice whether or not to allow the token to be sent to the RP (as is normally the • provide a means for a user to enable or disable it. case for CardSpace). We therefore require a small modifica- tion to the way that the Liberty-enabled IdP operates. The 3.2.5 Protocol Operation IdP must be modified to redirect the user agent to a web Figure 3 gives a simplified sketch of the integration scheme. page at the IdP server, rather than at the RP, thereby giv- The protocol operates as follows (with step numbers as shown ing the browser extension control. This could be achieved in figure 3). Steps 1, 2, 4–7 and 12 of the integration scheme by requiring the IdP to set the action attribute19 of the are the same as steps 1, 2, 3–6 and 8, respectively, of the HTML form to an empty string or to #20 . In step 11, the CardSpace personal card protocol given in section 2.1.2, and browser extension resets the action attribute to the URL hence are not described again here. address of the appropriate CardSpace RP, and, after ob- taining user permission to release the authentication token 3. User agent → user agent (browser extension → brows- to the given RP, automatically submits the HTML form, er). The browser extension scans the login page to redirecting the user agent to the RP website. This small detect whether the RP website supports CardSpace. change to the normal operation of the Liberty IdP helps If so, it starts to process the browser-rendered login to enhance user control (see sections 4.4 and 5.3.3), hence page, including embedding a function into the page to implementing Microsoft’s first identity law [6, 10, 17, 34]. intercept the authentication token that will later be It merits mentioning that both the LBP and LEC profiles returned by the CardSpace identity selector. If not, require the SP URL address to be specified as the value the browser extension terminates. of the ‘’ statement in the SAML authentication request [12]. To keep the changes 8. User agent → user agent (identity selector → browser at the IdP side to a minimum, the value of this field could extension). Unlike in the ‘standard’ case, the RSTR is be set to #, implicitly instructing the IdP to include this not sent to the RP; instead the browser extension in- value instead of the SP’s URL in the action attribute of the tercepts the RSTR (a SAML authentication response), HTML form sent back to the user agent. Further discussion converts it into a SAML authentication request, and of the LBP and LEC profiles is given in section 4.2. forwards it to the appropriate Liberty-enabled IdP. Given that we have assumed that the RP supports SAML Note that the detailed format of the SAML authenti- 2.0 tokens, there is no need to modify the proof-of-possession cation request will depend on the Liberty profile being data since the RP can use the Liberty ID-FF supported used (see discussion below). HoK [45] method (which can be symmetric or asymmetric) 9. Liberty-enabled IdP user. If necessary, the Liberty- to express its proof-of-possession requirements. However, a enabled IdP authenticates the user. 19 10. Liberty-enabled IdP → user agent. The IdP sends a Observe that, in the standard LBP profile case, the action attribute of the HTML form is set to the URL address of SAML authentication response to the user agent. This the requesting SP, and the IdP redirects the user agent to response is also Liberty profile-dependent (see discus- that SP. sion below). 20 Note that whilst this has been shown to work successfully with IE7 and IE8, other browsers may not support an action 11. User agent → RP. The user agent forwards the token attribute of an empty string or hash (#); hence setting the to the RP, optionally after first obtaining permission action attribute to a relative URL for the IdP login page from the user (see section 4.4). may be required for such browsers. 18 symmetric proof key should only be used if the user is will- a SAML authentication request to the user-selected ing to disclose the identity of the RP to the IdP, and if the IdP. While this is a straightforward task, it limits the RP holds a valid certificate. For browser-based applications scope of applicability of the scheme. (and also where no proof-of-possession is needed), the pro- posed scheme supports bearer tokens [13, 36, 38]. 2. Alternatively, it could be assumed that the CardSpace- Finally observe that the additional steps above can be enabled RP is concerned with both user authentication integrated into the current CardSpace framework relatively and the assertion of user attributes, and that the RP easily, as the prototype implementation shows. policy permits assertions (for user attributes only) to be provided by the SIP. In this case, along with re- quiring the PPID, the RP security policy would also specify the attributes required, leading the identity se- lector to highlight the user-created LibertyCards that satisfy the requirements. To ensure that no changes are required at either the RP or the IdP, the browser extension could store attribute assertions created by the SIP. The browser extension would then create the SAML authentication request according to the Liberty ID-FF standards, and forward it to the specified IdP. When the browser extension receives the response con- taining the authentication assertion from the IdP, it would add appropriate attribute assertion(s) from its local cache and then forward the entire package to the RP. However, if the RP security policy dictates that security tokens must be wholly signed by the issuing IdP, then this solution would fail. The prototype implementation, described in section 5, im- plements the first approach. Figure 3: Protocol exchanges 4.2 Liberty Profiles To maximise applicability, the integration scheme sup- ports both the Liberty browser post (LBP) and Liberty- 4. DISCUSSION AND ANALYSIS enabled client (LEC) profiles, introduced in section 2.2.3. We now consider implementation and applicability issues However, the prototype described in section 5 only imple- of the scheme. ments the LBP profile. In the LEC profile, interactions between a user agent and 4.1 Differences in Scope an IdP are SOAP-based, and the protocol messages include There is a key difference between the Liberty ID-FF and Liberty-specified HTTP headers indicating that the sender CardSpace frameworks. CardSpace allows IdPs to assert a is Liberty-enabled. Under the LEC profile, the client must range of attributes about users (including simple authen- submit the authentication request to the IdP as a SOAP re- tication assertions), whereas Liberty ID-FF only supports quest, whereas, when using the LBP profile, the request can authentication assertions. In CardSpace, the user attributes be embedded in an HTML form containing a field called to be asserted are specified in a SAML attribute statement ‘LAREQ’ set to the ‘’ protocol mes- contained in a SAML request that can be processed by the sage [12, 14]. In order to support both profiles, the integra- local SIP or the remote CardSpace-enabled IdP. However, tion software must therefore be capable of supporting both a Liberty ID-FF conformant IdP is only required to gener- forms of communications with the IdP. ate SAML authentication statements (and not assert user The two profiles have many properties in common. For attributes), which gives rise to an interoperation problem. example, they both support SAML. In both profiles, the Two possible solutions are as follows. HTML form containing the authentication response must be sent to the user agent using an HTTP POST; this form 1. It could be assumed that the CardSpace RP is only must contain the field ‘LARES’ with value equal to the concerned with user authentication (which seems likely authentication response, as defined in the Liberty protocol to be a common case). In such a case a LibertyCard schema [14]. In both profiles, the value of the ‘LARES’ field contains the IdP URL and the trigger word, and a Lib- must be encoded using a base-64 transformation [23]. ertyCard will only be used if the RP policy requests an Despite the differences between the profiles, the protocol assertion solely of the PPID attribute, e.g. by including steps given in section 3.2.5 apply to both profiles. ‘http://schemas.xmlsoap.org/ws/2005/05/identity/ claims/privatepersonalidentifier’ in the list of re- 4.3 Triggering the Browser Extension quired claims. In such a case, the browser extension As stated in section 3.1, the means by which the integra- will modify the RP policy to ensure it includes the tion software is triggered needs to be chosen carefully. The fields used in LibertyCards (see section 3.2.3). On se- means included in the scheme described in section 3.2.3 is lection of a LibertyCard, the browser extension (as in to include a trigger sequence (e.g. the word ‘Liberty’) in a step 8 in section 3.2.5) intercepts, creates and forwards specific field of a LibertyCard. This is also the method used 19 in the prototype described in section 5. However, other ap- the identity selector is invoked. As part of step 8, the proaches could be used, e.g. as follows. browser extension retrieves the encrypted value from the cookie and sends it to the IdP as a hidden HTML 1. The browser extension could start whenever CardSpace variable in an HTML form or as a query URL parame- is triggered. When a user submits an InfoCard, the ter. As part of step 10, the IdP returns the encrypted browser extension would offer the user two options RP address to the user agent (again as a hidden form (based on HTML forms): to continue to use CardSpace variable or as a URL parameter22 ). In step 11, the as usual, or to use a Liberty-enabled IdP. This ap- browser extension obtains the encrypted value and de- proach gives a greater degree of user control, and hence crypts it to obtain the RP address. implements Microsoft’s first identity law [6, 10, 17, 34]. However, it is not particularly convenient, since Note that the IdP is unable to read the RP address, it would always require users to choose whether or not hence protecting user privacy, since it is encrypted us- to use the integration software. ing a key known only to the browser extension. If the IdP, however, needs the RP address for auditing pur- 2. Alternatively, the browser extension could ask the user poses (e.g. for legal reasons), or the IdP policy requires whether they wish to activate the integration protocol the disclosure of the RP identity (e.g. so it can encrypt (e.g. via a JavaScript pop-up box). This has advan- the security token using the RP’s public key), then the tages and disadvantages similar to those of the first RP address could be sent in plain text to the IdP. alternative. 4.5 Defeating Phishing 4.4 Token Forwarding Use of LibertyCards helps to mitigate the risk of phishing. The means by which the security token is forwarded to the The LibertyCard contains the URL of the IdP entered by RP needs to be chosen carefully. We refer to the numbered the user, and the user will only be forwarded to that IdP, protocol steps given in section 3.2.5. i.e. the RP will not be able to redirect the user to an IdP of The responsibility for delivering the security token could its choice. By contrast, in the Liberty artifact and Liberty be given to the Liberty IdP (as is normally the case when browser post profiles (and in OpenID [44, 48]), a malicious using the LBP profile). In this case the RP address could SP might redirect a user to a fake IdP, which could then be added to the SAML authentication request (as prepared capture the user credentials. This is a particular threat for in step 8) so that the IdP knows which RP it must forward static credentials, such as usernames and passwords. the token to (again as is normally the case for the Liberty profiles). Although this would avoid the need for changes to 4.6 Integration at the Client Side the normal operation of the Liberty IdP and potentially also Some IdPs and RPs/SPs may not be prepared to accept help auditing, such an approach has privacy implications the burden of supporting two identity management systems since the IdP would learn the identity of the RP. simultaneously, at least unless there is a significant financial As a result, as specified in step 11 of the proposed scheme, incentive. Currently, major Internet players, such as MSN23 , the responsibility for sending the security token to the RP is do not provide any means of interoperating between identity given to the user agent. Thus a means is required for giving management systems. As a result, a client-side technique for the browser extension the address of the RP, so that it can supporting interoperation could be practically useful. forward the token. We next consider three possible ways in In addition, building the integration scheme on the client which the RP address might be made available. means that the performance of the server is not affected, • The RP address could be stored in the browser exten- since the integration overhead is handled by the client. Such sion itself. Whilst this puts the user in control, it is an approach also reduces the load on the network. not user-friendly, as it would require users to manu- 4.7 STS-enhanced RPs ally add the address of each RP into the code of the browser extension. STS-enhanced RPs are not supported by the integration scheme. This is because use of an STS involves direct com- • After the security token is returned from the Liberty munication (i.e. not via a browser) between the CardSpace IdP, the browser extension could ask the user to enter identity selector and the RP STS [27], which the integra- the RP address, e.g. using a JavaScript pop-up box or tion browser extension is currently not capable of inter- an HTML form. This has advantages and disadvan- cepting. For example, the identity selector directly con- tages similar to those of the previous alternative. tacts the RP STS to obtain its security policy using WS- MetadataExchange. • The browser extension could store the RP address en- In the scheme described in this paper, the interaction with crypted in a cookie as part of step 3, so that the the RP uses HTTP/HTML via a web browser. This is a browser extension can obtain the address in step 11. simpler and probably more common scenario for RP interac- In order to adhere to cookie security rules [31], this tions [19]. As discussed in section 2.1.3, an RP security pol- must be done in such a way that the browser believes icy can be expressed using HTML, and both the policy and it is communicating with the same domain when the the security token can be exchanged using HTTP/S. There- cookie is set and when it is retrieved21 . fore, to act as a CardSpace-enabled RP, a website is not To achieve this, the browser extension encrypts and 22 stores the RP address in a cookie in step 3, before The use of HTML forms (with the POST method) is prefer- able to query URL parameters, since the latter may suffer 21 from size restrictions; hence the former approach is used in Note that creation of and access to the cookie can be han- dled by the browser extension transparently to RPs and the prototype implementation described in section 5. 23 IdPs. http://www.msn.com 20 required to implement any of the WS-* specifications [19, 5.2 Implementation Details 27]. The prototype, described in section 5.3, was coded as a client-side plug-in26 using JavaScript [40, 42], chosen to 4.8 Applicability of the Scheme maximise portability. Indeed, JavaScript27 appears to be the Although the proposed integration scheme is presented most widely browser-supported and commonly used client- as Liberty-specific, we suspect that the scheme could also side scripting language across the Web today. Use of browser- be applicable for SAML-compliant IdPs; this, nevertheless, specific client-side scripting languages, e.g. VBScript, was requires certain modifications to the current scheme. For ex- ruled out to ensure the widest applicability [20]. ample, the technical differences24 between Liberty ID-FF 1.2 The implementation uses the Document Object Model and SAML 2.0 must be carefully examined. However, given (DOM) [32] to inspect and manipulate HTML [43] pages that SAML 2.0 is the successor to SAML 1.1, Liberty ID- and XML [9] documents. Since the DOM defines the objects FF 1.2 and Shibboleth 1.3 [15], a mapping seems likely to and properties of all document elements and the methods to be possible. access them, a client-side scripting language can read and Reconfiguring the integration scheme to interoperate with modify the contents of a web page or completely alter its SAML-aware IdPs potentially significantly increases its ap- appearance [20]. plicability and practicality. For example, the exchange of The prototype does not use any of the published Card- identity attributes, which is not supported under the cur- Space application programming interfaces (APIs). This will rent scheme, would then be feasible. The reconfiguration of ease migration of the plug-in to other CardSpace-like sys- the scheme remains possible future work. tems such as the Linux/Mac-based DigitalMe28 and the Fire- fox/Safari InfoCard extensions. 5. PROTOTYPE REALISATION 5.3 Operation of the Prototype This section provides technical details of a prototype im- In this section we consider specific operational aspects of plementation of the integration scheme when used with the the prototype. We refer throughout to the numbered proto- Liberty browser post profile. A number of prototype-specific col steps given in section 3.2.5. properties and possible limitations of the current prototype are also described. 5.3.1 Prototype-specific Operational Details In step 3, before the HTML login page is displayed, the 5.1 User Registration plug-in uses the DOM to perform the following processes. Prior to use, the user must have accounts with a CardSpace 1. The plug-in scans the web page in the following way29 . RP and a Liberty-enabled IdP. The user must also cre- ate a LibertyCard for the relevant Liberty IdP (or it could (a) It searches through the HTML elements of the be created at the time of use). This involves invoking the web page to detect whether any HTML forms are CardSpace identity selector and inserting the URL of the present. If so, it searches each form, scanning target Liberty IdP in the web page field25 and the trigger through each of its child elements for an HTML word (Liberty) in the city field. For ease of identification, object tag. the user can give the personal card a meaningful name, e.g. (b) If an object tag is found, it retrieves and ex- of the target IdP site. The user can also upload an image amines its type. If it is of type ‘application/x- for the card, e.g. containing the logo of the intended IdP or informationCard’ (which signals website support simply of Liberty. When a user wishes to use a particular for CardSpace), it continues; otherwise it aborts. Liberty IdP, the user simply chooses the corresponding card. An example of a LibertyCard is shown in figure 4. (c) It then searches through the param tags (child elements of the retrieved CardSpace object tag) for the ‘requiredClaims’ tag, which lists the claims required by the RP security policy. (d) If the required claims include attributes other than the PPID claim, then the plug-in terminates, giv- ing CardSpace the opportunity to operate nor- mally. However, if only the PPID claim is re- quested, then the plug-in adds the city and web page claims to the ‘requiredClaims’ tag, marking them as mandatory (see section 3.2.3). 26 We use the term plug-in to refer to any client-side browser extension, such as a user script, plug-in, etc. 27 Figure 4: A LibertyCard Throughout the description the term JavaScript is, for sim- plicity, used to refer to all variants of the language. 28 http://code.bandit-project.org/trac/wiki/ DigitalMe 24 29 https://spaces.internet2.edu/display/SHIB/ The relevant user guide [27] specifies two HTML extension SAMLLibertyDiffs formats for invoking an identity selector from a web page, 25 The web page field was chosen to contain the Liberty IdP both of which include placing the CardSpace object tag in- URL since it seems the logical choice; however, this is an side an HTML form. This motivates the choice of the web implementation option. page search method. 21 2. The plug-in adds a JavaScript function to the head sec- 5. It retrieves the encrypted RP URL from the cookie, tion of the HTML page to intercept the XML-based au- and writes it into the invisible form as a hidden vari- thentication token before it is sent back to the RP (such able. a token will be sent by the identity selector in step 8). 6. It writes the URL address of the Liberty IdP into the 3. The plug-in obtains the current action attribute of the action attribute of the invisible form. CardSpace HTML form, encrypts it using AES [39] with a secret key known only to the plug-in, and then 7. It auto-submits the HTML form (transparently to the stores it in a cookie. This attribute specifies the URL user), using the JavaScript method ‘click()’ on the address of a web page at the CardSpace-enabled RP to ‘submit’ tag. which the authentication token must be forwarded for processing. If the obtained attribute is not a fully qual- 5.3.2 Liberty IdP-specific Details ified domain name address, the JavaScript inherent For steps 8 to 10, we have created an experimental web- properties, e.g. document.location.protocol and docu- site to act as a Liberty-enabled IdP supporting the Lib- ment.location.host, are used to help reconstruct the full erty browser post profile. PHP is used to enable the IdP URL address. to parse the SAML request and perform the user authen- tication. The user credentials, i.e. username and password, 4. After storing it, the plug-in changes the current action that the IdP uses to authenticate the user are stored in a attribute of the CardSpace HTML form to point to MySQL database. They are salted, hashed with SHA-1, the newly created ‘interception’ function (see step 2 and protected against SQL injection attacks. PHP supports above). a variety of XML parsers, such as XML DOM, Expat parser, 5. The plug-in creates and appends an ‘invisible’ HTML and SimpleXML. The prototype uses XML DOM. form to the HTML page to be used later for sending 5.3.3 User Consent and Token Forwarding the SAML token request to the Liberty-enabled IdP. In step 11, the plug-in operates as follows. In step 8 the plug-in uses the DOM to perform the follow- ing steps. 1. It obtains the encrypted value of the RP URL from the appropriate HTML hidden variable, decrypts it using 1. It intercepts the RSTR message sent by the CardSpace its internally stored secret key, and inserts it into the identity selector using the added function (see above). action attribute of the HTML form carrying the re- ceived SAML token. 2. It parses the intercepted token. If the city field con- tains the word Liberty, the plug-in proceeds; if not, 2. The plug-in then displays the token to the user and normal operation of CardSpace continues. It also reads requests consent to proceed. The displayed token indi- the web page field to discover the URL address of the cates the types of information the authentication token IdP. In addition, all other fields, including the PPID is carrying, as well as the exact URL address of the RP and InfoCard public key with its digital signature, are to which the token will be forwarded. The JavaScript parsed. The city, web page, and PPID fields are con- ‘confirm()’ pop-up box is used to achieve this. tained in a SAML attribute statement, whereas the public key and signature values are contained in a 3. If the user approves the token, the plug-in seamlessly SAML signature statement. submits it to the RP using the JavaScript ‘click()’ The plug-in uses an XML parser built into the browser method. to read and manipulate the intercepted XML token. 5.3.4 CardSpace RP-specific Details The plug-in passes the token to the parser, which reads it and converts it into an XML DOM object that can To test the prototype, we built an experimental website be accessed and manipulated by JavaScript. The DOM to act as a CardSpace-enabled RP. On receipt of the SAML views the XML token as a tree-structure, thereby en- authentication token, the RP uses PHP in step 11 to parse abling JavaScript to traverse the DOM tree to read and validate the received token. As is the case with the (and possibly modify) the content of the token ele- Liberty IdP, the user identifying data is salted, hashed and ments. New elements can also be created where nec- stored in a MySQL database that is resistant to SQL in- essary. jection attacks. The validation process includes verifying the digital signatures and checking the conditions, e.g. time 3. It converts the token format from a SAML response stamps, included in the token. The PPID and the InfoCard message into a SAML request message, compatible public key in the token are compared to the values stored with Liberty-conformant IdPs supporting the browser in the RP database, and the authentication status is also post profile. This involves converting a SAML 1.1- checked. based RSTR into a SAML 2.0 authentication request. Moreover, as outlined in section 3.2.2, the plug-in adds 5.3.5 Other Issues the PPID and the InfoCard public key along with its The JavaScript-driven plug-in was built using IE7PRO, signature to the SAML request message, because the an IE extension, chosen to expedite the prototype imple- token must be signed by the Liberty-enabled IdP to mentation. Users of the prototype must therefore install provide integrity and authenticity services. IE7PRO, freely available at the IE7PRO website30 , prior to installing the integration plug-in. To enable or disable 4. It writes the entire SAML request message as a hidden 30 variable into the invisible HTML form created earlier. http://www.ie7pro.com 22 the integration prototype, a user can simply tick or un-tick usability obstacle. the appropriate entry in the ‘IE7PRO Preferences’ interface. This provides the means to achieve the final objective listed in section 3.2.4. 6. RELATED WORK Finally note that the integration plug-in does not require The Bandit31 and Concordia32 projects are currently de- any changes to default IE security settings, thereby avoid- veloping open source technologies to support interoperation ing potential vulnerabilities resulting from lowering browser between identity management systems. Unlike the inte- security settings. gration scheme proposed in this paper, these systems are not based on client-side models. Concordia has proposed 5.4 Limitations a CardSpace and SAML/WS-Federation integration model. This could be used as the basis for supporting Liberty/Card- The current version of the prototype has not been tested Space interoperation by taking advantage of the similarities with CardSpace relying parties using TLS/SSL. Therefore, between the Liberty ID-FF SSO profiles and the SAML SSO we are not able to provide precise operational and perfor- profiles. mance details in this case. Another scheme supporting interoperation between Card- If the RP has a certificate, then the identity selector will, Space and Liberty has been proposed by Jørstad et al. [29]. by default, encrypt the SAML-based RSTR message using In this scheme, the IdP is responsible for supporting inter- the public key of the requesting RP. Clearly, the plug-in operation. The IdP must therefore perform the potentially does not have access to the RP’s private key, and hence onerous task of maintaining two different identity manage- will not be able to decrypt the token. Therefore, it will not ment schemes. In addition, this scheme requires the user know whether to trigger the integration protocol, and will to possess a mobile phone supporting the Short Message be unable both to discover which IdP it must contact, and Service (SMS). Moreover, the IdP must always perform the to obtain the user identifier (the PPID). same user authentication technique, regardless of the iden- One solution to these issues would be for the plug-in to tity management system the user is attempting to use. The first ask the user whether the integration protocol should be IdP simply sends an SMS to the user, and, in order to be au- activated (e.g. via a JavaScript prompt window), and, if so, thenticated, the user must confirm receipt of the SMS. This it should then forward the SAML token to the RP and notify confirmation is also an implicit user approval for the IdP to the RP to wait for another token. The RP should decrypt send a security token to the RP. By contrast, the scheme the token, read the PPID, and then wait. At the same time, proposed in this paper does not require use of a handheld de- the plug-in should prompt the user to enter the URL of vice, and does not enforce a specific authentication method. the Liberty-enabled IdP, and then create and send a SAML Finally, we observe that Liberty is apparently also working request message to the Liberty IdP, which authenticates the on a scheme somewhat similar to that described here. No user and responds with a SAML response token. The plug- specifications have yet been released, but the plans are de- in could then, optionally, seek user consent, and, if the user scribed in a presentation available at the Liberty website33 . approves, the plug-in would then forward the token to the RP. The RP must issue the plug-in with a nonce (and a time-stamp) which the plug-in sends back with the second 7. CONCLUSIONS AND FUTURE WORK token to both link the two tokens together and help protect We have proposed a means of interoperation between two against replay and guessing attacks. leading identity management systems, namely CardSpace One of the most obvious drawbacks to this solution is and Liberty. CardSpace users are able to obtain an as- that it requires changes at the CardSpace-enabled RP, as sertion token from a Liberty-enabled identity provider that the RP must be reconfigured to accept two tokens. However, satisfies the security requirements of a CardSpace-enabled this would not be a major change since both tokens will be relying party. The scheme uses a client-side browser ex- constructed using SAML, and since the RP is not required tension, and requires no major changes to servers. It uses to directly contact the Liberty-enabled IdP. Therefore, the the CardSpace identity selector interface to integrate Lib- major overheard remains with the client. Nevertheless, we erty identity providers with CardSpace relying parties. The are working on a revised version of the prototype that is scheme extends the use of personal cards to allow for such fully compatible with SSL/TLS encryption but without the interoperability. requirement of RP reconfiguration. The integration scheme takes advantage of the similarity The integration plug-in must scan every browser-rendered between the Liberty ID-FF and the CardSpace frameworks, web page to detect whether it supports CardSpace, and this and this should help to reduce the effort required for full sys- may affect system performance. However, informal tests on tem integration. Also, implementation of the scheme does the prototype suggest that this is not a serious issue. In ad- not require technical co-operation between Microsoft and dition, the plug-in can be configured so that it only operates Liberty. with certain websites. Planned future work includes investigating the possibil- The integration plug-in has not been tested with Card- ity of using the CardSpace identity selector to enable access Space 2.0, because it was completed well before its release. to identity providers of other identity management systems, Therefore, we are not yet able to provide precise operational such as OpenID and Shibboleth. In addition, we also plan to details for this version. Finally note that some older browsers (or browsers with 31 http://www.bandit-project.org scripting disabled) may not be able to run the integration 32 http://www.projectconcordia.org plug-in, as it was built using JavaScript. However, most 33 http://www.projectliberty.org/liberty/content/ modern browsers support JavaScript (or ECMAscript), and download/4541/31033/file/20080ICP-Cardspace-DIDW. hence building the prototype in JavaScript is not a major pdf 23 investigate the possibility of extending the proposed integra- Liberty Alliance Project, 2004. tion protocol to support CardSpace-enabled relying parties http://www.projectliberty.org/liberty/content/ that employ security token services. download/319/2369/file/ draft-liberty-idff-bindings-profiles-1. 8. REFERENCES 2-errata-v2.0.pdf. [1] W. A. Alrodhan and C. J. Mitchell. A client-side [13] S. Cantor, J. Kemp, R. Philpott, and E. Maler CardSpace-Liberty integration architecture. In (editors). Assertions and Protocols for the OASIS Proceedings of the 7th Symposium on Identity and Security Assertion Markup Language (SAML) V2.0. Trust on the Internet (IDtrust 08), pages 1–7. ACM, OASIS, 2005. http://docs.oasis-open.org/ New York, NY, USA, 2008. security/saml/v2.0/saml-core-2.0-os.pdf. [2] S. Anderson et al. Web Services Trust Language [14] S. Cantor and J. Kemp (editors). Liberty ID-FF (WS-Trust). Actional Corporation, BEA Systems, Protocols and Schema Specification. Liberty Alliance Computer Associates International, International Project, 2005. http://www.projectliberty.org/ Business Machines Corporation, Layer 7 Technologies, resource_center/specifications/liberty_ Microsoft Corporation, Oblix, OpenNetwork alliance_id_ff_1_2_specifications. Technologies, Ping Identity Corporation, Reactivity, [15] S. Cantor (editor). Shibboleth Architecture — RSA Security, and VeriSign, 2005. Protocols and Profiles, 2005. http://download.boulder.ibm.com/ibmdl/pub/ http://shibboleth.internet2.edu/docs/ software/dw/specs/ws-trust/ws-trust.pdf. internet2-mace-shibboleth-arch-protocols-200509. [3] S. Bajaj et al. Web Services Policy Framework pdf. (WS-Policy). BEA Systems, International Business [16] D. Chadwick. FileSpace: an alternative to CardSpace Machines Corporation, Microsoft Corporation, SAP that supports multiple token authorisation and AG, Sonic Software, and VeriSign, 2006. http: portability between devices. In Proceedings of the 8th //download.boulder.ibm.com/ibmdl/pub/software/ Symposium on Identity and Trust on the Internet dw/specs/ws-polfram/ws-policy-2006-03-01.pdf. (IDtrust 09), pages 94–102. ACM, New York, NY, [4] K. Ballinger et al. Web Services Metadata Exchange USA, 2009. (WS-MetadataExchange). BEA Systems, Computer [17] D. W. Chadwick. Federated identity management. In Associates International, International Business A. Aldini, G. Barthe, and R. Gorrieri, editors, Machines Corporation, Microsoft Corporation, SAP Foundations of Security Analysis and Design V, AG, Sun Microsystems, and webMethods, 2006. FOSAD 2007/2008/2009 Tutorial Lectures, volume http://download.boulder.ibm.com/ibmdl/pub/ 5705 of Lecture Notes in Computer Science, pages software/dw/specs/ws-mex/metadataexchange.pdf. 96–120. Springer, Berlin/Heidelberg, Germany, 2009. [5] A. Berger. Identity Management Systems — [18] D. W. Chadwick and G. Inman. Attribute aggregation Introducing Yourself to the Internet. VDM Verlag, in federated identity management. IEEE Computer, Saarbrücken, Germany, 2008. 42(5):33–40, 2009. [6] V. Bertocci, G. Serack, and C. Baker. Understanding [19] D. Chappell. Introducing Windows CardSpace. MSDN, Windows CardSpace: An Introduction to the Concepts 2006. http://msdn.microsoft.com/en-us/library/ and Challenges of Digital Identities. Addison-Wesley, aa480189.aspx. Reading, Massachusetts, USA, 2008. [20] N. Daswani, C. Kern, and A. Kesavan. Foundations of [7] K. Bhargavan, C. Fournet, A. D. Gordon, and Security: What Every Programmer Needs to Know. N. Swamy. Verified implementations of the Apress, Berkeley, CA, USA, 2007. information card federated identity-management [21] G. Della-Libera et al. Web Services Security Policy protocol. In Proceedings of the 2008 ACM symposium Language (WS-Security Policy). International on Information, Computer and Communications Business Machines Corporation, Microsoft Security (ASIACCS 08), pages 123–135. ACM, New Corporation, RSA Security, and VeriSign, 2005. York, NY, USA, 2008. http://download.boulder.ibm.com/ibmdl/pub/ [8] D. Birch. Digital Identity Management: Technological, software/dw/specs/ws-secpol/ws-secpol.pdf. Business and Social Implications. Gower Publishing, [22] R. Fielding, J. Getty, J. Mogul, H. Frystyk, Farnham, UK, 2007. L. Masinter, P. Leach, and T. Berners-Lee. Hypertext [9] T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, Transfer Protocol — HTTP/1.1. RFC 2616, The and F. Yergeau (editors). Extensible Markup Language Internet Society, 1999. (XML) 1.0. W3C Recommendation, 5th edition, 2008. http://tools.ietf.org/html/rfc2616. http://www.w3.org/TR/xml/. [23] N. Freed and N. Borenstein. Multipurpose Internet [10] K. Cameron. The Laws of Identity. Microsoft Mail Extensions (MIME) Part One: Format of Corporation, 2005. http://www.identityblog.com/ Internet Message Bodies. RFC 2045, Internet stories/2005/05/13/TheLawsOfIdentity.pdf. Engineering Task Force, 1996. [11] K. Cameron and M. B. Jones. Design Rationale behind http://www.ietf.org/rfc/rfc2045.txt. the Identity Metasystem Architecture. Microsoft [24] S. Gajek, J. Schwenk, M. Steiner, and C. Xuan. Risks Corporation, 2006. http://www.identityblog.com/ of the CardSpace protocol. In Proceedings of the 12th wp-content/resources/design_rationale.pdf. International Conference on Information Security [12] S. Cantor, J. Kemp, and D. Champagne (editors). (ISC 09), pages 278–293. Springer-Verlag, Liberty ID-FF Bindings and Profiles Specification. Berlin/Heidelberg, Germany, 2009. 24 [25] M. Gudgin, M. Hadley, N. Mendelsohn, J.-J. Moreau, Technology (NIST). Announcing the Advanced H. F. Nielsen, A. Karmarkar, and Y. Lafon (editors). Encryption Standard (AES), FIPS 197, November SOAP Version 1.2 Part 1: Messaging Framework. 2001. http://csrc.nist.gov/publications/fips/ W3C Recommendation, 2007. fips197/fips-197.pdf. http://www.w3.org/TR/soap12-part1/. [40] T. Negrino and D. Smith. JavaScript and Ajax for the [26] P. Hallam-Baker and E. Maler (editors). Assertions Web: Visual QuickStart Guide. Peachpit Press, and Protocol for the OASIS Security Assertion Berkeley, CA, USA, 7th edition, 2008. Markup Language (SAML) V1.0. OASIS, 2002. [41] R. Oppliger, S. Gajek, and R. Hauser. Security of http://www.oasis-open.org/specs/#samlv1.0. Microsoft’s identity metasystem and CardSpace. In [27] M. B. Jones. A Guide to Using the Identity Selector Proceedings of the Kommunikation in Verteilten Interoperability Profile V1.5 within Web Applications Systemen (KiVS 07), pages 63–74. VDE Publishing and Browsers. Microsoft Corporation, 2008. House, Berlin, Germany, 2007. [28] M. B. Jones and M. McIntosh (editors). Identity [42] T. A. Powell and F. Schneider. Javascript: The Metasystem Interoperability Version 1.0 (IMI 1.0). Complete Reference. McGraw-Hill Osborne Media, OASIS Standard, 2009. http://docs.oasis-open. Berkeley, CA, USA, 2nd edition, 2004. org/imi/identity/v1.0/identity.html. [43] D. Raggett, A. L. Hors, and I. Jacobs (editors). [29] I. Jørstad, D. Van Thuan, T. Jønvik, and D. Van HTML 4.01 Specification. W3C Recommendation, Thanh. Bridging CardSpace and Liberty Alliance with 1999. http://www.w3.org/TR/html401/. SIM authentication. In Proceedings of the 10th [44] D. Recordon, L. Rae, and C. Messina. OpenID: The International Conference on Intelligence in Next Definitive Guide. O’Reilly Media, Sebastopol, CA, Generation Networks (ICIN 07), pages 8–13. Adera, USA, 2010. BP 196 - 33608 Pessac Cedex, France, 2007. [45] T. Scavo (editor). SAML V2.0 Holder-of-Key [30] S. Kellomäki and R. Lockhart (editors). Liberty Assertion Profile Version 1.0. OASIS, 2009. ID-SIS Employee Profile Service Specification. Liberty http://www.oasis-open.org/committees/download. Alliance Project, 2005. http://www.projectliberty. php/34962/sstc-saml2-holder-of-key-cd-03.pdf. org/liberty/content/download/1031/7155/file/ [46] D. Todorov. Mechanics of User Identification and liberty-idsis-ep-v1.1.pdf. Authentication: Fundamentals of Identity [31] D. Kristol. HTTP State Management Mechanism. Management. Auerbach Publications, New York, USA, RFC 2045, Internet Engineering Task Force, 2000. 2007. http://tools.ietf.org/html/rfc2965. [47] J. Tourzan and Y. Koga (editors). Liberty ID-WSF [32] A. Le Hors, P. L. Hégaret, L. Wood, G. Nicol, Web Services Framework Overview. Liberty Alliance J. Robie, M. Champion, and S. Byrne (editors). Project, 2005. http://www.projectliberty.org/ Document Object Model (DOM) Level 2 Core liberty/content/download/1307/8286/file/ Specification. W3C Recommendation, 2000. liberty-idwsf-overview-v1.1.pdf. http://www.w3.org/TR/DOM-Level-2-Core/. [48] R. Ur Rehman. Get Ready for OpenID. Conformix [33] E. Maler, P. Mishra, and R. Philpott (editors). Technologies, Chesterbrook, Pennsylvania, USA, 2008. Assertions and Protocol for the OASIS Security [49] T. Wason (editor). Liberty ID-FF Architecture Assertion Markup Language (SAML) V1.1. OASIS, Overview. Liberty Alliance Project, 2003. 2003. http://www.telenor.com/rd/idm/ http://www.oasis-open.org/committees/download. liberty-idff-arch-overview-v1.2.pdf. php/3406/oasis-sstc-saml-core-1.1.pdf. [50] G. Williamson, D. Yip, I. Sharoni, and K. Spaulding. [34] M. Mercuri. Beginning Information Cards and Identity Management: A Primer. MC Press, Big CardSpace: From Novice to Professional. Apress, New Sandy, TX, USA, 2009. York, USA, 2007. [35] Microsoft Corporation. Microsoft’s Vision for an Identity Metasystem, May 2005. http://msdn. microsoft.com/en-us/library/ms996422.aspx. [36] Microsoft Corporation and Ping Identity Corporation. An Implementer’s Guide to the Identity Selector Interoperability Profile v1.5, 2008. [37] A. Nadalin, C. Kaler, R. Monzillo, and P. Hallam-Baker (editors). Web Services Security: SOAP Message Security 1.1 (WS-Security 2004). OASIS Standard Specification, 2006. http://docs.oasis-open.org/wss/v1.1/wss-v1. 1-spec-os-SOAPMessageSecurity.pdf. [38] A. Nanda and M. B. Jones. Identity Selector Interoperability Profile V1.5. Microsoft Corporation, 2008. http://www.identityblog.com/wp-content/ resources/2008/Identity_Selector_ Interoperability_Profile_V1.5.pdf. [39] National Institute of Standards and 25 CardSpace-Liberty Integration for CardSpace Users IDtrust 2010 13/4/2010 Haitham Al-Sinani H.Al-Sinani@rhul.ac.uk Information Security Group http://isg.rhul.ac.uk/ Royal Holloway, University of London Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group Multiple identities for multiple accounts - Hard to manage multiple identities (hence poor security practises) - May result in identity theft Identity difficulties Development of identity management systems (IdMSs) CardSpace Liberty Alliance OpenID Non-interoperable Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group CardSpace Ships by default with Windows Vista and 7 Supports user authentication and exchange of attributes Managed Cards Personal Cards Issued by Issued by Remote IdP Local SIP Digital ID Card Website(s) Acronyms: RP: Relying Party, e.g. website. SIP: Self-issued Identity Provider. CIdS: CardSpace Identity Selector. RST: Request Security Token RSTR: Request Security Token Response CardSpace – SIP Mode RP 2. “Can I have a SAML token, containing First Name, RP Policy E-mail, PPID, issued by SIP, please?” 3. CIdS highlights InfoCards that satisfy the RP policy 1. Request protected resource 7. Token is presented 4. User picks a card 6. Token is created (RSTR) 5. Token is requested (RST) SIP Acronyms: UA: User Agent, e.g. web browser (IE8). RP: Relying Party, e.g. website. CIdS: CardSpace Identity Selector. SIP: Self Issued Identity Provider. CardSpace – SIP Mode[more details] 1. UA → RP: HTTP/S Request, GET /index.html HTTP/1.1 GET (Login Page). Host:www.myopenid.com/signin_password 2. RP → UA: HTTP/S Response, Login Page + RP Policy. 3. User → UA: CardSpace option clicked, and CIdS invoked. 4. UA ↔ CIdS: RP policy passed, matching InfoCards highlighted, the rest greyed out. RST 5. User ↔ CIdS: Picks/sends an InfoCard. CIdS SIP RSTR 6. CIdS ↔ SIP: Exchange of RST & RSTR. 7. CIdS → UA → RP: RSTR. 8. User ↔ RP: Grants/denies access. Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group Liberty Alliance Project Consortium of (150+) companies interested in SSO & IdM As of 2006, more than one billion Liberty-enabled identities & devices Builds open standard-based specifications for an ‘open’ XML-based SSO system Liberty Profiles ‘The combination of message content specification and message transport mechanisms for a single client type is termed a Liberty profile [1]’ [1] S. Cantor, J. Kemp, and D. Champagne (editors). Liberty ID-FF Bindings and Profiles Specification. Liberty Alliance Project, 2004. Liberty Artifact Liberty-Enabled Client (LEC) Supported (in the Supported integration scheme) Prototyped Prototyped Liberty Browser Post Supported Prototyped Liberty Browser Post User Agent Service Provider Identity Provider 1 Request protected resource 2 Obtain IdP 3 Redirect to IdP + AuthRequest 4 Get ? 5 Process 6 AuthRequest HTML form (post) to SP containing 7 POST 8 Process assertion 9 Grant/deny access Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group CardSpace Liberty Shibboleth OpenID Identity systems are proliferating ... Each system offers somewhat distinct user experience Different experiences may lead to user confusion, which in turn, could lead to phishing, pharming, etc Interoperation could lead to consistent user experience Hence, better security ... Wide adoption Slow adoption As of 2006, more than one billion Liberty-enabled identities & devices Interoperation could extend the applicability of CardSpace Wide adoption Ships by default in Windows Vista/7 As of 2006, more than one billion Liberty-enabled identities & devices World-wide use of Windows Practically useful for large numbers of identity management users and SPs CardSpace personal cards May not be prepared to are used to make Liberty IdPs available via the accept associated -CardSpace identity selector May not be prepared to Practically useful burden accept associated burden -Server performance not affected - Net load reduction The user must create a LibertyCard, which contains (at least): Address of the Liberty IdP Trigger sequence, e.g “Liberty” The integration scheme is built on: Browser extension CardSpace Identity Selector Responsible for intercepting, Responsible for storage of inspecting and modifying web pages Liberty IdPs’ addresses via personal cards, i.e. Responsible for automatically LibertyCards forwarding security tokens Different LibertyCards Responsible for etc. represent different Liberty IdPs Integration Protocol [Detailed View] User agent RP IdP Id selector Plug-in (CardSpace-enabled) (Liberty-enabled) 1 Request protected Resource 6 HTTP auth response 2 User selects a (RP policy embedded in objet tag) LibertyCard Plug-in: pre-process &prepare to intercept SAML token 3 5 User invokes CardSpace 4 Highlight Plug-in: Catch SAML response, 9 modify to Liberty SAML request & 8 forward SAML request SAML request (RST) 7 SAML response (RSTR) SAML response (auth token) 10 Plug-in: Display token, obtain user consent & SIP 11 forward the token 12 Grant/Deny access 13 Acronyms: RP: Relying Party, e.g. website. IdP: Identity Provider , e.g. Website. CIdS: CardSpace Identity Selector. Integration Scheme [summary ] CardSpace RP 2. “Can I have a SAML token, containing PPID, issued by *any*, please?” 3. Process RP Policy 4. CIdS highlights InfoCards that satisfy the RP policy 1. Request protected resource 9. Token is presented 5. User picks a card 8. Approve Token? 7. AuthToken is created 6. Generate Liberty AuthReq Liberty IdP Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group Agenda 1. Introduction 2. CardSpace 3. Liberty 4. Integration Scheme 5. Analyses 6. Concluding remarks 7. Q/A Information Security Group CardSpace-Liberty Integration for CardSpace Users IDtrust 2010 13/4/2010 Haitham Al-Sinani H.Al-Sinani@rhul.ac.uk Information Security Group http://isg.rhul.ac.uk/ Royal Holloway, University of London An Attribute-based Authorization Policy Framework with Dynamic Conflict Resolution Apurva Mohan Douglas M. Blough School of Electrical and Computer Engineering School of Electrical and Computer Engineering Georgia institute of Technology Georgia institute of Technology Atlanta, GA, USA Atlanta, GA, USA apurva@gatech.edu doug.blough@ece.gatech.edu ABSTRACT General Terms Policy-based authorization systems are becoming more com- Security, Languages, Performance mon as information systems become larger and more com- plex. In these systems, to authorize a requester to access Keywords a particular resource, the authorization system must verify that the policy authorizes the access. The overall authoriza- Attribute-based authorization, authorization policy, conflict tion policy may consist of a number of policy groups, where resolution each group consists of policies defined by different entities. Each policy contains a number of authorization rules. The 1. INTRODUCTION access request is evaluated against these policies, which may As information systems become more complex and dis- produce conflicting authorization decisions. To resolve these tributed in nature, system administrators and users need conflicts and to reach a unique decision for the access request authorization systems which can help them share their re- at the rule and policy level, rule and policy combination al- sources, data and applications with a large number of users gorithms are used. In the current systems, these rule and without compromising security and privacy. Although tra- policy combination algorithms are defined on a static basis ditional authorization systems address the basic problem of during policy composition, which is not desirable in dynamic granting access to only authorized individuals, they do not systems with fast changing environments. provide a number of desired features of modern authoriza- In this paper, we motivate the need for changing the rule tion systems. These include 1) easily changing authorization and policy combination algorithms dynamically based on based on accessor roles, group memberships, institutional af- contextual information. We propose a framework that sup- filiations, location etc., 2) multiple authorities jointly mak- ports this functionality and also eliminates the need to re- ing authorization decision, 3) dynamically changing autho- compose policies if the owner decides to change the combi- rization based on accessor attributes, and 4) GUI-based gen- nation algorithm. It provides a novel method to dynamically eral purpose tools for description and management of autho- add and remove specialized policies, while retaining the clar- rization rules. Some traditional authorization systems pro- ity and modularity in the policies. The proposed framework vide some of these functions on an ad-hoc basis. Although also provides a mechanism to reduce the set of potential policies have always been part of authorization systems, they target matches, thereby increasing the efficiency of the eval- were mostly buried in other functional code and hence were uation mechanism. We developed a prototype system to difficult to compose and analyze. demonstrate the usefulness of this framework by extending Modern policy-based authorization systems provide most some basic capabilities of the XACML policy language. We of these features. They have a separate policy module that implemented these enhancements by adding two specialized can be queried to make authorization decisions. This mod- modules and several new combination algorithms to the Sun ule makes decisions taking into consideration all applicable XACML engine. policies for a particular access request. These policies may be defined by multiple authorities. The policies may have Categories and Subject Descriptors different or even conflicting authorization decisions for the same access request. Policy languages use policy combi- K.6.5 [Management of Computing and Information nation algorithms (PCA) to resolve such conflicts. These Systems]: Security and protection; D.4.6 [Operating Sys- algorithms take the authorization decision from each policy tems]: Security and protection as input and apply some standard logic to come up with a final decision1 . These PCAs are currently chosen at the time of policy composition and hence they are static. In highly dynamic Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are environments, this is not desirable and there may be a need not made or distributed for profit or commercial advantage and that copies to select these PCAs dynamically. In this case, it will be bear this notice and the full citation on the first page. To copy otherwise, to useful to have a mechanism to select a suitable PCA based republish, to post on servers or to redistribute to lists, requires prior specific 1 permission and/or a fee. For efficiency reasons, policy engines only evaluate policies IDtrust ’10 April 13-15, 2010, Gaithersburg, MD until they reach a final decision based on the combination Copyright 2010 ACM ISBN 978-1-60558-895-7/10/04 ...$10.00. algorithm. 37 on the dynamic contextual information available to the sys- (XACML) as an example to introduce the primary elements, tem. More discussion on this issue along with a motivating these elements are similar in other policy languages as well. scenario is presented in Section 3. XACML is an OASIS standard that describes a policy PCAs used in current systems are also very restricted. language for representing authorization policies and an ac- There are a number of conflict resolution logics in general cess control decision request/response language [2]. XACML purpose computing which are not expressible as PCAs in is based on XML. It describes general access control re- authorization languages. Examples of these logics include quirements while allowing for extensions for defining new hierarchy-based resolution, priority-based resolution, taking functions, data types and combination logics. The language a simple majority vote, and taking a weighted majority vote. has syntax for defining authorization policies and building a There is a need to include algorithms such as these as PCAs request/response to validate authorization requests against in authorization languages to provide more functionality and the policies. The response contains one of the four possible flexibility in defining policies. outcomes of policy evaluation - Permit, Deny, Indeterminate Having a context-aware authorization system also pro- (an error occurred or some required value was missing, so vides the capability to define different policies for different a decision cannot be made) or Not Applicable (the request contexts. These contexts can be distinguished by contex- can’t be answered by this service). tual data or environmental attributes. In this case, the poli- XACML has a Policy Enforcement Point (PEP) that actu- cies will be modular making them easy to comprehend and ally protects the resource and a Policy Decision Point (PDP) analyze. Without the ability to choose the applicable poli- that evaluates the access request against the policies. The cies based on contextual information, the policy composer is PEP receives the access request from the requesting user forced to duplicate each access control rule with and with- and forwards it to the PDP which makes the decision in out the contextual information in the same policy. Although consultation with the policies. If the access is allowed, the the same access control decision can be achieved in both ap- PEP release the resource to the requesting user. The main proaches, the latter makes it difficult to analyze the policies components of a XACML policy are described below: and the effect of making changes to them. Also if policies Policy - An XACML policy contains a set of rules with are chosen dynamically, only a small set of rules will be eval- the subject and environment attributes, resources and cor- uated for their applicability for this request. This reduces responding actions. If multiple rules are applicable to a par- the number of matches with potential policy targets thereby ticular request, then the rule combination algorithm (RCA) lowering computation time. combines the rules and resoles any conflict in their decisions. Another advantage of using context-aware authorization XACML supports the following RCA’s - Deny-overrides (Or- is that a specialized policy created for some specific purpose dered and Unordered),Permit-overrides (Ordered and Un- can be added and removed from consideration dynamically ordered), and First-applicable. without changing the existing policies. This is especially Policy Set - A policy set is a container which contains useful for systems that have to adhere to certain temporary other policies or policy set. One or more of these poli- authorization requirements which require special authoriza- cies or policy sets may be applicable to a particular ac- tion rules. This is also useful in cases where the specialized cess request. If more than one are applicable, then the policy is composed by some entity other than the one who Policy Combination Algorithms (PCA) are used to com- usually creates and maintains authorization policies. bine the policies and resolve any conflicts in their decisions. The main contributions of this paper are: 1) proposing XACML supports the following PCA’s - Deny-overrides (Or- a framework where authorization for a particular access re- dered and Unordered),Permit-overrides (Ordered and Un- quest is decided dynamically based on context information, ordered), First-applicable, and Only-one-applicable. 2) supporting dynamic conflict resolution where PCAs are Target - A Target is basically a set of conditions for the chosen at run time based on context information, 3) provid- Subject, Resource and Action that must be met for a Policy ing the ability to dynamically include (remove) specialized, Set, Policy or Rule to apply to a given request. short-term or add-on policies to (from) the authorization Rule - The rule is the core representation of the access policy set, 4) increasing the efficiency of policy target match- control logic with the subject, resource, action and environ- ing during authorization, 5) increasing the modularity and ment fields. It is a boolean function, which evaluates to true clarity of the policies, 6) building a prototype authorization if the subject, resource, action and environment fields in the system to demonstrate the concepts, and 7) evaluating effi- request matches with the fields in the rule. ciency of the policy evaluation for the proposed framework. 2.2 Authorization Policy 2. ATTRIBUTE-BASED AUTHORIZATION In an attribute-based system, objects are protected by ad- ministrator (or object owner) defined policies. These poli- SYSTEMS cies define a set of verifiable attributes (with pre-defined In this section, we first introduce the basic constructs of values) against each resource for a set of privileges. These attribute-based policy languages. We then describe some ba- attributes are either the characteristics of the user or the sic concepts of attribute-based authorization systems, define environment. These attributes must be presented to the au- attribute-based policies, and policy combination algorithms thorization module and verified by it in order to authorize used in conflict resolution. the accessing user to access the requested object with spe- cific privileges. Since the attributes have to be verifiable, 2.1 Brief Introduction to Policy Languages they have to be certified by some entity which is trusted by In this sub-section, we introduce the basic elements of the authorization module. attribute-based authorization policy languages. Although An attribute-based authorization policy is formally de- here we use eXtensible Access Control Markup Language fined below. 38 Definition 1 : Let SA, RA and EA represent the Subject, which is an OASIS standard. XACML is an attribute-based Resource and Environmental attributes respectively, each policy description language and is used for implementing our of which is well defined set of finite cardinality, given as prototype system. Although we use XACML for discussion SA = {sa1 , sa2 .........sal }, RA = {ra1 , ra2 .........ram } and and implementation, the model we present in this paper is EA = {ea1 , ea2 .........ean }. These attributes can take values generic and can be implemented in other policy languages val sai ⊆ dom(sai )(1 < i < l), val raj ⊆ dom(raj )(1 < j < like P3P [4] or EPAL [1]. m) and val eak ⊆ dom(eak )(1 < k < n). Attributes can be of two types, one which can take dis- 2.3 Combination Algorithms and Conflict Res- tinct and unconnected values (for e.g. ‘role’=‘doctor’ or olution ‘role’=‘nurse’) and another type which can take a single or In a large system, there may be multiple authorities who range of values (for e.g. ‘time’ is between t1 and t2 or ‘age’ specify the authorization policies. As such, there can be mul- ≤ 21). In the latter case, the values that an attribute can tiple groups of policies. When a request is evaluated in the take are connected. Without loss of generality, we define system, the authorization module determines which policy the latter group as attributes which can take either a single sets apply to the particular request. Then it checks which value or a range of values. For example, for a range of saj , policies among those groups and which rules among those the domain and values are defined as follows: policies are applicable to the request. There can be multiple Attribute Type 1 - policy sets and multiple policies in each set applicable to a dom(saj ) = [saj val1 , saj val2 ...saj valn ], val saj ∈ dom(saj ); single access request. Even within each policy there can be Attribute Type 2 - multiple rules which apply to the access request. These rules dom(saj ) = [low, high], val saj = [low0 , high0 ] ⊆ dom(saj ); and policies can have a different or even conflicting decision where, (low0 ≥ low) and (high0 ≤ high). If val saj takes a for the request. As such, a mechanism is needed to resolve distinct value in [low, high], then low0 = high0 . these conflicts. Policy languages have some rule combina- Definition 2 : Let Action define a set of actions which a tion algorithms (RCAs), which evaluate the applicable rules subject can execute on resources. ACT = {act1 , act2 .........actp }. based on the logic of the algorithm and resolve any conflict For example, the set of actions on a file can be {read, write, in their decisions. delete, append, execute}. Let D be the set of decisions that Definition 6 : In a single policy, E(AR, Ri ) → di , where can result as a response to a predicate evaluating to true. E represents the evaluation of the ith rule and di is the cor- D = {d1 , d2 .........dq }. responding decision. The set of all the decisions is given as Definition 3 : An access request (AR) is a tuple of the DRule = (< d1 , d2 , ..., dx >). Rule Combination Algorithm form < s, r, a > , where s ⊆ {SA, EA}, r ⊆ {RA} and (RCA) is defined as {RCA φ DRule } → d, where d  D. φ a ⊆ {ACT }. It represents that s is requesting to access r represents ‘applied to’. with rights a. A Rule R has the same format but defines For example, a policy may use ‘deny-overrides’ as its RCA. the set s required to access r with rights a. In this case, if the algorithm finds even a single rule that Definition 4 : A policy is a list of rules given as P = (⊕, < denies the access, its final decision is ‘deny’; otherwise its R1 , R2 .........Rs >). ⊕ is a combination function, which decision is ‘permit’ even if a single rule permits. If none of combines the rules to produce a single decision for the policy. the rules either ‘permit’ or ‘deny’ the access, then the result Definition 5 : A Policy Set (PS) is a container which con- is ‘Not Applicable’. tains a list of policies. It may also contain other policy sets. For combining the policies and policy groups, policy lan- It is given as PS = ( , < PS 1 , PS 2 , ...PS i >). Each PS t guages have policy combination algorithms (PCAs). These represents either a policy set or a single policy2 . is a com- algorithms work on similar logic as the RCAs. Each policy bination function, which combines all the policy sets. This give a single decision for the access request. The PCA com- combination function is used to combine policies and policy bines these decisions into a single decision by using the PCA sets and has no direct relation with the rule combination logic. algorithm. Definition 7 : In the final policy list, E(AR, PS i ) → di , Conceptually, a policy is a deliberate plan to implement where E represents the evaluation of the ith policy set and authorization to a particular resource or group of resources. di is the corresponding decision. The set of all the decisions A rule is a component of the policy that defines a specific is given as DP S = {d1 , d2 , ..., dx }. Policy Combination Al- authorization predicate. A policy set is a container that con- gorithm (PCA) is defined as {PCA φ DP S } → d, where d  tains a number of logically connected policies. In a multi- D. authority setting where the authorization policies for a par- In the current systems, these RCAs and PCAs are static ticular resource are defined by a number of entities, all poli- and are determined at the time of composing the policies. cies for that particular resource will form a logical policy set. For example, at a university, the firewall policies to protect a lab computer may be a combination of the policy defined 3. DYNAMIC CONFLICT RESOLUTION centrally by the office of information technology, a specific In the last section, we saw how RCAs and PCAs resolve department policy, a lab firewall policy, and the administra- the conflicts among rules and policies to give a unique de- tor defined policy for that computer. A policy set encom- cision for an access request. We also noted that, in exist- passes all of these policies. The policies can be defined in a ing systems, these RCAs and PCAs are chosen at the time number of policy description languages. Each has its advan- of composing the policies and hence do not change. This tages and disadvantages. In describing the policies in this static composition may not be suitable for highly dynamic paper, we will use the syntax and structure of XACML [2], environments where there is a need to adapt the policies dy- namically. If such a mechanism is available, then it can also 2 In which case the set has a single policy and no PCA. serve as an easy tool for the policy composer, if he wishes 39 to change the RCAs and PCAs without recomposing the pectedly, therefore Alex cannot be expected to recompose authorization policies. policies when an emergency has already occurred. In cur- Some researchers have proposed static conflict detection rent systems, users like Alex do not change their policies on and avoidance, arguing that detecting and resolving conflicts such events. Our novel framework enables users to achieve in systems with a large number of policies in real time can this with little effort and provides an important new func- be a daunting task [26]. We argue that, even though it is tionality. a challenging problem, it is a superior approach. Organi- zation policies, regulatory polices, and user policies change 3.2 Proposed Model regularly. If we perform static conflict analysis, whenever In this section, we present a novel mechanism to dynam- one of the policy changes, new conflicts can arise requir- ically determine the policies applicable to an access request ing some party to change their policies. Also, some policies and to evaluate only the applicable policies. In this model, that conflicted before one of the policies changed and were we evaluate the authorization policies in two stages. In the never composed, may now become acceptable. There is no first stage, we determine which policies are applicable to the mechanism to reconsider these rejected policies. Also, the current access request and we also dynamically determine static model does not take into account adding and remov- which PCA will be used to resolve the conflicts in the au- ing specialized and time limited policies to provide flexibility thorization decisions. In the second stage, we evaluate only in policy composition and maintenance. the applicable policies using the PCA selected in the first stage. 3.1 Motivating Scenario During stage one, the total applicable policy set (TAPS) is Let us consider a motivating scenario from the health determined by selecting only those policies where at least one care domain. Alex is a patient who stores his personal of the authorization rules is applicable to the current access health record (PHR) with his health maintenance organi- request. If P S1 , P S2 ...P Sn are the authorization policy sets, zation(HMO) called Superior Health Care (SHC). At SHC then the TAPS for a particular AR is given as T AP S = the patients’ PHRs are stored in a repository where the ac- {P S1 , P S2 ....P Sn }. cess to the repository is mediated through a proxy. The The combination algorithm used is ‘all-that-apply’, which proxy stores all the authorization policies. The policies may is a new rule combination algorithm defined in Appendix A. have multiple groups with policies defined by patients like The ‘all-that-apply’ algorithm has been implemented in our Alex himself, the hospital which created the record, SHC’s modified XACML engine (see Section 5). To evaluate TAPS, organizational policies, federal regulatory policies, and so all available policy sets are evaluated as explained in Defini- on. When someone tries to access an EMR for a particular tion 6. If a policy set has at least one rule that applies to the patient, the system will consult the applicable policies to current access request, we include it in the TAPS. To find check whether this access is allowed. Assume that, in nor- an applicable rule, we consider the subject and environment mal circumstances, the policy combination algorithm used attributes in the access request (which is the set {EA∪SA}) is ‘deny-overrides’, which is a secure and stringent policy. along with their boolean relationships. We then match that Suppose that Alex wishes to use a more lenient policy in with the rules in the policy level target. We try to find a rule case of an emergency, where he will share his PHR with any with the same set {EA ∪ SA} with the same relationships accessor who is authorized by at least one of the applicable so that at least one of the attribute combinations matches policies. In this case, he needs to dynamically change his with those in the AR. EA, SA and RA are specified in PCA from ‘deny-overrides’ to ‘permit-overrides’ whenever Definition 1. there is an emergency and back to ‘deny-overrides’ once the To aid in determining applicable policy sets, we create a emergency is over. The traditional method would require meta-policy file called the M-Policy. This file contains one him to change his policies twice to achieve this. If Alex want rule for each authorization policy set in the system. This to have several dynamic options, he will have to change his rule is a copy of the policy level target rule included in each policy description each time such a dynamic change occurs. set. This rule is a method, in a language such as XACML, to In the proposed model, Alex can define all such dynamic define whether a particular policy is applicable to the given conditions as an attribute-based policy and the evaluation access request and it makes the processing faster. Includ- of these policies will determine what PCA will be used for ing it in the M-Policy file has two advantages, namely the the current access request. The model extends this concept processing of the M-Policy file is much faster compared to to the selection of the RCA dynamically. It is desirable that evaluating the policy level target rule in each file. These the user has the ability to define several dynamic conditions rules are optional in XACML. If they are not present, pol- simultaneously, need not change his policy descriptions ev- icy evaluation will take longer. Also, we do not use any ery time one such condition changes, and also need not keep rule level targets in the XACML policies. As such, we com- track of the dynamic changes. This is one of the key advan- pare the best case performance XACML can offer with our tages of using the proposed system. If Alex tries to achieve TAPS algorithm. The ‘all-that-apply’ algorithm makes it the same effect in current policy-based systems with static possible to evaluate all target rules at the same place. Each conflict analysis, when an emergency occurs he will have rule in the M-Policy is evaluated (refer to Definition 6). If to recompose his policy with ‘permit-overrides’ and resolve a rule evaluates to ‘permit’, it means that the target rule all conflicts created in the process. When the emergency representing the respective policy is true and that policy is is over, he will have to recompose his policies with ‘deny- applicable. We then include that policy in the TAPS. overrides’ and resolve all conflicts again. He cannot create To apply the TAPS algorithm to current XACML based a special policy for an emergency, because his two policies systems, we can create an M-Policy file if all the XACML are inherently contradictory. This puts a heavy burden on policies in the target system have a policy level target and the user and also, by definition an emergency comes unex- no overriding rule level target. In systems where either there 40 are no policy level targets or overriding rule level targets are the prototype. present, an efficient way to implement the TAPS algorithm is to broadly categorize the available policies and use these 4.1 System Design categories to select the applicable policies. Although this The proposed system has a two stage authorization pro- selection will neither be fine-grained nor accurate, it will still cess, where in the first stage the applicable policy set and improve the performance of the evaluation system because the applicable PCA is determined and in the second stage by using TAPS we can filter out non-applicable policies at the applicable policies are evaluated to reach an authoriza- an early stage. So, although the performance will not be tion decision. For the first stage, the policy is created with optimal in this case, it will still be better than the current an index rule for each policy in the TAPS. An index rule is performance. of the form < {SA, RA, EA} : P olicyId >, where PolicyId The next step in stage one is to determine the applica- is the index id of a particular policy. For example, if policy ble PCA (P CAapply ) based on a set of environmental at- ‘P1234’ is applicable to requests in an emergency scenario, tributes, which define the specific conditions under which then the index rule will be represented as - each of the PCAs is applicable. These environmental at- < {EM T.EM T License = ‘valid0 } : P 1234 > tributes essentially define the context of the AR. Some of < {CompanyY.Dispatched = ‘true0 } : P 1234 > these attributes might accompany the AR while others can < {EM T.Employer = ‘CompanyY 0 } : P 1234 > be provided by an internal or external system entity. We assume that the dynamic decision of which PCA to select The attribute in the index rule is directly provided by is itself based on a policy. Thus, there is a policy set con- an attribute provider (AP)3 . In this example, the three at- taining the rules governing PCA selection. The PCA rules tributes jointly establish that the EMT’s license is valid, he are defined so that they are mutually exclusive and only one works for company Y and company Y was dispatched to of them is applicable in a particular situation. Although the emergency by the 911 operator. These attributes will this might seem complex, it is not really so because there be provided by distinct entities. Using them together can are typically a small number of combination algorithms to establish a complex fact, which cannot be verified by any choose from. This is enforced by using the combination al- single entity in the whole system. Note that if an index rule gorithm ‘only-one-applicable’ to choose among the PCAs. does not contain any attributes i.e. < ∗ : P olicyId >, then ‘only-one-applicable’ returns the applicable PCA if one and it is true by default and that policy is always included. only one rule evaluates to ‘permit’. If zero or more than one For an access request, the attributes present in the request rule (and hence the PCA) evaluates to ‘permit’, then an er- are compared against the index rules and, in many cases, ror code is returned. All rules in the policy set are evaluated only a small number of policies will be included in the TAPS. and the applicable PCA is selected to be used for resolving As a result, the policy evaluation stage will be much faster in conflicts for this access request. these cases. The diagram in Figure 1 describes the dynamic Now in stage two, the final authorization decision is calcu- authorization process. A similar policy is created with an lated by evaluating the TAPS as E(T APS) = {T AP S P CAapply , index rule for each available PCA. Based on the attributes AR} φ DP S → d. As defined in Definition 7, in this eval- in the index rules, we determine which PCA will be applied uation, we consider all policies present in the TAPS and to this particular request. evaluate them against the access request AR. The used in this case is P CAapply , which is calculated in the previous step. As an example, using this model, Alex can create a PCA selection rule to the effect that if the EA = (‘emergency 0 = ‘true0 ), then the PCA ‘permit-overrides’ is used. The effect will be to allow access to anyone who can satisfy at least one of the applicable policies. On the other hand, in case where EA = (‘emergency 0 = ‘f alse0 ), PCA ‘deny-overrides’ can be used. This will limit access to holders of those at- tribute combinations that are not denied by any policy and are allowed access by at least one applicable policy. Since this evaluation is done during each access request, the PCA will change dynamically whenever there is an emergency. In addition to providing this novel functionality, our frame- work proposes the use of TAPS to reduce the policy set to be evaluated for each access request. As shown in Section 6, this improves the real time system performance by 4-8 times. Formulation and evaluation of these rules is explained in more detail in Section 4.1. Figure 1: Block diagram of policy evaluation using the proposed framework. 4. SYSTEM DESIGN AND BACKGROUND 3 MODULES An AP is an entity similar to an identity provider. We define an AP as an entity that can certify certain attribute In this section, we will first present the system design for a values for an individual due to its special relationship with generic implementation of this authorization framework, and the individual. For example, an employer can certify an then describe some background modules used for building employee’s role in an organization. 41 4.2 Application Scenario example XACML policy for Alex is shown in Appendix B. To understand the implication of using context informa- An additional benefit of our framework is that SHC can tion in the total applicable policy set (TAPS) evaluation create index rules using attributes like ‘username’5 , ‘datatype’, and using dynamic PCA selection, let us again consider and ‘data source’ to create index rules to quickly select rel- the previous health care domain scenario. Assume that evant policies when a physician tries to access Alex’s PHR. Alex’s HMO where he stores his PHRs has access policies These relevant policies form the TAPS for this access re- for data based on criteria like data type, membership type, quest. Suppose policy P880 contains Alex’s disclosure poli- etc. Alex’s policies also apply to his PHR, as described ear- cies, P130 contains data source’s policy, P110 contains HIPPA lier. Now Alex, who lives in Atlanta is planning a trip to policy, P112 contains the electronic privacy act6 , and P21 Florida for a week and he wants his PHR to be accessi- contains the SHC’s disclosure policies. SHC’s index rules for ble to any physician or ‘paramedic in Florida’ during that Alex’s PHR are shown below : week in case he needs medical help. Using our proposed < {‘username = Alex0 } : P 880 > model, he can add a special policy saying < {startdate ≤ < {‘datasourceId = 8148200 } : P 130 > date ≤ enddate} : P 2345 >, where P2345 describes the spe- < {‘datatype = P HR0 } : P 110, P 112 > cial permission to ‘physicians’ in general and ‘paramedics < {∗} : P 21 > in Florida’. Upon evaluating this index rule, Alex’s au- Note that, in the last index rule, the attribute value is left thorization system will compare the current date with the blank, which results in P21 being included every time. Using date range in the index rule and will include P2345 during this efficient evaluation of TAPS, SHC can quickly determine that particular week. Since the proposed model is attribute the policies that need to be evaluated for an access request based, Alex can take advantage of this by adding multiple to Alex’s PHR. We report some performance results of the attribute combinations. Assume that Alex’s location can be efficiency of TAPS evaluation in Section 6. tracked from his mobile phone, which communicates that to his authorization system over a secure channel. Then Alex 5. PROTOTYPE IMPLEMENTATION can set the index rule as follows : < {startdate ≤ date ≤ enddate}, {location = F lorida} : P 2345 >. In this section, we describe the prototype implementa- This additional attribute will make sure that the lenient tion of the proposed framework. The prototype implemen- PCA is chosen only when he is physically in Florida4 . Alex’s tation of the framework extends the functionality of the mobile phone is used to provide his location, but the PHR policy language. The implementation is done using Sun’s will be primarily be accesses by the paramedics and physi- open-source XACML engine implementation, where we im- cians using their systems. In the event that he has to cancel plemented additional modules and PCAs using Java. The his trip, his more lenient policy will not be in effect and generated policies are written in XACML. We use the Sun his information will not be available to any paramedic in XACML PDP implementation because its loading and eval- Florida. He also has the convenience of setting this rule uation times are both reasonable when compared to other once and then forgetting about it, irrespective of whether popular XACML implementations like XACMLLight and he actually makes the trip or not. XACML Enterprise. Its overall performance is much bet- It is important here to note the difference between creat- ter than XACMLLight and close to XACML Enterprise. A ing a new access rule in Alex’s policy vs. creating an add-on detailed comparison of the three implementations is done access policy. While the former is possible using the current in [25]. authorization systems, it will require Alex to modify his pol- The authorization policy consists of multiple policy sets. icy by adding new access rules and probably changing the These sets consist of the system policy, the patient policy, rule combination algorithm. The effects of doing both these and the data source policy. The system can be extended to actions is hard for an average user to comprehend. If Alex consider the data accessor’s policy to ensure that the obli- has set his RCA as ‘deny-overrides’ and he wants to add his gations associated with the access request will be honored. new rules to permit access during that particular week, he The authorization module is set up as shown in Figure 2. will need to either change the RCA to ‘permit-overrides’ or The ‘Policy Load and Evaluation’ and ‘Ancillary’ modules change each of the deny rules in the policy. Doing either are part of the standard XACML engine and the ‘PSS’ and is not desirable because his deny rules will be bypassed. In ‘PCA Selector’ (explained later in this section) are added to the proposed system, Alex can add a policy to the policy set the XACML engine. To make the proposed model closely defining his access policies and change the PCA to ‘permit- compliant with the existing XACML engine, we have mod- overrides’ for the specified period. Doing so will still keep eled the two new sub-modules as XACML policy sets, so all of Alex’s deny rules unmodified and his policy set will that the XACML policy engine can be used to do these allow access when at least one of his policies allow access, evaluations as well. which is what he intended to do. This is hard to do in cur- Policy Set Selector (PSS) - The PSS takes the autho- rent systems, because PCA cannot be changed according to rization policy as the input, which contains all the available dynamic requirements. The resulting policy set is also more policy sets. The schema of the TAPS as a policy file is shown modular and analyzing such a policy set is easier. Finally, in Figure 3. It is organized in the Subject, Resource, Action it saves the effort and complexity of analyzing the effects of and Environment structure. The PSS evaluates each policy changing the RCA or policy rules, not to mention restoring set to find out all the sets that are applicable to this access the original state once the specified time has passed. An request. The PCA used here is ‘all-that-apply’, which is es- 5 The system can use any pseudonym to link Alex’s PHR to 4 his policies. We assume that Alex always carries his mobile phone with 6 him because in essence the service is tracking a device and The assumption here is that the rules in these acts can be not Alex himself. encoded in a high level language like EPAL or XACML. 42 defining the selected PCA with no attributes (hence always applicable) and defining all the other PCAs with attributes that are never true. Although such a configuration may not provide some of the key benefits of the proposed framework, it may sometimes be required for backward compatibility. The PCA selector file is a policy set as shown in Figure 4. All the PCAs are described as contained policy sets and the combination algorithm used is ‘only-one-applicable’, which is a standard XACML PCA. It returns ‘permit’ if one of the policy sets is applicable and ‘deny’ if zero or more than one policy set are applicable. In case the result is ‘permit’, the applicable policy set returns the name of the PCA to be used in combining policies. This module provides the novel functionality of selecting the PCA dynamically as described in Section 3.2. Figure 2: Modified XACML policy engine. pecially developed for the PSS. The function of this PCA is to evaluate all the policy sets and output all that apply. All the policy sets selected by the PSS are stored in a data struc- ture and only those policy sets are considered in the evalu- ation phase. As mentioned earlier, this reduces the number of policies to be evaluated for an access request and results in considerable run time performance improvement. A de- tailed discussion of the performance improvement is given in Section 6. Figure 4: PCA selector module as a XACML policy set. To continue with the example in Section 3, the PCA se- lection policy set will be set as shown in Figure 4. Initially, when there is no emergency, the PCA ‘deny-overrides’ will be selected. This will be indicated by the attribute ‘emer- gency’ being set to false. When there is an emergency, the attribute is set to true and the PCA evaluation will give the output as ‘permit-overrides’. The output PCA again be- comes ‘deny-overrides’ once the emergency is over and the corresponding attribute is set to false. Figure 3: Policy set selector module as a XACML This attribute can be provided by a number of entities policy set. like the ‘emergency operations center’, the ‘911 operations center’, the patient himself or any other entity that the pa- PCA Selector - The PCA selector reads the PCA se- tient’s agent trusts to provide this attribute. Although it lection file, which is described as a XACML policy. This sometimes might be difficult to ascertain that this particu- description is created by the entity that is responsible for lar patient is involved in an emergency, the patient would making sure that all the relevant policies are taken into con- give more priority to making his PHR available to medical sideration. This entity should make sure that the all the personnel in an emergency rather than to his privacy. Since available PCAs are encoded as individual policies as shown the entire system can be audited, any breach of privacy can in Figure 4. This system can be used as a static system by be discovered on audit. 43 6. PERFORMANCE EVALUATION in each step. We also vary the number of index rules ap- In this section, we will discuss the performance evaluation plicable to each policy to 1,2,4, and 8 in different runs of of the various components of the proposed framework. We the experiment. The result is shown in Figure 5. We ob- are basically measuring the following parameters: 1) over- serve that the evaluations take almost linear time as shown head in evaluating the total applicable policy set (TAPS), in this semi-log graph. The evaluation time is within 2 sec- 2) overhead in dynamic selection of the PCA, and 3) time onds even with 1,000 policies with 8 rules each, whereas with saved in evaluating just the TAPS (and evaluating applica- 100 policies with 8 rules each the evaluation time is within ble policies) compared to performing a target match on all 250 milli-seconds. the available policies (and evaluating applicable policies). To measure these parameters, we evaluate the following - 1) TAPS evaluation time vs. total number of available poli- cies , 2) PCA evaluation time vs. number of attributes in each index rule, 3) evaluation time vs. number of policies (with and without TAPS). Reasons for choosing these pa- rameters and the evaluation results are discussed in detail in Section 6.2. 6.1 Evaluation Setup In the evaluation setup, we create XACML policies for the modules described in Section 5. For evaluating the TAPS, we use the schema shown in Figure 3. We setup a XACML policy file with one index rule representing each available policy file (or policy set). Each index rule contains two at- tributes, both of which are required for access. There are 16 attributes in total and we select 2 out of them randomly. Figure 5: Evaluation time vs. number of available For the experiments, we use 1,2,4 and 8 index rules for each policies. policy file in each run of the experiment. We also vary the total number of available policies from 1 to 10,000 increasing the number of policies by an order of magnitude each time. 6.2.2 Case 2 Most of the real world policies use 10-20 user attributes com- In this case, we evaluate the applicable PCA from a list ing from the organizations LDAP server [22], [3], hence we of PCAs supported by the system. In our prototype sys- feel 16 is a representative number. Moreover, this is a con- tem, we have seven PCAs, each denoted as a policy set with figuration parameter and not a limitation because it can its own index rule. We increase the number of attributes be scaled easily. We also scale the number of attributes in used in each index rule to understand the effect of scaling one of the experiments (as described in this Section 6.2.2). the attributes on performance. We increase the number of We believe that most of the real world systems use much attributes from 2 to 10,000. The run time performance is less than 10,000 policies. We evaluate performance up to shown in Figure 6. We observe that even with 100 attributes 10,000 policies to observe the system performance over a per index rule, the total evaluation time is under 280 milli- broad range. seconds. For selecting the PCA, we use the schema shown in Fig- ure 4. Since we have a fixed number of PCA’s in the system, we use this evaluation to scale up the number of attributes from 2 to 10,000 in each index rule. This evaluation gives us an estimate of the evaluation time in a system with large number of attributes. For evaluating the actual policies, we have created policies with 1,2,4 and 8 rules per policy to be used in different runs of the experiment. We created sets of 10, 100, 1,000, and 10,000 policies. All experiments were run on a single 2.4GHz Intel Dual Core Pentium machine with 2 GB of physical memory. 6.2 Evaluation Results In this subsection, we present the performance results for the different cases just described. 6.2.1 Case 1 Figure 6: Evaluation vs. number of attributes per In this case, we evaluate the time consumed in evaluating index rule. the TAPS with varying number of total available policies. The RCA used is ‘all-that-apply’, so the evaluation consid- ers all the policies that apply to a particular access request. 6.2.3 Case 3 We change the number of policies from 1 to 10,000 by in- In this case, we evaluate the same set of policies with and creasing the number of policies by an order of magnitude without the PSS module and compare the performance of 44 the two systems. The setup is described in Section 6.1. In each policy file, we have a policy target set up, which is the default method XACML uses to check whether the cur- rent policy (file) is applicable to the current request. This target can be set up by resources, subjects, actions, or envi- ronments. We set up these targets with applicable subjects values. This allows us to make a direct comparison with our experimental setup. Also, this does not limit the use of target in the experiments conceptually or physically7 . We first run the test with all the files and let XACML engine perform target matches with all the available policies and evaluate policies where the target matches. Figure 7 shows the result of this evaluation with about 1% of the policies being evaluated. For comparison with our proposed system, we run the ex- periment with the same policy set with the PSS module in- Figure 7: Evaluation time vs. number of total avail- cluded. We evaluate the TAPS using the index rule method able policies (conventional XACML). for all the available policies and force the TAPS to be 1% of the total available policies. The resulting TAPS is stored in an array and the XACML engine then performs evaluation of all the files in this array. The combined time for deter- mining the TAPS and evaluating it is shown in Figure 8. We include 1 percent of the total policies in the TAPS, which we believe is more than what most access requests would require, especially in systems with large number of policies. We chose this percentage so that we have a view of the worst case system performance and expect that most real systems will have fewer policies to evaluate per access request and the evaluation times will be lower that what is observed in Figure 8. Comparing the results in Figure 7 and Figure 8, we ob- serve that using TAPS evaluation with the index rules and then evaluating the applicable policies is about 4-8 times faster than the conventional method. This is specially im- portant in large systems with a lot of policies. Considering Figure 8: Evaluation time vs. number of total avail- the worst case scenario (10,000 policies, 8 rules/policy), the able policies (our proposed framework). conventional evaluation takes about 210 seconds compared to 26 seconds on our system. In a more common scenario (100 policies, 8 rules/policy), the evaluation times are 1.8 In this section, we review related work in the area of con- seconds and 0.5 seconds respectively. We argue that this flict detection, avoidance and resolution works and compare performance improvement is not only significant, but criti- them to our proposed framework. cal for real time systems. 7.1 Conflict resolution 6.2.4 Case 4 Mazzoleni, et. al, presented a system for integrating au- In this case, we fix the total number of available policies to thorization policies for different partners organizations [20]. 1000 and change the percentage of applicable policies to each Their core idea is to find the similarity between a set of access request. We perform this experiment with 15access policies and to use that information to transform the set of request. We repeat this experiment for 1,2,4 and 8 rules per policies into a single transformed policy which applies to the policy with and without the PSS system and compare their request. In their case, the PCA are static there is no way performance. The results are shown in Figure 9 and Fig- to choose policies dynamically, whereas in our framework ure 10. We observe that in our proposed model the system we can choose the PCA dynamically. Our framework also evaluation time starts from a very low value and increases allows multiple policies for the same resource, one of which linearly. On the other hand in existing systems, it starts at can be chosen at run time. near maximum value and remains almost constant. Another idea for policy conflict resolution in active databases was proposed by Chomicki et. al, in [10]. Their system is based on the Event-condition-action paradigm in which 7. RELATED WORK policies are formulated using ECA rules. A policy gener- 7 ates a conflict when its output contains a set of actions Using target in the policy file is optional in XACML. If that the policy administrator has specified cannot occur to- no target is used, the only way to check the applicability of the policy is to evaluate it and see if it applies to the cur- gether. This work is specific to dynamically resolving con- rent request. This will be slower than matching the target flicts among actions in a system, whereas our focus is more and hence we believe that our comparison is fair because we on a generic policy-based system to protect the resources. In compare our results with the faster version. our framework, the policy composers need not have any idea 45 conflict with the system policies. In our framework, the users can specify their preferences even if they have con- flicts with the other policies. The users policies may override other polices or be overridden based on context information. Agrawal’s framework also does not consider changing system and regulatory policies that may create more conflicts with accepted user policies. Also, it may result in removal of conflicts between the new system policy and previously re- jected user policies, which is not handled in this system. In our framework, this will be naturally handled without any action on anyone’s part to resolve the conflict. 7.3 Hybrid Approach Bertino, et. al, presented an approach which is a hybrid of conflict avoidance and conflict resolution [9]. In this work, Figure 9: Evaluation time vs. number of total avail- the authors propose a scheme for supporting multiple ac- able policies (conventional XACML). cess control policies in database systems. Here policies may have ‘strong’ authorization which are without conflicts or ‘weak’ authorization with possible conflicts. Compared to this framework, we believe that our approach is more generic because it allows conflicting policies to be composed and re- solves conflicts based based on context information. To im- plement Bertino’s proposed system, there should be some static hierarchy (or first specified rule overrides others) for conflict avoidance among strong authorizations. In contrast, our framework will allow dynamic overriding among the au- thorities. Another approach to resolving policy conflicts in a hybrid manner is proposed by Jin, et al. [14]. In their work they mention that although resolving conflicts using the static method is easier, it may not be feasible in large systems with large number of policies. The main difference with our framework is that the combination algorithms in their Figure 10: Evaluation time vs. number of total avail- model are defined statically, whereas in our case we decide able policies (our proposed framework). the combination algorithm at run time based on context information. Also, our framework enables the user to add (remove) PCAs or policies dynamically, an aspect not con- of the possible conflicts in the system, whereas in Chomicki sidered in [14]. the system administrator specifically defines conflicting ac- tions. Moreover, in our system there can be a number of 8. CONCLUSION authorities who can compose the policies and it is not possi- In this paper, we discussed policy-based authorization sys- ble for any one authority to have an idea of all the possible tems and attribute-based systems. We focus on the multi- conflicts in advance. authority case, where multiple policies are used to authorize a single access request. In particular, we expose the prob- 7.2 Conflict avoidance lems in choosing the PCAs ahead of time i.e. during the pol- One approach to avoid conflicts in authorization rules is icy description. We present a framework to choose the PCA presented by Yu et. al, in [26]. They argue that a large dynamically during run time based on dynamic attributes. number of rules may apply to a service and detecting and The framework also supports choosing the applicable policy resolving conflicts in real time can be a daunting task. Their sets based on dynamic attributes. This increases the policy system is completely static and assumes that is it always evaluation efficiency of the system and modularizes the poli- possible to determine priorities ahead of time and avoid con- cies enhancing their analyzability. Using dynamic attributes flicts. We argue that this is not possible in dynamic environ- to determine applicable policy sets at run time provides a ments and is based on multiple factors like the context of the novel method to add and remove specialized policies dy- access request, authorities defining the policies, mandatory namically. We implemented and evaluated a prototype of policies (like regulatory) vs. optional policies, and environ- the authorization system as a module of a modified version mental factors. of Sun’s XACML engine. Another approach for avoiding conflicts in policy specifi- cation is proposed by Agrawal, et. al, for defining autho- 9. REFERENCES rization policies for hippocratic databases [5] and [6]. Their [1] Enterprise Privacy Authorization Language (EPAL). system allows system administrators to specify system poli- http://www.w3.org/Submission/2003/SUBM-EPAL- cies for administration and regulatory compliance and these 20031110/. policies have the highest priority. Users are allowed to spec- [2] eXtensible Access Control Markup Language ify their privacy preference as long as their policies do not (XACML). www.oasis-open.org/committees/xacml/. 46 [3] Ldap authentication attributes. In Workshops, May 2007. http://docs.sun.com/source/817- [20] P. Mazzoleni, B. Crispo, S. Sivasubramanian, and 7647/ldapauth.htmlwp19608. E. Bertino. Xacml policy integration algorithms. In [4] P3P: The Platform for Privacy Preferences. ACM Transactions on Information and System http://www.w3.org/P3P/. Security (TISSEC), pages 852–869, February 2008. [5] R. Agrawal, D. Asonov, R. Bayardo, T. Grandison, [21] A. Mohan, D. Bauer, D. Blough, M. Ahamad, C. Johnson, and J. Kiernan. Managing disclosure of B. Bamba, R. Krishnan, L. Liu, D. Mashima, and private health data with hippocratic databases. IBM B. Palanisamy. A patient-centric, attribute-based, Research White Paper, Januray 2005. source-verifiable framework for health record sharing. [6] R. Agrawal, P. Bird, T. Grandison, J. Kiernan, CERCS Tech Report GIT-CERCS-09-11, Georgia S. Logan, and W. Rjaibi. Extending relational Tech, 2009. database systems to automatically enforce privacy [22] L. Ngo and A. Apon. Using shibboleth for policies. In ICDE, pages 1013–1022, April 2005. authorization and authentication to the subversion [7] A. Barth, J. Mitchell, and J. Rosenstein. Conflict and version control repository system. In IEEE ITNG, combination in privacy policy languages. In Workshop 2007. on Privacy in the Electronic Society, October 2004. [23] P. Rao, D. Lin, E. Bertino, N. Li, and J. Lobo. An [8] E. Bertino, C. Brodie, S. B. Calo, L. F. Cranor, algebra for fine-grained integration of xacml policies. C. Karat, J. Karat, N. Li, D. Lin, J. Lobo, Q. Ni, In CERIAS Tech Report 2008-21, Purdue University, P. R. Rao, and X. Wang. Analysis of privacy and 2008. security policies. IBM Journal of Research and [24] M. Rouached and C. Godart. Reasoning about events Development, 53, 2009. to specify authorization policies for web services [9] E. Bertino, S. Jajodia, and P. Samarati. Supporting composition. In IEEE International Conference on multiple access control policies in database systems. In Web Services (ICWS), September 2007. Proceedings of the IEEE Symposium on Security and [25] F. Turkmen and B. Crispo. Performance evaluation of Privacy, May 1996. xacml pdp implementations. In ACM workshop on [10] J. Chomicki, M. J. Lobo, and S. Naqvi. Conflict Secure Web Services, October 2008. resolution using logic programming. IEEE [26] W. Yu and E. Nayak. An algorithmic approach to Transactions on Knowledge and Data Engineering, authorization rules conflict resolution in software 15(1), Januray/February 2003. security. In Annual IEEE International Computer [11] K. Fisler, S. Krishnamurthi, L. Meyerovich, and Software and Applications Conference, July 2008. M. Tschantz. Verification and change impact analysis of access control policies. In International Conference on Software Engineering, May 2005. [12] J. Halpern and V. Weissman. Using first-order logic to reason about policies. In IEEE Computer Security Foundations Workshop, 2003. [13] J. Jin, G.-J. Ahn, M. J. Covington, and X. Zhang. Toward an access control model for sharing composite electronic health record. In 4th International Conference on Collaborative Computing, 2008. [14] J. Jin, G.-J. Ahn, H. Hu, M. J. Covington, and X. Zhang. Patient-centric authorization framework for sharing electronic health records. In SACMAT, 2009. [15] H. Kamoda, M. Yamaoka, S. Matsuda, K. Broda, and M. Sloman. Policy conflict analysis using free variable tableaux for access control in web services environments. In WWW Conference, 2005. [16] H. Koshutanski and F. Massacci. An access control framework for business processes for web services. In ACM Workshop on XML Security, October 2003. [17] N. Li, Q. Wang, W. Qardaji, E. Bertino, P. Rao, J. Lobo, and D. Lin. Access control policy combining: Theory meets practice. In ACM SACMAT, 2009. [18] E. Lupu and M. Sloman. Conflicts in policy-based distributed systems management. In IEEE Transactions on Software Engineering, pages 852–869, Nov/Dec 1999. [19] A. Masoumzadeh, M. Amini, and R. Jalili. Conflict detection and resolution in context-aware authorization. In 21st International Conference on Advanced Information Networking and Applications 47 APPENDIX A. ‘ALL-THAT-APPLY’ COMBINATION AL- GORITHM Definitions: Pi = ith Authorization policy. FID = File Identifier. FID(Pi ) = File Identifier for ith authorization policy file. TAPS = An array to store FIDs. M-Policy = A policy file with index rules to define applicability of authorization poli- cies. Algorithm: B. ALEX’S POLICY 48 49 Figure 11: An example policy for Alex. 50 An Attribute-based Authorization Policy Framework with Dynamic Conflict Resolution Apurva Mohan Douglas M. Blough Georgia Institute of Technology Contents • Problem introduction • Motivating scenario • Proposed solution • Performance of the proposed framework • Conclusion Introduction • Policy based authorization systems • Role-based vs. attribute-based systems • Multi-authority systems • Conflicts in policy decisions Problem Introduction • Conflict resolution in current systems is static • Most policy based systems do not provide modularity • Difficult to add or remove special purpose policies • Evaluation of a large number of non- applicable rules • Fast indexing scheme for finding applicable policies Motivating Scenario Superior Health Care (SHC) Proxy request Data source Alex’s policy policy response Querier Regulatory SHC’s policy policy EMR Repository Scenario – Cont. Alex’s Policy Deny Permit Overrides Overrides 1 2 3 1 2 3 Normal Emergency Proposed Solution • Dynamic Conflict Resolution • Decide Applicable policies based on context • Dynamically include (remove) specialized policies • Increase modularity of policies • Increasing the efficiency of policy target matching Authorization Flow Proposed Solution - Dynamic Conflict Resolution Proposed Solution – Applicable Policies Motivating Scenario revisited What Alex wants – • Only his Doctor can access his EMR • During his trip, ‘Doctors’ or ‘paramedics in Florida’ can access his EMR • Attributes used – Alex’s location, Doctor’s credentials, paramedics credentials and location, Alex’s trip duration Motivating Scenario revisited Location Provider Atlanta Proxy Server Alex’s policy (‘doctor’ or P1 ‘paramedic in FL’) and (AlexLocation = FL) and (date = [d1,d2]) P2 P3 Florida EMR Repository paramedic in FL Scenario - Continued Location Provider Atlanta Proxy Server Alex’s policy (‘doctor’ or P1 ‘paramedic in FL’) and (AlexLocation = FL) and (date = [d1,d2]) P2 P3 Florida EMR Repository paramedic in FL Experimental Setup • Total Applicable Policy Set evaluation – 1,2,4 and 8 rules/policy – 1,10, 100, 1000 and 10000 policies • PCA selection evaluation – 7 PCA’s, 2-10000 attributes/rule • Evaluation time – 1,2,4,and 8 rules/policy – 1,10,100, 1000 and 10000 policies Performance graph - 1 Performance graph - 2 Performance graph - 3 Performance graph - 4 Performance graph - 5 Performance graph - 6 Conclusion • Proposed a framework for dynamically changing the PCA • Selecting the applicable policies in a dynamic and efficient manner • Included modularity in policies • Add/remove specialized policies dynamically Questions/Comments? Computational Techniques for Increasing PKI Policy Comprehension by Human Analysts ∗ Gabriel A. Weaver Scott Rea Sean W. Smith Dartmouth Computer Science Dartmouth Computer Science Dartmouth Computer Science Department Department Department Sudikoff Lab: HB 6211 Sudikoff Lab: HB 6211 Sudikoff Lab: HB 6211 Hanover, NH 03755 Hanover, NH 03755 Hanover, NH 03755 gweave01@cs.dartmouth.edu scott.rea@dartmouth.edu sws@cs.dartmouth.edu ABSTRACT 1.1 Human Analysts and PKI Policy. Natural-language policies found in X.509 PKI describe an or- Information security policies describe an organization’s re- ganization’s stated policy as a set of requirements for trust. quirements for protecting their computational and informa- The widespread use of X.509 underscores the importance tional assets. In X.509 Public Key Infrastructure (PKI), a of understanding these requirements. Although many re- natural-language certificate policy (CP) is a type of infor- view processes are defined in terms of the semantic struc- mation security policy that documents an organization’s set ture of these policies, human analysts are confined to work- of requirements for trust; furthermore, a Certification Prac- ing with page-oriented PDF texts. Our research accelerates tice Statement (CPS) is a natural-language document that PKI operations by enabling machines to translate between describes how the CP is implemented. policy page numbers and policy reference structure. Adapt- As part of the operation of PKI, human policy analysts ing technologies supporting the analysis of Classical texts, must regularly retrieve, review and work with certificate we introduce two new tools. Our Vertical Variance Reporter policies and the corresponding CPS documents. Often, pol- helps analysts efficiently compare the reference structure of icy review processes (such as audits, grid accreditation, and two policies. Our Citation-Aware HTML enables machines bridging) involve comparing a policy or practice statement to process human-readable displays of policies in terms of under consideration against a trusted or accredited one. Dur- this reference structure. We evaluate these contributions in ing this process, analysts perform several operations on these terms of real-world feedback and observations from organi- natural language texts. zations that audit or accredit policies. • Finding and retrieving policies, in practice, is time- consuming and tedious. For instance, in the Interna- Categories and Subject Descriptors tional Grid Trust Federation (IGTF), although there is a formal distribution of accredited CAs, their corre- D.2.1 [Software Engineering]: Methodologies; sponding policies documents are not referenced in the D.2.8 [Software Engineering]: Metrics distribution metadata. Instead, analysts must man- ually browse each CA’s website (which isn’t always listed in the metadata), locate the policy and/or prac- General Terms tice statement, and download it. Management, Security, Standardization • Policy comparison requires the analyst to compare sections of one policy or practice statement (e.g. “1.1,” Keywords “3.2.1”) with the corresponding sections in another; in theory, these sections should match, but in practice PKI; Certificate Policy Formalization; XML often do not (and may be missing or moved). • Policy transform requires the analyst to manipu- 1. INTRODUCTION late the structure of one policy into another’s reference ∗ This work was supported by the NSF (under grant CNS-0448499). The views structure (e.g., RFC 2527 or RFC 3647); again, in the- and conclusions contained in this document are those of the authors and should ory, all policies should match the RFC exactly, but in not be interpreted as necessarily representing the official policies, either ex- pressed or implied, of any of the sponsors. practice they do not. • Policy mapping requires a combination of policy com- parison and policy transform to determine the equiv- alency of policies and practices within two different Permission to make digital or hard copies of all or part of this work for PKIs. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies • In compliance evaluation, the analyst examines how bear this notice and the full citation on the first page. To copy otherwise, to well issued certificates comply with relevant sections republish, to post on servers or to redistribute to lists, requires prior specific of policy. For example, do certificates that have been permission and/or a fee. IDTrust ’10, April 13-15, 2010, Gaithersburg, MD issued to authenticate to the grid comply with a can- Copyright c 2010 ACM ISBN 978-1-60558-895-7/10/04. . . $10.00. didate policy? 51 • Content disambiguation requires the analyst to an- These tools, in combination with our prior work, provide notate words and phrases in policy with the specific better quality, reproducible, and reliable data upon which senses with which they are used. For example, ’rea- policy auditors can base their trust decisions. Figure 2 sonable’ has a specific legal meaning in Dutch law but sketches how we envision these contributions transforming not in English law—this caused confusion among pol- PKI policy operations. icy auditors in the European Union Grid Policy Man- agement Authority (EUGridPMA). 1.3 This Paper In Section 2 we describe a set of principles and technolo- Currently, these review processes are done manually, tak- gies from the Classics that directly inform our research on ing much time and effort. An obstacle hindering all of them PKI policy. Section 3 presents motivation: real-world feed- is the fact that the processes are all defined in terms of back and observations from organizations—like the FPKIPA- the underlying semantic reference structure of the policies— CPWG, EuGridPMA, and TAGPMA—that audit or ac- but human analysts are instead confined to working with credit policies. In Section 4 we describe the design and im- the page-oriented PDF text—which may or may not match plementation of our Vertical Variance Reporter and Citation- the reference structure. Auditors therefore must manually Aware HTML—and also discuss the next tools we plan to translate, in their heads, between policy page numbers and build. Section 5 gives an experimental evaluation of our Ver- the reference structure in order to do these operations. This tical Variance Reporter and describes the design of several forces these operations to be largely manual and/or operate applications that leverage the properties of our Citation- on the entire document. Figure 1 sketches this situation. Aware HTML. Section 6 reviews relevant work. Section 7 1.2 Our Vision describes future research directions building upon this work, and Section 8 concludes. Our overarching research vision is to accelerate PKI pol- icy operations by building automated tools to eliminate slow and error-prone manual processes. In addition to our team’s 2. MAPPING CLASSICAL TECHNOLOGIES real-world PKI operations experience, we also bring a secret TO PKI weapon: experience in building automated tools to assist Our work adapts technologies from the Classics to con- classics scholars in overcoming a similar obstacle: doing struct computational tools that accelerate traditionally, semantic analysis on page-navigable reference works [20]. exclusively-manual PKI policy operations. PKI policies are (In this earlier paper, we helped apply simple clustering reference works. Analysts need to be able to align policy sec- algorithms and text-mining techniques to empirically illus- tions for comparison. Section 5 of RFC 2527 and Section 6 trate how Homeric scholia (scholary comments written in of RFC 3647 effectively define a canonical structure for Cer- manuscripts) were transmitted, arguably rewriting the past tificate Policies (CP) and Certification Practices Statements 200 years of theory regarding their transmission.) (CPS) for authors and users to understand the meaning and As a first step towards achieving this vision, we applied scope of these texts. the Canonical Text Services (CTS) Protocol (a tool we used Traditionally, PKI policy operations require analysts to in classics work [19]) to construct the PKI Policy Repos- manually align policy sections for comparison. However, we itory [18]. Our PKI Policy Repository solved the policy can regard these natural language texts as reference works, retrieval problem. Before, analysts had to manually find with canonical structures for authors and users to under- and then browse each CA’s website. Using the repository, stand the meaning and scope of these texts. (e.g., Section analysts request an arbitrary fragment of policy, the re- 5 of RFC 2527 and Section 6 of RFC 3647 define the struc- quest is encoded as a CTS-URN [10](a hierarchical, machine- ture for Certificate Policies (CP) and Certification Practices actionable, human readable reference string), and the appro- Statements (CPS).) priate passage is retrieved. Using this machine-actionable Prior work in the classics (to which we contributed, in reference framework, we reduced the time to aggregate data fact) provides technologies to help with analogous tasks for for CP comparison by up to 94% (Policy Reporter) and re- the natural language texts that field studies. We can build duced the time to map policies from hours to seconds (Policy on these technologies to solve our PKI problem. In this sec- Mapper). tion, we review some principal building blocks the Classics In this current paper, we report on further progress in gives us: achieving this resarch vision. In particular, we focus on the human-computer semantic gap between the machine repre- • a data model for canonical texts sentation of PKI policies (structured by page) and the ways • a historical distinction between physical navigation and in which policy analysts interact with policy (structured by logical reference, and reference scheme). We contribute tools and techniques that use computation to help analysts efficiently compare and • a methodology for working with multiple editions of browse policies: the same work. • Our Vertical Variance Reporter computes and reports 2.1 A Data Model for Canonically Cited Texts differences in the reference structure of two policies. Both theoretical work and hands-on experience with dig- • Our Citation-Aware HTML enables machines to search, ital texts in the Classics (e.g. Homer and Archimedes [17]) to style, and to process human-readable displays of over the past twenty years [11] [9] led us to propose in our policy in terms of this reference structure. previous Classics work [20] that all canonically cited texts possess four properties: We also discuss the tools we plan to build next in order to complete the vision. 1. citable units of a text are ordered 52 CA CA P Website Website o Comparison CA CA P 1. Find & retrieve policies l Website Website o 2. Open policies i l 3. Navigate policies by c Transform i page and find y c sections y 4. Perform policy O 1 3 operations 4 p Mapping V e i r e a Compliance PDF 2 w t Evaluation e Policy i Analyst Content r o n Disambiguation s Computer Policy Operations Semantic Gap Figure 1: Policy analysts operate on PKI policies by their reference structure, but machine representations of policy like PDF are organized by page. This imposes a semantic gap, forcing policy operations to be largely manual. Previously PKI Policy Policy Comparison Implemented Repository Reporter Vertical Variance New, Reporter Unimplemented P New, P o Implemented o Find and Retrieve Transform/Mapping l l CTS Protocol i i PolicyMapper 1 c Citation-Aware c y 1. Request policies HTML y 2. Perform policy Compliance I O operations 2 Evaluation Feedback n 3 p 3. View reports and Loop t 1. Retrieve policies e interfaces e 2. Perform policy r r operations a Content f 3. Generate reports and t Disambiguation interfaces a i Lexicon of c o Terms e n s Policy s Analyst View Policy Citation-Aware HTML Computer Human Policy Operations Figure 2: Our representation of policy allows man and machine to directly operate on PKI policy, resulting in reproducible, more reliable data for helping analysts make policy decisions. Please note that previously implemented items were either built or adapted by us. 53 2. citable units of a text are organized in a (possibly flat) Combining the above properties of canonically cited texts hierarchy with a citation by logical reference provides Classical schol- ars with a framework to analyze multiple editions of a text. 3. versions of a text are related to a notional text in a Versions of a text are related to a notional text (the work) conceptual hierarchy in a conceptual hierarchy. For example, the various trans- lations and editions of Homer’s Odyssey can be viewed as 4. citable units may include mixed content descendants of a notional work. Although versions may dif- The Canonical Text Services (CTS) library encodes this data fer, they share (more or less) a common logical reference model for canonical texts. Our CTS Protocol [19] defines an structure. Book 9 of the Odyssey contains Odysseus’ adven- HTTP protocol in terms of this data model for referencing tures with the cyclops Polyphemus regardless of the edition and retrieving arbitrary passages of a text. or translation. Our initial work applying Classics tools to PKI contributed Classical scholars also realized that editions may contain the PKI Policy Repository, consisting of a CTS server loaded slight variations both in logical reference structure, and in with validated, XML PKI policies. We encoded PKI policies textual content. To address these problems, Nagy intro- using Text Encoding Initiative (TEI) P5 Lite, an XML stan- duced the concepts of vertical variance and horizontal vari- dard for representing texts in digital form [2]. Like previous ance, distinguishing between differences in structure and efforts to encode policies using XML [5] [4], we modeled a content respectively [14]. security policy as a tree. This tree corresponded directly to In PKI operations, we can view the RFC 2527 and RFC both the hierarchy in the second property of our data model 3647 policy formats as notional works according to which for canonically cited texts and the outline of provisions in individual CAs author editions. Like Classical scholars, pol- Section 5 of RFC 2527 [7] and Section 6 of RFC 3647 [8]. icy analysts analyze multiple editions of a text using a com- Given a policy’s text, we only mark up this hierarchical ref- mon set of logical reference coordinates. Furthermore, dif- erence structure. By keeping the markup light, we reduce ferent editions may differ in terms of structure or textual the complexity of encoding a policy. content. Like passages in Homer, PKI policy sections may be added or deleted over time. Unlike Homer however, PKI 2.2 Physical Navigation and Logical Reference policy passages are identified not just by passage reference The Classics also teaches us the important distinction be- (e.g., “(9)”) but also by headers that describe the purpose tween physical navigation and logical reference. Originally, of the section (e.g., “Other Business and Legal Matters”). when texts such as Homer appeared on manuscripts (MSS), Therefore, passage reference does not necessarily correlate one could reference individual books or lines of the poem, with section semantics. (This would be like Polyphemus but resolving the reference to a passage of text required the cyclops occurring in Book 6 rather than Book 9 of the manually flipping through the physical MSS folios. With Odyssey!) Headers may be relocated and paired with a dif- the arrival of the book (as opposed to manuscript), the page ferent passage reference, identifying a different but semanti- number and table of contents enabled scholars to quickly re- cally equivalent section to the corresponding section in the solve logical references (such as “Book 9 of the Odyssey”) to canonical reference structure. physical pages for that particular printing. However, over To address these problems in PKI, we developed the Ver- time these tools for physical navigation were used as a cita- tical Variance Reporter to compute and report vertical vari- tion mechanism [15]. Disciplines outside of the Classics and ance between multiple editions of a policy under these con- law, who stuck with logical citation schemes, began citing ditions, enabling policy analysts to see the mapping between works in terms of the page. For examples, professors who two policies’ reference structures. reference pages rather than logical sections in their syllabi must update their syllabus if the textbook edition or print- 3. REAL-WORLD MOTIVATION ing changes. CTS advances the historical evolution of text, enabling people and processes to retrieve and navigate texts 3.1 Feedback by their logical structure. In our prior PKI policy tool work, we developed the PKI Once policy analysts can use computers to retrieve pas- Policy Repository, Policy Reporter, and Policy Builder. When sages by logical citation, they are no longer required to man- we presented these tools to the FPKIPA-CPWG, EuGridPMA, ually translate, in their heads, between policy page numbers and TAGPMA, these organizations gave us feedback. and the reference structure used by many policy operations. Many analysts agreed that a policy repository was desir- In actual practice policies are represented as untagged PDFs able for finding policies, understanding the actual content that are structured according to the page. Even services of real-world policies, and dynamically creating new policies such as Google books do not allow one to explicitly retrieve from previously-accredited, well-understood policies. How- or search within a specific section of a text. ever, they cited three major obstacles preventing the adop- Our overall research vision frees the analyst to continu- tion of our approach: encoding speed, policy variation, and ally work in logical reference coordinates whether retriev- display quality. This current paper contributes solutions to ing, comparing, or mapping a certificate policy. Transla- the last two concerns as part of a larger strategy to increase tion from these logical coordinates to a physical coordinate encoding speed—and discusses our plan to eliminate the re- scheme (byte offsets in a file) is outsourced to the computer. maining obstacle. Since the computer can perform this translation, many pol- icy operations can also be augmented with computational • Encoding Speed. Based upon our prior evaluation of tools. the Policy Reporter, we could encode a policy in 4-6 hours by copying and pasting policy content from a 2.3 Working with Multiple Editions PDF into a TEI-XML file. 54 • Policy Variation. Once a policy was encoded and loaded 4. OUR COMPUTATIONAL TOOLS into the PKI Policy Repository, analysts could retrieve As noted above, the policy analysts at the FPKIPA-CPWG, and run analyses on multiple editions of one or more EUGridPMA, and TAGPMA cited three major obstacles to policy sections, expressed as a set of passage references. our prior contribution: encoding speed, policy variation, and However, this approach implicitly assumed that pas- display quality. We now discuss the tools we built (and the sage reference correlate to section semantics. In the tools are still building) to address these obstacles—and fur- real-world, headers may be relocated and paired with ther manual bottlenecks we perceive. a different passage reference, identifying a different but semantically-equivalent section to that listed in RFC 4.1 Completed Tools 2527 or 3647. Analysts urged us to generalize our ap- proach to handle the relocation of headers. 4.1.1 Vertical Variance Reporter Our Vertical Variance Reporter addresses the practitioner • Display Quality. Our PKI Policy Repository is primar- community’s concern over policy variation. ily a service for computer programs; analysts wanted In order to determine the actual reference structure of a more human-friendly display of our XML policies. a policy rather than imposing an idealized, trusted structure Paragraphs, images, and tables needed to be clearly such as RFC 2527 or RFC 3647, we extract section identifiers displayed. Although analysts saw the potential of aug- (passage references and their corresponding headers) from menting their policy operations with computational its table of contents. Parsing relies upon a library of regular tools, they required a way to view the XML policy expressions we built to parse common formats for tables of using the traditional typographical conventions that contents. Iterating through these sections, we output a list reflect policy structure (for example, using different of section identifiers for the Vertical Variance Reporter. sized fonts to denote sections and subsections of a pol- Our Vertical Variance Reporter takes two lists of section icy). identifiers as input and computes a mapping between the two that preserves semantic-equivalence. Think about the 3.2 Observations section identifiers in the policy under consideration as being In addition to gaining feedback from our work, attending mapped, by some unknown function, to the section identi- meetings of these accrediting organizations allowed us to di- fiers in the accredited policy. We want a way to automati- rectly observe presentations, discussions, and business pro- cally discover and then calculate this function (or at least a cedures which would benefit from our computational frame- good approximation thereof; the human can do the rest). work once it could accommodate vertical variance and pro- To do this, we use one of the secret weapons inspired by vide a better human interface for browsing policies. the Classical notion of vertical variance: a confusion matrix Policy analysts manually align policy provisions before built using the Levenshtein metric for semantic distance. 1 they can compare their content. However, the real world The Vertical Variance Reporter first records the distance be- makes this task harder than one expects. Sometimes a policy tween section headers in the source and target policies. Our under consideration contains additional sections that do not tool then processes the confusion matrix to report a bidi- map to the trusted or accredited policy. Furthermore, such rectional mapping, classifying policy sections as matched, non-standard sections may contradict statements made in relocated, or unmapped.2 In the next few paragraphs, we other, standard sections of policy (analysts at the FPKIPA- provide more details about how we compute the confusion CPWG call this the whitespace problem). Such contradic- matrix and then use it to infer a mapping. tions, if present in an accredited policy, increase the risk We use a confusion matrix to (1) detect passage references accepted by an accrediting organization. However, a tool in the trusted or accredited policy that are missing from the that measured the vertical variance of a policy would allow policy under consideration, (2) identify sections in the pol- analysts to quickly identify non-standard sections of a can- icy under consideration whose headers are within epsilon didate policy where these contradictions are likely to occur. of a section header (via the Levenshtein distance) from the Analysts’ current approaches to finding, searching, anno- accredited policy, and (3) identify sections in the policy un- tating, and evaluating policies could be accelerated with bet- der consideration which are further than epsilon away from ter human interfaces for browsing policies. Although the any of the target policy headers. The rows of the confu- IGTF provides a formal distribution of accredited CAs, the sion matrix are indexed by the possible passage references corresponding policies themselves are not referenced in the within source policy given the target. These index values distribution metadata. Analysts searching for terms over directly correspond to the passage references in the target the entire text of a PDF policy complained that one could policy which are used to index columns. not restrict the search space to a particular section or range Our tool computes the confusion matrix by iterating over of sections. Analysts manually generate matrices consist- each of the passage references in the target policy and first ing of policy sections and comments—so a framework that testing whether it is enumerated in the source policy sec- supported annotation of policy would allow them to dynam- tion list. If the target passage reference does not appear in ically generate these comparison matrices. the source list, a −1 is recorded in the confusion matrix for Researchers at Trinity College, Dublin presented a suite the entire row. If the source section list does contain the of unit tests for measuring the validity of a certificate rela- target passage reference, then we calculate the Levenshtein tive to a policy [3]; we saw the potential for combining these distance between the target header for the current target automated tools with our suite of policy creation and anal- passage reference and each of the headers in the source. Re- ysis tools for allowing policy analysts, both non-technical 1 We use the Levenshtein distance but another metric could be used instead. and technically-inclined, to experiment with how modifying 2 It should be noted that this technique may prove useful in clustering documents a policy’s text impacts certificate validity. based upon their reference structure. 55 sults are recorded in a two dimensional matrix where rows for their eventual encoding in TEI-XML. Furthermore, our correspond to possible passage references within a source pol- technique lends itself to several policy-browsing applications icy given the target policy and columns correspond to the whose design we discuss below. target policy’s passage references. The Vertical Variance Reporter infers a mapping from two 4.2 Tools Still Under Development confusion matrices, one comparing sections in the source to those in the target, the other comparing sections in the tar- 4.2.1 Policy Encoding Toolchain get to those in the source. In this way, we obtain (1) a list We are addressing the practitioner community’s concern of omitted target references, (2) a list of matched source over encoding speed with our Policy Encoding Toolchain. En- headers (identified by passage reference), and (3) a list of coding a PDF policy with our Policy Encoding Toolchain unmatched source headers. From the target-to-source ma- requires the following three steps: (1) use Google Docs to trix, we obtain a list of additional source references, a list generate Google’s OCR HTML output for a given PDF pol- of matched target headers, and a list of unmatched target icy, (2) parse this HTML to generate a TEI-XML encoding headers. By processing these lists our tool is able to classify as well as CSS styling information, and (3) generate a high- a section as mapped or unmapped. Mapped sections may quality, human-readable view of the policy that faithfully be exact matches where the passage references in source and recreates the typography seen in Google’s OCR HTML. target are equal and the Levenshtein distance is 1, fuzzy Extracting section lists from a policy’s table of contents as matches where the passage references may be different or well as generating Citation-Aware HTML are both compo- (inclusive) the Levenshtein distance exceeds a threshold (we nents of our toolchain that have value in and of themselves. used 0.90). Source sections may be unmapped because their In order to generate TEI-XML from Google’s HTML, we passage reference is not present in the target document and must be able to generate a list of sections describing the their headers fail to match (additional sections) or simply be- reference structure we are trying to represent. Our Verti- cause their headers failed to match any of the target headers cal Variance Reporter compares the vertical variance of two (unmatched sections). Table 1 (located at the end of this policies, allowing us to evaluate the quality of the encoding paper) shows and discusses excerpts of reports generated by of a policy using a given list of section headers. However, our Vertical Variance Reporter. this same tool is also useful to policy analysts in comparing a policy under consideration to a trusted or accredited policy. 4.1.2 Citation-Aware HTML Our Citation-Aware HTML is a product of our envisioned In order to address the practitioner community’s concern toolchain. However, this same format has independent util- over display quality. we developed Citation-Aware HTML, ity as a key component of several of our policy browsing which makes it possible for human analysts to search, to applications which we will now describe. style, and in general to manipulate policy in the browser according to logical reference, 4.2.2 Policy Browsing Given a list of section identifiers, we use Lucene [12] to Policy-browsing applications based upon our Citation-Aware index and search Google’s OCR HTML for the correspond- HTML include a search utility for finding policies or search- ing byte offset at which the section begins.3 Our HTML ing within arbitrary sections of policy, a policy annotation generation process then iterates through these locations, ex- framework generalizing the idea of using typographical cues tracting the textual content contained between the start of (font size, color, etc) to reflect policy structure, and a policy the section and the next successfully-translated section (or feedback loop for dynamic certificate validation which relies end of file). upon the bijective mapping between HTML and TEI-XML. Citation-Aware HTML classifies HTML elements using CTS-URNs via the class attribute and thereby relates the Citation-Aware Searching. content spanned by those elements to a policy’s reference Since the class attribute of each citation node is annotated scheme via machine-actionable reference. Our Citation-Aware with its corresponding CTS-URN, search engines that in- HTML, like TEI-XML representations of policy, encodes the dex Citation-Aware HTML should, in theory, be CTS-URN hierarchy of citable units within a policy. An important aware. This means that one could search for all IGTF poli- consequence of this is that the mapping of citation nodes cies, all policies from a particular CA, a particular version of (citable units represented by the Document Object Model, a policy, or a particular passage of a policy by searching for a DOM) between TEI-XML and HTML is bijective: changes particular CTS-URN. At the very least, retrieval of a partic- to any citation node in either format can be mirrored in the ular edition should be possible since Citation-Aware HTML other since one can generate either format by processing the contains a URN in its page metadata. Just as one can use other. geographic coordinates to restrict a search to a particular Our Citation-Aware HTML format allows humans to view region, so can one use CTS-URNs as textual coordinates to text using traditional typographical conventions that reflect restrict a search to a particular region of text. policy structure while gaining the benefits of navigation by logical reference. Although this technique could be applied Policy Annotation Framework. to any HTML document, parsing Google’s OCR allows us Although Google’s OCR HTML styles content to mimic to extract CSS styling information so that eventually we page typography, for applications like annotating policy, our can maintain the typographical conventions in the original Citation-Aware HTML enables one to style content with re- PDF policy. This will allow us to faithfully reproduce the spect to its reference scheme. For example, auditors could display of paragraphs, lists, and tables and may be useful highlight various policy sections to indicate the presence of 3 Note that we are using Lucene to translate a logical reference coordinate system an annotation.4 Alternatively, auditors could just color-code to a physical coordinate system (bytes) for our machine representation (HTML file). 4 These annotations could be mined and presented in a matrix. 56 policy sections to indicate the various levels of compliance or issues that need further review. The Vertical Variance Reporter addresses the need to be able to understand how the structure of policies differs so Policy Feedback Loop. that one can quickly determine which sections of a policy Our Policy-Driven Feedback Loop allows analysts to em- under consideration can be compared to an accredited or pirically explore the effect that changing a policy would have trusted policy. In this section, we discuss results from exper- on an actual PKI infrastructure. Figure 3 illustrates our de- imental evaluations of how the section identifier extraction sign that would enable policy analysts to iteratively evaluate process affects the ability to infer a policy mapping between the effects of changing policy on certificate validity. First, source and target policies. During the discussion of results, policy analysts issue a request for a passage of policy against we will also mention how this tool relates to the feedback which to check the validity of a corpus of certificates. Using and observations from real-world policy analysts. a CTS GetPassage request, the corresponding TEI-XML is retrieved and used to generate a suite of unit tests. The test 5.1.1 Parsing Sections from Tables of Contents results are then presented by controlling the styling of our The Vertical Variance Reporter computes a semantics- Citation-Aware HTML for the requested policy passage. For preserving mapping between two lists of section identifiers. example, the RFC 2119 significance level of violated policy Our main technique for generating these lists is to parse the assertions could be indicated with different colors, the num- table of contents for a policy in Google’s OCR HTML out- ber of certificates failing to comply with an assertion could put. In order to make claims on how well the reference struc- be indicated by font size. Policy writers could then adjust ture described in a policy’s table of contents (TOC) maps to the required value or significance of a policy assertion and a target reference structure (such as RFC 3647), we need to POST the updated HTML. Since the mapping between TEI- be sure that we can correctly extract section identifiers from XML and HTML citation nodes is bijective we can construct table of contents formatted in Google’s OCR HTML. In the a feedback loop: the HTML citation nodes can be used to first evalution, we chose 10 policies, generated Google’s OCR recover the XML. New unit tests can then be generated and HTML, extracted their tables of contents, and parsed them new results presented back to the analyst. for section identifiers. (As noted earlier, we are currently The feedback loop depends upon enriching the reference building a tool to automate this encoding process.) model for policy with assertions on certificate content. Rather Table 2 shows results for the final step: parsing section than hand-coding unit tests for every new version of a pol- identifiers from tables of contents. icy, we hand tag the expected value, relation, and signif- As one can see, parsing the table of contents of these poli- icance of each machine-enforceable policy statement once cies takes only seconds and we successfully extract every within the TEI-XML. Our previously-developed RFC 2119 header contained therein. It should be noted that the ex- analysis tool leveraged the well-defined semantics of MUST, tracted headers may contain minor artifacts from the extrac- SHALL, and OPTIONAL. Since these words are technical tion process such as rogue page numbers and page headers. terms, we were able to process occurrences of these words These artifacts can be easily fixed either with some quick as tokens with a specific meaning. Similarly, by enriching manual editing or global find and replace. The results of our reference model with a representation for assertions on Evaluation 1 allow us to say that our section lists, accurately certificate content, we hope to gradually develop a lexicon reflect the policy structure described in a policy’s table of of technical terms for disambiguating content and gradually contents. make larger and larger portions of human-readable policy machine-actionable. 5.1.2 Computing Vertical Variance Using Tables of Using our extended policy representation, we walk the tree Contents of citation nodes of the requested policy passage and gener- The second evaluation uses our Vertical Variance Reporter ate a unit-test suite, much as a compiler walks an Abstract to compute the vertical variance between the same 10 source Syntax Tree (AST). The expected value, relation, and sig- policies and the structure of RFC 2527 or RFC 3647 depend- nificance encoded by our model of assertions, are treated as ing upon the source policy. We use the section lists derived parameters for generating each unit test. Each citable as- from the tables of contents. This evaluation allows us to see sertion results in the generation of a unit test whose name how well the documented structure of a source policy maps encodes its corresponding citation node and significance. to the RFC standard. Results are presented in Table 3. The unit tests are executed, results interpreted, and used Looking at the results we see that the AustrianGrid ta- to generate a CSS style to be included in the Citation- ble of contents’ closely follows RFC 3647 (containing 267 Aware HTML for the requested passage. Policy analysts of the 270 RFC sections) while the TACC Root policy ap- may change the values in the assertions, choosing terms from pears to be missing many sections (containing 67 of those a controlled vocabulary derived from our lexicon. 270 sections). Looking at the ULAGrid policy we see that it contains 271 citable units whereas RFC 3647 only con- 5. EVALUATION tains 270. This indicates an additional section which the In this section we present empirical and anecdotal ev- report will identify. This kind of information is a useful first idence to argue that our Vertical Variance Reporter and step for solving the whitespace problem; it identifies sections Citation-Aware HTML tools satisfy many of the require- to policy analysts that are non-standard and therefore may ments inspired by feedback and observations from real-world contain potentially contradictory information. Our mapping policy analysts. (As noted earlier, our other tools are still from the Austrian Grid TOC to RFC 3647 shows that 260 in development.) out of 267 citable units were successfully mapped and that the other 7 units were classified as unmapped. Only 65 of 5.1 Vertical Variance Reporter the already-reduced 67 sections in the table of contents for 57 Dynamic Policy Evaluation Generate TEI-XML Policy Unit Tests Recover Run Unit Certificate Corpus XML Tests Citation-Aware Format Test HTML Results Update Assertion Model Figure 3: Dynamic Policy Evaluation will allow the policy analyst to treat Citation-Aware HTML policies as a form for configuring a certificate policy validation engine. Results of testing the modified policy against a corpus of certificates will be highlighted within the submitted text according to degree of compliance and significance of policy assertion. TACC Root, actually corresponded to sections seen in RFC the TACC Root policy, with only 67 sections inventoried 3647. Notice that the mapping from RFC 3647 to Austrian remained unchanged. On the flip side of the coin, the Aus- Grid is consistent with its inverse, indicating that we are trian Grid policy, with only 3 fewer sections than that of mapping the same 260 citable units in both directions. RFC 3647 also remained unchanged. It should be noted that in general, inferring all mappings took between 9 and 45 seconds. Generating enhanced section lists took between 5.1.3 Computing Vertical Variance Using Enhanced 8 and 76 seconds depending upon the size of the section list Section Lists to be augmented. We ran our evaluations on a MacBook Evaluation 3 uses additional sources of information to in- Pro running MacOS 10.5 on a 2.33 GHz Intel Core 2 Duo crease the size of the source section list which we will refer processor and 2 GB 667 MHz of DDR2 SDRAM. to as T OC+. Increasing our section lists is necessary since the tables of contents of some policies do not contain all of the sections actually contained in the policy. In Table 3, we see that the DFN-PKI 2.2 policy only contains 79 out of 5.1.4 Comparing Enhanced Section Lists to Ground 270 possible sections from RFC 3647. However, looking at Truth the policy text, one sees several sections which its table of Evaluation 4 uses a ground-truth list of policy headers to contents does not enumerate. Because of this, we paired un- generate results as in Evaluations 2 and 3. We manually matched passage references from Evaluation 2 with section went through each policy and compiled a list of headers in headers from the target policy, searched for them within our the actual CP or CPS. We then ran the Vertical Variance source policy, and if the search returned a unique hit, folded Reporter to infer a mapping between our ground truth lists them into our source section header list. Table 3 shows re- (GroundT ruth) and our enhanced section header lists, al- sults of this experiment. lowing us to quantify how well we approximate actual policy Looking at the results, we see that in some cases, this structure. Table 4 shows results of this experiment. technique increased the size of the enhanced section lists Our results in Table 5 indicate that headers extracted us- (|T OC + |). DFN-PKI 2.2 went from having 79 citable units ing our enhanced section list methodology (|T OC + |) ap- to 203 citable units. TACC-MICS’ policy went from 151 proximated the actual structure of policies in our corpus citable units to 270 citable units. This was because TACC- with 90.9% to 100% accuracy. Most policies follow the MICS’ policy did not enumerate level 3 citation nodes (e.g. standard format described in RFC 2527 and RFC 3647. “1.3.2”) but only levels 1 and 2 (e.g. “1”, “1.3” respectively). The FBCA CP was an exception as it contained 28 non- Many of these newly-inventoried sections could be resolved standard provisions with citation depth 4. For example, to an RFC 3647 section: 200 of the 203 citation nodes in the Section 6.2.3.4 is found in FBCA CP but is not found in DFN-PKI 2.2 policy could be mapped to RFC 3647. How- RFC 3647. If one considers only provisions between depths ever, some policies did not benefit at all from this approach, 1-3 inclusive, then we successfully identify between 97.8% 58 and 100% of all actual provisions. Furthermore, we were wick, and Sasse developed a controlled vocabulary for con- able to map our |T OC + | headers to 89.0% to 99.6% of all figuring access control policies expressed in XML [6]. Our GroundT ruth headers. work takes a similar approach, encoding select portions of natural language PKI policies, and deriving a controlled vo- 5.2 Citation-Aware HTML cabulary from a lexicon of observed words and phrases. As discussed earlier, we developed Citation-Aware HTML Our work builds upon established standards and mature in direct response to real-world feedback on our PKI Policy technologies. TEI P5 [2] represents 15 years of research in Repository. In direct feedback, analysts wanted a human- encoding texts with XML. The CTS Protocol [19] has been friendly display of XML policies with paragraphs, images, in development for 5 years and is based upon over 20 years and tables within the policies preserved and presented. In of experience [9] in computing with a variety of digitized observing policy organizations, we also saw the potential to texts.5 use better human interfaces for browsing policies to acceler- ate and improve the process of searching, annotating, and 7. FUTURE WORK evaluating policies. Using our tools to quantify vertical variance and browse 5.2.1 Addressing Feedback policy in terms of its underlying structure, we will build an Our Citation-Aware HTML gives policy analysts a more IGTF PKI Repository based upon the policies in its dis- human-friendly display of XML policies with the potential to tribution. Using confusion matrices we will quantify the exactly replicate the presentational results of Google’s OCR structural variance in the IGTF’s policies. Knowing which output. Currently, we have a basic algorithm for encoding sections of policy are semantically comparable, we will then paragraphs. Given that Google does not display embedded be able to quantify their horizontal variance. images or explicitly encode tables in their OCR output, we Two approaches we will employ in quantifying horizontal will hand code image references. The display of paragraph, variance include adding structure to our TEI-XML editions lists and tables will be preserved through styling informa- of policy, and using text mining, much as we did in [20], to tion which we extract from Google’s OCR. However, should identify patterns in content with respect to a text’s struc- individual rows or cells of a table need to be referenced and ture. Extending our markup with other data structures, retrieved by machine, then hand coding their semantic struc- such as assertions, represents a general approach. Most peo- ture within the TEI-XML will become necessary. It should ple roughly agree upon the reference structure of a policy. be noted in spite of these limitations, we expect that using The data models arising from interpreting the text varies our Policy Encoding Toolchain to generate XML for most of greatly. We intend to continue to make content machine- the policy combined with manual encoding of images and actionable by extending our markup to include structures tables as needed, will significantly reduce policy encoding of interest and to document content values in a machine- speed. actionable lexicon. However, our approach also enables us to use textual content alone to extract topics relevant to 5.2.2 Leveraging Observations: trust decisions. With the IGTF repository, we will train Our design descriptions for Policy-Aware Searching, a Pol- classifiers to find all information in a document relevant to icy Annotation Framework, and a Policy Feedback Loop for a topic. This is of special interest to the FPKIPA-CPWG. Certificate Validation all rely upon key properties of Citation- Aware HTML to help analysts search, annotate, and evalu- 8. CONCLUSION ate policies. First, we classify citation nodes in the HTML The Vertical Variance Reporter and Citation-Aware HTML with CTS-URNs, a reference string whose semantics are are our solutions to challenges posed by real-world policy re- well-understood and that machines can process, whether to viewers all over the world. Our Vertical Variance Reporter index content for searching or style content according to allows analysts to quickly compare the reference structures some meaningful convention. Secondly, we leverage the sec- of two policies and find semantically-equivalent sections be- ond fundamental property of canonically cited texts to re- tween them. Our Citation-Aware HTML not only gives alize that the mapping between citation nodes in TEI-XML policy analysts a nicely-formatted view of policy but also and HTML is bijective. This allows us to create a dynamic allows us to create a variety of applications for searching, policy feedback loop that technical and non-technical policy annotating, and evaluating policy. By aligning the textual analysts can use to dynamically evaluate the consequences coordinate systems of man and machine, we have narrowed of changes in policy. the human-computer security policy gap. Given that human- judgement alone can actually weaken the effects of a security 6. RELATED WORK policy [16], we intend to continue exploring how computa- Semantic HTML and Semantic CSS advocates write HTML tional tools can support human judgements in the analysis and CSS that emphasizes the meaning of the text over its and enforcement of security policy. presentation [13]. Our Citation-Aware HTML subscribes to this philosophy but goes further by embedding URNs to as- 9. REFERENCES sociate semantics with page content. Additionally, others [1] Amit Agarwal. Perform OCR with Google Docs âĂŞ have recommended using Google OCR to convert PDF files Turn Images Into Editable Documents. Retrieved on into text [1]. November 20, 2009 from http://www.labnol.org/ The Policy-Driven Feedback Loop directly builds upon work internet/perform-ocr-with-google-docs/10059/. done by David O’Callaghan at Trinity College, Dublin [3]. 5 We used this experience in designing the CTS Protocol, requiring compatibility His work will provide us with target and source languages for with texts encoded in TEI, DocBook, or any other valid XML format encoding a our policy assertion to unit test compiler. Inglesant, Chad- citation scheme. 59 [2] L. Burnard and S. Bauman. TEI P5: Guidelines for and Data Mining: Examples Using the CITE electronic text encoding and interchange. Text Architecture. In Text Mining Services, page 129, 2009. Encoding Initiative Consortium. Retrieved July, 11:2008, 2007. [3] David O’ Callaghan. Automated Certificate Checks, 2009. [4] V. Casola, A. Mazzeo, N. Mazzocca, and M. Rak. An Innovative Policy-Based Cross Certification Methodology for Public Key Infrastructures. In EuroPKI, 2005. [5] V. Casola, A. Mazzeo, N. Mazzocca, and V. Vittorini. Policy Formalization to Combine Separate Systems into Larger Connected Network of Trust. In Net-Con, page 425, 2002. [6] David W. Chadwick and A. Sasse. The Virtuous Circle of Expressing Authorization Policies. In Semantic Web Policy Workshop, 2006. [7] S. Chokhani and W. Ford. RFC 2527: Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework, March 1999. [8] S. Chokhani, W. Ford, R. Sabett, C. Merrill, and S. Wu. RFC 3657: Internet X.509 Public Key Infrastructure Certificate Policy and Certification Practices Framework, November 2003. [9] Gregory Crane. The Perseus Digital Library. Retrieved May 29, 2009 from http://www.perseus.tufts.edu/hopper/. [10] D.Smith. CTS-URNs: Overview, December 2008. Retrieved May 29, 2009 from http://chs75.harvard. edu/projects/diginc/techpub/cts-urn-overview. [11] C. Dué, M. Ebbott, C. Blackwell, and D. Smith. The Homer Multitext Project, 2007. Retrieved May 29, 2009 from http://chs.harvard.edu/chs/homer_multitext. [12] Welcome to Lucene! Retrieved November 20, 2009 from http://lucene.apache.org/. [13] Antonio Lupetti. CSS coding: semantic approach in naming convention. Retrieved on November 20, 2009 from http://woork.blogspot.com/2008/11/ css-coding-semantic-approach-in-namin%g.html. [14] Gregory Nagy. Editing the Text: West’s Iliad. Homer’s Text and Language, pages 54–56, 2004. [15] L.D. Reynolds and N.G. Wilson. Scribes and scholars. Clarendon Press, 1967. [16] Stephanie A. Trudeau, Sara Sinclair, and Sean Smith. The Effects of Introspection on Creating Privacy Policy. In Workshop on Privacy in the Electronic Society, 2009. [17] G. Weaver. Semantic and Visual Encoding of Diagrams. Technical Report TR2009-654, Dartmouth College, Computer Science, Hanover, NH, August 2009. [18] G. Weaver, S. Rea, and S. Smith. A Computational Framework for Certificate Policy Operations. In Public Key Infrastructure: EuroPKI 2009. Springer-Verlag LNCS., 2009. To appear. [19] G. Weaver and D. Smith. Canonical Text Services (CTS). Retrieved May 29, 2009 from http://cts3.sourceforge.net/. [20] G. Weaver and D. Smith. Applying Domain Knowledge from Structured Citation Formats to Text 60 Mapping AustrianGrid Reff Section Class (3647 Reff, Score) S− > T 1.1 MATCH (1.1, 1.0) S− > T 4.9.2 MATCH (4.6.2, 0.92), (4.9.2, 1.0), (4.9.14, 0.94) S− > T 4.9.5 UNMATCHED na S− > T 6.2.10 UNMAPPED (6.2.11, 1.0) T− > S 6.2.11 ADDITIONAL na Passage Ref AustrianGrid Header 3647 Header 1.1 Overview Overview 4.6.2 Who may request renewal Who may request renewal 4.9.2 Who can request revocation Who can request revocation 4.9.5 Time within which CA must process the revocation request Time within which CA must procnal 4.9.14 Who can request suspension Who can request suspension 6.2.10 Cryptographic module rating Method of destroying private key 6.2.11 n/a Cryptographic Module Rating Table 1: Excerpts from a report quantifying the vertical variance of AustrianGrid versus RFC 3647. Row 1 shows that section 1.1 in the Austrian Grid policy exactly matches that of section 1.1 in RFC 3647. However, the mapping from Austrian Grid to RFC 3647 can be more complex. Section headers from the policy under consideration may be ambiguous or not correspond to the accredited policy as shown in rows 2 and 3. Section headers from the accredited policy may be missing in the policy under consideration (as Row 5 seems to indicate for 6.2.11) or relocated. However, looking at Row 4 indicates that section 6.2.11 was moved to section 6.2.10 in the Austrian Grid policy. Policy Version Time (s) Reff Misses AustrianGrid 1.2.0 4 0 DFN-PKI 2.1 2 0 DFN-PKI 2.2 2 0 FBCA 2.11 2 0 IRAN Grid 1.3 5 0 IRAN Grid 2.0 2 0 TACC-MICS 1.1 2 0 TACC-Classic 1.2 5 0 TACC-Root 1.2 2 0 ULAGrid 1.0.0 2 0 Table 2: Evaluation 1 shows how we we can parse tables of contents to get an inventory of policy sections. For each of the policies, we parse without missing any sections. This indicates that our section inventories accurately reflect the table of contents (TOC). TOC− >RFC RFC− >TOC TOC |T OC| : |T OC + | : |RF C| Mapped Unmapped |T OC| |T OC + | Mapped Unmapped |RF C| AustrianGrid 267:267:270 260 260 7 7 267 267 260 260 10 10 270 DFN-PKI-2.1 37:80:270 35 78 2 2 37 80 35 78 235 192 270 DFN-PKI-2.2 79:203:270 75 200 4 3 79 203 75 200 195 70 270 FBCA CP 281:281:270 242 245 39 36 281 281 242 245 28 25 270 IRAN-GRID-1.3 156:156:193 98 110 58 46 156 156 98 110 95 83 193 IRAN-GRID-2.0 273:273:270 264 264 9 9 273 273 264 264 6 6 270 TACC-MICS 1 1 151:191:270 149 190 2 1 151 191 149 190 121 80 270 TACC Classic1.2 266:270:270 258 264 8 6 266 270 258 264 12 6 270 TACC Root 1 2 67:67:270 65 65 2 2 67 67 65 65 205 205 270 ULAGrid 1 0 0 271:271:270 268 268 3 3 271 271 268 268 2 2 270 Table 3: Evaluations 2 and 3 show how well we can classify policy sections as mapped or unmapped. The second evaluation only uses sections from a policy’s table of contents (T OC), which the third evaulation uses an enriched list (T OC+). In 44 sections, we generate a report for the Austrian Grid that successfully identifies a mapping for 260 of the 267 sections in that policy. We added section headers from RFC 3647 to the headers parsed from DFN’s version 2.2 table of contents, resulting in mapping 200 rather than 75 sections. 61 GroundTruth− >TOC+ TOC+− >GroundTruth CP or CPS |GroundT ruth| : |T OC + | Mapped Unmapped |GroundT ruth| Mapped Unmapped |T OC + | AustrianGrid 267:267 265 2 267 265 2 267 DFN-PKI-2.1 80:80 79 1 80 79 1 80 DFN-PKI-2.2 207:203 201 6 207 201 2 203 FBCA CP 309:281 275 34 309 275 6 281 IRAN-GRID-1.3 157:156 145 12 157 145 11 156 IRAN-GRID-2.0 273:273 270 3 273 270 3 273 TACC-MICS 1 1 192:191 188 4 192 188 3 191 TACC Classic1.2 270:270 267 3 270 267 3 270 TACC Root 1 2 68:68 67 1 68 67 1 68 ULAGrid 1 0 0 271:271 270 1 271 270 1 271 Table 4: Evaluation 4 shows how well our method in Evaluation 3 approximates actual policy structure. Looking at TACC Root’s CP, we see that only 1 additional provision was identified by manual cataloging rather than automatic extraction. Similarly, only 4 more provisions were identified in DFN-PKI v.2.2. In general our approximation is quite good except for the FBCA CP in which 28, non-standard provisions with citation-depth 4 were identified (e.g. 1.6.2.1). CP or CPS |T OC + |/|GroundT ruth| |T OC + M apped|/|GroundT ruth| AustrianGrid 100% 99.3% DFN-PKI-2.1 100% 98.8% DFN-PKI-2.2 98.1% 97.1% FBCA CP 90.9% 89.0% IRAN-GRID-1.3 99.4% 92.4% IRAN-GRID-2.0 100% 98.9% TACC-MICS 1 1 99.5% 97.9% TACC Classic1.2 100% 98.9% TACC Root 1 2 100% 98.5% ULAGrid 1 0 0 100% 99.6% Table 5: Using the results in Table 4, we are able to see that our method in Evaluation 3 was able to identify between 90.9% and 100% of all actual provisions. Furthermore, we were able to map the |T OC + | headers to between 89.0% and 99.6% of all GroundT ruth headers. 62 Computational Techniques for Increasing PKI Policy Comprehension by Human Analysts Gabriel A. Weaver, Scott Rea, Sean W. Smith Dartmouth College 08/22/17 IDTrust 2010, Gaithersburg, MD 1 Introduction • PKI policies define expectations for trust • Policy review processes include – PKI compliance audit, – mapping for bridging, – and grid accreditation. 08/22/17 IDTrust 2010, Gaithersburg, MD 2 Our High-Level Goal 08/22/17 IDTrust 2010, Gaithersburg, MD 3 Our Contributions • We claim – A human-computer semantic gap forces PKI policy operations to be largely manual. • We bridge – That gap with computational tools to accelerate some of these operations based upon real-world feedback. • We propose – Future work to accelerate additional policy operations. 08/22/17 IDTrust 2010, Gaithersburg, MD 4 Semantic Gap | Tools | Future Work Surveying the Semantic Gap 08/22/17 IDTrust 2010, Gaithersburg, MD 5 Semantic Gap | Tools | Future Work The Problem Our Approach A Policy We Can BOTH Understand! Its cloudy in the human-computer semantic gap. Trust depends upon knowing what to expect. 08/22/17 IDTrust 2010, Gaithersburg, MD 6 Semantic Gap | Tools | Future Work Policy Analysis Operations 08/22/17 IDTrust 2010, Gaithersburg, MD 7 Semantic Gap | Tools | Future Work Building Tools for Policy Analysts 08/22/17 IDTrust 2010, Gaithersburg, MD 8 Semantic Gap | Tools | Future Work Bridging the Gap 08/22/17 IDTrust 2010, Gaithersburg, MD 9 Semantic Gap | Tools | Future Work Formalizing Certificate Policy We claim that computationally processing machine- actionable CP/CPSs is more efficient and consistent • Identification – CTS-URNs • Representation – TEI-XML encoding of reference structure (2527, 3647) 08/22/17 IDTrust 2010, Gaithersburg, MD 10 Semantic Gap | Tools | Future Work Tools for Today 08/22/17 IDTrust 2010, Gaithersburg, MD 11 Retrieval 08/22/17 IDTrust 2010, Gaithersburg, MD 12 Semantic Gap | Tools | Future Work PKI Policy Repository • Last year only a handful of policies • Feedback: – Needed more policies to be useful and prove viability • Response: – Today ~200 IGTF CP/CPSs – Beta version on Google AppEngine (slow but stable) • Demo! @ http://pkipolicy.appspot.com/ 08/22/17 IDTrust 2010, Gaithersburg, MD 13 Semantic Gap | Tools | Future Work Comparison 08/22/17 IDTrust 2010, Gaithersburg, MD 14 Semantic Gap | Tools | Future Work PKI Policy Reporter Provide more, higher-quality information for comparing CPs. • Generate a report given a set of policy sections and analyses. • Demo! • Feedback: – Not all policies rigorously obey 2527/3647 format – Sections may mean different things across versions • Response: – We created the Vertical Variance Reporter to see how policies structurally differ. 08/22/17 IDTrust 2010, Gaithersburg, MD 15 Semantic Gap | Tools | Future Work Vertical Variance Reporter QuickTime™ and a decompressor are needed to see this picture. 08/22/17 IDTrust 2010, Gaithersburg, MD 16 Semantic Gap | Tools | Future Work Viewing 08/22/17 IDTrust 2010, Gaithersburg, MD 17 Semantic Gap | Tools | Future Work Policy Reader • Feedback: – PKI Policy Repository’s interface not analyst- friendly. • Response: – We developed the PolicyReader to transform TEI-XML policies into a more familiar format. • Demo! 08/22/17 IDTrust 2010, Gaithersburg, MD 18 Semantic Gap | Tools | Future Work Future Work 08/22/17 IDTrust 2010, Gaithersburg, MD 19 Semantic Gap | Tools | Future Work Policy Searcher • Feedback: – It would be nice to search a PKI Policy Repository • Response: – We prototyped a PolicySearcher to search a repository. • Demo! 08/22/17 IDTrust 2010, Gaithersburg, MD 20 Semantic Gap | Tools | Future Work Policy Compliance 08/22/17 IDTrust 2010, Gaithersburg, MD 21 Conclusions • We claim – That a human-computer semantic gap arises from systems that primarily work on texts as files or a sequence of pages. • We bridge – That gap with computational tools to process these reference structures and try to quantify variance. • We propose – Additional tools to go beyond limitations of manual analyses. 08/22/17 IDTrust 2010, Gaithersburg, MD 22 Thank You http://pkipolicy.appspot.com/ Gabriel.A.Weaver@dartmouth.edu 08/22/17 IDTrust 2010, Gaithersburg, MD 23 Semantic Gap | Tools | Future Work Other Slides 08/22/17 IDTrust 2010, Gaithersburg, MD 24 PKI Policy Mapper Transform the content of a CP/CPS in 2527 format into 3647 format. • Mapping a 2527 to 3647 requires 20% more effort than two 3647 CPs. Avg. mapping takes 80-120 hours in a bridge context. • RFC 3647 defines tables, takes many hours • Our mapping transforms 2527 to 3647 in seconds • Demo • We can flexibly configure the mapping • Discovered errors in the transformation table (2.1 -> 2.6.4) 08/22/17 IDTrust 2010, Gaithersburg, MD 25 Semantic Gap | Homer | Tools | Evaluation Experimental Evaluation 08/22/17 IDTrust 2010, Gaithersburg, MD 26 QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. 08/22/17 IDTrust 2010, Gaithersburg, MD 27 QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. 08/22/17 IDTrust 2010, Gaithersburg, MD 28 Additional Results 08/22/17 IDTrust 2010, Gaithersburg, MD 29 Semantic Gap | Homer | Tools | Evaluation Previous Evaluation Results • Policy Translation (Policy Mapper): – Reduced time to perform task from a few days to a few seconds. • Policy Comparison (Policy Reporter) – reduced part of the policy comparison process by up to 94% 08/22/17 IDTrust 2010, Gaithersburg, MD 30 Report of the IDSP Workshop on Identity Verification Presented By: Jim McCabe Senior Director, IDSP American National Standards Institute IDtrust 2010 April 13, 2010 What is IDSP? „ ANSI is a not-for-profit membership organization that administers and coordinates the U.S. voluntary standards system „ Standards Panels provide a forum where subject matter experts from the private and public sectors work cooperatively to identify standards needed to address emerging national priorities „ Identity Theft Prevention and Identity Management Standards Panel (IDSP) is a cross-sector coordinating body whose objective is to facilitate the development, promulgation and use of standards and guidelines to combat ID theft and fraud z Identify existing standards, guidelines and best practices z Analyze gaps, need for new standards, leading to improvements z Make recommendations widely available to businesses, government, consumers IDtrust 2010 Slide 2 December 2, 2008 Page Workshop Participants „ North American Security Products Organization (NASPO) „ National Institute of Standards & Technology (NIST) „ Dept. of Homeland Security (DHS) „ General Services Administration (GSA) „ National Assn for Public Health Statistics & Information Systems (NAPHSIS) „ American Assn of Motor Vehicle Administrators (AAMVA) „ Colorado Div. of Motor Vehicles „ Coalition for a Secure Driver’s License „ Social Security Administration „ Others IDtrust 2010 Slide 3 The Identity Verification (ID-V) Problem Starts at Issuance „ Fraudsters exploit circularity of agencies relying on but not authenticating primary USA “identity” documents issued by other agencies (birth certificates, Social Security numbers / cards, state-issued driver’s licenses / ID cards) „ Intelligence Reform and Terrorism Prevention Act of 2004 (IRTPA) requires verification of identity prior to issuance of birth certificates z IRTPA regulations have not been released even in draft form „ REAL ID Act of 2005 requires verification of identity prior to issuance of driver’s licenses / ID cards z Does not provide guidance on how to corroborate a claim of identity under different circumstances IDtrust 2010 Slide 4 December 2, 2008 Page Birth Certificates Especially Problematic „ Birth certificates considered an acceptable breeder document in many states but typically not verified by the issuing agency „ No biometric linking individual to birth record „ Within 57 jurisdictions, there are 6,400 registrars and 14,000 variations of certified birth certificates „ Person obtaining certified copy may not have legal rights to record—some states have “open” records policies „ Birth certificate may not be valid for person presenting it „ Information on birth certificate may not be factual „ Death records may be absent or delayed IDtrust 2010 Slide 5 Solutions in Progress „ National Assn for Public Health Statistics and Information Systems (NAPHSIS) developing security guidelines z Recommending states have “closed” record policies z Focusing on physical security of vital records offices „ NAPHSIS looking to expand Electronic Verification of Vital Event (EVVE) system currently only available in some states (Feb 2010 update: 19 states online w/EVVE; implementation in progress in 4 more states and NY City) z Provides government-to-government verification of birth and death information z Earlier IDSP report encouraged this expansion IDtrust 2010 Slide 6 December 2, 2008 Page Solutions in Progress (contd.) „ HHS CDC / National Center for Health Statistics charged with rulemaking under section 7211 of IRTPA „ Will regulate how states issue vital records but states may decide not to comply „ Will reduce number of birth certificate forms to about 57 „ Earlier IDSP report noted that rulemaking has been delayed and recommended that these standards are needed now IDtrust 2010 Slide 7 Workshop Recommendation „ Issuers of primary USA “identity” documents need a process by which they can achieve a level of assurance whether to accept or reject a person’s claim of identity z One or more practical methods to verify identity with very high confidence, high confidence, some confidence or low/no confidence „ Guidelines on identity verification should be developed with a view toward eventual development of an American National Standard IDtrust 2010 Slide 8 December 2, 2008 Page Envisioned Benefits „ Enhanced security / credibility of identity vetting processes and foundational identity documents „ Enhanced security / credibility of credentials issued downstream based on the presentation of these foundational documents as evidence of identity z Other government credentials (FIPS 201 PIV cards, U.S. passports, Medicare / Medicaid cards) z Commercial credentials (credit / charge cards) „ Will help to reduce identity theft „ Will help to protect Americans from terrorist attacks „ And more . . . IDtrust 2010 Slide 9 Project Phases „ Phase 1 – Concept Formulation – 8 months z How to build certainty in a claimed identity z Criteria for the acceptance/rejection of a claim z Methods for the detection of fraud z Deliver draft Guideline „ Phase 2 – Testing – 4 months z State vital record offices (birth certificate issuance) z State DMVs (DL & ID card issuance) z Release of Guideline „ Phase 3 – Standardization – 8-12 months z ANSI/NASPO-IDV-2010 Methods for the Verification of Personal Identity IDtrust 2010 Slide 10 December 2, 2008 Page Timeline „ Initial IDSP workshop meetings July – Sept 2008 „ Project plan developed / team formed led by NASPO „ Concept formulation meetings Oct, Dec 2008, Feb 2009 „ IDSP workshop report and NASPO ID-V project then proceeded on parallel tracks. Both released Oct 2009. „ March 29, 2010 – NASPO formally announces its intention to develop an American National Standard z Project Initiation Notification System (PINS) 30 day announcement for public comment in April 9 edition of ANSI Standards Action www.ansi.org/standardsaction IDtrust 2010 Slide 11 Conceptual Approach for Identity Verification Guidelines Presented By: Brian Zimmer Panel Member, IDSP President, Coalition for a Secure Driver’s License IDtrust 2010 April 13, 2010 December 2, 2008 Page The Chosen Concept for Verified Identity „ An aggregation of evidence / adjudication process o Accreditation of Identity Adjudicators o An “Identity Resume” o An in-person meeting & biometric capture o Verification of key items of corroborative evidence o Use of acceptance/rejection criteria o A two step exceptions process o Binding of the person to the verified identity o Possible issuance of a ID-V token or certificate o Detailed procedures to be followed for the whole adjudication process IDtrust 2010 Slide 13 Key Concepts „ Selection and training of identity adjudicators to manage, administer and effect the process (Background check) „ Use of an identity resume to define the identity, gather information, detect fraud and reduce uncertainty „ An in-person meeting to provide opportunities for candidate / adjudicator interaction, observation and biometric capture „ Preparation and implementation of a personalized plan for verification of evidence „ Procedures for verification of the origins and continuous use of identity for both USA and foreign born persons IDtrust 2010 Slide 14 December 2, 2008 Page Key Concepts (cont.) „ Use of a contra indications format for the documentation and presentation of raw results „ Procedures for evaluation and aggregation of evidence and mitigation of significant contra indications „ Procedures for identification of critical combinations of findings to enable fraud detection „ Criteria (thresholds) for acceptance or rejection of the claimed identity optimized to the needs of the relying party „ A two step process to deal with problem cases „ Biometric binding of the person to the verified identity followed by registration of results IDtrust 2010 Slide 15 Deliverable Content Part 1 – Resume Introduction Part 2 – Identity Resume RFI Part 3 – Resume Preparation Instructions Part 4 – Adjudication Process Description Part 5 – Adjudicator Responsibilities Part 6 – Adjudication Procedures IDtrust 2010 Slide 16 December 2, 2008 Page The Identity Resume „ This is a request for information that will enable: a) definition of a person’s identity b) corroborative evidence to be collected c) uncertainty inherent in corroborative evidence to be reduced d) symptoms of identity fraud to be detected • Information items c) & d) in the Resume are based on: • for c) – an analysis of the uncertainty or risk associated with each item of corroborative evidence and how to reduce it • for d) – an analysis of behavior expected by each type of identity fraud leading to the inclusion of “imposter traps” IDtrust 2010 Slide 17 Resume Content 1. Your origins 9. Your work history 2. Your early years 10. Your memberships 3. Your family 11. Your ownerships 4. Any name changes? 12. Unique 5. Your education & events/experiences training 13. Your special skills 6. Places you have lived 14. Personal information 7. Your licenses 15. Your ID Documents 8. Your citizenship 16. Additional corrob. evidence IDtrust 2010 Slide 18 December 2, 2008 Page Adjudicator Responsibilities „ Prior to the In-Person Meeting „ During the In-Person Meeting „ Following the In-Person Meeting z Analysis of the Resume z Document Authentication z Verification of Corroborative Evidence „ Documentation of Findings & Contra Indications „ Assessment of Impacts on Proof of Origin & Use „ Evaluation, Decision & Action IDtrust 2010 Slide 19 ID-V Process Diagram December 2, 2008 Page Verification of Corroborative Evidence Contra Indications – The Results Format IDtrust 2010 Slide 22 December 2, 2008 Page Proof of Origin Flags IDtrust 2010 Slide 23 Longer Term Goal: PIV Synchronization IDtrust 2010 Slide 24 December 2, 2008 Page Custodian of the Results ? IDtrust 2010 Slide 25 Thank You. To obtain the IDSP workshop report http://webstore.ansi.org/identitytheft To obtain a summary of the NASPO ID-V project http://www.naspo.info/PDFiles/ID-V_Project.pdf For further information, contact Graham Whitehead, NASPO, gdw@naspo.info Jim McCabe Brian Zimmer jmccabe@ansi.org BrianZimmer@IDSecurityNow.org 212.642.8921 202-312-1540 www.ansi.org/idsp www.secure-license.org December 2, 2008 Page Four Bridges Forum: How Federated Identity Trust Hubs Improve Identity Management The Federal PKI Tim Pinegar Federal PKI Architecture Protiviti Government Services tim.pinegar@pgs.protiviti.com IDTrust 2010 – 4BF – Fed PKI What is the 4BF?  A consortium of public key infrastructure (PKI) bridges each serving a major community of interest;  Leveraging government and non- non-government federated identities;  Based on a common foundation of trust;  Laying the groundwork for a global trust network. IDTrust 2010 – 4BF – Fed PKI 2 Who is the 4BF?  The Federal PKI Architecture (formerly the Federal Bridge Certificate Authority or FBCA), established to enable trusted transactions within the government and between government and its industry partners.  SAFE SAFE-- BioPharma Association, Association, founded by global pharmaceutical organizations to develop and manage digital identity and signature standards for the pharmaceutical and healthcare industries.  CertiPath CertiPath,, establishing interoperable trusted identities for collaboration within the Aerospace and Defense industry via a standards based PKI bridge.  The Higher Education Bridge Certificate Authority (HEBCA), (HEBCA), developed to facilitate trusted electronic communications within and between institutions of higher education as well as with federal and state governments. IDTrust 2010 – 4BF – Fed PKI 3 The Four Bridges Boeing Lockheed Martin Northrop Grumman Raytheon Certipath EADS/Airbus BAE Systems Common Policy Root Defense SSPs Justice Exostar State SITA Treasury ARINC US Postal Service CitiBank Drug Enforcement Administration AstraZeneca US Patent & Trademark Ofc Bristol-Myers-Squibb Government Printing Office Genzyme State of Illinois GlaxoSmithKline Johnson & Johnson SSPs Merck Entrust Nektar Federal Bridge CA VeriSign Organon Verizon Business Pfizer DoD/ECA Procter & Gamble GSA/ACES Roche SAFE-Biopharma Sanofi-Aventis Higher Education IDTrust 2010 – 4BF – Fed PKI 4 The Federal Bridge  The FBCA is the identity trust hub that enables peer--to peer to--peer transactions between its member organizations, both Federal and non- non-Federal;  Source of interoperability for ALL Federal Agency HSPD- HSPD- 12 credentials (5.09 million and counting as of 12/2009);  Enables Agencies to validate each other’s PIV cards for physical access;  Validate desktop and network logins;  Support high assurance authentication to Agency Level 3 & 4 applications using government and private sector credentials IDTrust 2010 – 4BF – Fed PKI 5 4BF Timeline  2003 – NIH and Higher Ed demonstrate Bridge- Bridge-to- to-Bridge interoperability.  2004 – Aerospace Industry starts Certipath Bridge.  2004 – Pharmaceutical Industry announces SAFE Bridge.  2006 - CertiPath cross- cross-certifies with the FPKI.  2008 - BioPharma cross- cross-certifies with the FPKI.  2008 – Inaugural meeting of representatives of the four bridges.  2008 - 4BF Audit Working Group is formed to define a standard baseline for PKI audit comprehensiveness and quality that incorporates international standards.  2008 – 4BF Agreement to Cooperate is signed.  2009 – 4BF launches formal outreach campaign. IDTrust 2010 – 4BF – Fed PKI 6 Why was the 4BF Formed? Formed?  To address and resolve common issues affecting PKI--based identity federations; PKI  To stimulate greater use of high assurance electronic identity credentials by raising awareness of the benefits to relying party applications;  To target outreach to government program managers, application owners, and industry partners who can reap immediate benefits from use of PKI bridges; and  To stimulate global interoperability via the 4BF trust infrastructure. IDTrust 2010 – 4BF – Fed PKI 7 Benefits of the 4BF?  Leverage PIV certificates beyond internal agency systems to improve the ROI of PIV system infrastructure.. infrastructure  Source of interoperability with a business partners in the aerospace, defense and bio- bio-pharmaceutical communities  Trust of 4BF identity credentials provides “real time” scalability;; scalability  Facilitating identity portability IDTrust 2010 – 4BF – Fed PKI 8 For Further Information  Contacts: – Judith.Spencer@gsa.gov – Tim.Pinegar@pgs.protiviti.com – Mollie Shields- Shields-Uehling, ((mollie@safe mollie@safe--biopharma.org) biopharma.org) – Scott.Rea@Dartmouth.edu – Jeff.Nigriny@certipath.com  Websites: – http://www.safe http://www.safe--biopharma.org/index.htm – http http://www.certipath.com ://www.certipath.com – http://www.idmanagement.gov IDTrust 2010 – 4BF – Fed PKI 9 The 4BF The Four Bridges Forum The SAFE-BioPharma Digital Identity and Signature Standard Mollie Shields Uehling CEO SAFE-BioPharma Association Moving into the Digital Age: BioPharma and Its Partners  Revolution in life sciences and medical research  Cost and complexity has created crisis in R&D productivity  Need for rapid, close collaboration between pharma, healthcare providers, government agencies and research institutions  FDA and EMEA moving to fully electronic submission, review and response  Healthcare mandate for eMRs for every American by 2014 presents wealth of opportunity for information for research and clinical decision-making  Fundamental to interoperability in sensitive electronic exchanges of information are trusted identities and legal signatures. 3 SAFE-BioPharma Association  Strategic initiative started by biopharmaceutical industry – Member-governed non-profit collaboration incorporated May 2005 – Trusted identity and non-repudiable digital signature – Single interoperable digital identity across industry – Technology and vendor neutral – Interoperable with Federal agencies – Based on leading government technical and identity proofing standards – Wrapped in a legal, governance and risk mitigation model – Recognized by world’s leading regulatory authorities  To facilitate the transformation of the industry to fully electronic business and regulatory processes 4 SAFE-BioPharma 2005-2010  Regulatory engagement and recognition – US, European Union, Japan  Improving usability – Pilots, early adopters – Resulted in expansion of the standard – Improvements in identity proofing process and digital signing options  Building the interoperable network: – Issuers, digital signing, and business applications – Cross-certification with FBCA – EU qualified certificates; Safe Harbor certification  Supporting use – First, ELNs (basic laboratory research) – Then digitally signed regulatory submissions – Now workflow between several/many partners for auth & signing in federated approach 5 SAFE-BioPharma Association The SAFE-BioPharma Framework Legal, governance, risk mitigation – contract based Existing Technical and Identity Standards – NIST,OMB, Federal PKI  Identity verification  Manage identity life cycle  Comply with referenced standards  Security, audit & control requirements  Certification  Accept digitally signed transactions  Agree to limited liability caps  Agree to dispute resolution  Agree to identity assurance  Agree to self-audit & meet SAFE requirements 6 SAFE-BioPharma Association A Non-Profit, Member-Driven Standards Association Board of Directors & PAA Gary Secrest, J&J, Chair SAFE-BioPharma CEO Mollie Shields-Uehling Member Consortium Working Groups STAFF • Cindy Cullen, CTO • Jon Schoonmaker, SAFE-BioPharma Chief, Ops Technology WG European •Gary Wilson, Prog Mgr Justin Bovee, J&J • Rich Furr, Head, Reg Keith Respass, Merck Union Afrs Advisory • Tanya Newton, Mgr, •Federation TF, Reg Afrs Group, • Kevin Chisholm, Exec Merck/Pfizer •User Group, Isabelle Davias, Asst GSK/Lilly Sanofi-Aventis • John Weisberg, PR & Comm Hans van Leeuwen, •Kay Bross, Member & Merck Vendor Progs Global Business & Reg Betsy Fallen, Merck •Legal, Financial, Admin Jennifer Shivers, Lilly 7 SAFE-BioPharma Association – Non-Profit Standards Collaboration Standards Standards-Related Services Collaborative Association Standard Development & Manage member –funded Stakeholder outreach Maintenance shared infrastructure Education & advocacy Certification: Operation of SAFE bridge Policy engagement - Products Cross-cert with FBCA Industry awareness & - Issuers Vendor partner program engagement Standards engagement: Implementation tools 4BF – network of trusted HL7, HITSP, CDISC, IHE, bridges Kantara Information/Best Practices Working Groups Incubating Innovation Forum –Technical Credentials Issuance Model Media: local, national, –Federation –Users Group Antecedent Data ID Proofing trade, international –Global Business & Reg EU qualified digital identities –Implementation –SAFE EU Advisory Council Zero footprint token Regulatory alignment: Hosted digital signing –FDA; EMEA; NCAs, MHLW 8 Options for Flexible Use  Two levels of trust: – Basic Assurance for authentication – Medium Assurance for trusted identity uniquely linked to digital signature and EU-qualified  Three digital signing technologies: – Software – Hardware (zero footprint now undergoing FIPS certification) – Roaming  Three identity-proofing options – Antecedent – enterprise and on-line – Trusted agent – Notary – including office/home notary services 9 Member Public Key Infrastructure Options  Internal infrastructure – Cross certified with SAFE-BioPharma Bridge – BMS, J&J  Outsourced infrastructure – Cross-certified with SAFE-BioPharma Bridge:  Chosen Security  Citibank  Entrust  IdenTrust  TransSped  SAFE-BioPharma tiered services infrastructure: – External partners – Regulatory uses – Healthcare providers – Members Assurance and Identity Proofing Services* EU EU EU EU EU EU EU EU 3rd-Party Antecedent currently available in USA *Provided through outsourced services supplier. SAFE-BioPharma Association On-Line Antecedent Process  ID Vetting Successful: – Applicant Passes 3rd Party Antecedent identity proofing – Moved to RA queue for processing and Certificate Issuance steps. – It’s a matter of minutes end-to-end. ID Vetting Not Successful: ― Unable to verify identity via 3rd Party Antecedent ― Process reverts to Notary Process with two service options: • User locates notary • RAS/NNA will have a local notary contact the Applicant directly 12 SAFE-BioPharma and Regulators  European Medicines Agency (EMA) and FDA are on paths to requiring fully electronic submissions within the next few years  FDA and EMA helped write SAFE-BioPharma standard; engaged since inception – FDA has received 10,000s of SAFE-BioPharma submissions since 9/06 – EMA eCTD pilot successfully completed; EMA ESG to go live this year  Japan pilot underway – exchanging business, regulatory and clinical documents between pharmas, hospitals, and regulatory agencies 13 Examples of How SAFE-BioPharma Is Being Used Use Case Company ELNs – basic research Abbott (including China), BMS, GSK, Pfizer, SA Contracts, SOWs J&J, Premier, Oxford, MWB Consulting Physician Signatures SNAP Diagnostics Purchasing Premier Alliance management BMS External Partner Authentication BMS, GSK Regulatory Submissions AZ, BMS, GSK, SA, Eli Lilly Document management system McDougall Scientific Collaborative research partners BMS Paperless business/regulatory environment Amarin, MWB Consulting. SAFE-BioPharma Assn 14 Pfizer eLabNotebooks Company Profile:  Largest research-based pharmaceutical  Global research organizations  Using paper laboratory notebooks requiring scientists signatures on each experiment!  Replaced with electronic notebooks and digital signatures Pfizer ELN Results – Over 1 million digital signatures Results:  Less time on paperwork, more in the lab – > 3300 researchers in 280 departments in 20 countries; – > 550,000 documents signed – >1,000,000 digital signatures!  3.3 million pages not printed! – >16 tons of paper saved  Better patent defense – Signed, time-stamped in timely manner  Better compliance with internal regulations  Easier access to research – Electronic search of records  Faster research cycles – More time in lab, less on paperwork; No more delays to collect witness signatures SNAP Diagnostics Company Profile:  Leader in diagnostic technology for detection of sleep apnea and analysis of snoring problems  Provides physicians in the U.S., EU, and Latin America with proprietary diagnostic equipment used in home settings Scope:  Records of at-home tests analyzed by company physicians who advise referring physicians re therapeutic approach  SNAP physicians digitally sign diagnoses and send to personal physician Results:  Eliminated paper in day-to-day reviews of diagnostic information  Eliminated costs associated with handling, signing, shipping, storing and accessing paper 17 Premier Purchasing  Company profile – Largest Group Purchasing Organization (GPO) in U.S. – Owned by non-profit hospitals – Serves 2,000 U.S. hospitals and 53,000-plus other healthcare sites – Buys from ~700 suppliers – http://www.premierinc.com/  Scope: – Eliminate overnight shipping, fax and related workflows for contract origination and amendments – Provide SAFE-BioPharma credentials to Premier Sourcing/Procurement employees and their supplier colleagues for signing new and amended supplier contracts – eContracting process ~700 companies and thousands of contracts and/or amendments  Future: – Digitally sign and submit required reports to CMS National Cancer Institute-Bristol-Myers Squibb: The Business Issue  NCI – pre-imminent cancer research institution  Collaborates with pharma, biotechs, many research institutions and individual researchers  Lots of contracts, clinical documents, amendments, signatures  Bristol-Myers Squibb (BMS) conducts collaborative research with NCI’s Clinical Trials Evaluation Program (CTEP) and its many collaborative research partners (cooperating groups) • Clinical trial agreements • Clinical trial documents • Clinical materials orders and supply • Leverage PIV (Federal credentials) , BMS cross-certified SAFE- compliant credentials and SAFE-BioPharma credentials for authentication and digital signing 19 SAFE-BioPharma Association The NCI-BMS Project  Credentials – Medium Assurance – BMS  BMS SAFE-BioPharma Cross Certified credentials  SAFE-BioPharma Credentials – NCI  PIV cards – Cooperative Groups – research organizations – academic, CROs, etc.  SAFE-BioPharma Credentials  Signature Work Flow – SAFE-BioPharma Digital Signing Service Pilot – MySignatureBook/MyOneLog-on – A “cloud” service 20 SAFE-BioPharma Association 21 Digital Signature Service Cooperative Group Digital Signature Service Cooperative Group Leveraging the Value of the 4BF  Very challenging economic environment  Investment flowing to areas that will improve productivity and lower FTE  Opportunity to leverage network of interoperable credentials and  To create network of trusted partners  Move government-business and B2B processes into the cloud  Bring value to government program managers and private sector business process owners by improving business process efficiencies 24 SAFE-BioPharma Association Value of SAFE-BioPharma Digital Identity and Signature in the Partnered Digital World Established, tested standard meeting needs of governments, regulators, industry Ability to identity proof and issue certificate in ~15 minutes (including medical license) Contract-based system tailored to pharma/healthcare needs (HIPAA compliance; risk mitigation) Secure Legal enforceability Regulatory recognition and acceptance (US, EU, Japan) Global standard and set of services Links Federal agencies to pharma and healthcare providers Provides interoperability Interoperable standard that facilitates transition to fully electronic business and regulatory processes 25  Please visit the SAFE-BioPharma website: http://safe-biopharma.org/  Please visit the 4BF website: http://www.the4bf.com/  Pfizer’s Implementation of SAFE-BioPharma Digital Signatures in ELNs : http://www.safe-biopharma.org/images/stories/pfizer%20white%20paper_v1.pd f  AstraZeneca’s Implementation of SAFE-BioPharma for FDA Submissions: http://www.safe-biopharma.org/images/stories/az_safe_final.pdf  Learn more about the SAFE-BioPharma Implementation Toolkit: http://safe- biopharma.org/index.php?option=com_content&task=view&id=254&Itemid=422  Watch the SAFE-BioPharma introductory video: http://www.phillipsvideopost.com/safe  Contact us for more information: Jon Schoonmaker Mollie Shields Uehling Kay Bross, Director Chief of Operations & Cindy Cullen CEO Member/Vendor Progs Technical Program CTO mollie@safe-biopharma.org kbross@safe-biopharma.org (301) 610-6060 cindy.cullen@bms.com (201) 292-1861 (513) 489-3840 (o) jon.schoonmaker@safe- (609) 818 4152 (201) 925-2173 (cell) (513) 673-2344 (c) biopharma.org Kevin Chisholm, Admin. Rich Furr Gary Wilson Tanya Newton Kevin.Chisholm@SAFE- Head, Reg. Afrs. Prog. Mgr Manager, Reg Afrs BioPHarma.org RFurr@SAFE-BioPharma.org (781) 962-3172 (908) 213-1069 (201) 292-1860 (610) 252-5922 Gwilson@safe- tanya.newton@safe- biopharma.org biopharma.org 26 The 4BF The Four Bridges Forum HEBCA - Higher Education Bridge Certificate Authority Authentication  Authentication is the process of obtaining an identification credential (e.g. username/password) from a user and validating those credentials against some authority. – If the credentials are valid, the entity that submitted the credentials is considered an authenticated identity.  Authentication relies on two main elements: – A credential that is bound to an identity – The ability to verify the credential Authentication Factors  Three different Factors of Authentication: – Something you know  e.g. password, secret, URI, graphic – Something you have  e.g. key, token, smartcard, badge – Something you are  e.g. fingerprint, iris scan, face scan, signature Authentication Factors  Single Factor of Authentication is most common – Passwords (something you know) are the most common single factor  At least Two Factor Authentication is recommended for securing important assets – e.g. ATM card + PIN (have + know)  2 x Single Factor Authentication ≠ Two Factor Authentication – e.g. Password + Graphic is NOT equivalent to Smartcard + PIN (although it may be better than a single instance of One Factor Authentication) Password Authentication  General issues with Authentication using Password technology – Passwords easily shared with others (in violation of access policy) – Easily captured over a network if no encrypted channel used – Vulnerable to dictionary attacks even if encrypted channels are used – Weak passwords can be guessed or brute forced offline – Vulnerable to keyboard sniffing/logging attacks on public or compromised systems – Cannot provide non-repudiation since they generally require that the user be enrolled at the service provider, and so the service provider also knows the user's password – Vulnerable to Social Engineering attacks – Single factor of Authentication only Password Authentication  Specific issues with Authentication using Password technology – Too many passwords to remember if requiring a different one for each application  Leads to users writing them down and not storing them securely  Leads to use of insecure or weak passwords (more secure ones are generally harder to remember)  Leads to higher helpdesk costs due to resetting of forgotten passwords.  Leads to re-use of passwords outside institutions’ domain where protection mechanisms may be much lower Password Authentication  Specific issues with Authentication using Password technology – Potential single point of failure for multiple applications if same password used  Strong passwords not consistently supported in all applications  Weak passwords leads to widespread compromises  Passwords not consistently protected for all applications  Password expiration not synchronized across applications  Limited character set for input  No control over use of passwords outside organization’s domain  Offline attacks against passwords may be possible The PKI Solution  Solution to Password vulnerabilities - Public Key Infrastructure (PKI) – PKI consists of a key pair – 1 public, stored in a certificate, 1 private, stored in a protected file or smartcard – Allows exchange of session secrets in a protected (encrypted) manner without disclosing private key – PKI lets users authenticate without giving their passwords away to the service that needs to authenticate them  Dartmouth’s own published password-hunting experiences shows that users happily type their user ID and password into any reasonable-looking web site  PKI can be a very effective measure against phishing PKI Solution  Solution to Password vulnerabilities - Public Key Infrastructure (PKI) – PKI lets users directly authenticate across domains  Researchers can collaborate more easily  Students can easily access materials from other institutions providing broader educational opportunities – PKI allows decentralized handling of authorization  Students on a project can get access to a web site or some other resource because Prof Smith delegated it to them  PKI simplifies this process – no need for a centralized bureaucracy, lowers overheads associated with research – Private key is never sent across the wire so cannot be compromised by sniffing – Not vulnerable to dictionary attacks – Brute force is not practical for given key lengths – Facilitates encryption of sensitive data to protect it even if a data stream or source is captured by a malicious entity PKI Solution  Solution to Password vulnerabilities - Public Key Infrastructure (PKI) – 1024-bit keys are better than 128 character passwords (they are not subject to a limited character input set)  This is far stronger than just about any password based authentication system  As one researcher said recently “the Sun will burn out before we break these” Quote from Prof Smith: “In the long run: user authentication and authorization in the broader information infrastructure is a widely recognized grand challenge. The best bet will likely be some combination of PKI and user tokens.” – Failing to look ahead in our IT choices means failing in our research and educational mission. Additional PKI Benefits  Additional drivers for PKI in Higher Education (besides stronger authentication): – Better protection of digital assets from disclosure, theft, tampering, and destruction – More efficient workflow in distributed environments – Greater ability to collaborate and reliably communicate with colleagues and peers – Greater access (and more efficient access) to external resources – Facilitation of research funding opportunities – Compliance Additional PKI Benefits  Applications that utilize PKI in Higher Education – Secure Wireless – S/MIME email – Paperless Office workflow (signed PDF and Word docs) – Encrypted File Systems (protecting mobile data assets) – Strong SSO – Shibboleth/Federations – GRID Computing Enabled for Federations – E-grants facilitation HEBCA – A Brief History  HEBCA started life as pilot project to validate PKI bridge-2- bridge transactions  Modeled on the successful FBCA, but representing higher education  Hosted at govt. contractor site, beginning 2001 with involvement from several HE institutions – Dartmouth College, University of Wisconsin, University of California – Berkley, University of Alabama, etc.  EDUCAUSE provided sponsorship to instantiate the infrastructure for real  Dartmouth College chosen as operating authority in May 2004 HEBCA – A Brief History  HEBCA rebuilt from the ground up based on prototype infrastructure  Policy Mapping and technical interoperation completed with FBCA, cross-certification with a limited number of schools and related entities  HEBCA is ready for production, but still operates in a “Test” mode today  Steps are underway to migrate infrastructure to a long term commercial operation Proposed CA-2 CA-1 Inter-federations HE BR CA-1 AusCert CAUDIT CA-n NIH HE JP FBCA PKI CA-2 Cross-cert Cross-certs CA-3 DST C-4 ACES Dartmouth HEBCA Texas Cross-certs IGTF Wisconsin UVA Univ-N USHER CertiPath SAFE CA-4 Other CA-1 Bridges CA-2 CA-3 HEBCA – A Brief History  HEBCA provides 5 levels of interoperability – Test + 4 levels equivalent to NIST SP800-63  Audit has been the single most prevalent deterrent to adoption within the community – Schools are very consistent and regimented in the processes that they follow for Identity authentication and management, but often do not have formal documentation of those processes, nor audit of those processes by independent 3rd parties.  Authentication has been the service driving the majority of demand HEBCA  HEBCA provides an efficient way for participating organizations to establish trust of any identities issued by other participants  HEBCA uses technological and policy-based processes to assert the level of assurance that community members can place in a given identity certificate.  As each participant joins HEBCA, their identity credentialing processes are reviewed and an assurance value is assigned to their certificates on a scale recognized within the community.  Instead of each member establishing bilateral trust agreements, and reviewing the policies and procedures of each of all the other participants, they can simply trust the validity of the identity which HEBCA has vetted and asserted across its entire system  HEBCA’s participation in the 4BF enables a far greater community of trust for its participants beyond just higher education HEBCA  NOTE: HEBCA is still only operating in “Test” mode – Transition is underway to move operations to commercial CA vendor (DigiCert Inc.) – Root will be re-issued & participants re-cross-certified – Expect full production operations by Q4 2010 – Scott Rea: scott.rea@dartmouth.edu The 4BF The Four Bridges Forum Jeff Nigriny CertiPath CertiPath Trust Fabric 08/22/17 2 The “Bridge” between LACS and PACS Traditional LACS space marked by PKI, OTP, and UID/Password leveraged through Smart Card Logon, Federated Access Gateways, SSL, S/MIME Traditional PACS space marked by Magstripe and Prox, however PKI on PIV/-I and CAC is quickly becoming best practice for Federal Facilities Credentials which work in either application are the missing link to gaining situational awareness through logical and physical networked “intelligence points” Growing Pains • PKI in PACS is easier said than done – PACS Vendors and integrators are commercially aligned to avoid interoperable credentials • Poor implementations hurt everyone • All of the supporting infrastructure for interoperable credential usage in LACS is missing for PACS GSA Trusted PACS Specification Version 1 of the Trusted PACS Specification was published by GSA on March 9th, 2010 Policy - LACS & Credentials vs. PACS Interoperable high assurance LACS and Credential Interoperable high assurance standards/policies exist to: PACS standards/policies exist to:  Define the need  Define the need  Many e.g., OMB M-04-04,  Few e.g., SP 800-116, DTM- SP 800-79, ISO 27799, etc. 09-012  Define the form  Define the form  Many, e.g., x.509, SP 800-  Closest to date is TWIC, 73, SAML FRAC  Define audit/C&A  Define audit/C&A  Many, e.g. FIPS-201 APL,  None and worse, FIPS-201 FISMA, SOX, etc. APL is causing confusion  Define interoperability  Define interoperability  Many, e.g., The 4BF’s CPs,  One, GSA Trusted PACS OpenID, Kantara Specification  Define the requirement for  Define the requirement for industry industry  None  None Imp s le me M nt OE atio ns Ge e r s ner e n t al lem Imp 4/9/2010 This is PowerPoint Internet Voting – Threat or Menace? April 14, 2010 Jeremy Epstein Senior Computer Scientist SRI International Arlington VA © SRI International © SRI International Outline… What is Internet voting (i-voting)? • Background – definitions & requirements √ 1. Getting information on candidates, contests, etc. • How Internet voting works √ 2. Voter registration – get blank form, fill out, submit, receive ACK • Some potential solutions √ 3. Absentee ballot request – get request form, fill out, submit, receive ACK • How can PKI help with Internet voting 4. Fill out & submit ballot • Is it a threat or a menace? √ a. Get blank ballot √ b. Fill out X c. Return √ d. Receive ACK © SRI International © SRI International 1 4/9/2010 Advantages of i-voting Voting System Requirements • “More modern” • Allows each authorized voter to vote exactly once • Potential for higher turnout, especially for young voters • Accurately records the votes • Potential for lower cost • Accurately counts the votes • Reduce precinct staffing issues • Voter can be sure his/her vote is counted, without trusting the other side’s • Enable military/overseas voters - Uniformed and Overseas Citizens Absentee people, even if the other side’s people are the election officials (*) Voting Act (UOCAVA) compliance • Voter can be sure his/her vote is counted, without trusting the company that made or programmed the voting equipment • No one can learn how he/she voted without his/her cooperation • No one can prove how he/she voted even with his/her cooperation coercion Avoid } (*) Election officials are overwhelmingly honest, but the system can’t depend on that None of these are absolutes – all voting systems make some level of compromises © SRI International © SRI International Unique issues with voting (Internet or otherwise) Email ballot submission is a bad idea… • Once-every-four-years voters (can’t rely on special-purpose devices, software) • No privacy • Process must be understandable to everyone – Email is store and forward, so any machine (or administrator of the machine) can • Must be usable and accessible to all citizens – including low-income, seniors, read the message non English-speaking • No authentication • Process is largely run by minimally trained (but hardworking!) senior citizens – To/from headers aren’t trustworthy • Many ballot styles – hundreds or thousands per state • No integrity • Highly cost sensitive – no one wins election by promising to invest more in – Contents of the email may be modified at any hop elections! PKI can address all of these, if you can get certificates to voters that they then have to find and use successfully once every four years © SRI International © SRI International 2 4/9/2010 Mail-in ballots (absentee, VBM) Types of i-voting • Privacy – double envelopes, but no protection against vote selling • Home-based (personal computer, cellphone, etc) • Authentication – signatures, but signature checking is weak – More convenient • Integrity – controls on physical mail (stronger than email) – May be more accessible for voters with disabilities – Less expensive for locality • Lots of historical problems with privacy, esp. in nursing homes • Kiosk-based (dedicated controlled system) – More even playing field (poor voters aren’t at a disadvantage) • Overseas VBM has to trust (at least) two countries’ mail systems – More controlled environment (physical and software controls, voter authentication by • Definite risks, but wholesale attacks much harder than email a trusted person, reduced risk of in-person coercion) – Essentially no different than a precinct-based system • Absentee (excused or no-excuses) everywhere in the US • All-VBM in Oregon, largely VBM in California and Washington © SRI International © SRI International Simple i-voting Protocol Communications Security Server Server Client Client Including DNS, BGP, SSL issues © SRI International Copyright © 2010 Andrew Appel. Used by permission. © SRI International Copyright © 2010 Andrew Appel. Used by permission. 3 4/9/2010 Insider Attacks Vulnerability in Server to Outside Attacks Server Client Insider may be a contractor/vendor/election official P(success)>0.99 © SRI International Copyright © 2010 Andrew Appel. Used by permission. © SRI International Copyright © 2010 Andrew Appel. Used by permission. Vulnerability in Clients Obvious Questions Server Does the server software add up the Are ballots votes correctly? transmitted correctly? Can Client eavesdroppers learn how you voted? 30-80% of clients have malware © SRI International Copyright © 2010 Andrew Appel. Used by permission. © SRI International Copyright © 2010 Andrew Appel. Used by permission. 4 4/9/2010 Not-so-obvious questions Can my votes be If I can bank online, why can’t I vote online? changed after I submit them? Is that the real Am I getting the server I’m talking right blank ballot? to, or an imposter? Server Does the server software add up the Are ballots votes correctly? transmitted correctly? Are the votes displayed CanIs the “real” server software Client on the screen the same eavesdroppers actually installed on the learn how you as the votes actually voted? server computer? transmitted by the client software? ZeuS, May 2009 © SRI International Copyright © 2010 Andrew Appel. Used by permission. © SRI International Other online elections Some Obvious Solutions That (Might) Work • Shareowners (board of directors, etc) • Using SSNs as authenticators – Possibly different threat models - long history of attacks against political elections • Digital signatures on ballots – No requirement for ballot anonymity • End-to-end crypto • Political primaries (e.g., Democrats Abroad 2008) • Out-of-band vote confirmation – Run by parties, not the government • Signed vote summary to voter – No requirement for ballot anonymity • Bullet-proof server to store votes – No requirement for auditability • Paper backup of votes – Not governed by any Federal (or sometimes even state) regulations • Local elections (e.g., Honolulu neighborhood boards 2009) – Much lower threat model (less to gain, less to spend) – Not governed by any Federal (or sometimes even state) regulations © SRI International © SRI International 5 4/9/2010 Using SSNs to authorize voting Digital Signatures on Ballots • Can we make it easier to authorize voters for online voting? • Can we avoid the need to fully trust the client and server computers? • Idea: Have voters sign in by providing their SSN and then cast a ballot, • Idea: Let each voter (digitally) sign her ballot, and post every ballot on a which could be encrypted using their SSN as a key. public (Internet) bulletin board. Avoids having to create a new identifier that voters can’t remember. Accurate and trustworthy: Each voter can verify that her ballot is present; any member of the public can add up all the posted votes and reconfirm election results. Not all US citizens have SSNs, some non-citizens do (www.ssa.gov/pubs/10096.html) . How do we get keys/certificates to voters? Generally speaking it’s illegal. Complete loss of voter privacy! SSN isn’t secret! © SRI International © SRI International Cryptographic End-to-End Protocols Out-of-Band Vote Confirmation (1) • Can we allow posting votes without compromising voter privacy? • Can we avoid the threat of malware on the voter’s computer? • Idea: Let each voter (digitally) sign her ballot, and post every ballot on a • Idea: Have a chart of images associated with each candidate and public (Internet) bulletin board. But use special-purpose encryption published in the newspaper; server sends back the right image to the voter protocols to avoid loss of voter privacy to prove that the voter’s computer transmitted the vote correctly. Each voter can verify (probabilistically) that her ballot is (very likely) present; any member of the public can add up all the posted votes (probabilistically) and reconfirm election results. Gives warm fuzzies that the voter’s intent is captured. If lots of choices, Do these protocols actually work? Can they be explained to voters and increases effort for malware author to give a “right” image to the voter. policymakers? Are policymakers able to evaluate these protocols? Are there hidden vulnerabilities? Extra steps for voters (who aren’t likely to check). If only one image per candidate, malware can provide the image regardless of the vote cast. © SRI International © SRI International 6 4/9/2010 Out-of-Band Vote Confirmation (2) Signed Email with Vote Summary • Can we avoid the threat of malware on the voter’s computer? • Can we avoid the threat of malware on the voter’s computer? • Idea: Have the server call the voter on the phone to read back the votes. • Idea: Have the server send an email back to the voter with a digitally signed copy of their votes. Allows online voting without the slow process of picking candidates from If voter uses web-based email (e.g., Gmail), can check results a phone menu. anywhere, so compromise of vote casting computer isn’t catastrophic. Extra steps for voters (who aren’t likely to check). New opportunity for Extra steps for voters (who aren’t likely to check). Many voters won’t vote selling. How does a voter prove that the call-back doesn’t match have access to multiple computers. New opportunity for vote selling. what they wanted? Not all voters have phones! Teaching voters to check email signatures is hard! © SRI International © SRI International Provide a Bullet-Proof Server to Store Votes Paper Backup of Votes • Can we avoid the risk that someone (insider or outsider) will hack into the • Can we use paper backups of votes in case there’s a system failure? server and add or change votes? • Idea: Print the voted ballots and use those for audits and recounts. • Idea: Have understaffed non-technical election officials set up the system. • Idea: Have unaccountable outsourced vendors set up the system. • Note: This method used in Okaloosa County Florida for their 2008 pilot • Idea: Have Google run the election (c.f. Aurora). program. • Note: This method used by Democrats Abroad for their 2008 pilot program. Faith based voting? Voter can check that the computer recorded the vote correctly by examining paper. Allows audit to verify that electronic tallies are correct. This is perhaps the biggest threat of all. Does this actually solve a problem? Why not just mark the paper by hand and send it in the mail? © SRI International © SRI International 7 4/9/2010 Use CAC for military voters Where can PKI help? • For electronic ballot distribution, allow voters to ensure that they got the right • Can we use existing infrastructure to authenticate voters? ballot • Idea: Have military voters use their Common Access Cards (CAC). • For acknowledgement of voter registration requests, absentee ballot requests, and completed ballot receipt, signed emails • Possibly for authentication for military voters using CAC • But expecting voters to maintain a PKI certificate for use once every four years is a non-starter Provides strong remote authentication, with existing processes designed to deter credential sharing. Military isn’t likely to share CAC authentication with 5000 localities nationwide. At best, solves authentication problem. © SRI International © SRI International So is it a threat or a menace? So is it a threat or a menace? (Take 2) THREAT MENACE THREAT a warning that THREAT: MENACE pose a threat to; MENACE: something unpleasant is imminent present a danger to • Lots of political movement • I-voting is a danger to accurate towards i-voting because it vote counting given current and sounds like a good idea reasonably foreseeable • Little understanding by elected technology officials of the technological risks • E2E technologies may reduce or similarities and differences the risk for kiosk voting systems compared to e-banking © SRI International © SRI International 8 4/9/2010 Going forward For more reading/viewing • Encourage the relatively safe parts • Recent OVF/UOCAVA Internet voting debate – Online voter registration (backed up with in-person identity checks) – http://www.youtube.com/OverseasVote – Online absentee ballot requests • Open source voting – Online absentee ballot distribution – Open Source Digital Voting Foundation - www.osdv.org – Online absentee ballot receipt acknowledgement – Elections by The People Foundation – www.electionsbythepeople.org • … and stick with mail-in paper for the critical ballot submission • NIST End to End Voting Systems Workshop - http://csrc.nist.gov/groups/ST/e2evoting/index.html • Internet Voting: Will We Cast Our Next Votes Online?, Jeremy Epstein, ACM Computing Reviews, December 2009. • A Security Analysis of the Secure Electronic Registration and Voting Experiment (SERVE), David Jefferson et al, January 2004 (updated June 2007), www.servesecurityreport.org • Report of the National Workshop on Internet Voting: Issues and Research Agenda, Internet Policy Institute, March 2001. © SRI International © SRI International © SRI International 9 Independently-Verifiable Secret-Ballot Elections Poorvi L. Vora Department of Computer Science The George Washington University Outline • Current voting technology, limitations • Cryptographic approach; paradigm shift • “End-to-end” voting systems • Electronic E2E voting systems Current Technology In the world’s oldest continuous democracy • Humboldt County, CA: voting machines dropped 197 votes – Wired, 12-8-2008 • Florida’s 13th Congressional District (2006): One in seven votes recorded on voting systems was blank – US Government Accountability Office, 2-8-2008 • Franklin County, Ohio: computer error gave Bush 3,893 extra votes in one precinct – WaPo, 11-6-2004 • In a North Carolina County: 4,500 votes were lost – WaPo, 11-6- 2004 Voting Machine Analysis • Kohno et al (2004): Diebold AccuVote-TS DRE* – Voters can cast unlimited votes without detection – Insiders can modify votes and match votes to voters • Felten (2006) – "Hotel Minibar Keys Open Diebold Voting Machines • Bishop, Wagner et al (2007): CA “Top to Bottom Review” – Voter can insert a virus into code – Virus can spread through the state’s election system And so on …. optical scan (Kiayias et al, 2007), Ohio voting machines OS + DRE (McDaniel et al, 2007); NJ DREs (Appel et al, 2009); *DRE: Direct Recording Electronic More exhaustive testing? • Not possible to test large programs for the absence of errors • Cannot rely only on – software and – software testing • Go back to paper, or keep paper back-up At least “we” can count paper BUT • Everyone cannot use paper • Inefficient and inaccurate counts and recounts (e.g. Minnesota Senate election) Problems of integrity remain • “we” = persons with privilege • Still need to secure cast ballots till counting Integrity Issues Are these our only choices: – Trust: • chain of custody of voting systems/paper back-up and • those who count OR – Watch • all locks on all precincts, and • all counts Cryptographic Voting Systems Paradigm Shift Audit the Election Not the Equipment Instead of checking – all the software, and – that it will perform several operations correctly every time Determine that only the tally is correct, only this time Encrypted Paper Trail 1. Voter Casts Encrypted Vote and Takes Copy out of Polling Booth 2. Voter Checks Receipt on Website/Newspaper Tally Computation 34W1 5: McCain AC1U Tally 3:Romney HY40 9IK1 Voting Public digital audit trail 2LS7 system • commitment by voting B8OH system 5TJG for proof of tally DEV6 3. Voting system reveals tally and a digital audit trail to begin the proof of tally correctness For example: Invention of Secure Electronic Voting Votes are decrypted and shuffled Mixnet: David Chaum (1981): Public key encryption/decryption Partial decryption using assymetric-key cryptography 34W1 5GXT McCain AC1U NZ2Q Romney HY40 LN04 McCain 9IK1 S43R McCain 2LS7 77JH McCain B8OH MBFD Romney 5TJG AZ9J Romney DEV6 LOQ1 McCain On public website: anyone can compute tally Tally Audit 4. Public audit performed by auditors Successful audit verifies tally without revealing information on votes Open Voting protocols can protect tally integrity or vote secrecy (but not both) For Example: Tally Audit Jakobsson, Juels, Rivest (2002) 34W1 5GXT * McCain AC1U HY40 *LN04 NZ2Q Romney McCain *S43R 9IK1 2LS7 77JH * McCain McCain *MBFD B8OH 5TJG AZ9J * Romney Romney DEV6 LOQ1* McCain * On public website: anyone can check opened commitments The story so far (in 2002) … • Very interesting theoretical results Chaum (1981), Cohen (now Benaloh) and Fischer (1985), Benaloh and Tuinstra (1994), Sako and Kilian (1995), – Relevant: zero-knowledge proofs and interactive/non- interactive proofs (e.g. Goldwasser-Micali-Rackoff (1985) ) • BUT: Computers vote OR humans encrypt votes • Encryption on trusted machines – Cannot use in polling booth – Cannot use to vote from home, because • Home PCs can have viruses • Adversary can threaten or bribe voter Trusted encryption without trusted encryption device? E2E Systems: Voter-Verifiable Voting Voters need not trust encryption device • Electronic: Chaum (2002-3); Neff (2004); Benaloh (2006); VoteBox (2007) • Paper Ballots: Prêt à Voter (2005); Punchscan (2005); Scratch and Vote (2006); Voting Ducks (2006); Scantegrity (2007) • Remote: Rijnland Internet Election System (RIES) Netherlands governmental elections (2004, 2006); Helios (2008); not resistant to remote coercion Example: Prêt à Voter Ryan et al, 2005 1. System encrypts vote 2. Voters can choose to audit the encryption or cast it 3. Audit ballot Ballot “Onion” Receipt by opening onion Picture from Stefan Popoveniuc, PhD Dissertation, GW, 2009 Scantegrity II Takoma Park Municipal Election: 2009 Scantegrity II front end + Punchscan back-end UMBC, GW, MIT, Waterloo, UOttawa First fully-voter-verifiable secret- ballot governmental election • November 3, 2009: Takoma Park, MD • Mayor + 6 Council Members • 1728 votes cast (10,934 registered voters) • Candidates were ranked by voters (instant runoff voting) • Unique: – Public audit of tally – Open-source – Fully-verifiable by voters Scantegrity II (2008) UMBC, GW, MIT, Waterloo, UOttawa Photo by Alex Rivest Website Verification • Immediately after election (10-11 pm) – Scantegrity count announced – Codes made available online • 81 unique ballot verifications, 64 before Takoma Park complaint deadline (Nov. 6) • One complaint – Codes not clear enough for one voter – Voter noted “0” – Scantegrity website said “8” – Voter trusted Scantegrity code was correct – Audit check later revealed Scantegrity code was correct Audits: (Closed) Manual Vote Count • November 5, afternoon • Jointly by Scantegrity and Takoma Park • Corroborated Scantegrity total • Few differences, due to difference between: – machine reading (by scanner) and – human determination of voter intent • Election certified at 7 pm. – by Chair, Board of Elections, to City Council Audits: Encryption Audit Lillie Coney* Audited ballots through the day Chose about 50 ballots at random Exposed all confirmation codes Took home copies of marked ballots Checked them against commitments when opened after election With familiarity, voters, including candidate representatives, can do this too * Associate Director, Electronic Privacy Information Center and Public Policy Coordinator for the National Committee for Voting Integrity (NCVI) Audits: Digital Audit Trail Dr. Ben Adida* and Dr. Filip Zagórski+ – Audited the entire digital audit trail and independently confirmed tally correctness – Provided their own copy of confirmation codes for voter check – Pointed out discrepancies in documentation * Helios and Center for Research on Computation and Society, Harvard University + Institute of Mathematics and Computer Science, Wroclaw University of Technology, Poland Universally Verifiable Anyone can perform the audits performed by Adida and Zagórski – BoE Chair expects other voters will, using software provided by Adida and Zagórski – Voters can write their own software, using Scantegrity public spec Limitations • Bulletin Board (website) needs to be secure – Ensure that it doesn’t present one code to voters, another to auditors – Hence Adida and Zagórski made their own copies and requested voters to check • The cryptographic protocol does not prevent ballot stuffing, we had to use procedures • Paper ballots are inaccessible to those with motor and visual disabilities Electronic Independently- Verifiable Elections? Electronic Audit • Voter: “Vote for Bob” • System prints encryption and signs it • Voter: “I want to audit this encryption” • System shows that it encrypted vote for Alice • Voter knows system cheated, but no proof without hard record of “Vote for Bob” • If we keep hard record, then has to be destroyed if voter chooses to vote, not audit • Need observers during audit. Can we do that without voting system detecting an audit? Conclusions • Can have better integrity of election outcome using E2E systems • Challenges exist in making E2E systems electronic Acknowledgements Collaborators: Carback, Chaum, Clark, Coney, Essex, van de Graaf, Hall, Hosp, Popoveniuc, Rivest, Ryan, Shen, Sherman, Wagner At NIST: Hastings, Kelsey, Peralta, Popoveniuc, Regenscheid Help with Takoma Park election: City Clerk and Board of Elections, Takoma Park Independent auditors: Adida, Coney, Zagórski Survey: Baumeister Others: Florescu, Jones, Relan, Rubio, Sonawane, Support: NSF IIS 0505510, NSF CNS 0831149, NSF CNS 0937267 School of Engineering and Applied Science, GW: start-up funds Extras Using the DNS as a Trust Infrastructure with DNSSEC Cyber and Network Security Program Scott Rose NIST {scott.rose@nist.gov} IDTrust 2010, April 14, 2010 About DNS • Worldwide database, widest deployed standards-based name system – “PKI without the ‘K’” – Dan Kaminsky • Essential component of Internet Cyber and Network Security Program – Robust even in the presence of some errors – Often the first part of any Internet transaction • Due to lightweight, distributed nature, attacks very difficult to detect – cache poisoning – response re-writing • In response, the IETF developed the DNS Security Extensions (DNSSEC) What DNSSEC Provides • Cryptographic signatures in the DNS • Integrates with existing server infrastructure and user clients (i.e. Backwards compatible) • Assures integrity of results returned from DNS queries: Cyber and Network Security Program – Users can validate source authenticity and data integrity • Checks chain of signatures up to root – Protects against tampering in caches, during transmission • Not provided: confidentiality, security for denial-of- service attacks DNSSEC “.”Chain – DNS root. of Trust KSK Trust Anchors installed on client ZSK resolvers. KSK KSK se. Cyber and Network Security Program gov. KSK KSK KSK ZSK ZSK ZSK KSKs KSKs KSK KSK opm.gov. nist.gov. • KSK’s often serve as the “anchor” of KSK KSK authentication chain. ZSK ZSK •The higher up in the tree, the more Data Data useful the trust anchor Deployment is Real • Several TLD’s and lower zones are signed now – .gov, .org, and country codes like .us, .se, .br… – .edu, .net and .com are planning to deploy by 2011 – Drivers to deploy in .gov – OMB mandate and FISMA Cyber and Network Security Program • Root zone to be signed by July 1, 2010 • What’s Missing/Still in Development? – Application support – Stable means to distribute trust anchors – Full registrar support DNSSEC Becoming a Feature • Tools available – Open source software to turnkey appliances • Becoming available by ISP’s (Comcast) • Integrated into Windows 7 and Windows Server 2008 R2 Cyber and Network Security Program – managed via group policy • Some application patches available – Firefox browser and Thunderbird email client – Third party plug-ins and patches So What Does This Get Us? • Single, distributed, global, lightweight trust infrastructure. • DNS is a lookup protocol – different types of data can be placed in the DNS • Example: digital certs, SSH key hashes Cyber and Network Security Program – All would be DNSSEC signed. • Could we use this to bootstrap trust between organizations? – Both would have a common 3rd party trust anchor (root zone for example) – Data needed to establish trust in other protocols could be stored in an organization’s DNS zone (and signed). Examples – Bootstrapping Trust • Crude transport security – encoded public keys in DNS CERT RR’s to set up secure communication • Or SSH key hashes (SSHFP RR’s) Cyber and Network Security Program – CERT RR protected by DNSSEC signature – IP address of server also protected – Not ideal, but could work • Need to be sure you are actually talking to the actual server (no IP address spoofing) • Signed Email – user public keys encoded in CERT RRs (e.g. scottr@nist.gov becomes “scottr.nist.gov IN CERT …” Some Things to Keep in Mind • DNS has caching and no revoke feature – Data is considered valid as long as the signature is valid (replay attacks possible) – DNS updates might not be seen until old data times out of caches Cyber and Network Security Program • DNSSEC validation would have to be done by the client, or a trusted recursive server – Right now, stub clients on desktop/laptop systems rely on an upstream cache to do most of the work (including validation) – Do you always trust the recursive server? What about Wi-Fi hotspots? • No Cross-Signing – Hierarchy built upon the existing DNS hierarchy (so “example.com” can’t authenticate “sub.example.org”) Resources • DNSSEC Resources – General Information • http://www.dnssec.net/ – NIST DNSSEC Testbed Cyber and Network Security Program • http://www.dnsops.gov/ – DNSSEC Deployment Initiative • http://www.dnssec-deployment.org/ • Root Zone DNSSEC Deployment – http://www.root-dnssec.org/ Efficient and Privacy-Preserving Enforcement of Attribute-Based Access Control ∗ † Ning Shang Federica Paci Elisa Bertino Microsoft Corporation University of Trento Purdue University One Microsoft Way Via Sommarive 14 305 N. University Street Redmond, Washington Povo, Trento, 38123 West Lafayette, Indiana nishang@microsoft.com paci@disi.unitn.it bertino@cs.purdue.edu ABSTRACT Keywords Modern access control models, developed for protecting data Identity, Privacy, Agg-EQ-OCBE from accesses across the Internet, require to verify the iden- tity of users in order to make sure that users have the re- quired permissions for accessing the data. User’s identity 1. INTRODUCTION consists of data, referred to as identity attributes, that en- Modern data access control models, developed for inter- code relevant-security properties of the users. Because iden- actions across different domains and Internet, allow one to tity attributes often convey sensitive information about users, specify and enforce access control policies, that is, policies they have to be protected. The Oblivious Commitment- regulating accesses to the protected data, in terms of con- Based Envelope (OCBE) protocols address the protection ditions expressed against user identity attributes. Because requirements of both users and service providers. The OCBE such attributes often encode relevant-security properties of protocols makes it possible for a party, referred as sender, the users, they have to be protected as well. The implemen- to send an encrypted message to a receiver such that the tation of such attribute-based access control models thus re- receiver can open the message if and only if its committed quires mechanisms whereby a user obtains access to data if value satisfies a predicate and that the sender does not learn and only if its identity attributes satisfy the service provider1 anything about the receiver’s committed value. The possi- policy, whereas the service provider learns nothing about ble predicates are comparison predicates =, 6=, >, <, ≤, ≥. user’s identity attributes. In this paper, we present an extension that improves the ef- Several approaches based on anonymous credentials [6, 2, ficiency of EQ-OCBE protocol, that is, the OCBE protocol 10, 4, 3] have been proposed to allow users to prove that their for equality predicates. Our extension allows a party to de- identity attributes satisfy conditions in the policies by the crypt data sent by a service provider if and only if the party service provider without revealing the identity attributes in satisfies all the equality conditions in the access control pol- clear. These approaches are based on storing cryptographic icy. commitments of attribute values in certificates and using zero-knowledge proofs protocols [5] to prove properties of these values. A major drawback of those approaches is that, Categories and Subject Descriptors even though the service provider does not learn the attribute values, it learns whether users’ identity attributes satisfy its K.6.5 [Management of Computing and Information policy conditions and may thus infer information about the Systems]: [Security and protection] values of these attributes. The Oblivious Commitment-Based Envelope (OCBE) pro- tocols [9] is an approach that addresses such shortcoming General Terms and can thus satisfy the protection requirements of both the Security service providers and the users. The OCBE protocols allow a service provider to send an encrypted message, containing ∗This work was done while the author was at Purdue Uni- the protected data, to a user such that the user can open the message if and only if the committed value of a specified versity. †This work was done while the author was at Purdue Uni- identity attribute satisfies a predicate. Under such protocol service provider does not learn anything about the user’s versity. committed value and does not learn whether the value satis- fies the conditions in the access control policy. The possible predicates supported by OCBE are the comparison predi- cates, that is, =, 6=, >, <, ≤, ≥. A major drawback of the Permission to make digital or hard copies of all or part of this work for OCBE protocol is that it is only able to enforce a condition personal or classroom use is granted without fee provided that copies are (consisting of a single predicate) against a single identity not made or distributed for profit or commercial advantage and that copies attribute. Therefore, if the access control policy requires bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific verifying conditions against several identity attributes, sev- permission and/or a fee. 1 IDtrust’10 April 13–15, 2010, Gaithersburg, MD We use the term ‘service provider’ to refer to the party Copyright 2010 ACM 978-1-60558-895/7/10/04 ...$10.00. managing and securing the protected data. 63 eral rounds of the protocol have to be carried out which policy of the service provider. It involves three communica- results in inefficient access control. Efficient access control tion parties: a user U, a service provider SP, and a trusted systems are crucial for mobile identity systems and mobile party T which generates initialization parameters for the devices. protocol to use. In this paper, we present the Agg-EQ-OCBE2 protocol There are several cryptographic components in EQ-OCBE: that addresses the efficiency issue of the EQ-OCBE proto- col, that is, the OCBE protocol for equality predicates. Our • A semantically secure symmetric-key encryption algo- approach provides an efficient approach under which the user rithm E (e.g., AES) with keyspace {0, 1}k . We use can quickly decrypt the data, even when multiple conditions EKey [Message] to denote the encrypted plaintext Mes- are imposed against its identity attributes. Like the original sage with encryption key Key under the encryption al- EQ-OCBE, Agg-EQ-OCBE assures user privacy in that the gorithm E. service provider does not learn the values of the user iden- • A finite cyclic group G of large prime order p, over tity attributes nor whether these attributes verify the access which the computational Diffie-Hellman problem is in- control policies. tractable. The group operation is written multiplica- The paper is organized as follows. Section 2 reviews the tively. EQ-OCBE protocol. Section 3 presents the Agg-EQ-OCBE protocol. In Section 4 we prove that Agg-EQ-OCBE is se- • A cryptographic hash function H(·) : G → {0, 1}k . cure against a malicious user. Section 5 describes our imple- mentation and performance measurements. Section 6 con- We shall describe how the EQ-OCBE protocol works in cludes the paper. our case of policy enforcement for an equality condition. Protocol 1. (EQ-OCBE) 2. OVERVIEW OF THE EQ-OCBE PROTO- Parameter generation COL T runs a Pedersen commitment setup program to generate system parameters Param = hG, g, hi. T also outputs the We give an overview of the EQ-OCBE protocol in this order of G, p. section. We shall describe the protocol in a more general Commitment This step is a modified version of the one setting of finite abelian groups. This can be viewed as a described in [9]. Instead of requiring T to generate the Ped- natural extension of the originally proposed EQ-OCBE pro- ersen commitment, we let U perform this procedure and ask tocol [9]. T to verify the validity of the commitment4 . The EQ-OCBE protocol is built on the Pedersen commit- To commit to an element x ∈ Z/(p), U randomly chooses ment scheme [12], which is described in [12] in a particular r ∈ Z/(p), computes the Pedersen commitment c = g x hr , implementation using a subgroup of the multiplicative group and sends c to T. T asks U to open the commitment c, and of a finite field. Note that this is not intrinsic for the scheme. checks that U can indeed commit to the value x. T digitally It also can be implemented using other abelian groups, e.g., signs c and send its signature to U. This is an alternative to elliptic curves over finite fields. the CA-Commit step in the original EQ-OCBE protocol, We rewrite the Pedersen commitment scheme as follows. in which T sends c to SP. By adopting a public-key infras- Definition 1. (The Pedersen Commitment Scheme) tructure, T can go off-line after this step. Later in commu- Setup A trusted third party T chooses a finite cyclic group nications, U sends c as well as its signature from T to SP; G of large prime order p so that the computational Diffie- SP verifies the signature is valid, thus believes that the com- Hellman problem 3 is hard in G. Write the group operation mitment c is valid. In this way, no further communications in G as multiplication. T chooses an element g ∈ G as a are needed between T and U. generator, and another element h ∈ G such that it is hard Interaction to find the discrete logarithm of h with respect to g, i.e., • U makes a data service request to SP. an integer α such that h = g α . T may or may not know the number α. T publishes G, p, g and h as the system’s • Based on this request, SP sends its policy definition, parameters. which requires the value xo ∈ Z/(p) be committed by Commit The domain of committed values is the set of in- U. tegers D = {0, 1, . . . , p − 1}. For a party U to commit a value x ∈ D, it randomly chooses r ∈ D, and computes the • Upon receiving this policy, U sends a Pedersen com- commitment c = g x hr ∈ G. mitment, c = g x hr , signed by T, to SP. Open U shows the values x and r to open a commitment c. • After verification of T’s signature, SP randomly picks The verifier checks whether c = g x hr . y ∈ Z/(p)∗ , computes σ = (cg −x0 )y , and sends to U a The EQ-OCBE is a Diffie-Hellman-like protocol that al- pair hη = hy , C = EH(σ) [M ]i, where M is the message lows the user to correctly retrieve the protected data only if containing the requested data. the user’s committed value equals the one specified by the Open Upon receiving hη, Ci from SP, U computes σ ′ = η r , 2 and decrypts C using H(σ ′ ). ‘Agg’ stands for ‘aggregated’. 3 For a cyclic group G of order q, written multiplicatively, The adapted EQ-OCBE protocol above guarantees that the computational Diffie-Hellman problem is the following problem: Given a randomly-chosen generator g of G, and U can successfully decrypt the ciphertext if its committed g a , g b for random a, b ∈ {0, . . . , q − 1}, it is computationally 4 We say a Pedersen commitment c is valid if its holder, U, intractable to compute the value g ab . is allowed to commit to the value x. 64 value is equal to the one specified in SP’s policy, and that • U makes a data request to SP. it is computationally infeasible for U to do so if otherwise. SP will not know if the message M has been successfully • Based on this request, SP sends its policy, specifying (i) decrypted, without further communications with U. that n values x0 , i = 1, . . . , n, must be committed by (i) U, i.e., U must hold n commitments ci = g H1 (x0 ) hri , i = 3. AGGREGATION OF EQ-OCBE 1, . . . , n, all signed by T, in order to be served. The modification of the original EQ-OCBE protocol works • Upon receiving this policy, U picks its n correspond- for one equality condition. In many cases, we want the user ing commitments ci , all signed by T, and sends these to be able to decrypt a message, containing the protected commitments together with the signatures to SP. Note data, if and only if several equality conditions are all sat- that these signatures can be sent in an aggregated way, isfied. We can do this by dividing the encryption key into up to the requirements and design of the system, as de- many shares, then performing the EQ-OCBE protocol mul- scribed in [8, 1]. We shall use aggregate signature in tiple times, once for each share. More specifically, this can this protocol. be done as follows. • Suppose the user U requests data from the service • SP verifies T’s signatures, in an aggregated way, for all provider SP. commitments ci . SP computes the aggregate commit- ment • SP responds with its policy which specifies that n val- Yn ues x1 , . . . , xn ∈ Z/(p) need to be committed by U in c= ci , order that U can be served. i=1 • U then sends to SP its n corresponding commitments and the value c1 , . . . , cn . n X (i) x0 = H1 (x0 ) ∈ Z/(p). • SP chooses n−1 random messages M1 , . . . , Mn−1 , which i=1 have the same bit length as the to-be-sent message M (containing the data) and sets SP randomly picks y ∈ Z/(p)∗ , computes σ = (cg −x0 )y , n−1 and sends to U a pair hη = hy , C = EH(σ) [M ]i, where M M is the message related to the requested service. Mn = M Mi , i=1 Open where ⊕ denotes the bitwise exclusive-or operation. Upon receiving hη, Ci from SP, U computes n X • SP and U performs the interaction and open proce- dures as above for n times, for n encrypted Mi . r= ri , i=1 • U computes and n M M= Mi . σ′ = ηr . i=1 U then decrypts C using H(σ ′ ). However, such an approach is not very efficient in terms of bandwidth and computation. For n such equality con- Definition 2. (Soundness of Agg-EQ-OCBE) ditions, the number of packets sent in communications and An Agg-EQ-OCBE protocol is sound, if the user U, whose (i) the computational cost increase by approximately n times. committed values x0 , i = 1, . . . , n are those specified by We shall present an aggregated version of the EQ-OCBE SP’s policy, can output the plain-text message M with non- protocol, Agg-EQ-OCBE, which handles multiple equality negligible probability. conditions at the same time, without significantly increas- ing computational cost. Agg-EQ-OCBE also requires less It can be easily seen that Agg-EQ-OCBE is sound. When (i) bandwidth compared to the above n-round EQ-OCBE. ci = g H1 (x0 ) hri , we have that Protocol 2. (Agg-EQ-OCBE) n Y In addition to E, H(·), and G as in EQ-OCBE, another cryp- σ = (cg −x0 )y = ( ci g −x0 )y tographic component, a cryptographic hash function H1 (·) : i=1 ! !y {0, 1}∗ → Z/(p), is used. n Y (i) − n P (i) H1 (x0 ) H1 (x0 ) ri Parameter generation The system parameters Param = = g h g i=1 hG, g, hi are generated in the same way as in Protocol 1. i=1 !y Commitment To commit to an element x ∈ Z/(p), U ran- n P ri domly chooses r ∈ Z/(p), computes the Pedersen commit- = hi=1 = (hr )y = (hy )r = η r . ment of the hash value H1 (x), c = g H1 (x) hr , and sends c to T. T asks U to open the commitment c by revealing x and r. After verifying that x can be committed by U and indeed 4. SECURITY ANALYSIS c = g H1 (x) hr , T digitally signs c and sends the signature to Due to the unconditional hiding property of the Peder- U. U can hold multiple such commitments corresponding to sen commitment scheme, the service provider SP is not able different committed values. to learn whether any of the user U’s attributes satisfy the Interaction (with aggregation) required conditions in the policy. 65 The security analysis of EQ-OCBE [9] implies that when - C generates and sends Param = hG, g, hi to A. C a single commitment is considered, it is hard for a user U chooses and sends x1 , . . . , xn ∈ Z/(p) to A. C chooses to obtain useful information if U’s committed value is not b ∈ Z/(p)∗ , and sends hb to A. equal to that specified by SP, i.e., EQ-OCBE is oblivious. It can be easily seen that a similar argument holds true for - A chooses y1 , . . . , yn , r1 , . . . , rn ∈ Z/(p), with {x1 , . . . , xn } 6= Agg-EQ-OCBE. For the Agg-EQ-OCBE protocol, we have {y1 , . . . , yn }, and sends yi , ri , 1 ≤ i ≤ n to C. A out- the additional concern that a user U who does not possess puts a value σ ′ . all commitments corresponding to the values specified by Q n P n the SP may still be able to correctly decrypt the communi- - C computes c = g H1 (yi ) hri , x = H1 (xi ), and i=1 i=1 cations. For Example, if the SP’s policy requires two com- mitments c1 = g 21 hr1 , c2 = g 35 hr2 to be presented, a user σ = (cg −x )b . U who holds two commitments c3 = g 18 hr3 , c4 = g 38 hr4 can open the envelope, because the two aggregate commitment - A wins the game if σ ′ = σ. c1 · c2 and c3 · c4 have 56 = 21 + 35 = 18 + 38 as their expo- nents for g, although U does not conform to the policy. The Theorem 1. Assume that the computational Diffie-Hellman Agg-EQ-OCBE is designed to prevent such an attack from problem is intractable in G. Model H as a random oracle, happening. and assume that H1 has the property of group 2nd-preimage For the security analysis of Agg-EQ-OCBE, we shall intro- resistance. Then Agg-EQ-OCBE is secure against the user duce a new and reasonable property for the cryptographic U. hash function H1 (·) : {0, 1}∗ → Z/(p) that we use in Agg- EQ-OCBE. This new definition of property relies on the fact The proof of Theorem 1 is reported in Appendix A that the range of the hash function is a subset of a group, in which group operations can be considered. 5. EXPERIMENTAL RESULTS We have performed an experimental evaluation to com- Definition 3. (Group 2nd-preimage resistance) pare the performance of the multiple-round EQ-OCBE and e +) be a finite abelian group of large cardinality5 . Let Let (G, Agg-EQ-OCBE protocols. For multiple-round EQ-OCBE, He : {0, 1}∗ → G e be an unkeyed hash function. We say that we generate the Pedersen commitments by committing to e has the property of group 2nd-preimage resistance if for (i) H(·) the actual values x0 and do not introduce the cryptographic any positive integer m and n sufficiently smaller than |G|, hash function H1 (·). For Agg-EQ-OCBE, we use the hash and for any given m inputs x1 , . . . , xm , it is computationally (i) function H1 (·) and commit to the hash values H1 (x0 ). The infeasible to find n inputs y1 , . . . , yn , with experiment compares the creation time of σ and η at the service provider’s side, which consists of the most compu- {x1 , . . . , xm } 6= {y1 , . . . , yn }, tationally costly part for both protocols, and the derivation such that time of σ ′ from η at the user’s side. We also compare the cre- m n ation time of aggregate commitment and the creation time X X e i) = H(x e i ). H(y for σ and η (“envelope”), both at the service provider’s side. i=1 i=1 We do not include communication time and symmetric en- cryption time in the comparisons, which vary with differ- Note that the group 2nd-preimage resistance property is ent network settings and plaintext lengths, in order to fo- stronger than the well-known 2nd-preimage resistance prop- cus on the core components of the protocols. We also do erty (cf. e.g. [11]) of cryptographic hash functions, where not include the signature verification time in the compar- the latter property is an instance of the former with m = ison, for the same reason. We expect Agg-EQ-OCBE to n = 1. It is not known yet whether the property of group outperform multiple-round EQ-OCBE, when the number of 2nd-preimage resistance is a consequence of the three basic involved commitments increases. properties of a general cryptographic hash function: preim- In our experiment, we choose the group G to be the ratio- age resistance, 2nd-preimage resistance, and collision resis- nal points of the Jacobian variety (aka. Jacobian group) of a tance. genus 2 curve C : y 2 = x5 +2682810822839355644900736x3 + Given this definition, we now can give a security proof of 226591355295993102902116x2 +2547674715952929717899918x+ Agg-EQ-OCBE. 4797309959708489673059350 over the prime field Fq , with Since we assume that E is a semantically secure symmetric- q = 5 · 1024 + 8503491 (83 bits). The Jacobian group of this key encryption algorithm, the ability to decrypt a message curve has a prime order (164 bits)6 : is equivalent to the knowledge of the secret encryption key. When the hash function H is modeled as a random oracle, p = 24999999999994130438600999402209463966197516075699. the user U can compute this secret key H(σ) only if U can compute the value σ = (cg −x0 )y . We therefore say the Agg- The parameter generation program chooses non-unit points EQ-OCBE protocol is secure against the user U when no g and h in the Jacobian group as the base points for con- polynomial time adversary can win the following game with structing the Pedersen commitments. non-negligible probability. In the experiment, we run both multiple-round EQ-OCBE Game: and Agg-EQ-OCBE at the service provider’s side for n(1 ≤ Players: challenger C, adversary A n ≤ 50) Pedersen commitments of randomly generated val- Rules: (i) (i) ues x0 , 1 ≤ i ≤ n. We use x0 as the exponents of g for 5 6 Let |G| denote the cardinality of a set G, for all G. The data is taken from [7]. 66 (i) multiple-round EQ-OCBE, and the hash values of x0 as provider if and only if the user satisfies several equality con- the exponents for Agg-EQ-OCBE, where the hash function ditions. We have proved the security of our Agg-EQ-OCBE H1 (·) : {0, 1}∗ → Z/(p) is built on SHA-1. We also simu- protocol. The experimental results show that the Agg-EQ- late the aggregation of n commitments at the user’s side for OCBE is more efficient than running the EQ-OCBE pro- Agg-EQ-OCBE. For each n, 1 ≤ n ≤ 50, we run 50 rounds of tocol iteratively for each equality predicate. Future work both protocols on n Pedersen commitments. In each round, includes developing efficient OCBE protocols for inequality the n Pedersen commitments under test are different (ran- predicates. domly chosen) and we take the average running times of the 50 rounds. The experimental results are presented in Figures 1: From top to bottom: Acknowledgements - Computation time comparison at service provider’s side The work reported in this paper has been partially sup- of multiple-round EQ-OCBE and Agg-EQ-OCBE; ported by the MURI award FA9550-08-1-0265 from the Air Force Office of Scientific Research. - Computation time comparison at user’s side of multiple- round EQ-OCBE and Agg-EQ-OCBE; 7. REFERENCES - Computation time comparison at service provider’s side of commitment aggregation and envelope creation, for [1] D. Boneh and C. Gentry. Aggregate and verifiably Agg-EQ-OCBE. encrypted signatures from bilinear maps. In Proceedings of Eurocrypt 2003, volume 2656 of LNCS, pages 416–432. Springer-Verlag, 2003. [2] S. Brands. Rethinking public key infrastructures and digital certificates: Building in privacy. MIT Press, 2000. [3] J. Camenisch and E. Herreweghen. Design and implementation of the idemix anonymous credential system. In Proc. Ninth ACM Conf. Computer and Comm. Security, pages 21–30, 2002. [4] J. Camenisch and A. Lysyanskaya. An efficient system for non-transferable anonymous credentials with optional anonymity revocation. In Advances in Cryptology, Proc. EUROCRYPT 01, pages 93–118, 2001. [5] M. Camenisch, J.and Stadler. Efficient group signature schemes for large groups. Advances in Cryptology, CRYPTO ’97, pages 410–424, 1997. Figure 1: Running time comparison [6] D. Chaum. Security without identification: The experiment was performed on a machine running GNU/Linux Transaction systems to make big brother obsolete. kernel version 2.6.9-67.0.1.ELsmp with 4 AMD Opteron (tm) Comm. ACM, 28(10):1030–1044, 1985. Processor 850 2390MHz and 7.36 Gbytes memory. Only one [7] P. Gaudry and É. Schost. Construction of secure processor was used for computation. The code is written random curves of genus 2 over prime fields. In in C++, and built with gcc version 3.6.4, optimization flag Advances in Cryptology – EUROCRYPT 2004, volume -O2. The code is built over the G2HEC C++ library [13], 3027 of LNCS, pages 239–256. Springer-Verlag, 2004. which implements the arithmetic operations in the Jacobian [8] L. Harn. Batch verifying multiple RSA digital groups of genus 2 curves. signatures. Electronics Letters, 34(12):1219–1220, Jun The experimental results show that while in multi-round 1998. EQ-OCBE the running time for composing the EQ-OCBE [9] J. Li and N. Li. OACerts: Oblivious attribute envelopes linearly increases with the number of involved certificates. IEEE Transactions on Dependable and Pedersen commitments, in Agg-EQ-OCBE it is nearly con- Secure Computing, 3(4):340–352, 2006. stant. The experimental results also imply that the over- [10] A. Lysyanskaya, R. Rivest, A. Sahai, and S. Wolf. head of the hash computation introduced in Agg-EQ-OCBE Pseudonym systems. In Proc. Sixth Workshop Selected takes negligible time. We have obtained similar results for Areas in Cryptography, pages 184–199, 1999. the envelope opening operations executed at the user’s side. [11] A. J. Menezes, P. C. V. Oorschot, and S. A. Vanstone. We can see that the operation of aggregation of commit- Handbook of Applied Cryptography. CRC Press, Inc., ments at the service provider’s side takes very little time Boca Raton, FL, USA, 1996. compared to the envelope creation operations. Therefore, [12] T. P. Pedersen. Non-interactive and Agg-EQ-OCBE is more efficient than the solution based on information-theoretic secure verifiable secret sharing. running EQ-OCBE for multiple rounds. In CRYPTO ’91: Proceedings of the 11th Annual International Cryptology Conference on Advances in 6. CONCLUSIONS Cryptology, pages 129–140, London, UK, 1992. In this paper, we have proposed, Agg-EQ-OCBE, an ex- Springer-Verlag. tension that improves the efficiency of the EQ-OCBE pro- [13] N. Shang. G2HEC: A Genus 2 Crypto C++ Library. tocol by allowing a user to decrypt data sent by a service http://www.math.purdue.edu/˜nshang/libg2hec.html. 67 APPENDIX A. PROOF OF THEOREM 1 Proof. We shall show that if there is an adversary A who wins the game with probability ǫ, we can construct another adversary B who can either break the group 2nd-preimage resistance property of H1 , or solve the computational Diffie- Hellman problem in G, with the same probability ǫ. Indeed, B executes the following procedures: • When given a group G, h, ha , hb ∈ G, and x1 , . . . , xn ∈ Z/(p), B gives Param = hG, ha , hi to A. B also sends x1 , . . . , xn , and hb to A. Let g = ha . • B receives y1 , . . . , yn , r1 , . . . , rn , and σ ′ from A, where {x1 , . . . , xn } 6= {y1 , . . . , yn }. Pn Pn • B computes x = H1 (xi ), y = H1 (yi ), and checks i=1 i=1 P n whether x = y. If x 6= y, B computes r = ri , and i=1 outputs δ = (σ ′ (hb )−r )(y−x) −1 , −1 where (y − x) is the multiplicative inverse of y − x in Z/(p). When A wins the game, we have n ! !b ′ Y H1 (yi ) ri −x σ = g h g i=1 = (g y−x hr )b . If x = y, then the group 2nd-preimage resistance property of H1 fails to hold. Otherwise, δ = (σ ′ (hb )−r )(y−x) = g b = (ha )b = hab , −1 i.e., the computational Diffie-Hellman problem is solved. 68 Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Efficient and Privacy-Preserving Enforcement of Attribute-Based Access Control Ning Shang 1,3 Federica Paci1,2 Elisa Bertino1 1 Purdue University, 2 University of Trento, 3 Microsoft April, 2010 1 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 0 Without privacy 2 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord ica me d ce ss to A l l ow ac (1) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u r e cord ica l 61” me d ev el i s ce ss to my l A l l ow ac t or an d (1) “ d o c I’m a (2) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord 1” ica l is 6 ds ” me d e cess t o m y l e v c a l recor a c d i Allow or an me d (1) “ a do c t a c cess mI’ to (2) “ t h e keys are K . Here O (3) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord 1” ica l is 6 ds ” me d e cess t o m y l e v c a l recor a c d i Allow or an me d (1) “ a do c t a c cess mI’ to (2) “ t h e keys are K . Here O (3) “ SP knows a lot about user’s involved credentials 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 1 Privacy-preserving via ZKPK 4 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” vel > 59 t or of le do c ve ura n pro s i f u ca d recor e d ical m s to w a cces A llo (1) “ 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” vel > 59 t or of le do c ve ura n pro s i f u ca d recor to c o ls e d ical p r o s to m roof w a cces l e d ge p A llo w (1) “ -k n o Ze r o (2) 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” ve l > 59 l e r of a d o c to r ve u c a n pro s if u cord ls d ica l r e r oto c o s” s t om e p r o of p l r e cord acce s dge e di c a “Allow - k n owle c c e ss m (1) Ze r o to a (2) e key s e t h re ar OK . He (3) “ 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” ve l > 59 l e r of a d o c to r ve u c a n pro s if u cord ls d ica l r e r oto c o s” s t om e p r o of p l r e cord acce s dge e di c a “Allow - k n owle c c e ss m (1) Ze r o to a (2) e key s e t h re ar OK . He (3) “ SP knows whether the user’s credentials satisfy the requirements or not 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 2 Privacy-preserving via OCBE 6 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord l rec m ed ica s s to ow acce A l l (1) “ 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 m ed ica e v “l e l s s to or”, ow acce a do ct A l l ’ m (1) “ e nts ( “I i tm ) Comm (2 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 ed ica e v e l s to m , “l acce s ctor” ds ) A l l ow ’ m a d o a l recor (1) “ ts ( “I ed i c m i tm e n a c c ess m om ys t o (2) C p e(ke n ve l o (3) E 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 ed ica e v e l s to m , “l acce s ctor” ds ) A l l ow ’ m a d o a l recor (1) “ ts ( “I ed i c m i tm e n a c c ess m om ys t o (2) C p e(ke n ve l o (3) E User can open the envelope iff its credentials satisfy the policy SP does not know the outcome of envelope opening 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE Overview OCBE: Oblivious Commitment-Based Envelope.1 1 Jiangtao Li and Ninghui Li. OACerts: Oblivious attribute certificates. IEEE Transactions on Dependable and Secure Computing, 3(4):340-352, 2006. 8 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE cryptographic building blocks G = hg i: finite cyclic group of order p in which the computationally Diffie-Hellman problem is hard R Pedersen commitment: c = g x hr , where g , h ∈ G , r ← Fp EK : symmetric key encryption algorithm with key K H(·): cryptographic hash function 9 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) secret r c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) secret r c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) c secret r (2) c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) c R secret r (2) (3) y ← Fp , σ = (cg −x0 )y c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) ] c ss age R secret r (2) [ me (3) y ← Fp , (σ ) = EH σ = (cg −x0 )y c = g ID hr C y = h , η (4) 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) ] c ss age R secret r (2) [ me (3) y ← Fp , (σ ) = EH σ = (cg −x0 )y c = g ID hr C y = h , η (4) (5) σ 0 = η r , decrypts C with H(σ 0 ) 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Other OCBE’s GE-OCBE, LE-OCBE, . . . are OCBE protocols for ≥, ≤, . . . predicates. They are performed in a similar fashion as EQ-OCBE, but generally more expensive. 11 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE features Security & privacy: the identity tokens (commitments) are unconditionally hiding and computationally binding X.509 integration: the identity tokens can be put into X.509v3 certificate extension fields 12 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes specified in policy Conjunction of conditions “Allow access if you are a doctor of Hospital A in Indiana” 13 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message Secret message 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution EQ-OCBE on “Hospital A” Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution EQ-OCBE on “Indiana” EQ-OCBE on “Hospital A” Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary This approach works, but... It is not very efficient communication and computation costs increase in proportion to the number of specified attributes 15 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Question Can we do better? 16 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Answer Agg-EQ-OCBE: Aggregate OCBE protocol for equality predicates - handles multiple equality conditions at the same time, without significantly increasing computational cost - also requires less bandwidth 17 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE ideas Techniques to improve the performance make use of the algebraic structure and operations in EQ-OCBE trade more expensive exponentiation operations for less costly addition and multiplication operations 18 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) secret r1 , r2 c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) secret r1 , r2 c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) c2 c 1, secret r1 , r2 (2) c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) c2 R c 1, (3) y ← Fp , secret r1 , r2 (2) c = c1 · c2 , c1 = g 21 hr1 x0 = x1 + x2 , c2 = g 35 hr2 σ = (cg −x0 )y 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) ] c2 ss age R secret r1 , r2 (2) c 1, [ me (3) y ← Fp , (σ ) c1 = g 21 hr1 C = EH c = c1 · c2 , y c2 = g 35 hr2 = h , x0 = x1 + x2 , (4) η σ = (cg −x0 )y 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) ] c2 ss age R secret r1 , r2 (2) c 1, [ me (3) y ← Fp , (σ ) c1 = g 21 hr1 C = EH c = c1 · c2 , y c2 = g 35 hr2 = h , x0 = x1 + x2 , (4) η σ = (cg −x0 )y (5) r = r1 +r2 , σ 0 = η r , decrypts C with H(σ 0 ) to get message 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary One problem Collision Owners of identity token sets   S1 = c1 = g 21 hr1 , c2 = g 35 hr2 and S2 = c3 = g 18 hr3 , c4 = g 38 hr4 will both open the envelope. 21 + 35 = 56 = 18 + 38 20 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Solution Cryptographic hash 21 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) secret r1 , r2 c1 = g H1 (21) hr1 c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c1 = g H1 (21) hr1 c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c2 c1 = g H1 (21) hr1 c 1, (2) c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c2 R c1 = g H1 (21) hr1 c 1, (3) y ← Fp , (2) c2 = g H1 (35) hr2 c = c1 · c2 , x0 = H1 (x1 ) + H1 (x2 ), σ = (cg −x0 )y 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) ] c2 ss age R c1 = g H1 (21) hr1 (2) c 1, [ me (3) y ← Fp , (σ ) c2 = g H1 (35) hr2 C = EH c = c1 · c2 , y = h , x0 = H1 (x1 ) + η H1 (x2 ), (4) σ = (cg −x0 )y 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) ] c2 ss age R c1 = g H1 (21) hr1 (2) c 1, [ me (3) y ← Fp , (σ ) c2 = g H1 (35) hr2 C = EH c = c1 · c2 , y = h , x0 = H1 (x1 ) + η H1 (x2 ), (4) σ = (cg −x0 )y (5) r = r1 +r2 , σ 0 = η r , decrypts C with H(σ 0 ) to get message 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Underlying intractability assumptions Group 2nd-preimage resistant hash H(·) e Given (x1 , . . . , xm ), it is hard to find another tuple (y1 , . . . , yn ) such that Xm Xn e i) = H(x e i) H(y i=1 i=1 Computational Diffie-Hellman problem Given g a , g b , it is hard to compute g ab , without knowing a and b. 23 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Experimental results 24 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Future work More application scenarios Aggregate GE-OCBE and other OCBE protocols aggregation works in certain cases, e.g., when sum of attribute values needs to be ≥ a threshold value 25 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Summary Privacy-preserving attribute-based access control concepts and approaches OCBE overview Aggregate EQ-OCBE Experimental data 26 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary The End Thank you! Questions? nshang@cs.purdue.edu 27 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Efficient and Privacy-Preserving Enforcement of Attribute-Based Access Control Ning Shang 1,3 Federica Paci1,2 Elisa Bertino1 1 Purdue University, 2 University of Trento, 3 Microsoft April, 2010 1 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 0 Without privacy 2 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord ica me d ce ss to A l l ow ac (1) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u r e cord ica l 61” me d ev el i s ce ss to my l A l l ow ac t or an d (1) “ d o c I’m a (2) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord 1” ica l is 6 ds ” me d e cess t o m y l e v c a l recor a c d i Allow or an me d (1) “ a do c t a c cess mI’ to (2) “ t h e keys are K . Here O (3) “ 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Without privacy 59” l e vel > or of r a d o ct s if u l r e cord 1” ica l is 6 ds ” me d e cess t o m y l e v c a l recor a c d i Allow or an me d (1) “ a do c t a c cess mI’ to (2) “ t h e keys are K . Here O (3) “ SP knows a lot about user’s involved credentials 3 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 1 Privacy-preserving via ZKPK 4 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” vel > 59 t or of le do c ve ura n pro s i f u ca d recor e d ical m s to w a cces A llo (1) “ 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” vel > 59 t or of le do c ve ura n pro s i f u ca d recor to c o ls e d ical p r o s to m roof w a cces l e d ge p A llo w (1) “ -k n o Ze r o (2) 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” ve l > 59 l e r of a d o c to r ve u c a n pro s if u cord ls d ica l r e r oto c o s” s t om e p r o of p l r e cord acce s dge e di c a “Allow - k n owle c c e ss m (1) Ze r o to a (2) e key s e t h re ar OK . He (3) “ 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via ZKPK ” ve l > 59 l e r of a d o c to r ve u c a n pro s if u cord ls d ica l r e r oto c o s” s t om e p r o of p l r e cord acce s dge e di c a “Allow - k n owle c c e ss m (1) Ze r o to a (2) e key s e t h re ar OK . He (3) “ SP knows whether the user’s credentials satisfy the requirements or not 5 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Attribute-based access control - Approach 2 Privacy-preserving via OCBE 6 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord l rec m ed ica s s to ow acce A l l (1) “ 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 m ed ica e v “l e l s s to or”, ow acce a do ct A l l ’ m (1) “ e nts ( “I i tm ) Comm (2 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 ed ica e v e l s to m , “l acce s ctor” ds ) A l l ow ’ m a d o a l recor (1) “ ts ( “I ed i c m i tm e n a c c ess m om ys t o (2) C p e(ke n ve l o (3) E 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Privacy-preserving via OCBE ” vel > 59 t or of le do c s i f ura ord ”) l rec =61 ed ica e v e l s to m , “l acce s ctor” ds ) A l l ow ’ m a d o a l recor (1) “ ts ( “I ed i c m i tm e n a c c ess m om ys t o (2) C p e(ke n ve l o (3) E User can open the envelope iff its credentials satisfy the policy SP does not know the outcome of envelope opening 7 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE Overview OCBE: Oblivious Commitment-Based Envelope.1 1 Jiangtao Li and Ninghui Li. OACerts: Oblivious attribute certificates. IEEE Transactions on Dependable and Secure Computing, 3(4):340-352, 2006. 8 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE cryptographic building blocks G = hg i: finite cyclic group of order p in which the computationally Diffie-Hellman problem is hard R Pedersen commitment: c = g x hr , where g , h ∈ G , r ← Fp EK : symmetric key encryption algorithm with key K H(·): cryptographic hash function 9 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) secret r c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) secret r c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) c secret r (2) c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) c R secret r (2) (3) y ← Fp , σ = (cg −x0 )y c = g ID hr 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) ] c ss age R secret r (2) [ me (3) y ← Fp , (σ ) = EH σ = (cg −x0 )y c = g ID hr C y = h , η (4) 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary EQ-OCBE: equality predicate Public Param = hG , p, g , hi, E, H(·) EQ x 0 (1) ] c ss age R secret r (2) [ me (3) y ← Fp , (σ ) = EH σ = (cg −x0 )y c = g ID hr C y = h , η (4) (5) σ 0 = η r , decrypts C with H(σ 0 ) 10 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Other OCBE’s GE-OCBE, LE-OCBE, . . . are OCBE protocols for ≥, ≤, . . . predicates. They are performed in a similar fashion as EQ-OCBE, but generally more expensive. 11 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary OCBE features Security & privacy: the identity tokens (commitments) are unconditionally hiding and computationally binding X.509 integration: the identity tokens can be put into X.509v3 certificate extension fields 12 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes specified in policy Conjunction of conditions “Allow access if you are a doctor of Hospital A in Indiana” 13 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message Secret message 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution EQ-OCBE on “Hospital A” Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Multiple attributes: a straightforward solution EQ-OCBE on “Indiana” EQ-OCBE on “Hospital A” Secret message Secret message EQ-OCBE on “doctor” 14 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary This approach works, but... It is not very efficient communication and computation costs increase in proportion to the number of specified attributes 15 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Question Can we do better? 16 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Answer Agg-EQ-OCBE: Aggregate OCBE protocol for equality predicates - handles multiple equality conditions at the same time, without significantly increasing computational cost - also requires less bandwidth 17 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE ideas Techniques to improve the performance make use of the algebraic structure and operations in EQ-OCBE trade more expensive exponentiation operations for less costly addition and multiplication operations 18 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) secret r1 , r2 c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) secret r1 , r2 c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) c2 c 1, secret r1 , r2 (2) c1 = g 21 hr1 c2 = g 35 hr2 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) c2 R c 1, (3) y ← Fp , secret r1 , r2 (2) c = c1 · c2 , c1 = g 21 hr1 x0 = x1 + x2 , c2 = g 35 hr2 σ = (cg −x0 )y 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) ] c2 ss age R secret r1 , r2 (2) c 1, [ me (3) y ← Fp , (σ ) c1 = g 21 hr1 C = EH c = c1 · c2 , y c2 = g 35 hr2 = h , x0 = x1 + x2 , (4) η σ = (cg −x0 )y 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Agg-EQ-OCBE illustration Public Param = hG , p, g , hi, E, H(·) , EQ x2 E Q x1 (1) ] c2 ss age R secret r1 , r2 (2) c 1, [ me (3) y ← Fp , (σ ) c1 = g 21 hr1 C = EH c = c1 · c2 , y c2 = g 35 hr2 = h , x0 = x1 + x2 , (4) η σ = (cg −x0 )y (5) r = r1 +r2 , σ 0 = η r , decrypts C with H(σ 0 ) to get message 19 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary One problem Collision Owners of identity token sets   S1 = c1 = g 21 hr1 , c2 = g 35 hr2 and S2 = c3 = g 18 hr3 , c4 = g 38 hr4 will both open the envelope. 21 + 35 = 56 = 18 + 38 20 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Solution Cryptographic hash 21 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) secret r1 , r2 c1 = g H1 (21) hr1 c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c1 = g H1 (21) hr1 c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c2 c1 = g H1 (21) hr1 c 1, (2) c2 = g H1 (35) hr2 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) c2 R c1 = g H1 (21) hr1 c 1, (3) y ← Fp , (2) c2 = g H1 (35) hr2 c = c1 · c2 , x0 = H1 (x1 ) + H1 (x2 ), σ = (cg −x0 )y 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) ] c2 ss age R c1 = g H1 (21) hr1 (2) c 1, [ me (3) y ← Fp , (σ ) c2 = g H1 (35) hr2 C = EH c = c1 · c2 , y = h , x0 = H1 (x1 ) + η H1 (x2 ), (4) σ = (cg −x0 )y 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Aggregate EQ-OCBE Public Param = hG , p, g , hi, E, H, H1 (·) , EQ x2 E Q x1 secret r1 , r2 (1) ] c2 ss age R c1 = g H1 (21) hr1 (2) c 1, [ me (3) y ← Fp , (σ ) c2 = g H1 (35) hr2 C = EH c = c1 · c2 , y = h , x0 = H1 (x1 ) + η H1 (x2 ), (4) σ = (cg −x0 )y (5) r = r1 +r2 , σ 0 = η r , decrypts C with H(σ 0 ) to get message 22 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Underlying intractability assumptions Group 2nd-preimage resistant hash H(·) e Given (x1 , . . . , xm ), it is hard to find another tuple (y1 , . . . , yn ) such that Xm Xn e i) = H(x e i) H(y i=1 i=1 Computational Diffie-Hellman problem Given g a , g b , it is hard to compute g ab , without knowing a and b. 23 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Experimental results 24 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Future work More application scenarios Aggregate GE-OCBE and other OCBE protocols aggregation works in certain cases, e.g., when sum of attribute values needs to be ≥ a threshold value 25 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary Summary Privacy-preserving attribute-based access control concepts and approaches OCBE overview Aggregate EQ-OCBE Experimental data 26 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving Attribute-Based Access Control Introduction OCBE Overview Aggregate EQ-OCBE Summary The End Thank you! Questions? nshang@cs.purdue.edu 27 Presented by N. Shang Aggregate EQ-OCBE Privacy-Preserving DRM Radia Perlman Charlie Kaufman Ray Perlner Intel Microsoft NIST radia@alum.mit.edu charliek@microsoft.com ray.perlner@nist.gov ABSTRACT first is based on Chaum’s anonymous cash [5]. The second This paper describes and contrasts two families of schemes that is based on blind decryption [15]. enable a user to purchase digital content without revealing to anyone what item he has purchased. One of the basic schemes is In addition to the basic schemes, we provide various based on anonymous cash, and the other on blind decryption. In methods of enhancing these schemes for functionality such addition to the basic schemes, we present and compare as different costs for different content, ability of a third enhancements to the schemes for supporting additional features party to create content to be distributed by the content such as variable costs, enforcement of access restrictions (such as provider, and enforcement of authorization policies. “over age 21”), and the ability of a user to monitor and prevent Additionally, we examine the scenario where, for DRM covert privacy-leaking between a content-provider-provided box enforcement reasons, there is a sealed box, provided by the and the content provider. As we will show, the different variants content provider on the user’s premises, that communicates have different properties in terms of amount of privacy leaking, with the content provider to acquire keys and does the efficiency, and ability for the content provider to prevent sharing of encryption keys or authorization credentials. actual decryption. We examine the problem of whether the user can detect or prevent the sealed box from covertly telling the content provider what content the user is Categories and Subject Descriptors decrypting. We show that it is impossible, if the user is C.2.0 [Computer Networks]: General – Security and protection. only passively monitoring the channel, for the user to know K.4.1 [Computers and Society]: Public Policy Issues – privacy. E.3 [Data]: Encryption whether the box is indeed leaking information. We then show a mechanism in which the user can cooperate with General Terms the box in forming the message to be sent to the content Algorithms, Design, Economics, Security, Human Factors. provider, and be assured there is no collusion going on, without impacting the ability of the content provider to Keywords enforce DRM. Algorithms, Protocols, Blindable Parameterizable Public Key, Although the focus of this paper is not the cryptography, Privacy, DRM. we do introduce a new variant of asymmetric keys; the ability to have a family of blindable keys, parameterized by 1. INTRODUCTION an arbitrary string, which we will use to encode Most work in the field of Digital Rights Management information such as authorization policies or monetary (DRM) focuses on the problem of preventing its units. This functionality can be provided by a somewhat circumvention. This paper looks at a different problem: unusual use of identity based encryption (IBE), but we also how to charge for the use of content while allowing the introduce two alternative algorithms, which lack some of user to maintain her privacy (in the sense of not revealing the properties of IBE that are not needed for our to the content provider what content was purchased by application. which user). In some scenarios, privacy is of greater concern to the user than the payment required. This paper We will assume that there are enough items of content presents and contrasts two basic approaches, plus variants, distributed by the content provider that the mere fact that a of systems in which content is distributed in encrypted user is doing business with the content provider, and the form, and the user pays to receive a decryption key. The amount of money the user spends with the content provider, is not a privacy issue. However, as we will show, Permission to make digital or hard copies of all or part of this work for privacy leaking is not absolute, and some of the solution personal or classroom use is granted without fee provided that copies are variants have different tradeoffs. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy Encrypted content must be accessed anonymously, though otherwise, or republish, to post on servers or to redistribute to lists, that is not the focus of the paper. Encrypted content might, requires prior specific permission and/or a fee. for instance, be broadcast video, or content posted on the IDtrust ’10, April 13-15, 2010, Gaithersburg, MD. Internet. If the content is broadcast, say from a satellite or Copyright © 2010 ACM ISBN 978-1-60558-895-7/10/04… $10.00. 69 via cable TV, there may be no problem with accessing the In sections 2 and 3 we present the two basic schemes encrypted content anonymously. If the encrypted content is (anonymous cash in section 2 and blind decryption in downloaded from the Internet, some sort of anonymization section 3.) In section 4, we compare the efficiency of the technique would be required, e.g., [10], [11], [16]. two schemes and conclude that the blind-decryption-based scheme offers superior performance and lower overhead, In addition to the encrypted content for an item, there will while the anonymous-cash-based scheme provides the be associated metadata that can be used to decrypt the additional functionality of allowing the content provider to content with the help of the content provider. For example, do per-item accounting. metadata may include a decryption key for the content, encrypted with the public key of the content provider. In section 5, we propose modifications of the two schemes Metadata may also contain other information, such as an that allow the content provider to charge different amounts authorization policy for accessing the content. for various pieces of content, without compromising the user’s privacy. In section 6, we propose similar Another aspect of DRM, also not the primary focus of this modifications of the two schemes so that the content paper, is how to prevent a user from copying content and provider can enforce authorization policies (such as “over sharing it with others. There has not been a foolproof 18”, “citizen of US”, or “citizen of any country except technical solution, especially since the analog output of Monaco or Grenada”) that might restrict access to some video and audio has to be available. For instance, it is not content. We also discuss the comparative implications on uncommon for people to carry a camcorder into a theater, our scheme variants when authorization policy might be record the movie as it is played, and then sell copies later. very complex. In section 7 we consider how the user, while Various proposed solutions for enforcing DRM include communicating with the content provider using a sealed threats of prosecution if caught illegally copying and box can be assured that the box is not covertly leaking distributing, watermarking to discover which copy leaked information about the user’s purchases to the content [1], [7], [4], [9], and various software and hardware provider. techniques to prevent copying [14], [12]. Even though there might never be a foolproof technical solution, it is common today for digital content to be distributed with some degree of copy protection, even in software-only 2. First Scheme: Basic anonymous-cash- systems. This is evidence that content providers believe based DRM that copy protection deters a sufficient amount of copying that the complexity (and customer annoyance) of the DRM 2.1 The concept of anonymous cash is of positive value (to the content provider). Chaum [5] introduced the concept of anonymous cash. The basic idea is that a data structure with a particular syntax, So this paper is not about how to make DRM itself more signed with the bank’s private key, is worth a fixed amount secure; it is instead focused on enhancing DRM with of cash. The data structure includes a random number large additional functionality. enough to assure that independently chosen values will be DRM enforcement commonly involves using a sealed box unique. The anonymity comes from the construct of blind (e.g., the box that a video satellite provider installs at the signatures, where Alice can get the bank to sign something user’s house with a subscription to his service). We assume without the bank knowing what it is signing. in such deployments: Alice chooses a random number R, hashes it, and formats it according to the rules of valid currency. Alice “blinds” it,  The box’s only means of communication with and presents the blinded result to the bank, which signs the anything is through a channel that the user can result with its private key. Then Alice applies the blinding monitor. function’s inverse function (“unblind”) to obtain a value  The user can modify messages to/from the box we will refer to as “the bank’s signature on R”. (the user can place an additional box, along the The bank will not know the values of R that Alice has channel that the box uses to communicate with purchased, so when R is “spent” the purchase cannot be everything else). traced to Alice, though the bank will know how many  The user cannot examine the logic inside the box tokens Alice has purchased. Merchants accepting the to determine whether it is indeed designed not to anonymous cash can verify it is valid by checking the divulge the user’s identity. bank’s signature. The only problem is assuring that Alice doesn’t spend the same valid unit of anonymous cash more This fairly common deployment scenario leads to than once. If there is only one place accepting the interesting functional differences between the schemes anonymous cash (in this case the content provider), then presented in this paper. double spending can be prevented by having the content 70 provider remember all the R’s that have been spent. Alternately, if the bank issuing the anonymous cash is Alice Content provider online, then the cash can be spent with multiple merchants, provided that the bank remembers all the R values used and R, signature on R, content ID is consulted by each merchant on each transaction before the anonymous cash is accepted. K Chaum, Fiat, and Naor extended the notion of electronic cash to allow for an offline bank [6]. In this scheme, Alice might successfully spend digital cash multiple times, but once the bank collects the transactions (the spent cash), the culprit’s identity will be revealed. Since the transaction where Alice is requesting a content The latter anonymous cash scheme is more complex and key must be anonymous and encrypted, the metadata for an expensive and our application does not require the off-line item could simply be the item’s ID, and the content assumption. We will therefore use the simple notion of provider would keep a table of (content ID, content key) random R’s, that have been blindly signed in advance to pairs. (In contrast, as we will see in section 3, in the blind indicate that the holder of the signed R is allowed to trade decryption scheme, the metadata for an item must be {K}P, that R for a unit of merchandise. i.e., the content key encrypted with the content provider’s public key.) 2.2 Using anonymous cash for DRM In our application there is no reason for there to be a third However, it might be preferable, even in the anonymous party (the bank) providing general purpose tokens that can cash scheme, for the metadata to be {K}P rather than be spent with multiple merchants. Alice can directly simply a “content ID” if: purchase tokens from the content provider.  the content is to be prepared by a 3rd party; otherwise, it would be necessary for the 3rd party 2.2.1 Obtaining cash to securely relay the content key for that content to This will be done non-anonymously, in a conversation that the content provider. must be authenticated and encrypted. The shaded text box indicates encryption.  it were inconvenient for the content provider to securely keep a large table of (content ID, key) Alice must pay for the cash through some mechanism such pairs. as a credit card, or having a pre-paid account with the content provider debited when she obtains 3. Second scheme: Blind decryption cash. In this second scheme, we use blind decryption instead of Alice content provider blind signatures. Blind decryption is similar in spirit to blind signatures, but there are more algorithms that work Blinded R, proof I’m Alice for blind decryption than blind signatures because blind decryption does not require a “public” key. Blind decryption works with various schemes including RSA Signature on blinded R keys (as with blind signatures), Diffie-Hellman keys, and IBE (identity based encryption). 2.2.2 Purchasing content 3.1 Mechanics of Blind Decryption To purchase content, Alice presents the anonymous cash, 3.1.1 RSA Keys together with the metadata for the content she wishes to With RSA keys, blind decryption is a simple variant of access, and the content provider returns the content key. blind signatures. If the content provider’s public RSA key This interaction must be both anonymous (because the is (e,n), with the private key being (d,n), then the encrypted content provider will know what content is being requested data key K will consist of Ke mod n. and must not know who is requesting it) and encrypted To obtain K, Alice blinds Ke mod n by choosing a random (since otherwise an eavesdropper could steal the cash or the number R, “encrypting” R with the content provider’s content key). The cloud in the diagram indicates an public key to obtain Re mod n, multiplying the two anonymization infrastructure. Note that an anonymization quantities together to obtain (Ke * Re mod n), and infrastructure is very expensive in terms of computation presenting the result to the content provider, which uses its and bandwidth [10]. private key by raising to d mod n, resulting in K*R mod n, which it returns. Alice divides by R mod n to obtain K. 71 3.1.2 Diffie-Hellman Keys will be using it in a different way. In IBE, as traditionally Blind decryption can work with Diffie-Hellman keys, used, there is a master key generator. Anyone knowing the chosen from any Diffie-Hellman group, including elliptic domain parameters can generate a public key from a string, curves. We will call the operations “multiplication” and and the master key generator calculates the corresponding “exponentiation” although, in the literature, elliptic curve private key (using the domain secret), and gives the private operations are usually called “addition” and key to the public key’s rightful owner. “multiplication”. But we find the description with However, in our schemes, there is only one “rightful public multiplication and exponentiation more clear for people key owner” -- the content provider. In the way we use the who are familiar with Diffie-Hellman but not with elliptic BF scheme, the content provider will act as the master key curves. That way the formulae work with both mod p generator, in the sense of knowing the domain secret, but it Diffie-Hellman and with elliptic curves. Note: the Diffie- will not give private keys to anyone (other than calculating Hellman blind decryption we are presenting is a its own private key). Other parties will never know any simplification of one presented in [15], and it works for private keys; they will only know the domain parameters in blind decryption, but would not work as a blind signature order to obtain the content provider public key. scheme. Also, for brevity, assume the operations are being done mod p (rather than having us say “mod p” each time). In “normal” IBE, there would be a family of public keys, parameterized with a string “ID”. At this point in the paper, Assume the content provider’s public Diffie-Hellman key we only need a single public key (the content provider’s is gx, and the private key is x. public key), so we can assume that “ID” is a constant. Later A content key K is of the form gxy. If the encryption in the paper (section 6.3.3) we will want to use a string to algorithm requires a particular form factor for the key, such create a family of keys, but they will all still be public keys as being 128 bits, then some function would be performed belonging to the content provider. on gxy to convert it to the right form factor, such as a To create a blindable public key, we will modify a cryptographic hash. simplified version of the Boneh-Franklin IBE scheme. The The metadata associated with the item that is encrypted BF scheme uses a bilinear map ê(P,Q), (usually a twisted with key gxy includes gy. Weil or Tate pairing) which maps two order q elliptic curve In other words, gxy (or more likely a cryptographic hash of points to an order q finite field element, and has the gxy) is used as a symmetric encryption key (for any property that ê(Pa, Qb) = ê(P,Q)ab, for points P, Q and symmetric key algorithm such as AES) to encrypt the integers a, b. The security of BF relies upon the Bilinear content, and the metadata includes gy. To decrypt the Diffie-Hellman assumption that given P, Pr, Ps, Pt, it is content, Alice must obtain gxy. If blinding were not difficult to find ê(P,P)rst. necessary, Alice could send the content provider gy and In the case of the basic IBE scheme, a trusted server called have the content provider apply its private key (i.e., the private key generator chooses a secret integer s and an exponentiate by x) and return gxy mod p. But we need this elliptic curve point P, and it publishes as system parameters operation to be blinded. Ps, P, and a specification of the group that P lives in. The Each item of content distributed by a particular content private key generator can generate a private key provider is encrypted with a different key (a different y was corresponding to any public key, “ID”, by using a special chosen), but they all use the same secret x. The value y is hash function H to map “ID” to an element of the group independently and randomly chosen for each item. generated by P. We will write H(“ID”) as Pt, despite the fact that no party, including the key generator, will be able To blind gy mod p so that the content provider cannot know to compute t. This notation (Pt) is simply used here to make which key Alice is purchasing, Alice chooses a value z and the bilinear Diffie-Hellman problem embedded in the computes z-1 mod q, where q is the order of the cyclic scheme more transparent. The private key corresponding to group generated by g. For mod p groups, q is a large factor “ID” is H(“ID”)s, which may also be written as Pts. To of p-1. She raises gy to z to obtain gyz and sends that to the obtain a shared secret key with the holder of the public key, content provider. “ID”, an encryptor chooses a random number r and The content provider raises this to its private key (x) and transmits Pr. The shared secret is then ê(P,P)rst, which is returns to Alice: gxyz. calculated as ê(Ps, H(“ID”))r = ê(Ps, Pt)r by the encryptor and ê (Pr, H(“ID”)s) = ê (Pr, Pts) by the holder of the public Alice unblinds gxyz by exponentiating by z-1 to obtain the key “ID”. content decryption key gxy. Blinding may be added as follows: suppose a message is 3.1.3 Identity-based encryption(IBE) encrypted with ê (Pr, H(“ID”)s), and you know Pr and “ID”. The Boneh-Franklin (BF) scheme used in IBE [2] can also You want to decrypt the message with the help of the “ID” be used by our scheme for blind decryption, although we holder, but you don’t want him to find out which value of 72 Pr was used, since that would unambiguously identify the 4. Comparison of the basic schemes message you are trying to decrypt. You can do this by 4.1 Efficiency choosing a random blinding factor, b, and sending Prb to The blind decryption scheme is dramatically more efficient the holder of “ID”. He will send back ê (Prb, H(“ID”)s) = ê than the anonymous cash scheme because the blind (Prb, Pts) = ê (P,P)brst. You can now get ê(P,P)rst, by raising ê decryption scheme does not need an anonymization (P,P)brst to the b-1(mod q). infrastructure. Also, the anonymous cash scheme needs two 3.2 Purchasing Content with Blind Decryption conversations: a (nonanymous) conversation to purchase In the anonymous cash scheme, when Alice is purchasing a tokens, followed by an anonymous, encrypted conversation content key, she must do it anonymously, and the to request (and pay for) a content key. In contrast blind conversation must be encrypted. In our blind decryption decryption only needs a single interaction; debiting Alice’s scheme, it is not necessary for the conversation to be account and having Alice request a content key are done in anonymous or encrypted, but it does need to be integrity- the same (nonanonymous) two-message exchange. protected (signed by Alice). Another important difference is that with the blind There is no need for an anonymizing network. The content decryption scheme, the content provider only requires a provider will know which user (Alice) is accessing an item, single private key operation (to blindly decrypt {K}P). The and it can debit her account at that time, but it will not anonymous cash scheme requires one private key operation know which item Alice is accessing. for the content provider to blindly sign each token, as well as a private key operation to establish the server-side- The protocol for requesting decryption is for Alice to send authenticated encrypted channel required for content key the content provider a message containing Alice’s identity requests. The anonymous cash scheme is also likely to (so her account can be charged for the decryption), along require an additional private key operation to set up the with an encrypted blob (consisting of the blinded encrypted encrypted conversation in which Alice purchases tokens, key) that the content provider will “decrypt” with its although it could be done with a long-term shared secret private key. (“Decrypt” is in quotes because the result will key between Alice and the content provider, and many still be encrypted with the blinding function). This message tokens can be purchased in the same conversation. must be signed by Alice, e.g., with a MAC using a secret Additionally, although we showed a protocol, where the key Alice shares with the content provider, or signed with metadata is the content ID, and retrieving the content key is her private key, because her account will be debited for the a table lookup, that scheme requires the content provider to cost of the decryption and we must assure that a third party keep a large database (keys for all the content items). As cannot request a decryption be charged to Alice. It also such, it is likely preferable for the metadata to be {K}P, in must be resilient against replays, so an eavesdropper cannot which case the anonymous cash scheme would require at cause Alice to be charged multiple times for the same least three private key operations for the content provider, decryption. versus one for the blind decryption scheme. A simple method of avoiding replays without adding The main expense of the anonymous cash scheme messages is for Alice to include a timestamp, have the (compared to the blind decryption scheme) is the cost of content provider store the timestamp of the previous the anonymization infrastructure, both in bandwidth and decryption request from Alice, and ensure that the computation, placing computational burdens not just on timestamps from Alice are monotonically increasing. A Alice and the content provider but also on the relay nodes. sequence number could be used instead of a timestamp. Although obtaining the encrypted content (in either Alice will not be anonymous in this scheme. She will scheme) might in some cases require an anonymization authenticate to the content provider, and her account will network, there are scenarios (such as acquiring content be debited for each decryption of a content key she through broadcast video) in which the blind decryption requests. The content provider will know that Alice has scheme would not need such a channel. However, the purchased some content, but not which content. anonymous cash scheme will always require the existence [“Alice”, timestamp, B( {K}P )] signed by Alice of an anonymization infrastructure (though in most descriptions of anonymous cash in the literature, this Alice Content important detail is omitted). B(K) 4.2 Per-item accounting The anonymous cash scheme allows the content provider to Using blind decryption to obtain a specific encryption key know how many people have purchased each item of content, (although it does not know specifically which people have purchased which content). In contrast, the blind decryption scheme does not allow this. 73 It might be important in some applications for the content denominations of the content provider’s public key are 1, provider to know how many people have purchased each 10, and 100), the metadata associated with the 14-unit item item, in order to determine the royalty amount for each would contain 5 wrapped keys; ( (unit=10, {K1}P2), content contributor. However, many schemes deployed (unit=1, {K2}P1), (unit=1, {K3}P1), (unit=1, {K4}P1), today (e.g., premium TV channels that show many movies) (unit=1, {K5}P1) ). Alice would need to do 5 blind do not have any mechanism for the content provider to decryptions, each time specifying the unit, e.g., know how many people have watched specific movies. [“Alice”, timestamp, B( {K}P2 ) unit=10 ] signed Payment to receive a premium channel is a flat rate by Alice regardless of how much or which content is accessed within that channel. So in many applications this per-item And the 5 keys would be cryptographically combined to accounting is not required. form the content key. Note that if the metadata gives Alice the choice of 5. Variable Charging unwrapping 14 single-unit keys, or 5 variable-unit keys It is possible that some content might cost more than other (e.g., a ten and 4 ones), then these keys could not be simply content. With the anonymous cash scheme, it is simple to be hashed together to form the content key. Either the charge different amounts for different content, since the function would have to be  (where it is easy to make two content provider knows which key is being requested. So, different sets yield the same answer), or if a hash was used, the content provider could require n tokens to purchase an you’d wind up with two different quantities, say K1, and item worth n units. K2. The real content key C could be stored in the metadata This straightforward approach does not work in the blind as {C}K1 and {C}K2, so that C would be retrievable decryption scheme, since the content provider does not whether Alice had computed K1 or K2. know which key it is decrypting. 5.3 Issue: Privacy and large-unit tokens or 5.1 Multiple Keys decryptions In blind decryption, a piece of content that costs n units of If all G-rated content cost 1 unit, and all X-rated content money could require n encryption keys and n decryption cost 10 units, the variable charging could leak information. requests. So for instance, the metadata for an item costing n In the anonymous cash scheme, Alice could buy anything units could contain, for i = 1 through n, {Ki}P. Alice would she wants with (lots of) unit tokens, and the content need to decrypt each of the Ki and then perhaps  them or provider would not know who was purchasing the hash them together to obtain the content key. expensive content. Or even the fact that she has purchased Note that requiring n decryptions or requiring n blindly a large denomination note does not mean she is intending signed tokens to purchase an item worth n units puts a to buy a single expensive item, since she could pay for burden of n-1 additional private key operations on the multiple single-unit purchases in the same transaction with content provider in either scheme (either it has to blindly a single, large denomination note. sign n tokens or do n blind decryptions). With the blind decryption scheme, Alice is not anonymous, and has to unwrap the content in the same denominations 5.2 Multiple-value tokens and multiple-value that it was wrapped. To help protect privacy: public keys  Alice could spread decryptions over time, so the Instead of making an item worth n units require n private content provider wouldn’t be able to tell the exact key operations, we can make it require, say, log2 n amount of any item (e.g., for a 14-unit item, she operations, using either anonymous cash or blind could request decryption of the 10-unit key at a decryption, by having the content provider have different different time from requesting the 4 single-unit public key pairs for different denominations of money. keys). For instance, with the anonymous cash scheme the content  The content provider could provide metadata for a provider could have public keys: P1 worth 1 unit, P2 worth 14-unit item that would allow retrieving the item 10 units, and P3 worth 100 units. When Alice purchases using n single-unit decryptions, rather than the anonymous cash, she can specify the denomination that she smaller number of decryptions possible using would like. If she specifies she wants a 100-unit token, the larger denomination keys. Both types of metadata content provider would debit her account 100 units of could be provided, giving Alice the choice. So, the money and blindly sign the token with public key P3. To content key could be the  of 14 single-unit purchase something worth 14 units, she could present 14 single tokens, or a 10-unit token plus 4 singles. decryptions in the metadata, or the  of a ten-unit decryption plus four single-units. This savings can also be done with blind decryption. Suppose there was an item worth 14 units. (Assuming the 74 To avoid having users opt for unwrapping content using with a subscription to satellite TV or cable. But software- single unit keys (putting a computational burden on the only DRM schemes are prevalent today, even though they content provider), the content provider could provide other aren’t 100% effective, so they must be sufficiently content (rather than just X-rated content) that is worth more effective at deterring sharing to satisfy the content than one unit, for instance a package of all the Disney providers. movies together, or entire seasons of “Little House on the Assume that for each authorization category (e.g., over 21, Prairie”. Or, the content provider could provide a discount citizen of country X) there is a server that can determine for using the larger-unit keys (the metadata for a 14-unit whether someone is a member of the relevant group or has item could give Alice the choice of unwrapping 14 single- the relevant attribute. If Alice can prove to that server that unit keys, or, say, a 10-unit key and two single-unit keys, she has attribute Z, that server presents her with a secret, so that the item would cost only 12 units if she uses the SZ. To prevent an eavesdropper from stealing the secret, the larger denomination key. In the case of purchasing conversation in which Alice obtains SZ must be encrypted. anonymous cash, the content provider might provide To prevent Alice from sharing SZ with unauthorized users, discounts for large-value tokens, e.g., charging 9 units to some sort of DRM scheme must be in place. obtain a 10-unit token. Since it is common to have multiple users in the same 6. Authorization Categories household sharing a system, and they might have different In some cases it is not sufficient to pay for content; one authorizations (e.g. the system may be shared by parents must also be authorized to purchase that particular content. who are over 21 and children who are not), there must be For example, X-rated content might only be legally some ability to maintain multiple distinct accounts. There purchasable by someone over age 21. Or some other will also need to be some sort of login, so that the system content might only be legal to sell to citizens of some knows on which user’s behalf it is acting. The system countries. The system must allow anonymous purchase, but should keep a database, for each user, of items such as only to qualified individuals. authorization secrets, content keys, and anonymous cash In sections 6.1, 6.2, and 6.3 we discuss three methods of tokens. providing for authorization, and if/how each of the two When anyone in the household purchases a content key, it basic schemes can be modified with each of these: would be a matter of policy whether that key would also be  Authorization secrets used as credentials made available to all the household accounts that would be authorized to view that content, or whether each account  Authorization secrets used as content key would need to purchase the content separately. It might be components a privacy concern, for instance, for household members to  Authorization category-specific public keys see which items have already been purchased by some other household member. The various approaches have different tradeoffs in terms of amount of privacy information leaked, efficiency, To lessen the threat of authorized users sharing functionality, and ability to prevent credential sharing. authorization secrets with others, given that a DRM scheme Section 6.4 compares the three methods, while section 6.5 is likely not to be 100% effective, the authorization secret gives cryptographic techniques that can be used to make can, in some of our schemes, be changed periodically, and authorization category-specific public keys more efficient. then authorized users will need to get the new value when their old value becomes invalid. In one of our schemes Regardless of the method used to add authorization, the (authorization category-specific public keys), there are no authorization policy for an item must appear in the authorization secrets to share. metadata in cleartext, so that Alice can tell what types of authorization she must obtain in order to purchase the item. 6.1 Authorization secrets as credentials We will use the term “ACL” (Access Control List) to mean This scheme only works with the anonymous cash scheme. the authorization policy associated with an item, and we When Alice is anonymously requesting a decryption, she assume it can consist of any Boolean combinations of presents all the authorization secrets (A1, A2, A3) that groups, roles, identities, attributes, etc. prove she satisfies the ACL for the requested item, along An obvious concern is that any sort of authorization secret with anonymous cash. It will be known which could be copied and sent to non-authorized users. authorization secrets Alice has ever obtained, but not However, this is not a special concern with authorization, whether she ever uses them to purchase ACL-restricted since this is also true of the content keys. The entire system content. For maximum privacy, it might be best for Alice to depends on some sort of DRM enforcement to hinder automatically request all authorization keys for which she sharing of content keys as well as authorization secrets. is eligible so as not to leak any hints about what kinds of One mechanism, which we will explore in greater depth in content she might be seeking. An authorization secret section 7, is to use a sealed box like the one that comes would only need to be obtained once (per user), and that 75 would enable that user to access any content that requires secret has changed, the user will have to obtain the new that authorization. secret. 6.2 Authorization secrets as content key components Alice Content provider This variant works with either anonymous cash or blind decryption. We assume that Alice obtains a (symmetric) R, signature on R, A1, A2, A3, content ID encryption key for each authorization category that she qualifies for. As with section 6.1, it will be known which authorization secrets Alice obtained, but not whether she K ever purchases content requiring them. This scheme can handle any Boolean combination of authorization categories. To access an item that requires, say, authorizations X and Y, Alice would need to have obtained authorization secret keys KX and KY, in addition Content provider looks up ACL associated with to the K wrapped inside the metadata. So, the metadata “content ID”, and verifies that A1, A2, and A3 might consist of: ({K}P, {K1}KX, {K2}KY). The decryption are sufficient credentials to satisfy the ACL key for the content could be, for instance, h(K,K1,K2). Alice unwraps {K}P with the help of the content provider, It is straightforward to accommodate complicated but is able to unwrap K1 and K2 because she knows KX and authorization policy, e.g., of legal age in the country of K Y. residence. Since the ACL is part of the metadata, the client The OR operation would require organizing the metadata can calculate what credential secrets need to be sent to to give the client the choice as to what to unwrap. For satisfy the policy. The content provider can know what the example, if the ACL was “citizen of US OR citizen of policy for that content is in one of two ways: Canada”, the metadata might contain ((“citizen of US”,  The content provider stores, for each item of {{K}P}KUS) , (“citizen of Canada”, {{K}P}KCANADA)). content, (content ID, key, ACL) If there were an ACL such as “citizen of any country other  To save the content provider from keeping such a than Monaco” this would require a large amount of large table, the metadata for the content would be metadata, since that would be the OR of hundreds of [{K}P, ACL] signed by content provider. countries. In contrast, the authorization claim secrets scheme (6.1) only requires that Alice present the single However, there is a potential for privacy leaking. If there is authorization claim secret for some country other than a group with a very small number of members, and Monaco (we won’t worry about whether someone who is a someone requests access to something requiring being a dual citizen is allowed to see content in this case). member of that group, there is no way to avoid leaking that someone from that group accessed that item. Even if all In this scheme (using the authorization secret as a groups were large, it could be that the intersection of decryption key), it is not as easy to periodically change an several groups could be very small. If access to the item is authorization secret as it would be in scheme 6.1. It could the AND of a bunch of groups, it is unavoidable (with this be done, but it would involve preparing new metadata for scheme) to divulge that someone who is in the intersection all affected content. of all the groups has accessed the item. The issue is with the OR of several groups. Suppose the 6.3 Authorization category-specific public keys ACL says that you must be accredited as fluent in at least 3 In this scheme, the content provider has different public languages, and Alice happens to know Bulgarian, Bengali, keys, one for each authorization group. In the blind and Navajo. When the anonymous requester presents those decryption scheme, this would mean that an encryption key three credentials, it will narrow the potential requesters to a for an item would be wrapped with a category-specific very small set, even though each of the groups is large, and public key. In the anonymous cash scheme, it would mean even though the ACL would usually allow for satisfaction that the cash token would be signed with a category- while still being part of a very large potential set (e.g., with specific public key. In other words, in the blind decryption English, French, Spanish). request, Alice would specify “blindly unwrap this using One feature of this scheme (as opposed to the one we will your ‘US-citizen’ key”, and in the anonymous cash present in the following section), is that it is relatively easy purchasing request, Alice would specify “blindly sign this to periodically change the authorization secrets, to mitigate using your ‘US-citizen’ key”. against some stealing of credentials. When an authorization 76 These could be completely independent keys, or they could two schemes: the content provider will know how many be derived cryptographically using any of the methods that decryptions Alice is asking for, for each ACL. we will present in section 6.5. On the other hand, using either authorization scheme raises 6.3.1 Boolean combinations with blind decryption revocation issues to a greater or lesser extent: An With blind decryption, Boolean combinations of authorization secret could be stolen, or Alice might no authorization categories can be handled the same way as in longer be authorized in some category (say, her scheme 6.2. In other words, an item requiring membership in an organization has lapsed). If authorizations A1 AND A2 could be encrypted with h(K1 communication to the content provider is done with a ,K2) and include as metadata (A1: {K1}PA1) and (A2: sealed box, or with reasonably trusted DRM software, then {K2}PA2). Alice would have to unwrap both keys to read the content provider could keep the authorization secrets in the item. The keys would have to be half the price of the the client up to date. For instance, if “current member of intended cost of the item. The metadata for A1 OR A2 ACM” is required for some types of content, the content would be similar, but just have a single K, such that provider could communicate with ACM periodically to get unwrapping either quantity will work, as in: ((A1: {K}PA1) its list of members, and then install the “ACM” OR (A2: {K}PA2)), and either of those unwrappings would authorization secret into the boxes (or software) of all the be the actual cost of the item. authorized users, and remove the secret from boxes (DRM software) of users who were, but are no longer, members. 6.3.2 Boolean combinations with anonymous cash With anonymous cash (assuming the metadata is just the Given that even with DRM, authorization secrets might be content ID), it works somewhat like scheme 6.1, in that a stolen by determined attackers, it is an advantage of cash token signed with an authorization-specific key works scheme 6.1 that the secrets can be changed periodically. both as a unit of currency and as proof of authorization. If In contrast, with multiple content provider public keys Alice has to prove A1 OR A2, she merely presents a token (6.3), revocation is very simple. All that is required is that signed with either the A1-specific public key or the A2- the content provider keep track of all of Alice’s specific public key. If Alice has to prove A1 AND A2, authorizations. If, for instance, her membership in an during the anonymous content request, she could present organization lapses, that organization would inform the two (half-price) tokens, one signed with A1 and one signed content provider, which would remove membership in that with A2. organization from Alice’s profile and no longer allow Alice to decrypt anything requiring that authorization. With 6.3.3 ACL-specific keys anonymous cash-based authorization-specific content An alternative for Boolean combinations is to have a public provider schemes, once Alice has obtained authorization- key which is specific to the entire ACL, e.g., a specific specific cash tokens it will not be possible to take them public key for “(paid up member of ACM OR IEEE) AND back (unless enforced through the DRM citizen of US”. In other words, in the blind decryption software/hardware). scheme, the metadata would consist of {K}PACL-string. In the anonymous cash scheme, the client would request a cash token signed with the ACL-specific key PACL-string. 6.5 Blindable Parameterizable Keys That approach has the disadvantages of In this section, we present a new cryptographic tool;  requiring a lot of content provider keys (but in blindable parameterizable keys, and give several ways of section 6.5 we will explain how that can be accomplishing this. Armed with such functions, the content practical), and provider can have a family of keys, parameterized by the ACL.  leaking privacy, because although there might be a lot of items of content requiring each of the 6.5.1 Using Identity Based Encryption component authorization categories, there might The notion of keys parameterized by a string sounds a lot be very few (or even just a single one) with the like IBE [17] [2], and indeed the same math can be used for specific combination of those categories in the parameterizable blind decryption (but not blind signatures), ACL. but we are using IBE in a different way. We described in section 3.1.3 how to use IBE for blind 6.4 Comparison of 6.3 with 6.1 and 6.2 decryption, but we were not parameterizing the single With authorization-specific content keys, Alice cannot content provider public key. To make the scheme work cheat by stealing authorization secrets, since when she with a different public key for every ACL string, we make requests cash tokens or requests blind decryption, she is not it more like IBE in the sense that the public key used is anonymous, and the content provider checks her derived from the ACL string. The rest of the system still authorizations by looking them up in her profile. However, works as it did in section 3.1.3 – the content provider it has a serious privacy disadvantage relative to the other 77 knows the domain secret and can convert any public key We assume the box is reasonably difficult to tamper with, into a private key, and the clients never need to know any and an additional hindrance would be that tampering with it private keys, just the domain parameters. would be illegal. A plausible deployment of such a “box” might be a smart card or other sealed module that installs 6.5.2 Parameterized Diffie-Hellman into the user’s PC. Parameterization can be done with our Diffie-Hellman variant of blind decryption. Alice would only need to know 7.1 Hindering Copying of Authorization Keys “g” and “p”. The content provider would only need to In many of the variants we have presented, a user collects know a single secret “x”. The metadata for content for content keys and authorization keys. So, an obvious “over 21”, would consist of (gy mod p, “over 21”). The implication is that one person can obtain a key to decrypt a content key for that data would be calculated by calculating piece of content, or an authorization key for “over 21”, and S=h(x, “over 21”) and then raising the metadata to S to widely distribute it. obtain the content key gyS mod p. However, each box will be known to the content provider. Alice blinds gy mod p by choosing a random z, calculating Either the content provider will know a public key for each the inverse exponent z-1 for mod p exponentiation, and box, or will have a shared secret key with each box. presenting that along with the string “over 21”. The content Communication is between the server and the box, and any provider uses the string “over 21” to calculate S and returns information that must be kept from the user (such as an gyzS mod p. Alice exponentiates, mod p, by z-1 to obtain gyS authorization key) can either be done through an encrypted mod p, the content key. channel (such as SSL) between the box and the server, or can be returned to the box encrypted with a key known 6.5.3 Parameterizable RSA only to that box. Content and authorization keys, as well as Note that the schemes we present in sections 6.5.1 and the private key for a particular box will be stored inside the 6.5.2 work for blind decryption but not blind signatures, so box, and the box would be designed to make it be very neither of them would work for anonymous cash. A difficult to extract keys from the box. scheme that might work as a blindable parameterizable public key scheme is RSA, where the content provider’s If a determined user does extract keys from a box, all is not public key, instead of being (e,n), is simply the modulus n. lost. It still would be difficult to insert such keys into other The public exponent for a given ACL would be the hash of boxes. In other words, assuming a reasonably competent that ACL string. job of engineering the boxes to be tamper-resistant, it would not only take a great deal of ingenuity and lack of RSA is clearly not secure if multiple users use the same fear of prosecution to extract the keys from one box, but it modulus, since knowledge of a key pair allows you to would take an equal amount of tampering to insert keys factor the modulus [3], but we are not proposing that. into a box, since an untampered-with box would only Instead we are proposing a single user (the content accept such keys during communication with the content provider) using modulus n, but using a family of exponent provider. pairs parameterized with a string. If the identity key for a particular box were compromised, It is a good idea for all the public exponents to be relatively that might enable simulating an entire box in software (and prime, so that Alice can’t get the decryption of something therefore it would not take much effort to deploy clones), encrypted with an exponent that she isn’t authorized for, by but the compromise of that one box would become known requesting one or more decryptions using exponents she is to the content provider quickly (as, for instance, the owner authorized for and multiplying or dividing the results. of that box would be charged for all content requested by With exponents being hashes, this threat is unlikely to ever any clone), and the content provider would revoke the key happen in practice, but it is possible (with some for that box. Although the content keys and authorization computational cost) to make all the exponents prime by not keys known to that compromised box might still be simply hashing the ACL string, but instead, hashing the publicly known, it would still be difficult to install these ACL string, padding with some number (e.g., 32) of zero keys into existing boxes. bits, and then finding the first prime greater than that. 7.2 Monitoring Privacy Preservation 7. DRM-Enforcement Sealed Box The box is provided by the content provider, so even if in This section considers the implications on the design in the theory the protocol is intended to enable preserving the common deployment scenario where the content provider user’s privacy, the content provider might be motivated to provides a sealed box, and communication between the cheat. “user” and the content provider is actually done between the box and the content provider. We assume that the user Communication is between the box and the content can communicate with the box, to tell it which content the provider, but as we said in the introduction, the user can user would like to access. monitor what is transmitted. 78 In the anonymous cash scheme, when decryptions are using the shared secret K is “X”, and the ID of the content requested, this must be done over an encrypted channel, being requested by Alice is n, then instead of sending X as with a key between the box and the content provider. The the integrity check, the box could send X+n. To retrieve user cannot tell what the box is saying. The box could “n”, the content provider computes the correct integrity easily be (intentionally) leaking its identity when it asks for check for the message (X) and subtracts it from the a decryption of a particular piece of content. integrity check as sent by the box. In the blind decryption scheme, it is also possible for the There really is no way to fix this, so the integrity check box to cheat in a way that the user cannot detect through must be a public key-based signature, where Alice must passive monitoring. When the box asks for decryption of a have access to the box’s public key so she can verify that piece of content, the communication is not encrypted, so the box is providing valid signatures. the user can indeed verify that what the box transmits is However, there is still a problem. In many public key “[“Alice”, timestamp, B( {K}P1 )] signed by Alice”. signature schemes, e.g., ElGamal, there is a per-message However, there are several ways for the box to cheat in a random number x, where gx mod p is part of the signature. way that would be undetectable by Alice, even though The box could choose an x that leaks the ID of the content Alice can see what it is transmitting. being requested. For example, the box could try lots of x’s, First we will explain how the box can cheat, and then until it finds one for which the lower bits of gx mod p explain in section 7.3, with a protocol between Alice and reveal the ID of the content. If it were exactly the ID of the the box, how we can allow Alice to enforce privacy content, Alice would be able to detect this; however, there protection without interfering with the (legitimate) DRM- are ways for the box to do this undetectably to Alice. For enforcing protocol between the box and the content example, if the box shares a secret S with the content provider. provider, and if both the box and the content provider remember the timestamp T of the last request to the content 7.2.1 Cheating with a weak blinding function provider, the box could compute T encrypted with S, take There is no way for the user Alice to know whether the box the bottom n bits of {T}S (where “n” is the number of bits is truly choosing a random number for the blinding in a content ID),  the result with the content ID to obtain function, or whether it is sneakily identifying the content the quantity Q, and find an x such that the bottom n bits of Alice is purchasing, by using a blinding function gx mod p is Q. predictable to the content provider. Thus there really is no way for Alice to passively monitor An example method for the box to cheat and let the content the channel and be reassured that the box is indeed provider know which item Alice is requesting, without preserving her privacy, in either the anonymous cash Alice being able to detect that it is cheating, is as follows: scheme or the blind signature scheme. The random number it uses for the blinding could be a hash However, in section 7.3 we will provide a mechanism for of the secret the box shares with the content provider, and Alice to interact with the box and be assured that the box is the time. The granularity of time units must be small not colluding with the content provider. The only way this enough so that consecutive decryption requests would have can work is with the blind decryption scheme using public different blinding quantities, but large enough so that it is key signatures for the integrity check. We will show how not expensive for the content provider to do a brute force Alice can protect against both methods of the box cheating search on all possible blinding functions derived that way (weak blinding function and leaky integrity check). until it obtains a K with recognizable formatting. Recognizable format, for instance, might be where K in 7.2.3 Cheating by using the timestamp, or timing {K}P was padded with specific structure, e. g., according If the timestamp has sufficient granularity, it would be to the PKCS #1 standard [13]. possible for the box to leak information in the low order bits of the timestamp. Also, it might be possible for the box 7.2.2 Cheating by using the integrity check to covertly signal information to the content provider based If the integrity check between the box and the content on when it sends requests. Both of these threats are easily provider is a shared secret key, the key will not be known countered, as explained in section 7.3.3. to Alice, because the content provider does not want Alice to be able to ask for content keys. 7.3 User-enforced Privacy Protection In this case, the box can leak, say, the ID of the content that With the anonymous cash approach, the user has no Alice is requesting, by adding the ID of the content to the recourse other than trusting that the content provider’s box integrity check. For example, if the proper integrity check is indeed protecting the user’s privacy, because the for the message conversation between the box and the content provider must be encrypted. The DRM system will not allow Alice “Alice”, timestamp, B({K}P1) to monitor the conversation (e.g., by letting the encryption 79 be between Alice and the content provider rather than the Alice would then unblind with B2’s inverse and forward box and the content provider) because she is not allowed to B(K) to the box. see the content key. But this would not work. The problem is that the message However, it is possible, with the blind decryption schemes, between the box and the content provider needs to be to have a protocol between Alice and the box in which integrity protected; otherwise, anyone could ask for Alice can be assured that her privacy is being protected. decryptions, and Alice’s account would be debited. Even The basics of the protocol are that the box emits a message Alice is not trusted (by the content provider) to generate it would like to send to the content provider. Because Alice messages, since the content provider wants to keep sits between the box and the rest of the world, Alice can decrypted content keys inside the closed system (only choose either to send this message on to the content accessible by the boxes provided by the content provider). provider or to intercept the message. If she intercepts the Since the message from the box to the content provider is message, she can send it back to the box, together with integrity protected, Alice cannot modify it without instructions for modifying the request. The box then invalidating the message. modifies the message it would have sent, using Alice’s So, the solution is for Alice to interact with the box in instructions. Alice will be able to verify that the box order to influence what it uses for blinding. incorporated Alice’s R into the message the box sends to The constraints are: the content provider.  The box cannot trust Alice to do the complete blinding (because Alice is not allowed to see the Box Alice Content Prv content key). msg  The signed message to the content provider must be generated by the box (since only it is trusted requested modifications by the content provider to sign messages).  Alice needs to be able to verify that the box is not modified msg modified msg attempting to leak information, and that it really is applying the extra level of blinding she requests. 7.3.1 Foiling weak blinding So the protocol is to allow Alice to ask the box to apply an As we discussed in section 7.2, with the blind decryption extra level of blinding, with a key that she chooses and scheme, the box could choose blinding functions that are specifies to the box. She will be able to verify that her level predictable by the content provider, and thereby allow the of blinding has been applied, because she can compare the content provider to discover which content Alice was box’s output before and after her blinding function has accessing. This is unavoidable if Alice is merely passively been applied. The box will be able to unblind with both monitoring the channel. functions; the blinding function it chose, and the one that Alice chose. The content provider will act as it did before, However, there is a way (with the blind decryption scheme) though if it were attempting to collude with the box, it will for Alice to enforce that there be no such convert channel notice that the box is no longer colluding with a weak between the box and content provider. The simplest blinding function (since the content provider will not be solution (which doesn’t quite work, but we will fix it) is to able to unblind the message from the box to discover what have Alice insert an extra level of blinding in the message key Alice is attempting to access). If there was no collusion to the content provider, and reverse her level of blinding attempt going on between the box and the content provider, before passing the result back to the box. the double blinding will be undetectable by the content In other words, what we’d like is that the box would provider. transmit 7.3.1.1 Using RSA keys  “Alice”, timestamp, B({K}P) The box originally chooses the blinding function R1, and to the content provider, but the message would be emits the signed message: intercepted by Alice, who would add an extra level of “Alice”, timestamp, R1e * Ke mod n blinding, say with function B2, and forward to the content Alice intercepts this message, chooses a random R2, and provider: returns the message to the box saying “please add an extra  “Alice”, timestamp, B2(B({K}P)) level of blinding using R2.” The returned message from the content provider will be The box then transmits the signed message:  B2(B(K)) “Alice”, timestamp, R2e * R1e * Ke mod n 80 Alice examines this by dividing by R2e mod n, to ensure With RSA keys, and with PKCS #1 v1 padding, there is no that the result is what the box originally transmitted (R1e * problem. Ke mod n). If the answer is correct, she forwards the now With signatures involving a per-message random number, doubly blinded message to the content provider such as ElGamal, it is possible (as we showed in section The content provider applies its private key and returns: 7.2.2) for the box to leak information. R2 * R1 * K mod n As with double blinding, Alice can enforce that the box is Alice lets the message go to the box, which knows both R1 not choosing a bad random number x by allowing Alice to and R2, and can therefore extract K. contribute to the random number. The box first presents to Alice the message it would like to send, including gx mod This protocol will work, in the sense that the key will be p. Alice then chooses her own random number y and tells properly extracted for the content that Alice requested, and the box to include “y” in its signature. Then she tests also, that Alice is assured that the box has not leaked to the whether the box modifies gx mod p to instead be gxy mod p, content provider the identity of the content she has and still sends a valid signature. requested. 7.3.3 Foiling other attacks If the content provider had been attempting to collude with the box by having it use a predictable blinding function, the 7.3.3.1 Timestamp content provider will notice that it is unable to unblind The box could, in theory, leak some information in the least what it received. significant bits of the timestamp, assuming the timestamp had sufficient granularity that it could do that while still 7.3.1.2 Using Diffie-Hellman keys having a timestamp that was plausible to Alice. If it was If instead the content provider had a public Diffie-Hellman using a sequence number, then it could not embed key, say gx mod p, then the protocol to extract the information, since the sequence number would be encryption key for a piece of content from the metadata for constrained to be one bigger than the last request. that content, say gy mod p, would be: In some cases Alice might not be keeping sufficient state to  The box would choose a blinding number z1, be able to monitor the sequence numbers, and therefore it exponentiate by z1 mod p, and transmit the signed might be more convenient to use a timestamp. message: When she is making the request to modify the message, she o “Alice”, gy*z1 mod p can also request a specific timestamp, close enough to the  Alice would intercept this, choose her own actual time so it would still be a valid timestamp, but blinding number z2, and say to the box without the box being able to control the low order bits. o Add blinding using z2 7.3.3.2 Timing To foil the box leaking information by when it sends  The box would then transmit the signed message: requests, Alice can delay a message between the box and o “Alice”, gy*z1*z2 mod p the content provider by some amount of time before  Alice raises gy*z1*z2 mod p to her number’s inverse passing it on. exponent and verifies that the result is the original More broadly, there might be a piece of popular content one transmitted by the box, i.e., gy*z1 mod p that many users may attempt to access at the time that it is  Alice lets the message go through to the content broadcast for the first time. The fact that someone is asking provider, and allows the return message to go to access something at just that time would be a clue that through to the box. the user is likely accessing that particular piece of content. To mitigate this issue, the content provider should provide 7.3.2 Foiling Leaky Signatures the metadata for content well in advance of the broadcast. The other method for the box to cheat and collude with the Even if the data for the content does not exist, there is no content provider is by leaking information in the integrity reason why the key with which that content will be check. If the integrity check is a secret key shared between encrypted could not be chosen and posted well in advance. the box and the content provider, there is nothing Alice can Then users can collect the metadata for that content and do. request decryption of the key(s) well in advance of the However, if the integrity check is based on the box’s public existence of the content. While this is not strictly key, then Alice can ensure there is no cheating, as long as something Alice can enforce, she can at least verify that the she has access to the box’s public key (and she monitors content provider is consistently making metadata available that signatures that the box emits are correct). early. 81 7.3.3.3 Box-initiated encrypted communication have the advantage that there are no authorization secrets to There are times when the content provider needs to steal from authorized users, and revocation of a user’s transmit encrypted information to the box; e.g., authorization in a category is trivial. authorization secrets. If this were done by establishing an To make it practical to have many content provider public encrypted channel between the box and the content keys, e.g., based on potentially complex authorization provider, then the box could transmit any information it categories, we provided a scheme, inspired by IBE, wanted without Alice being able to monitor it. For wherein the content provider’s Diffie-Hellman key is example, it could inform the content provider which items derived from the authorization string. This is not an IBE Alice has recently purchased. scheme because Alice never finds out (or needs to find There is no reason for the box to be sending encrypted out), the particular content provider public key. All she information to the content provider (other than the blinded needs is the Diffie-Hellman parameters (g and p), and the content key, which we discussed in section 7.3.1.). But the string, (say “citizen of US AND over 21”). content provider does need to send encrypted authorization The most likely deployment scenario for this type of secrets to the box. application is where communication is not directly between Rather than doing this by establishing an encrypted channel the content provider and an open computer controlled by between the box and the content provider, authorization the user, but rather by a sealed box approved by the content secrets can be encrypted by the content provider with the provider and provided by the content provider to sit in the box’s public key, or with a shared secret key between the user’s house. content provider and the box. As long as all of the We examined the implications of this design. In particular, information from the box to the content provider is we concluded there is no way in any of the schemes, if the unencrypted (again, other than the blinded content key), user can only passively monitor all communication to and Alice can prevent the box from leaking information to the from the box, to see if the box is indeed performing the content provider. privacy protection protocol properly, rather than covertly 8. Conclusions leaking to the content provider what the user is accessing. We have examined two families of privacy-preserving We concluded that only in the blind decryption scenario DRM schemes, one based on anonymous cash and the would it be possible to enhance the system with a protocol other based on blind decryption. between the user (a computer controlled by the user) and The blind decryption scheme is less expensive, because the box, so that the box can continue to enforce the decryption purchases and decryption requests can occur in legitimate interests of the content provider, but the user can the same message. In contrast, the anonymous cash scheme enforce that the box not covertly leak privacy- requires a (non-anonymous) communication to purchase compromising information to the content provider. We tokens and a separate anonymous communication for discussed several ways in which the box could covertly purchasing decryptions. Also, the anonymous cash scheme pass information to the content provider that would be requires an anonymization network. undetectable to Alice, if she were only passively monitoring the communication, and we presented methods We provided a way (in either scheme) to provide for Alice to be assured no such covert channel is going on, differential costs of items using multiple denomination by allowing Alice to influence the messages between the content provider public keys. box and the content provider. The anonymous cash scheme allows the content provider to do accounting of how many accesses there are for each 9. Acknowledgements item of content, which might be important if royalties to the We would like to thank John Kelsey, Dave Molnar and copyright owners of individual items of content are based Hilarie Orman for their helpful comments and advice. on number of accesses. The blind decryption scheme does not support this. 10. REFERENCES [1] Bender, W., Gruhl, D., Norimoto, N., “Techniques for We examined several variants for supporting additional data hiding”, Proc. of SPIE, 1995. authorization. We concluded that authorization encryption keys worked equally well with anonymous cash or blind [2] Boneh, D., Franklin, M., “Identity-Based Encryption decryption, and leaked the least privacy information. The from the Weil Pairing” Advances in Cryptology - authorization claim secret scheme had the advantage that Proceedings of CRYPTO 2001. authorization keys could be changed inexpensively. The [3] Boneh, D., “Twenty years of attacks on the RSA multiple content provider public key scheme has the cryptosystem”, Notices of the AMS, 1999. privacy disadvantage that it knows the authorization policy [4] Boneh, D., Shaw, J., “Collusion-secure fingerprinting of the content that Alice is decrypting. However, it does for digital data”, CRYPTO 1995. 82 [5] Chaum, D., “Blind signatures for Untraceable pay- ments”, Advances in Cryptology - proceedings of Crypto 82, 1983. [6] Chaum, D., Fiat, A., and Naor, M. “Untraceable electronic cash”. Proceedings on Advances in Cryptology (Santa Barbara, California, United States). 1990. [7] Cox., I., Miller, M., Bloom, J., “Digital Watermarking”, Morgan Kaufmann Publishers Inc., 2001. [8] Cox, I., Kilian, J., Leighton, T., Shamoon, T., “Secure Spread Spectrum Watermarking for Multimedia”, IEEE Transactions on Image Processing, 1997. [9] Craver, S., Memon, N., Yeo, B. L., and Yeung, M. M. “Resolving rightful ownerships with invisible watermarking techniques: Limitations, attacks and implications”. IEEE Journal. Selec. Areas Comm. 1998. [10] Dingledine, R., Mathewson, N., Syverson, P. “Tor: The Second-Generation Onion Router”. Usenix Security Symposium, 2004. [11] M. J. Freedman and R. Morris. Tarzan: A peer-to-peer anonymizing network layer. In 9th ACM Conference on Computer and communication Security, 2002. [12] Iannella, R., “Digital Rights Management (DRM) Architectures, D-Lib Magazine, June 2001. [13] Jonsson, J., Kaliski, B., “Public-Key Cryptography Standards (PKCS) #1: RSA Cryptography Specifications Version 2.1”, RFC 3447, February 2003. [14] Nair, S., Tanenbaum, A., Gheorghe, G., Crispo, B., "Enforcing DRM policies across applications", Proceedings of the 8th ACM workshop on Digital rights management, 2008. [15] Perlman, R., “The Ephemerizer: Making Data Disappear”, Journal of Information System Security, 2005. [16] Saint-Jean, F., Johnson, A., Boneh, D., Feigenbaum, J., "Private Web Search", Proceedings of the 2007 ACM workshop on Privacy in electronic society", 2007. [17] Shamir, A., “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology: Proceedings of CRYPTO 84. [18] Shamir, A., “How to Share a Secret”, Communications of the ACM, v 22 n 11, p 612-613, Nov 1979. 83 Privacy-preserving DRM Radia Perlman; Intel Laboratories Charlie Kaufman; Microsoft Ray Perlner; NIST 1 The problem • Let Alice purchase content • Without anyone knowing which content she purchased 2 Basic approach • Obtain (encrypted) content somehow – from satellite TV – from the Internet (through an anonymizer) • Purchase the content key 3 With wrinkles • Additional authorization – Over 21 – Citizen of US – Citizen of any country other than Monaco • Differential costs (some things cost more than others) • Implications of content-provider provided sealed box at customer site 4 Structure of talk • Two families of schemes – anonymous cash – blind decryption • Comparison of these schemes • Adding wrinkles with each scheme 5 Encrypted content has metadata • The metadata might, for instance, contain the content key encrypted with the content provider’s public key • Presenting the metadata to the content provider allows it to return the content key 6 Encrypted content: metadata {K}P metadata: {K}P content provider knows priv key, can decrypt {K}P and return K encrypted with K 7 Encrypted content: metadata content ID metadata: content ID content provider has table (ID, K) for all items, encrypted with K can look up K 8 Both schemes use concept of “blinding” • Alice wants Bob to sign or decrypt “x” with Bob’s private key • Alice creates functions (blind=B, unblind=U) that commute with Bob’s public/private key operations • Sends B(x) to Bob • Bob applies private key • Alice takes the result, applies U, to get signed or decrypted x 9 Anonymous Cash • Chaum scheme for anonymous cash • Choose random number R, “blind it”, send it to bank to sign, then unblind it. A “token” is R, and the signature on R, say Rsig • Buying content – (non-anon) buy tokens, using real money – In an anonymous, encrypted conversation, present anonymous cash, ask for particular content key 10 Anonymous Cash Scheme: Buying tokens Alice Content provider B(R1), B(R2), credit card debit credit B(R1sig), B(R2sig) card 2 units Unblind to obtain R1sig and R2sig 11 Anonymous Cash Scheme: Purchasing content Alice Content provider R1, R1sig, content ID K Anonymizing cloud, encrypted conversation 12 Scheme 2: Blind Decryption Alice Content provider B({K}P), “Alice” If request valid, do decryption and B(K) debit Alice’s account Note: conversation must be signed by Alice, plus have timestamp 13 Comparisons • Per-item accounting – Possible in anonymous cash scheme – Not possible in blind decryption scheme • Efficiency (see next slide) 14 Blind decryption more efficient • One conversation, vs anonymous cash – one to buy token – one by purchase content • One private key operation for content provider, vs in anonymous cash – blindly sign token – establish server-side encrypted/authenticated session • No need for anonymization cloud 15 First wrinkle: variable charging Could be trivial with anon cash: present n tokens to buy something worth n units That would require n private key operations for the content provider (actually 2n because of originally signing them) Instead, can have different denomination tokens 16 Variable charging: Anonymous cash • Content provider has different “denomination” public keys, say P1=“one”, P2=“10”, P3=“50” • When purchasing tokens, ask for denominations – I’m Radia • I’d like 4 ones: B(R1), B(R2), B(R3), B(R4) • And 2 tens: B(R5), B(R6) • And 3 fifties: B(R7), B(R8), B(R9) – Content provider applies P1 to first 4, P2 to next 2, P3 to next 3 • When (anonymously) purchasing content, provide all the necessary tokens 17 Anonymous cash, different denominations • Might be suspicious to get anything more than a one, if all G-rated content was 1, and X-rated was more • Allow purchase of multiple things in the same transaction, so asking for a large denomination bill isn’t suspicious • Besides you could purchase with all one’s – content provider could discourage this paranoia by offering a discount for large denominations 18 Anonymous Cash : Purchasing content that costs 12 units Alice Content provider (“one”, R1, R1sig, ), (“one”, R2, R2sig, ), (“ten”, R3, R3sig, ), content ID K Anonymizing cloud, encrypted conversation 19 Variable Charging: Blind Decryption • For an item costing 3 units, metadata would have 3 wrapped keys, K1, K2, K3, and content key is h(K1,K2,K3) • Could also have different denomination content provider public keys, just like anonymous cash • Metadata for something worth 12 units: – “one”: {K1}P1, “one”: {K2}P1, “ten”: {K3}P2 • Request to content provider: – “one”, B({K1}P1), “ten” B({K3}P2), “Alice” • Can request the keys at different times 20 Variable Charging: Blind Decryption • If Alice is nervous buying something worth more than 10 units, metadata could give the choice of unwrapping 12 individual keys or a 10 and 2 ones. Alice’s choice – Could unwrap 12 ones, content key is XOR of all of those, or unwrap 2 ones and 1 ten, and content key is also XOR of those 3. – Content provider might provide discount for using larger denominations • Note: the component keys for this content can be purchased at different times 21 Easy issue: Timing issue • When something is first broadcast, it might be likely that someone asking for content at that time is buying that content • So, provide the metadata well in advance 22 New topic: Additional Authorization • Suppose you also have to prove “over 21” • Several scheme, with slightly different properties. – authorization secrets used as encryption keys – authorization secrets used as credentials – different content provider public keys 23 Leaking of authorization secrets • Obvious concern • No matter how the secrets are used, what if they leak out? • No harder to leak these, or to protect these, than content keys • So we’re assuming some sort of DRM, whether hardware or software – Note: software DRM “can’t” be secure, but it is widely deployed 24 Authorization secrets used as keys • Metadata would contain – ACL: “over 21”, “US” – {K}P (blind decryption), or content ID (anonymous cash) • Alice has already (nonanonymously) obtained and saved K21, and KUS. • Content key would be h(K, K21,KUS) • Somewhat bulky metadata with the OR of attributes, but everything is doable 25 Auth secrets as credentials • Only works with anonymous cash scheme • Metadata would contain – ACL: “over 21”, “US” – content ID (anonymous cash) • Alice has already (nonanonymously) obtained K21, and KUS. • Anonymous, encrypted request – K21, KUS , content ID – Content provider checks ACL to make sure all necessary authorizations are proven, returns K 26 Anonymous Cash Scheme: Purchasing content Alice Content provider cash, auth secrets, content ID K Anonymizing cloud, encrypted conversation 27 Complex policies • Easy: ACL is part of metadata. Client figures out what is needed to satisfy it 28 Comparison • Privacy issue – Could be there is only one Lithuanian with a PhD in Chinese literature with a plumbing license • auth secrets as credentials wouldn’t be very anonymous then.. • only relevant if ACL is complicated OR • Revocation of authorization secrets – In credentials scheme, easy to change secret periodically – With auth secrets as keys, you’d have to re-encrypt the data 29 Third possibility: ACL-dependent content-provider keys 30 ACL-dependent public keys Alice Content provider applies public key associated with that ACL “Alice”, “over 21 and not Lithuanian”, B({K}P) B(K) Note: conversation must be signed by Alice, plus have timestamp Content provider checks Alice profile to ensure she has attributes 31 ACL-dependent public keys; cash • When asking for a token, specify which key (e.g., “US citizen”) • When purchasing ACL-dependent content, use the relevant cash 32 Good, bad, and ugly of this variant • Good – No need for authorization secrets – No worry about authorization secrets getting shared – Revocation of Alice’s attributes very easy • Bad – Content provider knows ACL of the content Alice is asking for; could be very few possibilities – But could wrap content with more atomic attributes • Ugly (but not, with cute crypto) – Managing all these public keys 33 Unique ACL • Could be only one piece of content that has the ACL “plumbing license AND alum of NYU” • So instead, you could have two keys in the metadata, one wrapped with “plumbing license” public key, and one wrapped with “alum of NYU” public key, and content key is XOR of the two of them 34 How to do blindable, ACL- parameterized public keys • Use Diffie-Hellman keys – works with elliptic curves, but I’ll explain it with modular exponentiation, where it also works • All Alice knows for content provider’s key are the parameters “g” and “p” • Content provider just needs a single secret, let’s call it “S” 35 Content provider encrypts an item • Choose a random number “y” • Calculate gy mod p • hash S with the ACL, e.g., h(S, “(plumbing license AND alum of NYU) OR member ACM”) = x • Calculate gxy mod p • Content key is h(gxy mod p) 36 Alice wishes to purchase item • Metadata – ACL: plumbing license AND alum of NYU) OR member ACM – gy mod p • Unblinded: Send all that metadata to content provider, which derives “x” from the ACL, and sends back gxy mod p • Blinded: Choose z, calculate z-1, raise gy mod p to z, send ACL and gyz mod p 37 ACL-dependent public keys Alice Content provider “Alice”, “plumbing license AND alum of NYU) OR member ACM”, gyz mod p gxyz mod p Note: conversation must be signed by Alice, plus have timestamp Content provider checks Alice profile to ensure she has attributes Alice raises to z-1 to obtain gxy mod p 38 Note • Reminiscent of “identity based encryption” • But it’s not: nobody but content provider can know either public or private key 39 IBE (Identity Based Encryption) • This works as well, where the content provider knows the domain secret, its “public key” is the domain parameter, and the ACL is the string 40 RSA • This variant may work • Would be nice to have a proof • “Public key” is just modulus “n” • public exponent is h(ACL string) • “private key” is factorization of n 41 Most subtle wrinkle: Sealed box • Common deployment scenario: sealed box at customer premises provided by content provider • Communication is between box and content provider • Customer can monitor communication, talk to box, intercept messages 42 Sealed box provided by content provider content content provider’s provider box Alice’s computer 43 Can Alice tell if box is cheating, and leaking privacy information? • Anonymous cash scheme – Absolutely not: communication between box and content provider must be encrypted with end-to- end key – Alice can’t tell anything about the conversation • Blind decryption – Looks sort of promising – Box says, in cleartext “timestamp, B(metadata)” 44 But box can cheat • For instance, blinding function could be purposely weak, so that content provider can tell what content Alice is accessing • No way for Alice to be able to detect this is going on • But there’s hope 45 Alice can add extra level of blinding content content provider’s provider box “Alice”, B2(B({K}P)) “Alice”, B({K}P) Alice’s computer B2(B(K)) B(K) choose 2nd blinding function B2 Note: Alice can’t cheat and get access to the content keys. Box can’t cheat and tell content provider what key is being decrypted 46 But it doesn’t quite work • Problem: There needs to be end-to-end authentication between the box and the content provider, because content provider wants it to be “impossible” to get content keys out of boxes. • So Alice can’t modify messages between the box and the content provider. 47 Solution: Alice can tell box to add her chosen 2nd level of blinding content content provider’s provider box “Alice”, B({K}P) “Alice”, B2(B({K}P)) Alice’s add blinding B2 computer “Alice”, B2(B({K}P)) B2(B(K)) B2(B(K)) choose 2nd blinding function B2 Note: Alice can’t cheat and get access to the content keys. Box can’t cheat and tell content provider what key is being decrypted 48 Summary • Two basic schemes – anonymous cash – blind decryption • Wrinkles – variable costs – supporting arbitrarily complicated ACLs – allowing Alice to cooperate with box to preclude covert channel 49 4/14/2010 NIST Hash H h Competition: C titi Where we are and what we’re learning Bill Burr NIST April 14, 2010 Cryptographic Hash Function • Hash functions take a variable-length message x and reduce it to a shorter fixed-length message digest hash(x). ( ) • Core requirement: Use hash(x) as a stand-in for x in digital signatures, MACs, file comparisons, etc. • Many applications: “Swiss army knives” of crypto: – Digital signatures (with public key algorithms) – Random number generation – Key update and derivation – One wayy function – Message authentication codes & user authentication (with a secret key) – Code recognition (list the hashes of good programs or malware) – Commitment schemes and random oracles 1 4/14/2010 Hash Functions Properties • Collision Resistance – Hard to find x1 and x2, x1 ≠ x2 such that hash(x1)=hash(x2) • By Birthday paradox, for n-bit hash, should take 2n/2 hashes to find a collision – Harder to get CR than we knew before 2004 • Xiaoyun Wang: differential collision attacks on SHA-1 • Attacker controls too much (no secret key) • Preimage Resistance – Roughly means the hash is “one-way.” That is, given y, hard to find x such as y = hash(x). ( ) • For n-bit hash, should take 2n hashes to invert • Second Preimage Resistance – Given an x1, hard to find x2 ≠ x1, such that hash(x1) = hash (x2). Hash Function Standards • MD5: 128-bits, badly broken in 2004 – Never was a FIPS • SHA-0: SHA 0 160 160-bits bits designed b by NSA 1990 – FIPS 180, Quickly withdrawn, publicly broken in 1998 • SHA-1: 160-bits, tweak to SHA-0, 1995 – FIPS 180-1, Wang attack in 2005 • Now down to estimated 252 work factor • SHA-2: designed by NSA – 224 224, 256 256, 384 & 512 512-bit bit variants, i t FIPS 180 180-33 – MD5, SHA-0, SHA-1 all in a design family, all broken. SHA-2 is a descendent: is it next? • Takes a long time to introduce a new hash function into use 2 4/14/2010 SHA-3 Hash Competition • Motivated by collision attacks on commonly used hash algorithms, particularly MD5 & SHA-1 – No actual collisions yet announced on SHA-1 • SHA-1 collision work factor may be as low as ≈ 252 operations – McDonald, Hawkes and Pieprzyk, Feb 09 • Held 2 hash function workshops in 2005 & 2006 • Proposed criteria for new hash function Jan 2007 – Many comments received • “SHA-3” Competition announced Nov. 2, 2007 SHA-3 Competition Timeline  01/23/07 Draft submission criteria published  11/02/07 Federal Register announcement of SHA-3 Competition  08/31/08 Preliminary submissions due  10/31/08 Submissions due – 64 received  12/09/08 Announced 51 First round candidates  02/25/09 First SHA-3 Candidate Conference, Leuven Belgium  07/24/09 Announced 14 second round candidates  09/15/09 Tweaks accepted, second round began  08/23/10 – 08/24/10 Second SHA-3 Candidate Conference, UCSB  4Q10 Announce finalist candidates  1Q11 Final tweaks of candidates  1Q12 Last SHA-3 Candidate Conference  2Q12 Announce winner  4Q12 FIPS package to Secretary of Commerce 3 4/14/2010 SHA-3 Participation • SHA-3 Zoo – ECRYPT II/Graz University – Independent summary of cryptanalytic results • eBash - ECRYPT Benchmarking of All Submitted Hashes – Systematic benchmarking of hash functions on many platforms • ATHENa - Automated Tool for Hardware EvaluatioN – Inspired by eBash eBash, but for hardware – George Mason Univ., Virginia Tech., Univ. Illinois Chicago – ARRA Grant The SHA-3 Candidates • Very international field & some are hard to tie to a particular countryy • US, Canada, China, Singapore, Japan, Korea, Argentina, India, Switzerland, Macedonia,Turkey, Israel, Belgium, France, Norway, Luxembourg and a number of “pan European” submissions – Now down to 14 second round candidates • 3 are (mainly) from the US • 1 each from Singapore, Japan, Israel & Turkey • The rest are mainly Western Europe (some are hard to attribute to one country) 4 4/14/2010 Second Round SHA-3 Candidates • BLAKE • JH – Swiss, HAIFA – Singapore, novel construction • Blue Midnight Wish • Keccak – Norway, WideP MD – European, Sponge • CubeHash • LUFFA – US, Sponge variant – Japan, Sponge variant • ECHO – France, HAIFA • SHABAL • Fugue – France, WideP MD – US, Sponge variant • SHAvite-3 • Grøstl – Israel, HAIFA – European, WideP MD • SIMD • HAMSI – France, WideP MD – Turkey, MD • SKEIN – US, WideP MD (more or less) Merkle-Damgard Chaining Mode message pad length M1 Mk h-bit h-bit h-bit fixed IV F … F chaining value message digest compression function 5 4/14/2010 Davies-Meyer Compression Fun. • Compression function from a block cipher – Message block is the key – Old chaining value is plaintext – New chaining value is ciphertext XOR old chaining value Mi key Hi-1 plaintext E Hi ciphertext Merkle-Damgard Hash • Figure illustrates a MD hash with a Davies-Meyer Davies Meyer Compression Function message pad length M1 Mk E h-bit h-bit h-bit fixed IV chaining value … E message digest 6 4/14/2010 Wide Pipe Merkle-Damgard Hash • Chaining variable bigger than hash output – greater preimage resistance for big messages – Prevents extension attacks message pad M1 Mk h+b bit h+b bit h-bit fixed IV F … F message chaining value digest compression function finalize HAIFA - HAsh Iterative FrAmework • Biham & Dunkelman p • Incorporates a salt & a bit count in each compression function call • Shavite-3, ECHO & Blake are HAIFA message pad lm ld M1 Mk h-bit h-bit h-bit fixed IV … salt F chaining value salt F message count count digest compression function 7 4/14/2010 SKEIN – UBI Mode • Threefish “tweakable” wide block cipher • y y Matyas-Meyer-Oseas mode variant • Message is plaintext, chaining var. is key • Wide-pipe MD, more or less message pad M1 Mk 0 Threefish G E chaining value … E E len: xx len: yy first: 1 first: 0 final: 0 final: 1 type: msg type: msg type: out compression 128-bit Tweak function Grøstl • Wide-pipe MD - truncated in output transformation, Ω • P & Q are nearly identical fixed permutations • P & Q use AES s-boxes message pad M1 Mk Q Q IV P P P H(m)    Ω 8 4/14/2010 Keccak - Sponge Model • Compression function is a fixed permutation • XOR message into part of the chaining variable • Think of sponge as a stream cipher – absorb the message, then squeeze out the keystream (message digest) P P P P P z z z 0 1 2 3 4 0 1 2 r 0 f f f f f f f c 0 absorbing squeezing Luffa - Modified Sponge • Separate linear message injection function, then 3 to 5 256-bit 256 bit non-linear non linear fixed permutations M1 M2 Mn 0 Z0 Q0 Q0 Q0 MI Q1 MI MI Q1 MI Q1                Qw-1 Qw-1 Qw-1 256-bits absorbing squeezing 9 4/14/2010 Questions and Issues • Performance issues – 32 vs 64-bit, low end vs high end, hardware vs software, parallelism: SIMD & MIMD, long messages vs short • How important are proofs? • Primitive reuse – AES s-boxes, AES round function, Threefish wide bl k cipher, block i h cha-cha h h round, d stream t cipher i h • Does any property above 2256 matter? – Time + memory? Greater of time or memory? Some Comparisons • Big biters vs lil’ nibblers – Nibblers (64-bits or less): CubeHash, Hamsi, Fugue – Biters (256-1024 bits): all the rest & CubeHash • Block cipher vs fixed permutation – Block Cipher: Skein, Shavite3, BLAKE, SIMD, Shabal, BMW? – Fixed Permutation: CubeHash, Luffa, Keccak, Grøstl, ECHO H ECHO, Hamsi,i F Fugue, JH • Unusual Chaining Mode: JH, Shabal • One engine fits all sizes: Kekkac, CubeHash, JH 10 4/14/2010 Confusion & Diffusion • Since Shannon we’ve talked about y g y “diffusion” and “confusion” in cryptography • Roughly speaking – Diffusion means that one input bit affects the value of every output bit • typically get this from XOR & rotation – Confusion means that the relationship between inputs and outputs is algebraically complex (“non-linear”) so we can’t solve the equations Non-linearity • Compression function needs non-linearity or it would be easy to invert – Some submissions get non non-linearity linearity from S S-boxes boxes (substitution tables), often the AES s-boxes • Shavite-3, Grøstl, Luffa, Fugue • But table lookups are subject to cache timing effects resulting in side channels – Do side channels matter for hash functions? – Other submissions use only operations • AND, Add, & Multiply all contribute to nonlinearity – Of course you can generate substitution boxes with logic or circuits, or implement logic circuits with tables • May not be equally efficient or practical depending on the design and platform 11 4/14/2010 Comparison: Non-linearity S-box Logic Mode AES Bit-slice ARX Bitwise Stream Fugue CubeHash sponge Block Luffa Keccak sponge Wide MD Grøstl JH BMW, Shabal, Sk i Skein SIMD Narrow MD HAMSI Haifa Shavite 3, BLAKE ECHO Basic Operations of SHA-3 Candidates This chart is from a presentation by Kris Gaj & Jens-Peter Kaps of GMU 12 4/14/2010 Non-linearity: Skein (ARX) ARX: Add, Rotate, XOR 512-bit plaintext The non-linearity in the Threefish block cipher Subkey 0 comes from many repetitions of the simple MIX MIX MIX MIX MIX operation Permute (word) word word 64-bit MIX MIX MIX MIX 4X Permute (word) <<< Rr,i    Subkey 1   72X MIX  operation 512-bit ciphertext Non-linearity: Keccak (bitwise) • All nonlinearity from one simple function – Only bitwise logic Χf Χ-function ti 1600-bit state 5×5 slice 64-bit lane (word) 13 4/14/2010 Multiplicative Complexity • We can – view crypto functions as logic circuits – make any circuit from AND and XOR – represent a circuit as an equation on GF(2), where AND is multiplication () and XOR is addition (+). – solve large systems of linear equations (only additions), but solving nonlinear equations (also do multiplication) is much harder • Cryptography needs a lot of nonlinearity to make solving l i ththe equations ti computationally t ti ll complex. l • How do we measure nonlinearity? – Multiplicative complexity is one answer Multiplicative Complexity • Can divide any circuit into a linear and nonlinear components • But determining the minimum number of AND gates needed is very hard for nontrivial cases (super exponential) • Find lowest known upper bound • Peralta (NIST) and Boyar (SDU) have developed better heuristics for minimizing multiplicative complexity 14 4/14/2010 Multiplicative Complexity • Peralta-Boyer heuristics – Better logic synthesis ?? • NIST/SDU patent application • Application to AES s-boxes resulted in simplest known circuit • Significantly speeded up 2 SHA-3 candidates • Plan to apply heuristics to SHA-3 finalists – How many ANDs for each output bit? • Alternative view for cryptanalysis • Is anything just too simple? – Speed improvements Links – NIST Hash competition page: http://www.nist.gov/hash-competition – eBASH benchmarking of hash functions: http://bench.cr.yp.to/ebash.html – Wikipedia: http://en.wikipedia.org/wiki/SHA-3 – ECRYPT II – The SHA-3 Zoo: http://ehash.iaik.tugraz.at/wiki/The_SHA-3_Zoo – Classification of the SHA-3 Candidates: http://eprint.iacr.org/2008/511.pdf – SHA-3 Engineering Comparison (propaganda (propaganda, but useful): http://www.skein-hash.info/sha3-engineering – ATHENa - Automated Tool for Hardware EvaluatioN: http://cryptography.gmu.edu/athena/ 15 Biometrics-Based Identifiers for Digital Identity Management Abhilasha Anna Squicciarini Elisa Bertino Bhargav-Spantzel College of Information Department of Computer Intel Corporation Sciences and Technology Science 2191 Laurelwood Avenue Pennsylvania State University CERIAS Santa Clara,CA 95054 University Park, PA Purdue University abhilasha.bhargav- 16802-6823 West Lafayette, IN 47906 spantzel@intel.com asquicciarini@ist.psu.edu bertino@cs.purdue.edu Xiangwei Kong Weike Zhang Information Security Research Patent Examination Center Collaboration Center 33 Dalian University of No. 18 South Fourth Technology Street,Zhongguan Cun Liaoning Province 116023 Haidian District, Beijing 100190 kongxw@dlut.edu.cn zhangweike@sipo.gov.cn ABSTRACT General Terms We present algorithms to reliably generate biometric identifiers from Algorithms, Security, Experimentation, Human Factors a user’s biometric image which in turn is used for identity veri- fication possibly in conjunction with cryptographic keys. The bio- Keywords metric identifier generation algorithms employ image hashing func- Security, Privacy, Biometrics, Multi-factor Authentication, Iden- tions using singular value decomposition and support vector classi- tity, Cryptography fication techniques. Our algorithms capture generic biometric fea- tures that ensure unique and repeatable biometric identifiers. We provide an empirical evaluation of our techniques using 2569 im- 1. INTRODUCTION To support online activities, such as commerce, healthcare, enter- ages of 488 different individuals for three types of biometric im- tainment and scientific collaboration, it is crucial to be able to ver- ages; namely fingerprint, iris and face. Based on the biometric type ify and protect the digital identity of the individuals involved. Mis- and the classification models, as a result of the empirical evaluation use of identity information can result in identity theft, that is, the we can generate biometric identifiers ranging from 64 bits up to act of impersonating another’s identity by presenting stolen identi- 214 bits. We provide an example use of the biometric identifiers in fiers or proofs of identities. Identity theft has been receiving in- privacy preserving multi-factor identity verification based on zero creasing attention because of its high financial and social costs. knowledge proofs. Therefore several identity verification factors, An approach that can help in protecting from identity theft is the including various traditional identity attributes, can be used in con- privacy-preserving multi-factor verification of identity1 . Such a junction with one or more biometrics of the individual to provide verification requires an individual to prove his/her identity by prov- strong identity verification. We also ensure security and privacy ing the knowledge of several identity attributes (also called identi- of the biometric data. More specifically, we analyze several attack fiers). When talking about identifiers, we distinguish between weak scenarios. We assure privacy of the biometric using the one-way and strong identifiers. A strong identifier uniquely identifies an in- hashing property, in that no information about the original biomet- dividual in a population, whereas a weak identifier can be applied to ric image is revealed from the biometric identifier. many individuals in a population. The number and types of strong identifiers used in verification should not be fixed a-priori and each Categories and Subject Descriptors party interested in verifying the identity of an individual should be K.6.5 [Management of Computing and Information Systems]: able to require any combination of such identifiers [3]. Biometric Security and protection; E.3 [Data Encryption] data represent an important class of identity attributes. To fully re- alize their potential, identity verification protocols should be able to support the use of biometric data in combination with other digital identifiers, such as a social security number (SSN) or a credit card Permission to make digital or hard copies of all or part of this work for number (CCN). The privacy of the biometric data and other sen- personal or classroom use is granted without fee provided that copies are sitive identifiers should, however, be protected to mitigate attacks not made or distributed for profit or commercial advantage and that copies 1 bear this notice and the full citation on the first page. To copy otherwise, or Effective solutions to protect from identity theft require a com- republish, to post on servers or to redistribute to lists, requires prior specific bination of technical and non-technical measures. Our approach permission and/or a fee. represents one such measure which if used alone, however, may IDtrust ’10, April 13-15, 2010, Gaithersburg, MD not be sufficient to address all possible threats to the security and Copyright ! c 2010 ACM ISBN 978-1-60558-895-7/10/04... $10.00 privacy of identity information. 84 such as identity theft. By privacy of the biometric data we mean BID (small intra-class variation). Another main challenge is to en- that minimal information about the biometric is revealed during the sure the security and privacy of the biometric data. In particular, biometric verification process, and that this information cannot be it should not be possible to re-create the BID without the original reused in contexts outside a given biometric verification. biometrics and the final BID should not leak information about the original biometrics. There are additional challenges with respect to The use of biometric data in the context of identity attribute verifi- the protection of the BID from brute force attacks conducted by ex- cation poses several non trivial challenges because of the inherent ploiting meta-data stored at the client. As such several well-known features of the biometric data. In general, two subsequent read- solutions to the problem of BK generation have shown to be vul- ings of a given biometrics do not result in exactly the same biomet- nerable to this threat [35]. ric template2 . Therefore the matching against the stored template is probabilistic. Storing biometric templates in repositories along We develop an approach that does not need to use specific features with other personally identifiable information introduces security of the biometrics. We in fact use generic properties of biometric and privacy risks [16]. Those databases can be vulnerable to at- images that are shown to be suitable for multimodal biometric sys- tacks by insiders or external adversaries and may be searched or tems [45]. Multimodal biometric systems utilize more than one used for purposes other than the intended one. If the stored bio- physiological or behavioral characteristic for enrollment and ver- metric templates of an individual are compromised, there could be ification. This is an original contribution of our work as most of severe consequences for the individual because of the lack of revo- today’s approaches are designed for a specific biometrics and can- cation mechanisms for biometric templates. To overcome the short- not be trivially generalized to other biometrics. Additionally in the comings of server-based storage and matching, several efforts have current approach, we depend on cryptographic keys in combination been devoted to the development of techniques based on client side with the biometric data to preserve the privacy of the biometric dur- matching [26, 27]. Such an approach is convenient as it is relatively ing biometric verification. simple and cheap to build biometric verification systems supporting biometric storage at the client end able to support local matching. Nevertheless, systems of this type are not secure if the client de- Our Approach. The method for generating BIDs from biometric vice is compromised; therefore additional security mechanisms are measurements is characterized by two phases [38]. During the first needed. phase the biometric features are analyzed and used to compute a bit string representing these features. Such bit string should have Client side verification systems has lead to research on key genera- uniqueness and repeatability properties. The bit string is then used tion mechanisms that use biometrics [50, 48, 15, 26, 27, 58, 38]. A in the second phase to generate a unique BID with the help of some biometric key (BK for brevity) is never stored at any location and meta-data. If two instances of the bit strings are sufficiently similar, the key generation mechanisms should not allow the re-generation then the BID generated is the same. of the BK without the individuals’ real biometrics. Note that un- der those approaches the biometric template is stored; therefore the verification does not involve biometric matching and instead uses the BK. Current techniques, however, are not sufficient because of several unresolved challenges concerning BK generation [35]. In particular, most BK generation approaches [24] do not differentiate between the cryptographic keys, used in the BK generation process, and the specific information retrieved from the actual biometrics. Figure 1: Two main phases of the biometric key generation. For example in [24] the BK is a repeatable string derived from a user biometrics. The final BK is essentially a pre-defined crypto- In our approach, in Phase 1, a biometric hash vector is generated. graphic key which can only be derived from information stored by Such biometric hash vector is a bit string which represents the bio- the user and the users biometric information. As such the BK is metrics and is obtained from the biometrics through an image hash- never stored and cannot be derived without the users biometric in- ing algorithm based on Singular Value Decomposition (SVD) (see formation. Other approaches map biometric data into a unique and Figure 1). In Phase 2, a classifier model based on Support Vec- repeatable binary string [50, 48, 15, 26, 27, 58, 38]. Subsequently, tor Machines (SVM) is used to classify and rank the resulting bio- the binary string would be mapped to an encryption key known as metric hash vector. More specifically, the resulting biometric hash the BK by referring to a look-up table. In this work we focus on vector is classified to obtain a combination of classes which repre- the repeatable binary string, referred to as the biometric identifier sent the user’s unique and repeatable BID. The meta-data needed (BID), that is derived from the biometrics. to execute Phase 1 and 2 consists of the classifier model and the pseudorandom secrets involved in the hashing algorithm. The goal of this paper is to identify the biometric information nec- essary and sufficient to generate a BID, which can in turn be used to The final BID generated at the end of Phase 2 is used for multi- generate a BK or simply as conventional strong identifiers such as factor identity verification. Identity verification based on the use of SSN or CCN. To be used as strong identifiers, BIDs need to satisfy BIDs can be executed according to different strategies. For exam- two key properties, namely uniqueness and repeatability. Unique- ple the BID can be used as a password or as an attribute embed- ness of BID ensures that two different individuals do not gener- ded in a digital certificate. In our approach we focus on the use ate the same BID. If each individual is considered as a class in a of BIDs in the context of a privacy-preserving multi-factor cryp- given classifier model [22], then for uniqueness property to hold, tographic protocols for identity verification [3]. More specifically the BIDs should have large inter-class variation. Repeatability of such protocol is based on the notion of proof of identity which con- BID refers to the ability by an individual to re-generate his own sists of a cryptographic token bound to an individual, versus the 2 actual value of the individuals’ identity attribute. A proof is created The digital representation of a biometric is referred to as biometric template. so that only the individual to whom the proof is bound can properly 85 use it. Proofs of identity attributes are built using zero knowledge SVD based image hashing algorithm and the SVM classification proof of knowledge (ZKPK for brevity) techniques [6, 18]. Ef- algorithm. ficient mechanisms have been developed to prove the knowledge of multiple strong identifiers stored as cryptographic commitments 2.1 Preliminary Concepts using aggregated ZKPK protocols [3]. Singular Value Decomposition (SVD). SVD is a well known tech- In our approach the BID is used for identity verification based on nique for factorizing a m × n matrix into a diagonal form. As ZKPK. The BID is used together with a random secret r to generate proven by Golub and Loan [23], if A is a real m-by-n matrix, two a Pedersen commitment [9]. This commitment is used to construct orthogonal matrices exist: a ZKPK proof. This proof is sufficient for verification purposes as it corresponds to the biometrics enrolled in the system. The com- U = [u1 , . . . , um ] ∈ Rm×m V = [v1 , . . . , vn ] ∈ Rn×n mitment is enrolled with a party and can be used by any verifying party. The use of ZKPK proof enables us to support two-factor (i.e. such that the BID and the secret random r) verification. At the time of ver- U AV T = diag(σ1 , . . . , σp ) ∈ Rm×n p = min{m, n} ification the individual needs both to provide r and to reconstruct the BID, to prove knowledge of the value committed at enrollment. where V T is the transpose of matrix V and σ1 ≥ σ2 ≥ . . . ≥ To revoke a BID, the commitment corresponding to enrolled bio- σp ≥ 0. σi ’s, i = [1 . . . p], are the singular values of A, and metrics is added to a revocation list which is similar to certificate the vectors uj , j = [1 . . . m], and vk , k = [1 . . . n], are the jth revocation lists [25] in a public key infrastructure. In our approach, left singular vector and the kth right singular vector respectively. we consider the case where a revocation list consists of the biomet- σi (A) denotes the ith largest singular value of A. ric commitments which have been revoked. After a commitment has been published in the revocation list, the individual cannot do The singular values of a matrix A are unique. The singular values a proof of knowledge with that BID because it relies on a revoked σi ’s reflect the variations along the corresponding i singular vec- commitment. tors. It can be shown that computation of the right singular vectors and the singular values can be obtained by computing the eigenvec- tors and eigenvalues of the symmetric matrix M = AT A where Contributions. The key contributions of the paper are as follows. AT is the transpose matrix of A. First we present algorithms for reliable and secure generation of BIDs from different types of biometrics. We focus on techniques Support Vector Machines (SVM). SVM [22] is a classifier based that are suitable for fingerprints, irises and faces. Second, we pro- on statistical learning technique developed by Vapnik et al. [13]. It pose an approach for encoding BIDs into cryptographic biomet- aims at finding optimal hyperplanes to determine the boundaries ric commitments that are used in ZKPK at the time of verifica- with the maximal margin separation between every two classes tion. It follows from the zero-knowledge proof protocols that the while training the classifier model. Then additional data, which cryptographic proofs do not leak information except for the fact is not used during the training, is used as test data and can be clas- that the verifier learns that the prover verifies the proof. As such sified using the separate hyperplanes. the verifying party obtains no information about the characteris- tics of the real biometrics from the cryptographic proof. Therefore, Let {xi , yi }, i = [1, . . . , L], be a training data vector, where xi multi-factor verification techniques can use one or more biometrics is the data item and yi , yi ∈ {−1, +1} is a class label. Given an interoperably with one or more non-biometric features to achieve input vector x, SVM constructs a classifier of the form strong identity verification. Our protocols ensure that the privacy f (x) = Sign(ΣL i=1 αi yi K(xi , x) + b) of the biometrics is preserved as the final BID does not reveal any information about the original biometric image. We also present a where: αi , i = [1, . . . , L], is a non-negative Lagrange multiplier; detailed security analysis of the resulting biometric verification sys- each multiplier corresponds to an example from the training data; tem. We provide an empirical analysis of the biometric key gener- b is a bias constant; and K(·, ·) is a kernel function satisfying the ation for different types of biometrics in order to provide evidence conditions of Mercer’s theorem [53]. Some frequently used ker- of the correctness of the proposed algorithms. Finally, we briefly nel functions are the polynomial kernel K(xi , xj ) = (xi · xj + discuss several use scenarios for our techniques to identify relevant 1)d and the Gaussian Radial Basis Function (RBF) K(xi , xj ) = infrastructural and organizational requirements for the use of our 2 2 e−|xi −xj | /2γ . Note that there are several approaches adopting technique. SVM for classification problems with three or more classes as well. The rest of the paper is organized as follows. In Section 2 we in- SVM applies to classification of vectors, or uni-attribute time se- troduce the main algorithms for the BID generation. In Section 3 ries. To classify multi-attribute data, which are matrices rather we present the experimental results. In Section 4 we develop a than vectors, the multi-attribute data must be transformed into uni- comprehensive analysis of the proposed solution. In Section 5 we attribute data or vectors. We use the combination of the SVD tech- discuss related work. Finally in Section 6 we make some conclud- nique with SVM which has been explored by previous work [31, ing remarks and additional considerations concerning the use of our 37, 55]. SVD is used to reduce multi-attribute biometric data to approach. feature vectors. 2. BIOMETRIC KEY GENERATION ALGO- 2.2 SVD Image Hashing RITHMS In this section we describe the hashing mechanism used in Phase In this section we first introduce some preliminary concepts related 1 of BID generation. The techniques presented build on the basic to the techniques underlying our proposed solution. Then, we dis- image hashing process described in [30]. The main steps of the cuss the two core algorithms for the BID generation, that is, the algorithm (summarized in Figure 2) are as follows. 86 resulting intermediate hashes are also different with high probabil- ity [36]. The squares ρi ’s determined in steps 18–23 and used in the partitioning (see Figure 2) are deliberately chosen to be overlapping to further reduce the vulnerability of the algorithm to malicious tampering. Note that an increased number of squares increases the pseudorandomness in the resulting hash value, and therefore helps in increasing security as explained in Section 4, assuming a secure pseudorandom number generator. As a further advantage, the random partitioning decreases the probability of collision and increases the robustness against noise that may be present in the biometric image. As reported in line 22 of Algorithm 1, the Ai ’s, 1 ≤ i ≤ p, are matrices corresponding to the selected sub-image blocks. Here each element of the matrix Ai corresponds to the Figure 2: Key steps of the biometric image hashing algorithm. 256 grey level value of the pixel of the selected sub-image. The encoding of the actual matrix used in the transformation is done based on the fact that every element in the matrix has a grey value g, 0 ≤ g ≤ 255, a position v and a direction d. A single pixel may not have a direction, but for a group of pixels, the grey value may change hence defining a concrete direction. Grouping pixels is important as isolated components may not be robust. Transformation. Each sub-image Ai , 1 ≤ i ≤ p, is used to per- form the SVD transformation. As a result for each Ai a unitary reduction to the diagonal form is performed to obtain Ui Si Vi , 1 ≤ i ≤ p, such that Ai = Ui Si ViT . As such the SVD selects the opti- mal basis vectors in the L2 norm3 sense such that, for any m × m real matrix Ai , we have Figure 3: Fingerprint region of interest. (σk , − →, − u → k vk ) = arg mina,→ − k−1 − →− →T − a− y |A − Σl=1 σl ul vl x ,→ − → x−→ y T |2F Pre-processing. As a first step the biometric image may be pre- where: 1 ≤ k ≤ m; a ∈ R; − →x ,− → y ∈ Rm ; σ1 ≥ σ2 . . . ≥ σm processed so as to obtain a clear well focused biometric image I. → − → − are singular values, {ui } and { vi }, 1 ≤ i ≤ p, are the corre- Pre-processing provides an effective region in a selected biometric sponding singular vectors; and (·)T is the transpose operator [30]. image for subsequent feature extraction. We support three types of By using the SVD we preserve both the magnitude of the impor- biometric data: face, iris and fingerprint. tant features in singular values and also their location geometry in the singular vectors. The combination of the left most and right For the specific case of fingerprint image, as a part of pre-processing, most singular vectors which correspond to the largest singular val- the region of interest (ROI) is identified (See step 2 of Algorithm 1). ues, in turn, captures the important geometric features in an image The unique characteristics of the fingerprint are known to be around in the L2 norm sense. Therefore as a next step for each Ai , − → ui , the core point or delta point [54]. The outside portion of a fin- → − that is, the first left singular vector and vi , that is, the first right gerprint is generally prone to small translations and is typically singular vector are retrieved. Those vectors are then combined in cropped out. Also, a larger area of the central portion of finger- Γ = {− →, . . . , − u 1 →, − u → → − p v1 , . . . , vp }. tip skin is in contact with the scanner surface as compared to the peripheries, giving a better image. The center is also better for live- The next step is to form a pseudorandom (based on pseudorandom ness analysis. Since data such as the rate of perspiration can be numbers) smooth secondary image J from Γ. J is formed accord- measured, the center region is also more robust to pressure disper- ing to an iterative process, at each step of which an element from Γ sion as compared to the other regions. Importantly, as the exper- is selected and added to J. As a first step an element is pseudoran- imental results show, it preserves enough information to identify domly selected from Γ and set at the first column of J. Then for the individuals. The procedure to determine the ROI corresponds to ith column of J, an element from Γ is selected such that it is clos- steps 6-15 of Algorithm 1 (see Figure 3). This ROI is then used est to the (i − 1)th column of J in the L2 norm sense as denoted as an image input for the rest of the algorithm (step 15 of Algo- in step 39 in Algorithm 1. An element can only be chosen once rithm 1). from Γ, therefore an element chosen at the ith step cannot have been chosen at any of the previous (i − 1)th steps. Hence after 2p Feature Extraction. Once the image I of size n × n is finalized, steps all the elements of Γ are pseudo-randomly reordered to form the features are extracted based on a random region selection. The the secondary image J of size m × 2p. Note that the secondary im- selection is executed by choosing p semi-global regions based on age is required to ensure the one-way property of the SVD image a pseudorandom (PR) generator that uses a secret key r. The ob- hashing algorithm (See the analysis in Section 4). tained matrices corresponding to the selected sub-images (denoted by ρi ) are then transformed under matrix invariant functions such Once J is formed, SVD is re-applied to it, to finally obtain the as SVD. image hash vector (steps 49 – 52 of Algorithm 1). The left and right singular vectors are obtained by J = UJ SJ VJT . Then the The random partitioning of the image introduces unpredictability in the hash values and hence increases the security of the overall sys- 3 L2 norm, !"defined for a vector − → x = {x1 , . . . , xn } is denoted by tem. As long as these sub-images are sufficiently unpredictable, the − → |x| = n |x 2 |. k=1 k 87 singular vectors corresponding to the largest singular values, that is, the first left (− u→ − → J ) and the first right (vJ ) are chosen. These vectors → − are simply combined to obtain the final hash value H = {− u→ − → J , vJ }. Algorithm 1 Generic Biometric Image Hashing Algorithm 2.3 SVM Classification Require: Biometric image I Ensure: The quality of the image is suitable based on biometric. As discussed in the previous section, from one input biometric sam- 1: Input biometric image I → − ple, a hash vector H = {− u→ − → J , vJ } of length m + 2p is obtained. {Pre-process fingerprint images to calculate ROI} Since the hash vectors obtained from different biometric samples of 2: if (type(I) == ’fingerprint’) then the same user may be the same or may differ from sample to sam- 3: point1 = Algorithm R92(I) {Compute core or delta point} ple, we train a classifier to determine which hash values correspond 4: size = 4 {Set fingerprint ROI threshold size} 5: count = 0 to a given user (or class), so that at the time of verification, the clas- 6: for each line i in orthogonal directions (N,S, E, W) do sifier can identify the correct class of the user. To achieve this goal 7: repeat several biometric samples of different users are taken. Algorithm 1 8: increment length of line; is run on each sample to get the corresponding hash vector. 9: if line encounters a ridge then 10: pointi = coordinate of intersection of line and ridge These samples are then divided into training and test data to per- 11: count++ 12: end if form the classification. We use K-fold cross-validation to divide 13: until (count !=size) the training and testing data. All sample hash vectors are parti- 14: end for tioned into K subsamples. Of the K subsamples, a single subsam- 15: I = crop(point2 , point3 , point4 , point5 ) ple is retained as the validation data for testing the model, and the 16: end if remaining K - 1 subsamples are used as training data. The cross- 17: Let resultant image I ∈ Rn×n be of size n × n {Random Selection} validation process is then repeated K times (the folds), with each of 18: Let p be the number of rectangles the K subsamples used exactly once as the validation data. The K 19: Let ρi be the ith rectangle and m be the height/width of ρi . results from the folds are then averaged to produce a single estima- 20: for each i where 1 < i < p do tion [2]. 21: Randomly position rectangle ρi at (xi , yi ) such that xi + m < n and yi + m < n The obtained hash vectors do not greatly differ with respect to 22: Let Ai be the “sub-image” that is formed by taking the portion of image that is in ρi : Ai ∈ Rm×m , 1 ≤ i ≤ p. the Euclidean distance, as inferred through experimental analysis; 23: end for therefore we use SVM techniques to map the input hash vectors 24: {First SVD Transformation} onto a higher dimensional space where a maximal separating hy- 25: for each Ai where 1 ≤ i ≤ p do perplane can be constructed. 26: Ai = Ui Si ViT {Collect singular vectors corresponding to the largest singular value} As explained in Section 2.1 the hyperplane constructed using SVM 27: − → ui = first left singular vector is such that it has the maximum distance to the closest points of the 28: − → vi = first right singular vector 29: end for training set. These closest points in the training set are called sup- 30: Γ = {− →, . . . , − u 1 u→, − → → − p v1 , . . . , vp } port vectors. Here we use the Gaussian radial basis kernel function 31: Initialize secondary image J[m, 2p] {Constructing secondary image → − − → → − → − 2 2 → − from singular vectors} (RBF for brevity) K( H i , H j ) = e−| H i − H j | /2γ where H i and → − 32: for all c where 1 ≤ c ≤ 2p do H j are two of the training samples and γ > 0. 33: Initialize variable ec corresponding to element in Γ 34: if c = 1 then During training, two specific parameters have to be assessed, namely 35: ec = P R Select(Γ) γ used in the RBF kernel function and the penalty parameter C used 36: else in the evaluation of an optimal hyperplane balancing the tradeoff 37: var loop = true 38: while var loop do ! between error and margin. To select the pair with the best CV ac- "c−1 curacy, all combinations of C and γ are tried using a grid search 39: ec = min2p k=1 ( 2 l=1 (J(l) − Γ(k)) ) method [8]. After training, the SVM model encodes all the classes 40: if not(ec already chosen for J) then that this SVM classifier has been trained with. 41: var loop=false 42: end if 43: end while Note that an increased number of classes increases the number of 44: end if choices for an attacker executing guessing attacks on the SVM 45: for all r where 1 ≤ r ≤ m do model, to guess the right BID. Additional classes can be added to 46: J[r][c] = ec [r] the original SVM classifier model by training additional samples 47: end for of the given biometrics. These samples have to be carefully added 48: end for {Second SVD Transform} as the added classes, which do not resemble the original biometric 49: J = UJ SJ VJT {Collect singular vectors corresponding to the largest classes, would most likely be easily ruled out by an attacker. We singular value} therefore employ a strategy to make the additional classes similar 50: u−→ = first left singular vector J to the original set of classes. For each class in the SVM model 51: −v→J = first right singular vector → − we define a protector class which is similar to the original class so 52: H = {− u→ − → J , vJ } that the cluster formed by the protector class is close to the origi- → − 53: return Hash Value H nal SVM class, and yet is different enough to be distinguished as a different class. There could be different ways of obtaining the protector classes. The first is to find biometric images of different individuals which look perceptually similar. The second possibil- 88 ity is to add noise to the original biometric image. For example, the face images could be modified to render naturally asymmetric features to symmetric or changing other specific aspects as the size of the face characteristic such as the eyes, nose and so on. If there are n original classes, then we add a protector class for each, thus resulting in 2n classes. We also add other spurious classes which are not similar to the original biometric samples (as the protector classes) but are of the same biometric type. As a final step, a combination of the classes is chosen based on SVM ranking which provides class prediction confidence of the SVM classifier. More specifically if n is the total number of classes, the final BID is the label of class with the highest confidence la- bel and an unordered combination of the top t = n2 class labels which are listed with decreasing confidence levels. For an attacker to guess# the Figure 4: Plot of different values of number of sub-images (p); the $ BID, given the SVM classes, the number of choices # $ is n + nt resulting in the final number of bits as log2 (n + nt ). image size of sub-images (m); and the corresponding CV accuracy. Considering the FAR for the primary class the final#number $ of bits would be M IN [log2 (n), − log 2 (F AR)] + log2 ( nt ). We typi- cally consider the total number of classes n > 69 which leads the number of choices to be > 264 , thus making it computationally hard for the attacker to guess the right BID. 3. EXPERIMENTS In this section we summarize the experimental results we conducted to assess the accuracy and robustness of our approach. We carried out extensive tests for different biometrics, to demonstrate that the relevant criteria required for the security, repeatability and unique- ness of the BID are met. All experiments have been conducted us- ing Microsoft Windows XP Professional 2002 Service pack 1 oper- ating system, with Intel(R) Pentium(R)4 3.20GHz and memory of 512MB. Figure 5: J2 histogram of iris classification. 3.1 Dataset and Experimental Setup [10, . . . , 100] and the value of m between [10, . . . , 100] (See Fig- We tested our hashing algorithm (Algorithm 1, Section 2.2) on fin- ure 4); the highest accuracy was found for p = 50 and m = 30. gerprint, iris and face data. Summary information about the data used and the obtained results is reported in Table 2. For finger- The code for implementing the various steps is written in MATLAB prints we used FVC [34] databases. The FVC dataset used con- and the rand() function of MATLAB is used as the pseudo ran- sists of overall 324 fingerprint images of 59 individuals collected dom function used in step 21 and 35 of Algorithm 1. The size of the using thermal sweeping and optical sensors. We also used 50 im- secondary image J is 30 × 100 leading to the size of −u→J = 30 × 1 → − ages of 10 individuals generated using the synthetic fingerprint gen- and vJ = 100 × 1, thus resulting in a hash vector H = {− − → u→ − → J , vJ } erator SFingeGe v3.0 [7]. Regarding the iris data, the UBIRIS of 130 dimensions. iris Database3 [44] was used which consists of 1695 images of 339 individuals’ eyes. Finally for the face data we used the Yale For the SVM classification we adopted the LIBSVM [8] package to Database of Faces [20] containing 100 images of 10 individuals generate the hash vectors and build the final classifier model. This and the AT&T Database of Faces [1, 46] containing 400 images uses the RBF as the kernel function. Based on experimental analy- of 40 individuals. We evaluated our results using the SVM clas- sis, C was set to the range {25 , . . . , 215 } and γ to {2−5 , . . . , 23 }. sification algorithm, with K-fold cross validation (CV). Based on All combinations C and γ were tried using grid search to select the the CV accuracy, the False Acceptance Rate (FAR) was calculated. best CV accuracy based on the input data. The FRR is calculated as 1 − CV Accuracy, whereas the FAR is calculated as the number of false accepts divided by the number of − → Image type n p m J size H size tries. Fingerprint/Iris/Face 128 50 30 30 × 100 130 The values used in the experiments for the key parameters of Al- Table 1: Parameter values for experiments on Algorithm 1. gorithm 1 are reported in Table 1, where n is the size of the image in pixels, p is the number of sub-images, m is the size in pixels for each of the sub-images, and J is the secondary image. 3.2 Experimental Results We now discuss the results of the experimental evaluation of our To assess the optimal values for p and m, we ran experiments approach. First, regarding the time performance, on the average, with various possible combinations of the values and used the one the hash vector from any given image is generated in 0.9597 sec- which provided the maximum accuracy. For example for the finger- onds. The generation of SVM model for about 220 persons’ hash print database FVC2004 DB3 B, the value of p was varied between vectors takes 3 or 4 hours. At the testing stage, once the model is 89 # Biometric Database Name Description # Im- # CV Accu- FRR % FAR % Type ages Persons racy % 1. Finger-print FVC2004, DB3 B 300 × 480, Ther- 54 9 92.59 7.41 9.26 ×10−03 mal Sweeping Sensor 2. Finger-print FVC2004, DB3 A 300 × 480, Ther- 150 30 97.33 2.67 9.21 ×10−04 mal Sweeping Sensor 3. Finger-print FVC2004, DB2 328 × 364, Opti- 120 20 85.83 14.17 7.46 ×10−03 cal Sensor 4. Finger-print SFingGe v3.0, Syn- 288 × 384 50 10 88 12 1.33 ×10−02 thetic Generator 5. Iris UBIRIS.v1 Sessao 1 800 × 600 − 24 1100 220 87.73 12.27 5.6 ×10−04 bit color 6. Iris UBIRIS.v1 Sessao 2 800 × 600 − 24 595 119 97.65 2.35 1.99 ×10−04 bit color 7. Face The Yale Face 640 × 480 − 8 bit 100 10 99 1 1.11 ×10−03 Database B gray scale 8. Face AT & T Databases of 92 × 112 − 256 400 40 98.25 1.75 4.49 ×10−04 Faces bit gray scale Table 2: Summary of the experimental results of all biometric data types. generated, it takes approximately 0.001 second to classify the test the optical sensor degrades. The skin humidity does not affect the images. image quality of the thermal sensor because it is the sweeping type. Moreover, regarding pressure, for optical sensor the foreground im- Regarding the experimental results, the obtained results largely con- age is smaller for low pressure, while the fingerprint is smeared for firm the correctness of our algorithm: in each of the test cases, high pressure. This is again not true for thermal sweeping sensor the accuracy was above 85% cross validation. False acceptance where the image quality is not significantly affected. rates were within the interval [1.99 × 10−04 , 1.33 × 10−02 ], which translates into the assurance that the chances of accepting an in- Note that the last data set was composed of artificially generated correct biometric image are low. The worst observed FAR value is images. We experimented with synthetic fingerprint images as they 1.33 × 10−2 , which interestingly is obtained for the images gener- potentially supply non-biased images and can be created at a low ated by the synthetic fingerprint generator, where the conditions for cost. It was difficult to control the randomness which lowered biometric generation were generally better controlled (e.g., there the cross validation classification accuracy to 88%. We believe was no unexpected noise because of human interaction). Regard- the results could be improved using synthetic generator version ing FRR, the worst observed FRR value was in conjunction with which generates several samples corresponding to a single indi- the worst accuracy results since the FRR result is dependent on the vidual, maintaining the invariant features of an individual for all accuracy (see previous section). The worst rate amounts to 14% samples. (test case n. 3) and it is still acceptable, as it is in the same order of similar biometric key generators [24]. Additional insights spe- Iris. We used the UBRIS.v1 Sessao 1 (Session 1) and UBRIS.v1 cific to the different types of tested biometrics are discussed in what Sessao 2 (Session 2) [44, 43] iris databases. For the first image follows. capture session, noise factors, such as reflections, luminosity and contrast, were minimized. In the second session the capture place Fingerprint. Two types of Fingerprint Verification Competition was changed to introduce a natural luminosity factor. Images col- (FVC) databases [34] corresponding to two types of sensors were lected in the second session simulated the ones captured by a vi- used for the fingerprint biometric experiments. The sensors highly sion system without or with minimal active participation from the influence the quality of fingerprint images. We define the quality subjects, adding possible noise to the resultant images. Note that of the fingerprint image according to three criteria [28]: (i) high when capturing iris images, some pre-processing is performed. A contrast between ridges and valleys, (ii) the image area foreground, sequence of images is obtained rather than a single image. Not all and (iii) little scar or latency. As shown by the results, the CV cross images in the input sequence are clear and sharp enough for recog- validation is above 85% for each data set considered, which con- nition. The images may be out of focus, or contain interlacing lines firms the validity of our approach. A first important consideration caused by eye motion or have severe occlusions caused by eyelids suggested by the experimental results is that the algorithm performs and eyelashes. Therefore, only high quality images from an input better in case of large data set (as in the test case n. 2 in Table 2), sequence are included in the final database. most likely because of the more accurate training and testing during the configuration phase which helped in finding the optimal config- Face. We used two databases for these experiments. The first one uration parameters. We also notice that on average our algorithm collected good quality images, in that photos were taken with sub- performs better when using the thermal sensor than when using the jects in frontal pose. Thus the resulting cross validation accuracy optical sensor because the thermal sensor captures better quality was 99%. The second set of tests was performed on images taken fingerprint images. We can explain this result by elaborating more at different times, varying the lighting, facial expressions (open / on how the quality is affected, in that the quality of the fingerprint closed eyes, smiling / not smiling) and facial details (glasses / no image is affected by several human factors such as skin humidity glasses). All the images were taken against a dark homogeneous and pressure. If the skin humidity is lower, the image quality of background with the subjects in an upright, frontal position with 90 tolerance for some side movement. Despite this, the overall cross Type n spurious η # bits validation accuracy of this database was 98.25% although the false Fingerprint 69 - 2.84 × 1019 64 rejection rate increased by .75%. Fingerprint 139 69+1 2.36 × 1040 134 Iris 220 - 4.52 × 1064 214 4. ANALYSIS Iris 119 - 2.43 × 1034 114 We start with proving some key properties related to uniqueness Face 101 50+1 1.01 × 1029 96 and repeatability and security properties of the BID generation al- gorithms. Based on such results we analyze privacy aspects and Table 3: Summary of number of SVM classes and entropy. discuss how to prevent from possible attacks. → − it is computationally hard, given the BID hash vector H to recon- 4.1 Uniqueness and Repeatability struct the original biometric image. We prove this result by the A criterion frequently used for assessing uniqueness and repeatabil- following two theorems. First, we prove that it is hard to construct ity in classification is the J2 function [32]. The key idea of the J2 the secondary image from the vector, which is required for recon- function is to compare the within-class distance of the various hash structing the original biometrics. The result (Theorem 2) shows vectors (or elements being classified) belonging to a given class, that even if the second image is constructed or attacked, it is still with the between-class distance among the various classes. There hard to obtain the original biometric image I. Our results are based are two key steps to be taken while evaluating J2 . on the combination of mathematical properties of the SVD and the employed hashing technique. The first step is to evaluate the within-class scatter matrix Sw : Sw = ΣM i=1 Si Pi where M is the total number of classes; Si = E[(x − µi )(x − µi )T ] is the covariance matrix4 for a class denoted T HEOREM 1. Let − → u J and − →v J be the vectors which form the by wi where E is the expected value function, x is any vector in final hash value H(uJ , vJ ), and let λi be non-zero eigen values of class wi and µi is the mean vector of class wi ; and, Pi = ni /N the matrix J T J where J is the secondary image. If there is no λi where ni is the number of samples in class wi and N is the total that is dominant, then it is computationally hard to construct the number of samples in all the classes. secondary image from H(uJ , vJ ). The second step is to evaluate the between-class scatter matrix Sb :Sb = ΣM i=1 Pi (µi − µo )(µi − µo ) T where µo = ΣM i=1 Pi µi Because in our theoretical results the assumption that there is no is the global mean vector of all the classes. dominant eigenvalue is crucial, we have carried out extensive an experimental analysis on the biometric images to assess whether From the above a covariance matrix of feature vectors with respect such assumption holds. Our experimental results show that such to the global mean is evaluated as Sm = Sw + Sb . Finally the J2 assumption holds because of the smoothness of the secondary im- criterion is calculated as: J2 = |S m| |Sw | As it is evident from the equa- age. A proof sketch of the theorem is reported in Appendix A. tion, for good repeatability of correct classification (small within- class distance), and uniqueness (large between-class distance) the T HEOREM 2. Given the secondary image it is computationally value of J2 should be large. hard to obtain the original image I. We carried out additional experiments on all the datasets to estimate J2 and obtained average values of J2 for fingerprint as 1.2712 × Proof Sketch in Appendix A. 1081 , iris as 1.5242 × 10303 and face as 3.7389103 . These values of J2 and the corresponding classification accuracy (See Table 2) As a final remark we note that even if the attacker is able to retrieve provide empirical evidence that the algorithm satisfies the unique- the biometric image, it cannot reconstruct the hash vector without ness requirement on the biometric hashes generated based on the the knowledge of the secret random value needed during the selec- biometric datasets provided. tion of the p sub-images and to pseudorandomly combine them to form the secondary image J. For clarity, we provide an example of a J2 histogram for the Iris Session 1 database in Figure 5 (data corresponding to test case n. 5 in Table 2). Note that the J2 metric requires the calculation of 4.3 SVM Classes and BID Space within class and between class distances of all the possible pairs From the empirical analysis during the classification experiments of data elements. The y axis in the histogram presents the values provided in Section 3, we observe that if n is the number of classes, of log(J2) class distances between any two classes. For instance and these classes are listed in decreasing order of their confidence for a value (120(x-axis),100(y-axis)) means that there are 100 class level, the highest confidence class is the same and the unordered distances which have the J2 value of 120. If there are all together set of the following t classes where (n − 1) ≥ t ≥ n2 is the same |C| number of total classes then the possible permutations of the for the multiple testing rounds in the K-fold validation. In general, for most SVM classification experiments for all three biometrics, distances to be tested are |C|×|C−1| . 2 the ordering of several of the t classes was swapped with the neigh- boring classes. Therefore for the final label which denoted the final 4.2 Biometric Image Keyed Hashing BID value, we use the class with the highest confidence followed We analyze the one-way security property of the SVD based bio- by an unordered combination of the next t classes. For an attacker metric image hashing algorithm. More specifically, we show that to guess the right key based on# the $ classifier model, the number of 4 Covariance is the measure of how much two random variables choices would be η = n + nt , under the assumption that each vary together. A covariance matrix is a matrix of covariances be- class has the same likelihood. Based on the uniqueness analysis tween elements of a vector. from the J2 metric we observe that the samples considered have 91 large inter-class distances, thus avoiding centroid formations that confidentiality properties that the attacker cannot recreate the hash would narrow down the attacker’s number of choices. As part of values given the biometric image and also cannot link a BID to an future work, we plan to further investigate inference-based attacks actual individual. on the SVM model, which could potentially help the attacker make better guesses about the combination of classes used for generating 4.4.2 Security Analysis the BID. Security in our system is given by the difficulty of perpetrating im- personation attacks. As noted from the experiments n in our case ranges in the interval [69, 220]. Based on the value of n, the resulting η ranges in the We make two key assumptions in order to achieve a high-assurance interval [264 , 2214 ]. η is proportional to the number of bits needed BID generation. First, we assume that the sensor which captures to encode the BID. More precisely the number of bits, considering the biometric image is able to detect live images and does not leak the FAR# $for the primary class, is M IN [log2 (n), − log2 (F AR)] + the image or information about the image. Second, we assume that log 2 ( nt ). This results in the number of bits ranging in the interval the pseudorandom hashing secret used in Phase 1 is not compro- [64, 214]. A summary of the experimental data corresponding to mised. If at least one of the two assumptions holds, then the BID the biometric type, n, η and final number of bits of the BID is cannot be compromised, as elaborated further in the analysis below. provided in Table 3. We now focus on an attacker trying to impersonate a given user 4.4 Privacy and Security Analysis based on the BID and show how our approach withstands these We now analyze the relevant privacy and security properties of our types of attacks. We analyze the attackers’ options by considering technique, based on the above results. In addition we briefly an- each of the secrets involved in the system. alyze how our commitment technique is employed in the multi- factor approach to identity verification. The various possible points of attack include (A) biometric image; (B) hashing secrets; (C) classifier model used in Phase 2 (see Fig- 4.4.1 Privacy Analysis ure 2); (D) BID and possibly additional secrets and components Privacy in our context includes the following properties: unlink- depending on other cryptographic components used. The secrets ability of the BID to the source biometric image, anonymity and of the system are the hashing secrets used in Phase 1 and the ran- confidentiality. dom commitment secret which is used together with the BID to create the cryptographic commitment. The classifier model is not Unlinkability: Unlinkability refers to the impossibility of linking assumed to be secret. Precisely, the classifier model can be re- the BIB with a source biometric image. This property holds in our vealed without jeopardizing the protocol security if the number of approach as a consequence of the irreversibility results of Theo- classes n is greater than 69. This is because n > 69 (69 is the mini- rems 1 and 2. The one-way nature of the BID generation process mum sample size used in our experiments) would make the number guarantees that there is no way to reconstruct the biometric image of possibilities greater than 264 thus ensuring computational hard- from the BID. ness. As described in Section 4.3, increasing the value of n by adding classes increases the keyspace; making it computationally Confidentiality: Confidentiality refers to keeping the biometrics hard for an attacker to perform a brute force attack. confidential throughout all the processing steps of the BID life- cycle. We protect confidentiality of the image as follows. First, A B C D E Attack Prevention Summary once the biometric image is captured, the conversion phase only 1 × BID cannot be created without hashing secrets. requires the hashing secrets and the SVM classifier model (referred 2 × × BID cannot be created without clas- to as the meta-data). Specifically, only the classifier model is per- sifier model. manently recorded by the system. During the verification phase, 3 × × The classifier model does not al- low inference of the hashing secret only the hash values obtained after processing the biometric im- needed construct BID. ages are used. Clear text images and templates are not required, so 4 × × × × The BID is compromised, but the as to minimize information exposure. Therefore the only code that commitment secret prevents from creating ZKPK. needs to be trusted to assure confidentiality of the biometric image 5 × The BID is compromised, but the is the code that given the initial image generates the hash value. commitment secret prevents from Such code must be trusted not to leak the image and to discard the creating ZKPK. No other secrets image once the hash value has been generated; the code is small are leaked. 6 × × × All stored information is compro- and thus can be easily verified. We remark that confidentiality is mised but the BID cannot be cre- preserved even in case an attacker gains partial information related ated without biometric image. to the BID. Since the BID and the biometric image are unlinkable, the confidentiality of the biometric image is preserved, as given the Table 4: Possible security attacks [key: (A) biometric image (B) BID, given the unlinkability of the BID with the biometric image. hashing secrets (C) classifier model (D) BID (E) commitment secret; ×: the value is known to the attacker]. Anonymity: Anonymity refers to the property that prevents an in- dividual to be identifiable within a set of subjects [42]. Our ap- To succeed in an impersonation attack the attacker needs to know proach also assures anonymity, provided that no other identifying all the secrets required to create the BK. In order to gather the other information is used in combination with the BID ZKPK proofs secrets, the attacker would have to pass the verification methods needed for verification. The generated BID, in fact, does not re- and compromise the system. Bypassing the cryptographic ZKPK veal any unique physiological information about the user’s iden- protocol is computationally hard [18, 5]. Additionally, the crypto- tity which is one of the key problems in typical matching based graphic ZKPK protocol prevents replay attacks: the attacker cannot biometric verification. Also it follows from the unlinkability and use the proofs created during a given biometric verification process 92 in any another verification process. Table 4 provides a summary of parable to the above scheme. From the algorithmic point of view, the various cases in which one or more secrets are compromised, we use a similar concept of chaff points while adding spurious and reports possible security implications. Case 1, 2 and 3 address classes to make it hard for the attacker to guess the correct final the cases in which the biometric image is known to the attacker, but key. We do not use ECC to retrieve the final key, but plan to in- not the meta-data, which includes the hashing secret and classifier vestigate how ECC can be used while finding a list of SVM classes model, nor the random secret in the BID commitment, which are uniquely ordered by the confidence measures (See Section 4.3). A stored by the user. Thus, in these cases the attacker is not able to major difference of our approach with respect to the stage-one ap- generate the BID. However, if the attacker knows the BID, then to proaches of the various implementations of the fuzzy-vault is that perform successful verification it also needs the commitment se- their feature extraction is specific to the type of biometrics. Depen- crets. This scenario is summarized by case 4. As noted earlier the dence on specific features has led to brute force attacks on several knowledge of the BID does not reveal any information about the fuzzy vault implementations [35]. In our case, we instead use im- biometric image or the secrets involved as shown in case 5. age analysis which can be used for several generic 2D biometric images such as fingerprint, iris and face. Finally, an interesting case is when the stored information including the meta-data and the commitment secret are compromised (case Another scheme which makes use of the polynomial reconstruc- 6). In this case, the attacker’s best choice as a source of information tion problem in the second-stage is the scheme proposed by Mon- is the SVM model. However as we show in Section 4.3, for number rose et al. which was originally used for hardening passwords us- of classes n > 69, the number of choices > 264 which makes it ing keystroke data [39] and then extended for use in cryptographic computationally hard for the attacker to guess the right BID. key generation from voice [38]. Let us consider the case when m biometric features are recorded at stage-one. When the system is initialized the main key κ and 2m shares of κ are generated us- 5. RELATED WORK ing generalized secret sharing scheme. The shares are arranged Biometrics-based key generation has been extensively investigated within an m × 2 table such that κ can be reconstructed from any in the past years. As mentioned earlier, the biometrics-based key set of m shares consisting of one share from each row. The selec- generation is characterized by two stages. At the first stage certain tion is based on the biometric features recorded. Monrose et al. biometric features are used to compute a bit string representing that show that it is computationally infeasible for an attacker to guess biometrics. The bit string is then used in the second stage to gen- the right shares because of the random or spurious shares present erate a unique cryptographic key with the help of stored meta data. in the table. We also add spurious classes in the SVM classifica- If two instances of the bit strings are sufficiently similar then the tion model to make it infeasible for the attacker to guess the BID. cryptographic key generated is the same. In most approaches, the Moreover, the features they capture in stage-one for key stroke [39] second stage is independent of the biometrics being used, whereas are durations and latencies, whereas for the voice [38] are the cep- the first is mostly biometric-specific. tral coefficients. Their experimental evaluation shows an average about 20-30% FRR. This biometric encoding of voices is not com- The first approach to biometrics-based key generation is by Soutar parable with ours as we consider different biometrics which can be et al. [50, 49, 48]. They developed methods for generating a repeat- represented in 2D images. able cryptographic key from fingerprints using optical computing and image processing techniques. Following Soutar’s work several Several of the techniques have been recently extended in the con- strategies have been proposed for improving the second-stage of the text of bio-hashing [33, 29, 12]. The approaches closest to ours are key generation. Davida et al. [15] described a second-stage strat- the bio-hashing techniques by Goh and Ngo [21, 41] who propose egy using error correcting codes (ECC) and how it could be used techniques to compute cryptographic keys from face bitmaps. Bio- with first-stage approaches for generating a bitstring representing hashing is defined as a transformation from representations which iris scans[14]. The second-stage approach was significantly im- have a high number of dimensions and high uncertainty (example proved by Juels et al. [26, 27]. The underlying intuition behind face bitmaps) to representations which have a low number of di- the error correction and similar schemes can be understood based mensions and zero uncertainty (the derived keys). Like our work, on Shamir’s secret sharing scheme [47]. The hardness of Shamir’s the goal of using the image hashing techniques is to extract bits secret sharing scheme is based on the polynomial reconstruction from face images so that all similarly looking images will produce problem which is a special case of the Reed-Solomon list decoding almost the same bit sequence. However, the work mainly focuses problem [4]. In fuzzy vault scheme proposed by Juels [27] based on the first stage of biometrics-based key generation and proposes also on ECC, the user adds spurious chaff points which make it in- the potential use of Shamirs secret sharing techniques [47] in the feasible for an attacker to reconstruct the polynomial representing second stage. With respect to the first stage, Goh and Ngo use prin- the BK. cipal component (PCA) analysis for analyzing the images. This is similar to our use of SVD, as both SVD and PCA are common tech- Since the introduction of the fuzzy vault scheme, several researchers niques for analysis of multivariate data. There is a direct relation have implemented it in practice [11, 57, 17, 10, 19, 51, 40]. In par- between PCA and SVD in the case in which principal components ticular the most recent work is by Nandakumar et al. [40] where are calculated from the covariance matrix. An important capabil- the fuzzy vault implementation is based on the location of minutia ity distinguishing SVD and related methods from PCA methods is points in a fingerprint. They generated 128 bit keys and obtained the ability of SVD to detect weak signals or patterns in the data an accuracy rate of 91% for high quality images and 82.5% for which is important in our case as we propose to use our techniques medium quality images. The FRR was approximately 7% which for generic 2D biometric images. The methodologies we employ shows an improvement over several other implementation of this for stage-one also differs in that the biometric hash vector output scheme (where the average FRR was from 20-30%). From the ex- from stage-one cannot be simply distinguished using straight for- perimental point of view, we generate 134 bit keys with the accu- ward implementation of hamming distance based analysis as pro- racy of 94.96% for high quality images and 86.92% for medium posed in [21, 41]. We instead combine stage-one and stage-two quality images. The FRR was on an average 9.06% which is com- 93 with the use of SVM classifiers in stage-two which provides a way [2] K-Fold Cross Validation. http: to analyze the properties such as inter and intra-class distance of the //en.wikipedia.org/wiki/Cross-validation. biometric hash vectors. We provide a detailed analysis of our ap- proach which has not been developed in earlier bio-hashing work. [3] A. Bhargav-Spantzel, A. C. Squicciarini, R. Xue, and E. Bertino. Practical identity theft prevention using There are other biometric cryptosystems in which biometric au- aggregated proof of knowledge. Technical report, CS thentication is completely decoupled from the key release mecha- Department, 2006. CERIAS TR 2006-26. nism. The biometric template is stored on the device and when the [4] D. Bleichenbacher and P. Q. Nguyen. Noisy polynomial biometric match happens, the cryptographic key is released [52]. interpolation and noisy Chinese remaindering. Lecture Notes This approach however has several vulnerabilities and is not related in Computer Science, 1807:53–77, 2000. to our key generation approach. [5] J. Camenisch and A. Lysyanskaya. Efficient non-transferable 6. CONCLUSION anonymous multi-show credential system with optional anonymity revocation. In B. Pfitzmann, editor, Advances in In this paper we have presented a novel approach for generating Cryptology — EUROCRYPT 2001, volume 2045 of Lecture BIDs from 2D biometric images. These BIDs can be used to- Notes in Computer Science, pages 93–118. Springer Verlag, gether with other identity attributes in the context of multi-factor 2001. identity verification techniques. In the proposed approach the se- cure management of the BID’s random secret is an important issue. [6] J. Camenisch and A. Lysyanskaya. Signature schemes and To address such issue there are approaches that provide a secure anonymous credentials from bilinear maps. In Advances in and usable way to manage and store those random secrets. One Cryptology – CRYPTO ’04, 2004. such approach [56] uses cellular phones based on NFC (Near Field Communication) technology and allows users to store secrets on [7] R. Cappelli. SFinGe: an approach to synthetic fingerprint the phone as well as to split them among various phone compo- generation. In International Workshop on Biometric nents (including an external card) and also on an additional exter- Technologies (BT2004), pages 147–154, Calgary, Canada, nal device for increased security. From the user side, configuration June 2004. is very easy in that the user has a menu with three security levels [8] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support (low, medium, high) among which to choose. Each such level cor- vector machines, 2001. Software available at responds to a different splitting strategy. We refer the reader to [56] http://www.csie.ntu.edu.tw/ ˜cjlin/libsvm. for more details. [9] D. Chaum and T. P. Pedersen. Wallet databases with In addition to the technical solution provided in the paper, we have observers. In CRYPTO ’92: Proceedings of the 12th Annual also investigated organizational requirements based on the poten- International Cryptology Conference on Advances in tial scenarios where our approach would be most likely used5 . In Cryptology, pages 89–105, London, UK, 1993. particular, the security of the initial enrollment is crucial for the Springer-Verlag. overall process. We have developed cases in which enrollment has high assurance and it is performed at controlled and secure enroll- [10] Y. Chung, D. Moon, S. Lee, S. Jung, T. Kim, and D. Ahn. ment points. By contrast, in a non-secure enrollment, additional Automatic alignment of fingerprint features for fuzzy verification steps are needed to attest the biometric key generation fingerprint vault. In In Proceedings of Conference on software and the storage medium used for storing the user secret Information Security and Cryptology, pages 358–369, keys. We have thus explored the possible media used to store the Beijing, China, Dec. 2005. secrets and benchmarked them to identify the most suitable media. [11] T. C. Clancy, N. Kiyavash, and D. J. Lin. Secure smartcard Similar considerations apply to the verification locations, which based fingerprint authentication. In WBMA ’03: Proceedings may be protected or unprotected. Such analysis has been instru- of the 2003 ACM SIGMM Workshop on Biometrics Methods mental for clarifying the relevant preconditions that need to be met and Applications, pages 45–52, New York, NY, USA, 2003. to successfully apply our approach, and to identify possible non- ACM Press. technical limitations. [12] T. Connie, A. Teoh, M. Goh, and D. Ngo. Palmhashing: A We plan to further investigate possible attacks on the classification novel approach for cancelable biometrics. Information model to see if guessing attacks can reduce the entropy of the bio- Processing Letters, 93(1):1–5, 2005. metric samples considered. The η provided in Section 4 assumes [13] C. Cortes and V. Vapnik. Support-vector networks. Machine that there are no guessing attacks as the J2 value is high. However, Learning, 20(3):273–297, 1995. there may be additional attacks such as those discovered by Mi- hailescu in [35] relevant to Fuzzy Valut schemes where the entropy [14] J. Daugman. Biometric personal identification system based of the scheme was significantly reduced as a result of the attacks. on iris analysis. In United States Patent, 1994. [15] G. Davida, Y. Frankel, and B. Matt. The relation of error 7. REFERENCES correction and cryptography to an offine biometric based [1] AT & T Databases of Faces. identication scheme. In Proceedings of WCC99, Workshop http://www.cl.cam.ac.uk/research/dtg/ on Coding and Cryptography, 1999., 1999. attarchive/facedatabase.html. 5 Details concerning the organizational requirements for our bio- [16] R. Dhamija and J. D. Tygar. The battle against phishing: metric verification protocols are reported in a technical report, Dynamic security skins. In SOUPS ’05: Proceedings of the which we are unable to refer because of the double blind review 2005 Symposium on Usable Privacy and Security, pages requirements. 77–88, New York, NY, USA, 2005. ACM Press. 94 [17] Y. C. Feng and P. C. Yuen. Protecting face biometric data on [32] C.-C. Li and K. S. Fu. Machine-assisted pattern classification smartcard with reed-solomon code. In Proceedings of CVPR in medicine and biology. Annual Review of Biophysics and Workshop on Privacy Research In Vision, page 29, New Bioengineering, 9:393–436, 1980. York, USA, June 2006. [33] A. Lumini and L. Nanni. An improved biohashing for human [18] U. Fiege, A. Fiat, and A. Shamir. Zero knowledge proofs of authentication. Pattern Recognition, 40(3):1057–1065, 2007. identity. In STOC ’87: Proceedings of the nineteenth annual [34] D. Maio and D. Maltoni. FVC2004: third fingerprint ACM conference on Theory of computing, pages 210–217, verification competition. New York, NY, USA, 1987. ACM Press. http://bias.csr.unibo.it/fvc2004/, 2004. [19] M. Freire-Santos, J. Fierrez-Aguilar, and J. Ortega-Garcia. [35] P. Mihailescu. The fuzzy vault for fingerprints is vulnerable Cryptographic key generation using handwritten signature. to brute force attack. Technical report, University of In P. J. Flynn and S. Pankanti, editors, Proceedings of SPIE: Göttingen, 2007. Biometric Technology for Human Identification III, volume 6202, 2006. [36] M. K. Mihçak and R. Venkatesan. New iterative geometric methods for robust perceptual image hashing. In DRM ’01: [20] A. Georghiades, P. Belhumeur, and D. Kriegman. From few Revised Papers from the ACM CCS-8 Workshop on Security to many: Illumination cone models for face recognition and Privacy in Digital Rights Management, pages 13–21, under variable lighting and pose. IEEE Pattern Analysis and London, UK, 2002. Springer-Verlag. Machine Intelligence, 23(6):643–660, 2001. [37] X. min Tao, F. rong Liu, and T. xian Zhou. A novel approach [21] A. Goh and D. C. Ngo. Computation of cryptographic keys to intrusion detection based on SVD and SVM. Industrial from face biometrics. In Communications and Multimedia Electronics Society, 3(2–6):2028–2033, November 2004. Security, volume 2828 of LNCS, pages 1–13, 2003. [38] F. Monrose, M. K. Reiter, Q. Li, and S. Wetzel. [22] K.-S. Goh, E. Chang, and K.-T. Cheng. Support vector Cryptographic key generation from voice. In SP ’01: machine pairwise classifiers with error reduction for image Proceedings of the 2001 IEEE Symposium on Security and classification. In MULTIMEDIA ’01: Proceedings of the Privacy, page 202, Washington, DC, USA, 2001. IEEE 2001 ACM workshops on Multimedia, pages 32–37, New Computer Society. York, NY, USA, 2001. ACM Press. [39] F. Monrose, M. K. Reiter, and S. Wetzel. Password [23] G. H. Golub and C. F. V. Loan. Matrix Computations. Johns hardening based on keystroke dynamics. In CCS ’99: Hopkins University Press, Baltimore, Maryland, 1983. Proceedings of the 6th ACM conference on Computer and communications security, pages 73–82, New York, NY, [24] F. Hao, R. Anderson, and J. Daugman. Combining crypto USA, 1999. ACM Press. with biometrics effectively. IEEE Transactions on Computers, 55(9):1081–1088, 2006. [40] K. Nandakumar, A. K. Jain, and S. Pankanti. Fingerprint-based fuzzy vault: Implementation and [25] R. Housley, W. Polk, W. Ford, and D. Solo. Internet X.509 performance. In IEEE Transactions on Information Forensics public key infrastructure certificate and certificate revocation and Security, 2007 (To appear), 2007. list (CRL) profile, 2002. [41] D. C. Ngo, A. B. Teoh, and A. Goh. Biometric hash: [26] A. Juels and M. Wattenberg. A fuzzy commitment scheme. high-confidence face recognition. IEEE Transactions on In ACM Conference on Computer and Communications Circuits and Systems for Video Technology, 16(6):771–775, Security, pages 28–36, 1999. June 2006. [42] A. Pfitzmann and M. Köhntopp. Anonymity, unobservability, [27] A. Juels and M. Wattenberg. A fuzzy vault scheme. In and pseudonymity - a proposal for terminology. pages 1–9. Proceedings of IEEE International Symposium on 2001. Information Theory, 2002., 2002. [43] H. Proença and L. A. Alexandre. UBIRIS: a noisy iris image [28] H. Kang, B. Lee, H. Kim, D. Shin, and J. Kim. A study on database. In ICIAP 2005: International Conference on Image performance evaluation of fingerprint sensors. In Audio and Analysis and Processing, volume 1, pages 970–977, 2005. Video Based Biometric Person Authentication, pages 574–583, 2003. [44] H. Proença and L. A. Alexandre. Toward non-cooperative iris recognition: A classification approach using multiple [29] A. Kong, K.-H. Cheung, D. Zhang, M. Kamel, and J. You. signatures. IEEE Transactions on Pattern Analysis and An analysis of biohashing and its variants. Pattern Machine Intelligence, Special Issue on Biometrics, Recognition, 39(7):1359–1368, 2006. 9(4):607–612, July 2007. ISBN 0162-8828. [30] S. S. Kozat, R. Venkatesan, and M. K. Mihcak. Robust [45] A. Ross, A. K. Jain, and J.-Z. Qian. Information fusion in perceptual image hashing via matrix invariants. In biometrics. In Pattern Recognition Letters, volume 24, pages International Conference on Image Processing, pages V: 2115–2125, September 2003. 3443–3446, 2004. [46] F. Samaria and A. Harter. Parameterisation of a stochastic [31] C. Li, L. Khan, and B. Prabhakaran. Real-time classification model for human face identification. In IEEE Workshop on of variable length multi-attribute motions. Knowledge Applications of Computer Vision, Sarasota (Florida), Information Systems, 10(2):163–183, 2006. December 1994. 95 [47] A. Shamir. How to share a secret. Communications of the prove the hardness by analyzing the following equation which pro- ACM, 22(11):612–613, 1979. vides a possible approximation of the secondary images – [48] C. Soutar, D. Roberge, A. Stoianov, R. Gilroy, and B. V. Kumar. Biometric encryptionT M - enrollment and r % √ √ verification procedures. In SPIE 98: In Proceedings of J= λi ui viT = λ1 uJ vJT Optical Pattern Recognition IX, volume 3386, pages 24–35, i=1 1998. √ √ √ [49] C. Soutar, D. Roberge, A. Stoianov, R. Gilroy, and B. V. + λ2 u2 v2T + λ3 u3 v3T + . . . + λr ur vrT & '( ) Kumar. Biometric encryptionT M using image processing. In SPIE 98: In Proceedings of Optical Security and Counterfeit where r = 2p; p is the number of sub-images created; and λi , Deterrence Techniques II, volume 3314, pages 178–188, 1 ≤ i ≤ r are non-zero eigen values of the matrix J T J such that 1998. λ1 > λ2 > . . . >λ r . Note that J T is the transpose matrix of J and a positive square root of λi is a singular value. The ui ’s and [50] C. Soutar and G. J. Tomko. Secure private key generation vi ’s, i = [1, . . . , r], are eigenvectors of JJ T and J T J respectively. using a fingerprint. In Proceedings of Cardtech/Securetech Since the final hash value, [uJ ,vJ ] are known to the adversary, the Conference, volume 1, pages 245–252, May 1996. values which need to be guessed are λ1 and {λ2 u1 v1T + λ3 u2 v2T + . . . + λr ur vrT }. To guess λi ’s there are infinitely many solutions [51] U. Uludag and A. Jain. Securing fingerprint template: Fuzzy as any nonnegative eigenvalues can lead to specific eigenvectors vault with helper data. In CVPRW ’06: Proceedings of the that are unitary (i.e. satisfy the definition). Any eigenvalue matrix 2006 Conference on Computer Vision and Pattern resulting from this construction will give a solution to the equation Recognition Workshop, page 163, Washington, DC, USA, and therefore it is computationally hard for the adversary to identify 2006. IEEE Computer Society. the original value. [52] U. Uludag, S. Pankanti, S. Prabhakar, and A. Jain. Biometric cryptosystems: Issues and challenges. In Proceedings of the If there is a case in which λ1 is dominant such that the rest of the IEEE, Special Issue on Enabling Security Technologies for values λ2 , . . . , λr are approximately equal to zero, then one could Digital Rights Management,,2004., volume 92, 2004. try to√guess λ1 and possibly approximate the secondary image by J˙ = λ1 uJ vJT . It is not trivial to theoretically predict the possi- [53] V. N. Vapnik. The nature of statistical learning theory. ble distribution of the values of λi ’s because they are dependent on Springer-Verlag New York, Inc., New York, NY, USA, 1995. the type of image and the distribution of the pixel values of those images. Therefore we conducted experimental evaluation on the [54] S. Wang and Y. Wang. Fingerprint enhancement in the biometric images and found that the λi ’s are distributed such that singular point area. IEEE Signal Processing Letters, there is no one dominant eigenvalue because the secondary image 11(1):16–19, January 2004. J is a smooth image (i.e. the adjacent pixels of the image do not differ beyond a certain threshold which is determined by the algo- [55] Y. Wang, Y. Sun, M. Liu, P. Lv, and T. Wu. Automatic rithm parameters). We conclude that because of the hardness of inspection of small component on loaded PCB based on guessing the eigenvalues and the lack of dominant eigenvalues the SVD and SVM. In Mathematics of Data/Image Pattern reconstruction of the secondary image J from the resultant hash Recognition, Compression, and Encryption with Applications → − IX., volume 6315 of Society of Photo-Optical vector H is computationally hard for the biometric types consid- Instrumentation Engineers (SPIE) Conference, September ered. ! 2006. [56] J. Woo, A. Bhargav-Spantzel, A. Squicciarini, and Proof Sketch. [Theorem 2] E. Bertino. Verification of receipts from m-commerce transactions on nfc cellular. In 10th IEEE Conference on If J is known to the adversary, then the first step would be to form E-Commerce Technology (CEC 08), July 2008. each sub-image matrix Ai , where 1 ≤ i ≤ p. Note that a combi- nation of all Ai eigenvectors were used to construct J. Each Ai is [57] S. Yang and I. Verbauwhede. Automatic secure fingerprint of the form Ai = Ui Si ViT . As in the proof of Theorem 1, an infi- verification system based on fuzzy vault scheme. In ICASSP nite number of eigenvalues exist for constructing infinite Ai which ’05: Proceedings of the Acoustics, Speech, and Signal would satisfy the relation. Moreover, using the same reasoning Processing, volume 5, pages 609–612, Philadelphia, USA, as before, there are no dominant eigenvalues as the p sub-images March 2005. each of size m × m are overlapping. Because of the overlap most significant eigenvalues do not differ beyond a certain threshold as [58] W. Zhang, Y.-J. Chang, and T. Chen. Optimal thresholding determined by the algorithm parameters p and m. In addition the for key generation based on biometrics. In ICIP ’04: largest eigenvectors (i.e. the left most and the right most vectors International Conference on Image Processing, pages of the Ui and Vi matrices respectively) of each sub-image Ai are 3451–3454, 2004. pseudorandomly combined to form J resulting in the number of choices the attacker would need to try as p!. This motivates the APPENDIX need for large values of p (∼ 50). As a result guessing the order Proof.[Theorem 1] of each sub-image Ai and hence creating the original image I is computationally hard. If only the final hash value is known to an adversary, then the first step is to approximate the secondary image J (See Figure 2). We 96 Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Biometrics Based Identifiers for Digital Identity Management Abhilasha Bhargav-Spantzel, Anna Squicciarini, Elisa Bertino, Xiangwei Kong, Weike Zhang IDTrust 2010 April 14th 2010 Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Outline 1 Identity Concepts Overview 2 Biometric Systems Overview 3 Biometric Commitments 4 Backup Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity Digital identity: nyms. identity attributes or identifiers: strong identifiers (eg. SSN) weak identifiers (eg. age) Owner of an identity attribute: Individual who is issued the identity attribute authoritative of making the claim Identity verification: Claimed attribute is owned by the individual valid Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity Digital identity: nyms. identity attributes or identifiers: strong identifiers (eg. SSN) weak identifiers (eg. age) Owner of an identity attribute: Individual who is issued the identity attribute authoritative of making the claim Identity verification: Claimed attribute is owned by the individual valid Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity Digital identity: nyms. identity attributes or identifiers: strong identifiers (eg. SSN) weak identifiers (eg. age) Owner of an identity attribute: Individual who is issued the identity attribute authoritative of making the claim Identity verification: Claimed attribute is owned by the individual valid Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity (cont.) Identity assurance and linkability Identity assurance: Confidence about ownership validity Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity (cont.) Identity assurance and linkability Identity assurance: Confidence about ownership validity Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Digital Identity (cont.) Identity assurance and linkability Identity assurance: Confidence about ownership validity Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Outline 1 Identity Concepts Overview 2 Biometric Systems Overview 3 Biometric Commitments Our Approach Main Techniques Experiments and Results Analysis Related Work 4 Backup Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Biometric Matching Based Systems Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Biometric Keys: General Idea Generating cryptographic keys from biometric measurements: Phase 1: Biometric features → bit string Bit string should have large inter-class variation and small intra-class variation Phase 2: metadata Bit string −−−−−→ unique key If two instances of bit strings are ‘similar’ then the key generated is the same Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Biometric Keys: General Idea Generating cryptographic keys from biometric measurements: Phase 1: Biometric features → bit string Bit string should have large inter-class variation and small intra-class variation Phase 2: metadata Bit string −−−−−→ unique key If two instances of bit strings are ‘similar’ then the key generated is the same Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Biometric Keys: General Idea Generating cryptographic keys from biometric measurements: Phase 1: Biometric features → bit string Bit string should have large inter-class variation and small intra-class variation Phase 2: metadata Bit string −−−−−→ unique key If two instances of bit strings are ‘similar’ then the key generated is the same Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Two main phases of the biometric key generation Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Two main phases of the biometric key generation Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Hashing Process Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Key Steps of Biometric Hashing Algorithm 1 Random selection of Ai from biometric image 2 First SVD transform: Ai = Ui Si ViT 1 ≤ i ≤ p 3 Random selection of eigenvectors to create secondary image J 4 Second SVD transform: J = UJ SJ VJT − → − → − → 5 Final hash vector: H = {uJ , vJ } Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work SVM Classification SVM Usage The hash vectors are ranked based on confidence degrees from SVM Biometric key: highest confidence class and top n/2 classes (total n classes) Attacker choices for brute force n + nt  For n > 69 number of choices is > 264 Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work SVM Classification SVM Usage The hash vectors are ranked based on confidence degrees from SVM Biometric key: highest confidence class and top n/2 classes (total n classes) Attacker choices for brute force n + nt  For n > 69 number of choices is > 264 Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Experimental Samples Iris sample [1695 Thermal (left) and optical (right) sensor images - UBIRIS] fingerprint samples [324 images - FVC] Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Experimental Samples (cont.) Yale face samples [100 images] AT&T face samples [400 images] Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Summary of Experimental Results Type # Images # Persons CV Accu- FAR % racy % Fingerprint 204 39 94.96 5.09 ×10−03 (Thermal) Fingerprint 120 20 85.83 7.46 ×10−03 (Optical) Iris 1695 339 92.69 3.80 ×10−04 Face Yale 100 10 99 1.11 ×10−03 Face 400 40 98.25 4.49 ×10−04 AT&T Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Uniqueness and Repeatability Analysis The metric to measure uniqueness and repeatability is |Sm | J2 = |Sw | where Sm is inter-class distance and Sw is intra-class distance The average values of J2 calculated were as follows– Fingerprint : 1.2712 × 1081 Iris : 1.5242 × 10303 Face : 3.7389103 Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Key Analysis Type n Spurious classes η # of BK bits Fingerprint 69 - 2.84 × 1019 64 Fingerprint 139 69+1 2.36 × 1040 134 Iris 220 - 4.52 × 1064 214 Iris 119 - 2.43 × 1034 114 Face 101 50+1 1.01 × 1029 96 Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Verification System Analysis Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Verification System Analysis Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Verification System Analysis Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Verification System Analysis Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Biometric Verification System Analysis Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Fuzzy Vault Scheme Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Fuzzy Vault Scheme (cont.) Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Fuzzy Vault Scheme - Shortcomings Intra-class variability: rotation, translation, # minutia points ‘helper data’ reduces security Increasing the degree of the polynomial increases complexity require increased number of minutiae points Increasing the number of chaff points increases the complexity empirical bound because of minutiae location Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Fuzzy Vault Scheme - Shortcomings Intra-class variability: rotation, translation, # minutia points ‘helper data’ reduces security Increasing the degree of the polynomial increases complexity require increased number of minutiae points Increasing the number of chaff points increases the complexity empirical bound because of minutiae location Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Fuzzy Vault Scheme - Shortcomings Intra-class variability: rotation, translation, # minutia points ‘helper data’ reduces security Increasing the degree of the polynomial increases complexity require increased number of minutiae points Increasing the number of chaff points increases the complexity empirical bound because of minutiae location Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work BioHashing Goh et al. perform bio-hashing Based on principal component analysis (PCA) Focus only on the first phase of key generation. Our approach couples phase-one and phase-two of key generation analyzes inter and intra-class variations analyzes security and privacy of the biometric verification system Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Our Approach Identity Concepts Overview Main Techniques Biometric Systems Overview Experiments and Results Biometric Commitments Analysis Backup Related Work Thank you! Abhilasha Bhargav-Spantzel Intel Corporation email: abhilasha.bhargav-spantzel@intel.com Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Tools used Singular Value Decomposition (SVD) If A is a real m-by-n matrix, the two orthogonal matrices exist: U = [u1 , . . . , um ] ∈ Rm×m and V = [v1 , . . . , vn ] ∈ Rn×n such that UAV T = diag(σ1 , . . . , σp ) ∈ Rm×n p = min{m, n} where V T is the transpose of matrix V and σ1 ≥ σ2 ≥ . . . ≥ σp ≥ 0. σi ’s are the singular values of A and the vectors ui and vi are the ith left singular vector and the ith right singular vector respectively. Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Tools used (cont.) Support Vector Machines (SVM) SVM is a classifier based on statistical learning technique developed by Vapnik et al. It aims at finding optimal hyperplanes to determine the boundaries with the maximal margin separation between every two classes. SVM applies to classification of vectors, or uni-attribute time series. To classify multi-attribute biometric image data, which are matrices rather than vectors, the multi-attribute data are transformed into uni-attribute data or vectors using SVD. Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Tools used (cont.) Support Vector Machines (SVM) SVM is a classifier based on statistical learning technique developed by Vapnik et al. It aims at finding optimal hyperplanes to determine the boundaries with the maximal margin separation between every two classes. SVM applies to classification of vectors, or uni-attribute time series. To classify multi-attribute biometric image data, which are matrices rather than vectors, the multi-attribute data are transformed into uni-attribute data or vectors using SVD. Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Key Steps of Biometric Hashing Algorithm 1: Input biometric image I 2: for each random Ai where 1 ≤ i ≤ p do 3: Ai = Ui Si ViT {First SVD Transform} {Collect singular vectors corresponding to the largest singular value} → − 4: ui = first left singular vector → − 5: vi = first right singular vector 6: end for → − → − → − → − 7: Γ = {u1 , . . . , up , v1 , . . . , vp } 8: Randomly create J[m, 2p] from Γ {Second SVD Transform} 9: J = UJ SJ VJT {Collect singular vectors corresponding to the largest singular value} → − 10: uJ = first left singular vector → − 11: vJ = first right singular vector → − → − → − 12: H = {uJ , vJ } Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Identity Concepts Overview Biometric Systems Overview Biometric Commitments Backup Fuzzy Vault Scheme - Shortcomings Attacks on Fuzzy Vault In August 2007, Preda Mihailescu presented a brute force attack in three known implementations of the vault for fingerprints. The vulnerability cannot be avoided by mere parameter selection in the actual frame of the procedure. Abhilasha Bhargav-Spantzel Biometrics Based Identifiers for Digital Identity Management Thoughts on Personal Identity Platforms William I. MacGregor IDTrust 2010 1 1 Foreword This is a thought experiment… ...to show feasibility… ...and is doubtless reinvention. 2 2 National Strategy for Secure Online Transactions “To improve trustworthiness and security of online transactions by … interoperable trust frameworks and … improved authentication technology and processes … across federal, civil, and private sectors.” - SecureIDNews, 1Apr2010, by Zack Martin • Protect Privacy: secure PII & transaction data • Defeat Fraud: reduce losses & improve recovery • Promote Confidence: increase trust in online transactions 3 3 Three Questions 1. Could leakage of subject authenticators be prevented? 2. What are the characteristics of a solution to Question 1? 3. Does strong attribute assurance require strong identity assurance? 4 4 Personal Identity Platform An answer to Question 1 Subject Authenticators SUBJECT PLATFORM Secure Online AUTHENTICATION AUTHENTICATION Transactions Crypto V1: Credential 1 Transaction 1 Authentication Subject • • Authentication PIN, Password, • • Vector Passphrase, etc. • • VN: Credential N Transaction 1 Biometrics The subject trusts the PIP to present only the selected credential; the relying party trusts the PIP to perform subject authentication first. 5 5 Characteristics of PIP An answer to Question 2 • The PIP is a trust intermediary between the subject and relying party • Only the Subject Authentication Vector is known to Credentials • Credentials belong to the subject because they reside on the subject’s PIP • “Platform authentication” is also “SAML generation” or “session key agreement” 6 6 Requirements for a PIP Another answer to Question 2 • The PIP must be available to, and controlled by, the subject • The PIP must be a competent computing device or system – HIDs, biometrics, crypto, comm, clock, etc. • The PIP must be coupled into the subject’s transaction stream What have I left out? 7 7 Strong Attribute Assurance An answer to Question 3 Attribute Provider 2 Relying S((Age>=21, Bio, H(KDH))S?, FPN-Subject) Party 1 E((Age>=21)?, KDH) 3 S((Age>=21, Bio, H(KDH)), FPN-AP) 4 E(S((Age>=21, Bio, H(KDH)), FPN-AP), KDH) Subject (PIP) 8 8 The Result The answer to Question 3: No • The PIP claims that FPN-Subject is bio authenticated, and the PIP in session H(KDH) • The AP claims that subject Age>=21 is bio authenticated, for PIP in session H(KDH) • The RP trusts the PIP and AP, so believes the authenticated subject has Age>=21 • The AP does not learn the RP; the RP does not learn any static subject identifier 9 9 About Attributes • Why have Attribute Providers and Identity Providers? – Go to the source—IDPs aren’t all sources • Why have dynamic attributes? – Attributes change—shouldn’t be in static credentials • Examples – Conditions of probation – Permit to carry – EMT certification 10 10 Thanks for listening! Useful references U-Prove https://connect.microsoft.com/content/content.aspx? contentid=12505&siteid=642 Selective attribute delivery designed to meet privacy objectives. ISO/IEC 24727 http://csrc.nist.gov/publications/nistir/ir7611/nistir7611_us e-of-isoiec24727.pdf Standard for construction of platforms like PIP. SASSO http://www.projectliberty.org/liberty/content/download/3960 /26523/file/NTT-SASSO%20liberty%20case%20study.pdf Implementation of a federated IDP provider in a USIM smart card in a mobile phone. 11 11 Practical and Secure Trust Anchor Management and Usage Carl Wallace Geoff Beier Cygnacom Solutions Cygnacom Solutions 7925 Jones Branch Drive Suite 5200 7925 Jones Branch Drive Suite 5200 McLean, VA 22102 McLean, VA 22102 cwallace@cygnacom.com gbeier@cygnacom.com ABSTRACT “A Trust Anchor is a public key and associated data used by a Public Key Infrastructure (PKI) security depends upon secure relying party to validate a signature on a signed object where the management and usage of trust anchors. Unfortunately, widely object is either: used mechanisms, management models and usage practices related to trust anchors undermine security and impede flexibility.  a public key certificate that begins a certification In this paper, we identify problems with existing mechanisms, path terminated by a signature certificate or encryption discuss emerging standards and describe a solution that integrates certificate with some widely used applications.  an object, other than a public key certificate or certificate revocation list (CRL), that cannot be validated via use of a certification path.” Categories and Subject Descriptors K.6.5 [Management of Computing and Information Systems]: Trust Anchor Management Requirements [6] also provides a Security and Protection - authentication. definition for a trust anchor store: General Terms “A trust anchor store is a set of one or more trust anchors stored Security in a device. A trust anchor store may be managed by one or more trust anchor managers. A device may have more than one trust anchor store, each of which may be used by one or more Keywords applications.” Trust anchor management, public key infrastructure (PKI). In current practice, a trust anchor is a (typically self-signed) 1. INTRODUCTION certificate that resides in a trust anchor store. Despite their Trust anchors (TAs) are used for a variety of purposes. For importance, trust anchor stores are usually managed, to a large example, trust anchors are used when a web browser authenticates extent, by software vendors. Trust anchor store users have few or a web server, when an email client verifies a signature on an email no enforceable constraints available to limit the extent of trust message or prepares an encrypted email message and when a accorded to the trust anchors in the trust anchor store or to the domain controller authenticates a user logging in with a smart software vendor managing the trust anchor store. card. In short, a TA is used whenever a PKI is securely used. Trust Anchor Management Requirements [6] provides the This paper briefly describes current trust anchor management following definition for a TA: tools and practices, identifies some problems with the status quo and describes an implementation that provides alternative trust anchor management mechanisms for applications that use the Microsoft Crypto API (CAPI) certification path processing interfaces. 2. Current Trust Anchor Management and Usage In most common scenarios, trust anchors are distributed and Permission to make digital or hard copies of all or part of this work for managed by operating system and application vendors. TA stores personal or classroom use is granted without fee provided that copies are are initialized during software installation and are often are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy changed by software updates. Proprietary operating system- otherwise, or republish, to post on servers or to redistribute to lists, specific or application-specific tools are used to customize trust requires prior specific permission and/or a fee. anchor store contents. These actions may be undone, however, by IDtrust „10, April 13–15, 2010, Gaithersburg, Maryland, U.S.A. automated trust anchor store updates or routine software updates. Copyright© 2010 ACM 978-1-60558-895-7/10/04…$10.00. 97 Synchronization of trust anchor stores from different vendors (or to in the user interface as properties or purposes. Constraints are even the same vendor) requires manual steps using proprietary configured using a dialog like the one shown below in Figure 2. tools. Comparison of trust anchor store contents is a similarly The constraint options are very similar to extended key usage manual affair. values, with a difference being that extended key usage extensions are not processed across a certification path but the constraints configured here appear to be. On Windows Vista SP 2 systems, Most operating systems and applications use certificates to there are 38 purposes available for selection. When a trust represent trust anchor information. In some cases, a collection of anchor is manually installed, all purposes are enabled by default. trust anchors may be represented using a “certificates only” Cryptographic Message Syntax (CMS) SignedData message. Some applications may require distinguished encoding rules (DER) encoded certificates or privacy enhanced mail (PEM) encoded certificates, but this is a fairly minor problem as conversion tools are readily available. The following sections provide an overview of some widely used mechanisms and discuss the primary problems with these mechanisms. 2.1 Overview of selected current mechanisms 2.1.1 Microsoft Windows Many applications that operate on Microsoft Windows platforms use the trust anchor stores built into the operating system. A variety of interfaces are available for adding trust anchors to a trust anchor store, including the following:  Right-clicking a certificate file, choosing “Install Certificate” from the resulting menu and selecting a trust anchor store destination,  Installing a certificate into a trust anchor store using the Microsoft Management Console (MMC),  Installing a certificate into a trust anchor store using an application-provided interface, such as Internet Explorer (IE),  Installing a certificate into a trust anchor store using group policy or System Center Configuration Manager (SCCM). Figure 2 Microsoft trust anchor constraints dialog The MMC interface to the trust anchor store is shown below. In addition to manual trust anchor installation, Windows provides automatic trust anchor store update mechanisms, with different versions of Windows providing somewhat different capabilities. When these features are enabled, a trust anchor may be automatically installed with no visual cue provided to the operator, for example, when a certificate file subordinate to that trust anchor is simply inspected using the Windows certificate viewer a corresponding trust anchor may be downloaded and installed. Trust anchors installed automatically do not necessarily have all purposes enabled. Trust anchor stores are maintained in the system registry. Trust anchors are imported and exported as certificates. The certificates are stored in the registry along with property information. When Figure 1 MMC view of a trust anchor store trust anchors are exported, the user-configured constraints are not conveyed along with the exported certificates. Some options, such as MMC and Internet Explorer, allow for the specification of certain trust anchor constraints, which are referred 98 2.1.2 Firefox Firefox does not use Microsoft Windows trust anchor stores. Trust anchors are added to the Firefox trust anchor store using the Certificate Manager dialog shown below in Figure 3. This dialog is accessed by invoking the Tools->Options menu and selection the Encryption tab from the Advanced options. Figure 5 Mac OS X version 10.6 trust anchor store Trust anchor information, including usage constraints, can be viewed by right-clicking a trust anchor in the Keychain Access application and choosing Get Info. Nine properties are available. As with Microsoft Windows and Firefox, the properties are very similar to values typically expressed via an extended key usage Figure 3 Firefox trust anchor store extension. Trust anchor constraints can be configured by clicking the Edit button and selecting the desired properties in a dialog like the one shown below, which allows three properties to be enabled. As with Microsoft Windows, the properties are similar to values that are typically expressed via an extended key usage extension. Figure 4 Firefox trust anchor constraints Firefox trust anchors are maintained in a database that resides in the profile of a Firefox user. Trust anchors are imported and exported as certificates. 2.1.3 Mac OS X Figure 6 Mac OS X version 10.6 trust anchor constraints Mac OS X maintains trust anchors in the key chain. Trust anchors are added to the trust anchor by invoking the Keychain As with Microsoft Windows and Mozilla trust anchor stores, trust Access application, as shown below. anchors are exported as files containing X.509 certificates, and no user-specified constraints are conveyed along with these certificate files. 2.2 Primary problems with current mechanisms This paper does not aim to catalog problems with existing trust anchor management mechanisms. However, this section discusses 99 some problems in the areas of trust anchor store management and Bridge CA (FBCA). Each CA that has issued a cross-certificate trust anchor constraints enforcement. to the FBCA creates a large number of potential certification paths that traverse that cross-certificate. Some enterprises, such as the Department of Defense, have adopted an approach to cross- 2.2.1 Trust anchor store management certifying with the FBCA that allows application owners to opt Management of trust anchor stores requires usage of proprietary out of the cross-certification by recognizing alternative trust tools. Where necessary, system administrators must take care to anchors that are not connected to the FBCA. A problem arises synchronize the contents of multiple trust anchor stores. This when entities who have “opted out” need to establish a trust requires configuration of trust anchor constraints as well as relationship with another CA that is cross-certified with the ensuring trust anchors are installed in (or removed from) the FBCA. Simply recognizing the CA as a trust anchor will establish necessary trust anchor stores. the trust relationship but causes the entire FBCA community to be recognized as well. This could be avoided if it were possible to constrain a trust anchor using similar mechanisms as those used in Maintenance of trust anchor store contents is complicated by the cross-certificates. fact that software updates frequently adjust trust anchor store contents (sometimes undoing changes made by the system administrator). Automatic trust anchor update mechanisms can 3. Next Generation Specifications The Internet Engineering Task Force (IETF) is presently working create de facto trust anchor stores that contain more trust anchors on several specifications related to trust anchor management and than are visible to administrators using the available tools. usage, including: Trust Anchor Management Protocol (TAMP) [4], Trust Anchor Format (TAF) [3], CMS Content Constraints Trust anchors do not offer any integrity protection or “in-band” (CCC) [2], Using Trust Anchor Constraints during Certification security mechanisms. Confirmation that the correct trust anchor is Path Processing (UTAC) [5]. These specifications provide being installed typically requires manual checks. complementary features, but subsets of features can be implemented where the full feature set is not required. 2.2.2 Constraint representation The following subsection briefly introduce each of these As shown in Section 2.1, different trust anchor stores enable the specifications, which were used in the implementation described usage of different, non-standard trust anchor constraints. These in Section 4. constraints are stored using a proprietary format. When trust anchors are exported from the trust anchor store the constraint information is lost. 3.1 Trust Anchor Format TAF [3] provides syntax for representing trust anchors. The primary structure is TrustAnchorChoice: The certification path validation algorithm described in RFC 5280 TrustAnchorChoice ::= CHOICE { [1] only makes use of the public key and name of a trust anchor. cert Certificate, Implementations are free to perform processing beyond that tbsCert [1] EXPLICIT TBSCertificate, required by RFC 5280 [1], such as to impose name constraints or taInfo [2] EXPLICIT TrustAnchorInfo } certificate policy requirements on a trust anchor. However, there is no standardized process for doing so. This lack of This structure provides support for existing trust anchors standardization has resulted in inconsistent means of specifying represented as certificates and provides two mechanisms that constraints and poor interoperability. Complicating matters is the allow relying parties to customize the definition of a trust anchor: fact that trust anchors are almost always represented as TBSCertificate and TrustAnchorInfo. Using the TBSCertificate certificates. Though the signature on the trust anchor’s certificate option, the signature is simply removed from a Certificate provides little security value, it interferes the editing of certificate structure allowing the contents to be edited. Using contents. TrustAnchorInfo, a Certificate can be wrapped, with additional or alternative constraints defined in the wrapper or a name and public key can be used with or without additional information. 2.2.3 Constraint enforcement Enterprise PKI operators use cross-certificates to establish trust between enterprises and employ a variety of constraints, i.e., 3.2 Trust Anchor Management Protocol extensions, to limit the degree of trust accorded to the cross- TAMP [4] defines eleven message formats and a set of processing certified PKI. However, cross-certificates are not always a viable rules that can be used to manage trust anchor store contents. Each option. In some cases, however, a trust relationship may only be of these message formats, or content types, can be encapsulated appropriate for a small subset of subscribers to an Enterprise PKI. using a CMS SignedData structure to provide source In these cases, directly trusting a trust anchor is an alternative. authentication and message integrity. The eleven messages Unfortunately, existing trust anchor constraint mechanisms do not consist of five request/response pairs and a generic error message: provide a set of constraint options comparable to those available  TAMPStatusRequest when using a cross-certificate, making direct trust difficult to use.  TAMPStatusResponse  TAMPUpdate For an example of problems caused by lack of trust anchor  TAMPUpdateConfirm constraints, consider the community surrounding the Federal 100  TAMPApexUpdate 3.2.4 Managing TAMP community membership  TAMPApexUpdateConfirm TAMP messages can be created such that all TA stores that  TAMPCommunityUpdate recognize the TA store manager will accept the message, a group of TA stores will accept the message or a specific TA store will  TAMPCommunityUpdateConfirm accept the message. Community identifiers are one means for  SequenceNumberAdjust addressing a group of trust anchor stores. TAMP-enabled trust  SequenceNumberAdjustConfirm anchor stores should have the ability to store a list of community  TAMPError identifiers. TA store managers can use these identifiers to create arbitrary groups of trust anchor stores for future management purposes. 3.2.1 Reviewing TA store contents TAMPStatusResponse messages provide a means of representing TAMPCommunityUpdate messages are used to add or remove trust anchor store contents. As with most TAMP community identifiers from a trust anchor store. response/confirm messages, the message can be either verbose or TAMPCommunityUpdateConfirm is used to report the results of terse. A verbose TAMPStatusResponse message provides a processing a TAMPCommunityUpdate message. comprehensive set of information regarding a trust anchor store, including a list of all trust anchors, an indication of which TA is 3.2.5 Managing TAMP sequence numbers the apex trust anchor (if any) and information on TAMP sequence TAMP uses sequence numbers to detect attempts to process old numbers and TAMP communities. A terse TAMPStatusResponse TAMP messages. Each TAMP-enabled trust anchor store provides only trust anchor key ids along with communities of maintains a sequence number for each trust anchor authorized for which the store is a member. A TAMPStatusRequest simply asks TAMP (and may maintain a sequence number for certificate a trust anchor store to provide its contents in the requested holders who have been authorized for TAMP). A message format, i.e., verbose or terse. Use of SequenceNumberAdjust message can be used to convey the TAMPStatusRequest and TAMPStatusResponse can reduce current sequence number to a trust anchor store to reduce the reliance on proprietary tools for TA store management and likelihood of replay. A SequenceNumberConfirm message is simplify comparison of TA store contents. used to indicate the results of processing the SequenceNumberAdjust message. 3.2.2 Editing TA store contents TAMPUpdate messages allow new trust anchors to be added to a 3.3 CMS Content Constraints trust anchor store, existing trust anchors to be changed or existing A basic problem for any trust anchor management protocol is trust anchors to be removed. Each TAMPUpdate message authorization of management operations. Certification authorities contains a set of one or more commands (i.e., add, change, are authorized to issue cross-certificates using constraints remove). Since TAMPUpdate messages are signed, in-band expressed as certificate extensions, e.g., basicConstraints, integrity and source authentication checking is enabled. certificatePolicies, etc. CCC [2] defines an authorization mechanism that can be used with TAMP. 3.2.2.1 Subordination rules TAMP defines a strict set of subordination rules that apply when a TAMPUpdate message is processed. These rules allow limits to CCC is a generic mechanism for authorizing public key certificate be placed on TA store managers. These rules could be used to holders to originate specific types of information protected using place constraints on automated updates, such as to ensure an the Cryptographic Message Syntax (CMS). A set of content types undesirable trust anchor is not restored after it has been removed is expressed in the CCC extension. When a CMS-protected by a local management action, or to ensure that a trust anchor message is processed, the originator is authenticated and the CCC rekey operation does not exceed locally-imposed constraints on extension associated with the originator is inspected to ensure the the old key. given content type is permitted. 3.2.3 Replacing the Apex TA TAMP [4] introduces the concept of the Apex TA, which is For TAMP, this mechanism can be used to authorize some entities defined as being the single trust anchor within a trust anchor store to manage trust anchor stores and others to review the contents of that is superior to all other trust anchors. This concept is trust anchor stores while leaving other entities with no privileges primarily used as a disaster recovery technique. Essentially, a at all. To authorize an entity to manage trust anchor stores, trust anchor store is created with a single Apex TA in place. include, in either the entity’s certificate or trust anchor, a CCC Authority over various management operations is then delegated extension with the TAMPUpdate, CommunityUpdate, to other trust anchors that are added to the trust anchor store or to SequenceNumberAdjust and TAMPStatusQuery content types certificate holders. Management operations are conducted by the permitted. To authorize an entity to review the contents of trust delegates with the Apex TA private key maintained in secure anchor stores, include a CCC extension in the entity’s trust anchor storage. As an extra safeguard, a contingency public key can be or certificate with the TAMPStatusQuery content type permitted. included in the definition of the Apex TA. The contingency public key corresponds to a private key that is intended to be used once to replace the Apex TA in the event of loss or compromise of the operational Apex TA private key. 101 3.4 Using Trust Anchor Constraints during The provider can cause a path validation operation to fail by indicating the given certificate is revoked. Certification Path Processing UTAC [5] augments the certification path processing algorithm specified in RFC 5280 [1] by describing how to use constraints This approach was not implemented for two reasons. First, the contained in a trust anchor during certification path processing. interface is invoked for each certificate in a path, not for an entire Essentially, the constraints contained in a trust anchor are certification path. This means the provider would need to intersected with those provided by a user. The results of this maintain state across multiple invocations in order to get a view of intersection are used as the inputs to the RFC 5280 [1] the entire path. Second, while this could effectively cause a certification path validation algorithm. This allows a trust anchor certification path that violates trust anchor constraints to fail, the store manager (i.e., an enterprise) to establish a minimum set of error indicated by the provider creates a misimpression that a restrictions on the usage of a trust anchor without removing the certificate is revoked. This kind of misreported failure leads to a ability of an application (i.e., a user) to provide inputs to the path poor user experience in the desktop applications that are targeted validation algorithm. in this effort. UTAC [5] describes rules for using constraints in a 4.2 Validation Policy Provider TrustAnchorInfo wrapper relative to constraints resident in a Next, the validation policy provider interface was explored. This certificate that is wrapped, i.e., the wrapper takes precedence. interface is not as comprehensively documented and less widely UTAC processing can be integrated directly into an RFC 5280 used than the revocation status provider interface. Like the path validation implementation or as pre or post processing. revocation status provider interface, a validation policy provider is registered with the operating system and loaded by CAPI during certification path processing. Unlike the revocation status 4. Integrating Trust Anchor Management provider interface, providers do not failover from one to another. with CAPI Providers can be registered for a specific validation policy. The goal of the implementation effort described in this paper was However, the processing performed by default policy providers is to enable the usage of emerging trust anchor management not documented and replacing the default providers is not specifications with commonly deployed commercial off-the-shelf recommended. No way could be found to invoke the default (COTS) products which have been public key-enabled using providers from a third party provider. Microsoft Crypto API (CAPI). This integration aims to enforce constraints associated with a trust anchor. To achieve this, the We implemented policy providers for several of the default software must be able to influence the outcome of a certification policies but abandoned the effort due to inconsistent invocation of path validation operation performed by CAPI. the installed replacement policy provider. For example, within Microsoft Outlook, the replacement policy provider was invoked Since there is no publicly documented set of APIs intended for when no certification path was found for a message signer but not this purpose, existing APIs intended for other purposes were when a certification path was found. evaluated to determine suitability for integration of trust anchor management functionality. The following interfaces were 4.3 Certificate Store Provider analyzed: revocation status provider, validation policy provider While performing the analysis of the validation policy provider and certificate store provider. API, we used code interception to inspect and log parameter values. After discarding the revocation status provider and 4.1 Revocation Status Provider validation policy provider efforts, we focused on finding a means The initial approach that was considered was to use the revocation of using code interception as the basis for performing the status provider interface. Revocation status providers are integration. This required identifying opportunities where code typically used to provide support for Online Certificate Status could be loaded prior to the CertGetCertificateChain API and Protocol (OCSP). A revocation status provider is a dynamic link unloaded afterwards, enabling the CertGetCertificateChain to be library (DLL) that implements the CertVerifyRevocation API. intercepted. For most applications, the certificate store API The provider is registered with the operating system. The provides such an opportunity. registration information consists of the full path and filename of the revocation status provider and is stored in a registry key We implemented a certificate store provider that is registered with containing a list of string values. The list of providers can be the operating system as a CA store provider in the ordered according to system administrator preference. Providers HKEY_LOCAL_MACHINE registry hive. When the certificate are invoked in turn until a one is found that can provide store is loaded, hooks are created for the CertGetCertificateChain revocation status information for the certificate in question. API. No certificate store functionality is actually provided. When an application validates a certification path, the provider is To limit the scope of the provider, configuration information can loaded by CAPI and invoked once for the end entity certificate be saved on a per-application basis. When an application that and each intermediate CA certificate contained in a certification does not require the trust anchor management services path validated by CertGetCertificateChain or WinVerifyTrust. implemented by the provider loads it, no hooks are set. The 102 certificate store provider is loaded into memory but performs no possesses an authorized signing key. The primary interface to code interception. manage trust anchors using Store Manager is shown below. A side benefit of this integration approach is the ability to fully replace CAPI certification path processing instead of simply enforcing trust anchor constraints following discovery of a certification path. This enables the usage of the Server-based Certificate Validation Protocol (SCVP) or alternative local certification path processing engines for both path discovery and validation. Though the software described below supports this option, it is not discussed further in this paper. Nor are issues associated exclusively with the provision of SCVP support. As noted above, integration via the certificate store API proved workable for most applications that were tested but not all. For Internet Explorer, it was necessary to build a browser add-on that Figure 7 Store Manager trust anchor list causes the certificate store to be loaded before the browser can be used to access SSL/TLS-protected websites. The browser add-on simply forces CertOpenStore to be called by validating a path to a Using Store Manager, trust anchors can be added to a TA store, hard-coded trust anchor, which was selected from the list of removed from a TA store or edited. When a trust anchor is added, required trust anchors defined in Microsoft knowledge base article its format can be changed from certificate to TBSCertificate or number 293781. TrustAnchorInfo, enabling the expression or alteration of constraints. 5. CAPI Trust Anchor Guard (CAPI TAG) CAPI Trust Anchor Guard (CAPI TAG) is a set of software tools that enable management of a local or remote trust anchor store Trust anchor constraints are edited using dialogs provided with using TAMP and enforcement of trust anchor-based constraints the PKIF library. These allow the expression of constraints that for applications that use CAPI for certification path processing. align with the standard path validation algorithm inputs as defined in RFC 5280 [1]. The constraints editing dialog is shown below. 5.1 Overview CAPI TAG consists of eight primary components: PKIFTAM, CAPI TAG Store Creator, Store Manager, mod_tam, Process TAMP Message, CAPI TAG, CAPI TAG Config and CAPI TAG Customization Wizard. 5.1.1 PKIFTAM PKIFTAM.dll provides basic encoding and decoding functionality for structures defined in TAF [3], TAMP [4] and CCC [2]. Additionally, it provides classes that can be integrated with the PKIF library (www.pkiframework.com) to enforce TA constraints using a TA store managed with TAMP. 5.1.2 CAPI TAG Store Creator CapiTagStoreCreator.exe is used to initialize a CAPI TAG trust anchor store. A trust anchor store can be created using trust anchors from a CAPI trust anchor store or a file folder. 5.1.3 Store Manager StoreManager.exe is the primary trust anchor management tool. It can be used to manage local trust anchor stores, remote trust Figure 8 Editing trust anchor constaints in Store Manager anchor stores accessed via HTTP or remote trust anchor stores via a file containing a TAMPStatusResponse message generated by the target trust anchor store. The user interface in Store Manager 5.1.4 mod_tam is mostly driven by TAMP messages, and all operations are mod_tam is an Apache module that serves either or both of the possible regardless of access method, provided the operator following purposes: 103  Routes TAMP messages received via a particular URI 5.2 Trust Anchor Management to a TA store file for processing Using CAPI TAG, several trust anchor management models are  Periodically check specified URIs for TAMP messages, possible. As shown in Figure 9, the management models which are downloaded and presented to a TA store file considered here are: local management, online remote for processing. management, indirect remote management and remote pull. The terms local and remote refer to the relative positions of the trust anchor store and the trust anchor manager’s private key. For local This enables the suite to support either push or pull for TA management scenarios, the TA store and TA store manager’s management. mod_tam is accompanied by an optional system tray private key are collocated1. For remote management scenarios, notification applet that allows the user to see desktop alerts as the TA store and TA store manager’s private key need not be TAMP messages are processed. collocated. 5.1.5 Process TAMP Message Given that CAPI TAG trust anchor stores are files, the contents ProcessTampMessage.exe allows a file containing a TAMP could be prepared in one location and distributed using means like message to be presented to a CAPI TAG trust anchor store for group policy. With minor additions to the current specification processing. The store can be addressed either as a local file or suite, additional models including usage of a using an HTTP URI. Unlike Store Manager, the operator of subjectInformationAccess-based pointer or trust anchor store- Process TAMP Message need not have any TAMP privileges (or initiated client/server exchange are possible. even possess a private key). CapiTag Store 5.1.6 CAPI TAG Creator CapiTag.dll integrates with Microsoft Windows operating systems to provide trust anchor constraints enforcement or alternative certification path processing. Initialize trust anchor store 5.1.7 CAPI TAG Config CapiTagConfig.exe is the primary means for configuring CAPI TAG Indirect CapiTag.dll for use. It enables the configuration of default Trust Anchor remote mgmt. settings and application-specific settings. All configuration Store information is stored in the system registry. Process Tamp Message 5.1.8 CAPI TAG Customization Wizard Local Indirect Indirect mgmt. remote mgmt. remote mgmt. CapiTagCustomizationWizard.exe is used to create transform files Remote mgmt. (.mst) that can be used to customize the CapiTag.msi installation package for use in a particular environment. The wizard allows customization of the following aspects of a CAPI TAG Online remote deployment: mgmt. Store Manager  Inclusion of one or more CAPI TAG trust anchor stores  Customization of Store Manager PKI settings (i.e., used mod_tam when validating TAMP messages generated by a CAPI TAG TA store) Remote pull  Customization of CAPI TAG trust anchor store PKI settings (i.e., used when validating TAMP messages generated by Store Manager) Remote TAMP  Customization of CAPI TAG PKI settings (i.e., used resources when enforcing TA constraints or to configure alternative certification path processing) Figure 9 Management models  Customization of CAPI TAG settings (i.e., default or per-application settings) 1  Specification of a customized mod_tam configuration This taxonomy is quite loose. Files accessed over a local area file network are considered “local” despite the fact that the TA store resides on a different physical machine. Similarly, files accessed over HTTP to a local mod_tam service are considered “remote”. 104 5.2.1 Local management  Local machine – default Using the Store Manager application, a CAPI TAG trust anchor store file can be opened and queried using a TAMPStatusQuery This allows a high degree of configurability for trust anchor stores message. The Store Manager operator’s private key must be and application PKI settings. Some applications can be available and the target trust anchor store must recognize the configured to enforce trust anchor constraints, others can be operator as authorized to originate TAMPStatusQuery messages configured to use an SCVP responder (or alternative local (no other permissions are required to simply review the contents certification path processing implementation) and other of a TA store). applications can be configured to use native processing without TA constraints enforcement. This degree of configurability makes it easy to enforce constraints for key applications without If the operator is authorized to edit TA store contents, changes can impacting any legacy incompatible applications that need to run be made and saved using Store Manager. on the same system. 5.2.2 Online remote management CAPI-enabled applications Using the Store Manager application, a CAPI TAG trust anchor store can be managed via HTTP by entering the URI corresponding to the desired trust anchor store. This will establish a connection to a mod_tam service, which will route TAMP messages to/from trust anchor stores collocated with the Validate certificate mod_tam service per the httpd.conf file. using CAPI functions As with local management, the operator may be authorized to edit the TA store or simply to review the TA store contents. CAPI Certification Path Processing 5.2.3 Indirect remote management TA stores can be managed remotely through exchange of files Pass certification paths to CAPI containing TAMP messages. An entity with at least TAG for further processing TAMPStatusQuery privileges can generate a TAMPStatusResponse message using Store Manager. The file containing the response can be provided to another entity with full CAPI TAG TAMP privileges, who can then open the file using Store Manager TA constraints and generate one or more TAMP messages to edit the TA store. enforcement These messages can be returned to the requesting entity for processing using the Process TAMP Message utility. To ensure security, the TA store should sign the TAMPStatusResponse. Find TA in CAPI TAG TA store and enforce 5.2.4 Remote pull applicable constraints A TA store manager can prepare TAMP messages using Store Manager for distribution via HTTP. The mod_tam service can be configured to periodically retrieve TAMP messages for zero or more URIs for processing by the indicated trust anchor store Manage associated with the mod_tam instance. CAPI TAG contents TA store(s) w/ TAMP TA store In CAPI TAG, automated remote pull is not available without the administrator mod_tam service. TAMP messages can be manually collected TAMP response and processed using either ProcessTAMPMessage or the process signing key externally generated TAMP message feature of Store Manager. Figure 10 TA constraints enforcement with CAPI TAG 5.3 Trust Anchor Constraints Enforcement Since CAPI TAG uses a trust anchor store that is separate from CAPI TAG can be configured to enforce trust anchor constraints CAPI trust anchor stores, the CAPI TAG trust anchor store on a per-machine, per-user or per-application basis. When an manager’s actions are not affected by changes made to the CAPI application loads CAPI TAG, the most specific available trust anchor store via automated trust anchor store updates or configuration is used. The order of preference is as follows: software upgrades. CAPI TAG can be configured to accept trust  Current user – application anchors from CAPI when a path is validated to a trust anchor not  Local machine – application present in the CAPI TAG trust anchor store and can be configured  Current user – default to write trust anchors to a file folder, enabling the trust anchor store manager to adjust the contents of the CAPI TAG trust 105 anchor store as necessary. This feature reduces the difficulty of The SCVP-client mode of operation in CAPI TAG required the determining which trust anchors must be present and trusted to availability of certificates in order to use existing structures that ensure that an application continues to function as users expect. could not be changed. For CAPI TAG purposes, trust anchors are always represented as either a certificate or a TrustAnchorInfo containing a certificate. It may have been possible to recast trust CAPI TAG can also be configured to not act for certain types of anchors stored as TBSCertificate or TrustAnchorInfo objects as operations. For example, CAPI TAG can be configured to use Certificates with bogus signatures, but this was not explored. only native functionality when a certificate is validated in support of a CAPI trust root list validation operation. Integration of trust anchor constraints enforcement [5] with the PKIF library was straightforward. Initially support was integrated as wrapper code that resided in an application, but this was moved 6. Summary into the library itself and exposed as an optional feature of the CAPI TAG demonstrates the effectiveness of the emerging IETF path validation implementation. Constraints enforcement [5] can trust anchor management specifications in a typical, commercial be implemented independent of other trust anchor management software environment. CAPI TAG is intended to generate interest specifications [2][3][4] using extensions expressed in self-signed and discussion in trust anchor management and usage practices certificates. This would be of limited utility at present given the that ensure relying party interests can be satisfied. This section fact that most self-signed certificates do not include constraints of describes some challenges encountered while developing the any sort. CAPI TAG products and identifies some areas where additional standardization is potentially required. At a high level, the implementation of support for the trust anchor management specifications and integration of that support into 6.1 Implementation experience existing products consisted of the following activities: A primary challenge encountered during the development of the  Define trust anchor store format software was the lack of a proper interface for integrating enhanced trust anchor management capabilities and enforcement  Define and implement trust anchor store interface and of trust anchor constraints. Not surprisingly, once an approach access control mechanisms was identified it was also proved suitable for implementing an SCVP client. Most of the problems associated with the selected  Identify code that uses trust anchors and make integration mechanism could easily be addressed if a means of adjustments to accommodate new formats, where utilizing alternative certification path processing implementations necessary similar to that used for installing alternative revocation status  Implement trust anchor constraints enforcement as providers were available. pre/post processing of path validation or integrate with Integration of non-certificate formats into a trust anchor store path validation code posed another challenge. This was solved by using a CAPI TAG- Following the implementation of support for trust anchor specific trust anchor store file format. Several challenges management and trust anchor constraints enforcement, prevented the usage of existing mechanisms. The interfaces to deployment of the capabilities consisted of the following existing trust anchor stores accept (usually self-signed) activities: certificates. Trust anchor management messages were the desired format to support in-band integrity checks, authorization,  Identify the applications of interest (i.e., web browsers, subordination checks, etc. While it may have been possible to email clients, etc.) have overloaded the CertAddEncodedCertificateToStore to handle  Identify the trust anchors required by these applications TAMP messages, this was not explored. For these reasons, TA store management was implemented as wholly independent of  Identify entities authorized to manage trust anchor CAPI. stores Read/write access to the trust anchor store file is managed by the  Initialize trust anchor stores to include desired trust operating system. CAPI TAG trust anchor store usage only anchors (including constraints) and trust anchor store requires read access. Write access can be limited to the mod_tam managers service, if desired. By default, though, system administrators have write access to CAPI TAG trust anchor store files. Authorization  Distribute trust anchor stores and enable trust anchor to manage trust anchor store contents via a TAMP interface is constraint enforcement capabilities enforced using CCC.  Manage trust anchor stores using appropriate local, Trust anchor constraints enforcement was integrated with an remote, direct or indirect means existing public key enablement library (PKIF). Integration of support for alternative formats [3] required a number of changes 6.2 Potential additional standardization needs to the library. These were addressed primarily through the use of Most existing trust anchor constraints mechanisms provide a abstract interfaces that captured the common elements of the capability similar to the extended key usage extension. various formats, i.e., all featured a subject name, a public key and Unfortunately, extended key usage values included in a trust extension values. anchor are not processed during certification path validation [1]. Defining an extension and an augmentation of the standard path validation algorithm would be simple and straightforward and 106 potentially valuable in terms of promoting interoperability. adoption of additional constraints could provide useful However, the utility of this extension is not entirely clear given countermeasures. that most enterprises do not operate certificate authorities, let alone root certification authorities, on a per extended key usage basis. The usage of the existing name constraints extensions in trust 7. REFERENCES [1] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, anchors is effective in enterprise environments where naming R., and W. Polk, “Internet X.509 Public Key Infrastructure conventions are rigorously controlled and are generally Certificate and Certificate Revocation List (CRL) Profile”, hierarchically related. The name constraints mechanism is less RFC 5280, May 2008. suited to internet use, where distinguished names vary greatly within a single certification path and server names are often [2] Housley, R., Wallace, C., and S. Ashmore, “Cryptographic conveyed as a terminal relative distinguished name (RDN) value. Message Syntax (CMS) Content Constraints Extension”, in Addressing this issue may be more easily accomplished by progress. refining naming practices to enable the usage of existing name [3] Housley, R., Wallace, C., and S. Ashmore, “Trust Anchor constraints mechanisms than defining alternative constraint Format”, in progress. mechanisms. [4] Housley, R., Wallace, C., and S. Ashmore, “Trust Anchor Another name constraints-related issue is the observation that to Management Protocol (TAMP)”, in progress. effectively use name constraints, most or all trust anchors in a [5] Wallace, C. and S. Ashmore, "Using Trust Anchor given trust anchor store must have an associated name constraint Constraints During Certification Path Processing", in value. To ensure that a particular namespace can only be issued progress. by a given trust anchor all other trust anchors must be defined to either have an alternative permitted namespace or to exclude the [6] Wallace, C. and R. Reddy, “Trust Anchor Management namespace of interest. Requirements", in progress. Though no in-depth investigation of the utility of trust anchor management tools to counter phishing attacks was conducted, it is possible that better use of existing constraints or definition and 107 4/15/2010 Practical and Secure Trust Anchor Management and Usage Symposium on Identity and Trust on the Internet April 15, 2010 Carl Wallace Geoff Beier Cygnacom Solutions Cygnacom Solutions cwallace@cygnacom.com gbeier@cygnacom.com Scenario #1 • DOD is cross-certified with the Federal Bridge CA (FBCA) via the Interoperability Root CA (IRCA) – Enables optional recognition of the FBCA community, i.e., those who install the IRCA as a trust anchor (TA) can interoperate with the FBCA • In some cases, entities who do not use the IRCA need to communicate with enterprises who are cross-certified with the FBCA – For example, Department of State (DOS) • How can interoperability be enabled without using the IRCA? – Installing the DOS trust anchor results in similar (and worse) level of exposure to the FBCA community that installing the IRCA does 2 1 4/15/2010 3 * Diagram courtesy Booz Allen Hamilton Scenario #2 • Operating systems and applications are pre- loaded with a variety of default TAs – Trust graph is unknown (or at least unpublished) – Limited options for users or administrators to add constraints • How can an enterprise ensure CAs validated using a default TA are not issuing certificates that assert names or policies managed by the enterprise? 4 2 4/15/2010 ? ? ? ? ? ? ? ? ? ? …amongst others 5 ? ? ? ? 6 3 4/15/2010 Common problem • The challenge in both scenarios stems from an inability to constrain TAs in useful ways 7 Establishing Trust Relationships Between Enterprises Using a PKI • Direct bi-lateral cross-certification – Mechanics: Enterprise A’s root CA issues a certificate to Enterprise B’s root CA and vice versa – Constraints: Each certificate includes desired name constraints, policy constraints, path length constraint, policy mapping, etc. – Scope: enterprise-wide • Indirect bi-lateral cross-certification (i.e., Bridge CA) – Mechanics : Both Enterprise A and B root CAs issue certificates to a Bridge CA and vice versa – Constraints: Each certificate includes desired name constraints, policy constraints, path length constraint, policy mapping, etc. – Scope: enterprise-wide • Direct trust/implicit unilateral cross-certification – Mechanics : Enterprise A installs Enterprise B’s root certificate as a trust anchor and vice versa – Constraints: Unconstrained or limited by the extended key usage-like constraints options supported by the trust anchor store – Scope: local 8 4 4/15/2010 Current TA constraint mechanisms 9 Emerging standards • Several specifications are progressing through the IETF to address trust anchor management and trust anchor usage issues – Trust Anchor Format (TAF) • http://tools.ietf.org/html/draft-ietf-pkix-ta-format-04 – Trust Anchor Management Protocol (TAMP) • http://tools.ietf.org/html/draft-ietf-pkix-tamp-07 – CMS Content Constraints (CCC) • http://tools.ietf.org/html/draft-housley-cms-content-constraints-extn-04 – Using Trust Anchor Constraints During Certification Path Processing (UTAC) • http://tools.ietf.org/html/draft-wallace-using-ta-constraints-02.html • Requirements for the specifications listed above are defined in an informational draft – Trust Anchor Management Requirements • http://tools.ietf.org/html/draft-ietf-pkix-ta-mgmt-reqs-05 10 5 4/15/2010 Trust Anchor Format • Self-signed certificates are de facto standard for representing trust anchors – Security requires out-of-band establishment of trust – Format does not lend itself to association of constraints by relying parties • TAF defines three formats for representing a trust anchor – Certificate • Self-signed or otherwise – TBSCertificate • i.e., a Certificate structure without signature – TrustAnchorInfo • Can be as small as name and key (or name and key plus constraints) • Can wrap a certificate to add constraints 11 Trust Anchor Management Protocol • Primary aim is to reduce need for out-of-band trust decisions – Enables trust anchor stores to be initialized once (in a secure environment) and managed thereafter using TAMP • Provides support for disaster recovery – Via Apex TA with a contingency key • Management operations are subject to strict subordination rules 12 6 4/15/2010 TAMP Message Types • Eleven TAMP message types – Content types, in CMS parlance • Five pairs of request/response messages plus TAMPError Request (TA Manager-generated) Response (TA Store-generated) TAMPUpdate TAMPUpdateConfirm TAMPApexUpdate TAMPApexUpdateConfirm TAMPCommunityUpdate TAMPCommunityUpdateConfirm SequenceNumberAdjust SequenceNumberAdjustConfirm TAMPStatusQuery TAMPStatusResponse TAMPError 13 CMS Content Constraints • Certificate or trust anchor extension that describes the types of content that can be validated using a given public key – Content is described in terms of CMS content types and CMS attributes • Can be used as an authorization mechanism with TAMP 14 7 4/15/2010 Using Trust Anchor Constraints During Certification Path Processing • Describes how to use constraints expressed in a trust anchor during certification path processing – Essentially describes how to combine values from trust anchor extensions with standard user-supplied path validation inputs • Processing can be – Incorporated into an RFC 5280 compliant implementation – Implemented as pre-processing of RFC 5280 inputs and post-processing of RFC 5280 outputs 15 Using emerging standards to address Scenarios #1 and #2 • Wrap each desired self-signed root certificate in a TrustAnchorInfo structure and associate necessary constraints, i.e., permitted namespaces, excluded namespaces, policies, etc. – Uses TAF • Install new trust anchor definitions in TA store – Uses TAMP and CCC • Enforce the constraints contained in the TA during certification path processing – Uses UTAC 16 8 4/15/2010 17 * Diagram courtesy Booz Allen Hamilton CAPI Trust Anchor Guard (CAPI TAG) • Enables applications that using Microsoft CAPI for certification path processing to use trust anchor constraints • Uses secondary trust anchor store that provides constraints for trust anchors stored in native CAPI root store – Managed via CAPI TAG tools using TAMP/CCC 18 9 4/15/2010 Integration with Microsoft CAPI • Several avenues were explored while searching for a means of providing support for trust anchor constraints to applications enabled using Microsoft CAPI – These efforts are described in IDTrust paper – Revocation status provider and validation policy provider interfaces were explored but not used • The certificate store API was selected for use – Serves as a point of entry for intercepting calls to the native certification path processing function • Enables support for trust anchor constraints, delegated certification path processing (SCVP), etc. 19 CAPI TAG components • CAPI TAG StoreCreator – Used to initialize a CAPI TAG trust anchor store • Store Manager – Manages local CAPI TAG trust anchor stores via local file access or HTTP – Manages remote CAPI TAG trust anchor stores via HTTP – Manages remote CAPI TAG trust anchor stores via a file containing a TAMPStatusResponse message • mod_tam – Routes TAMP messages received via a particular URI to a TA store file for processing – Periodically checks specified URIs for TAMP messages, which are downloaded and presented to a TA store file for processing • Process TAMP Message – Enables a file containing a TAMP messages to be presented to a CAPI TAG trust anchor store (via HTTP or local file access) for processing 20 10 4/15/2010 CAPI TAG components • CAPI TAG – Integrates with Microsoft Windows to provide trust anchor constraints enforcement (or alternative certification path processing, e.g., SCVP) • CAPI TAG Configuration Utility – Primary means for configuring CAPI TAG for use • CAPI TAG Customization Wizard – Deployment utility used to create MST files • PKIFTAM – C++ library that provides support for TAF, TAM and CCC – UTAC support is available in base PKIF library 21 CapiTag Store Creator Management models Initialize trust anchor store • Local management – File-based • Online remote management CAPI TAG Indirect – TAMP over HTTP Trust Anchor Store remote mgmt. • Remote pull Process Tamp – TAMP messages via HTTP, RSS, LDAP, FTP, etc. Message • Indirect remote management Local Indirect Indirect – TAMP over sneaker-net, email, etc. mgmt. remote mgmt. remote mgmt. Remote mgmt. • Additional possibilities – Status quo w/ RFC 5280 inputs and/or TA Online remote constraints in certificates mgmt. Store Manager – Delegation via SCVP mod_tam • Potential models Remote – Interactive pull (submit response, receive pull update) – Remote pull (per SIA or AIA) Remote TAMP resources 22 11 4/15/2010 Process flow abstraction 23 Process flow abstraction 24 12 4/15/2010 Process flow abstraction 25 Functional Roles • Trust anchor store manager – Controls private key corresponding to trust anchor or certificate with CCC extension authorizing generation of all TAMP request messages • TAMPUpdate, TAMPCommunityUpdate, SequenceNumberAdjust and TAMPStatusQuery messages • Trust anchor store viewer – Controls private key corresponding to trust anchor or certificate with CCC extension authorizing generation of TAMPStatusQuery messages • Trust anchor store user – Is not authorized to generate any TAMP messages but may use a CAPI TAG trust anchor store – May present a TAMP message from an authorized source to a CAPI TAG trust anchor store for processing • System administrator – Authorized to edit system registry and filesystem (i.e., to install or configure CAPI TAG) 26 13 4/15/2010 Demonstration 14 A Proposal for Collaborative Internet-scale trust infrastructures deployment: the Public Key System (PKS) Massimiliano Pala Department of Computer Science Dartmouth College, Hanover, NH pala@cs.dartmouth.edu ABSTRACT (e.g., certificates) and (3) the context in which the creden- Public Key technology is about multiple parties across dif- tials may be trusted. In on-line environments, relying on ferent domains making assertions that can be chained to- information that is not properly validated can lead to fraud, gether to make trust judgments. Today, the need for more unauthorized access to classified data, or misuse of comput- interoperable and usable trust infrastructures is urgent in ing resources. Public Key cryptography offers the possibil- order to fulfill the security needs of computer and mobile ity to authenticate the identity of a remote party by veri- devices. Developing, deploying, and maintaining informa- fying one’s capability to use a private key associated with tion technology that provides effective and usable solutions a known public key. Although the link between the public has yet to be achieved. In this paper, we propose a new and the private keys can be easily established through cryp- framework for a distributed support system for trust infras- tographic algorithms, the link between public key and user’s tructure deployment: the Public Key System (PKS). We identity requires an additional component: an infrastructure describe the general architecture based on Distributed Hash for identity and key management. Tables (DHTs), how it simplifies the deployment and usabil- Unfortunately, when leaving closed and controlled envi- ity of federated identities, and how existing infrastructures ronments (like proprietary OSes), the complexity and va- can be integrated into our system. This paper lays down riety of real-world trust infrastructures impacts on the in- the basis for the deployment of collaborative Internet-scale teroperability of trust infrastructures. Solving today’s de- trust infrastructures. ployment issues will provide the required building block for secure communication and authentication in many environ- ments (e.g., Trusted Computing, Computing Grids, wireless Categories and Subject Descriptors and wired network access). We identify the following as the K.6.5 [Management of Computing and Information most important issues related to the deployment of Internet- Systems]: Security and Protection—authentication scale trust infrastructures in open environments. General Terms Problem 1. Unlike the Domain Name System (DNS), which provides a world-wide single Internet host naming Security, Design, Standardization infrastructure, PK technology does not rely on a globally- authoritative infrastructure. In order to correctly use the Keywords services offered by a Certification Authority (CA), applica- PKI, Federated Identities, Distributed Systems, Peer-to-peer tions need to be able to “discover” them and take informed trust decisions. Regrettably, there is no support system for trust infrastructures deployment, nor a standardized pro- 1. INTRODUCTION AND MOTIVATIONS tocol (besides PRQP [21] which is capable of providing the Public Key Infrastructures are fundamental building blocks discovery properties for PKI resources) that will allow appli- of the Internet. We rely on Public Key (PK) technology cations to easily interact with different PKIs. For example, for many important activities—e.g. eCommerce, email pro- discovering the address and supported protocol for certifi- tection, and website authentication. Effective use of PK cate renewal from a Certification Authority (CA) is almost requires the relying parties to access the information and impossible for an application. In this paper we propose and resources that enable them to verify (1) the identity of the analyze the design of a Public Key System (PKS) that al- participating entities, (2) the validity of their credentials lows PK-enabled applications to discover resources offered by different CAs. Trust decisions regarding a particular CA can then be facilitated by discovering which trust communi- ties or other organizations already rely upon them (see also Permission to make digital or hard copies of all or part of this work for Problem 4 below). personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies Problem 2. Interaction among different parts of a PKI bear this notice and the full citation on the first page. To copy otherwise, to is often difficult. Current PKIs require applications to inter- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. act with many different services, which are provided through IDTrust ’10, April 13–15, Gaithersburg, MD USA disparate transport protocols. Although many popular ap- Copyright 2010 ACM ISBN 978-1-60558-895-7/10/04 ...$10.00. 108 plications (e.g., browsers and Mail User Agents) are capable to easily discover PK services, our PKS provides the ability of using Public Key Certificates (PKCs), even the simplest to group them according to specific environments to help tasks related to the use of PK technology (e.g., requesting users to manage (or delegate) trust settings. a certificate, renewing a certificate and/or checking the va- lidity of a certificate) requires the application to support a Problem 4. The lack of a standardized method to iden- variety of different protocols. On the developers side, the tify the federation that a trust anchor is a part of impacts on problem with PKIs is the complexity of certificate process- the capability of users to select the context in which the trust ing and the need to support a wide variety of transport pro- anchor should be used. For example, when using a browser tocols. For example, according to [8] the minimum set of to interact with a Federal Agency website, the user is un- required protocols to be supported is composed by FTP [3] able to trust only a subset of the trust anchors present in the and HTTP [9, 16]. Both of these protocols have been relied application’s (or Operating System’s) certificate store (e.g., upon because of historical reasons (e.g., FTP) or because of certificates that are part of the Federal Bridge PKI), instead their wide deployment (HTTP), the availability of many re- all trust anchors are treated equally. The user should be pro- lated services (e.g., HTTP Proxy, HTTP Caching services, vided with the possibility of trusting a specific set of trust etc.) or default access properties (e.g., traffic being allowed anchors by using the familiar concept of federation instead through firewalls by default). On top of these, most of CAs of Policy Identifiers embedded in the digital certificates. By use HTTPS, LDAP [26], LDAPS to publish certificates and facilitating a method for disseminating information about CRLs. All of these protocols, require the application de- which organizations or federation use/include/trust a par- veloper to either rely on existing libraries (when these are ticular trust anchor, our system allows for easy deployment available) or to provide her own implementation. In fact, of federated identities. implementing a full HTTP library that is capable of man- aging all the possible HTTP commands, codes, and config- In this paper we present a support system for trust infras- urations could require a lot of additional development time tructures based on Distributed Hash Tables (DHTs) that and costs. When it comes to small devices, the need to re- is suitable for Internet-scale deployment and provides dy- duce the size of libraries and memory usage is well known. namic federation management. Moreover, our work can be Therefore, our proposal provides a simple transport proto- easily integrated with existing infrastructures allowing for col for PKI messages. The protocol is easy to implement a smooth roll over between isolated PKI islands to globally and flexible enough to support current and future needs for available and locally configurable PKI services. communications between different PKI actors. An analy- The rest of the paper is organized as follows. Section 2 sis of all current PKIX protocols (eg., OCSP, CMC, CMM, presents the background and related work. Section 3 de- etc.) showed that supporting a request—response model in scribes the basic principles of PKS, the overlay network de- PKS allows to integrate them with the proposed PKS. As sign and the message format. How to deploy federated iden- described in 3.1, we developed a simple challenge—response tities within PKS is explained in section 4, while Section 5 protocol that allows for re-utilization of most of the already details how to integrate existing PKIs with our infrastruc- deployed software. ture. Section 6 contains our conclusions and future work. Problem 3. It is impossible for users and applications 2. RELATED WORK to specify the class of PK services they want to trust. The possibility of identifying a set of service providers based on a An important part of PKIs is the “I“–nfrastructure that is classes of services (e.g., local, eCommerce, eBanking, eMail, needed to manage the trust relationships between entities. organizational, and Internet) will allow better trust manage- We investigated existing trust infrastructure deployment ment in applications. Since people carry many small per- systems and collaborative approaches to provide federated sonal devices for everyday use, they might want to exchange identities. In this section, we provide a description of the information directly and securely (e.g., beam it or radio it). previous work and related technologies. In order for people to interact efficiently with different certi- fication authorities for different purposes, we need to rethink today’s infrastructures to allow for globally and locally avail- 2.1 “I” for Infrastructure able trust infrastructure networks. In fact, real-world trust Throughout the years, research has offered many different infrastructures demand a simple and interoperable way to technologies like PGP [6], SDSI-SPKI [7] and identity-based federate identities. Today, many PKIs are in place to serve encryption (IBE) [4] to authenticate users. Although each a specific purpose. The deployment of PKIs for providing of them have their own strenghts and weaknesses, an infras- identities to access resources within federations (e.g., com- tructure of some sort is needed in order to provide support puting grid policy bodies like TAGPMA [25] or IGTF [13]) is for trust building. An example of a widely deployed infras- an example of such specializations. Another example is the tructure is represented by the web of trust used in Pretty presence of many CAs in the commercial sector dedicated to Good Privacy (PGP) [6]. Similar to traditional X.509 PKIs, provide only SSL certificates. We should introduce a mecha- PGP uses signed statements (certificates) to establish the nism to support contextual trust. For example, when setting link between a public key and a user’s identity. PGP iden- up a mobile device to access the home network, it should be tities are unique in that normal users can endorse them by easy to discover and utilize local PK services; however, when digitally signing other users’ keys. Although this approach it comes to accessing services on the Internet, we might want may work in small and well-defined communities where out- to validate certificates/services by using trust anchors asso- of-band (e.g., face to face) identity verification is feasible, ciated with specific bodies: government services, Internet its decentralized authentication scheme would not work for services, and on-line banking services. Given the possibility large-scale, widely distributed deployments (e.g., for the In- 109 ternet community), automated infrastructure environments (e.g., Trusted Computing), or in high-security environments (e.g., Federal Agencies). 2.2 Peer-to-peer systems In X.509 infrastructures, well-defined liabilities and cer- In this paper we introduce a novel approach to deliver a tificate policies have been defined to provide the flexibility cooperative system to enable interoperable trust infrastruc- and the scalability required by Internet-scale trust infras- tures deployment at the Internet scale based on peer-to-peer tructures. Researchers and standardization bodies, working technologies. Since we envision a Peer-to-peer approach in at both local and global scales, have defined a set of mini- the PKS design, we provide a summary of current Peer-to- mal requirements (or profiles) that can be used as guidelines peer technologies relevant to our work. when deploying these authentication infrastructures [8, 11]. In the first-generation P2P systems (e.g., Gnutella, Kazaa, As a consequence, X.509 PKIs now provide the most widely Napster, etc.) all nodes are both clients and server: any deployed technology for Internet authentication (e.g., WAN node can provide and consume data. Some of these systems, and Interdomain). Regardless of the fact that today identity like Napster, implemented a centralized search service where providers need to participate in federations, no standardized a single server keeps track of the location of the shared data. infrastructure exists to support federated identities and to On the opposite side is Gnutella; in this type of network, help applications and users to correctly manage their trust search is implemented by recursively asking the neighbors settings. for files of interest. The search goes on till a Time To Live To address the need for a trust infrastructure for the In- (TTL) limit is reached. Systems like Kazaa or Skype use ternet, early approaches envisioned the establishment of an a hybrid model where super-peers act as local search hubs. Internet Policy Registration Authority [15]. Its failure due Super-nodes are automatically chosen by the system based to political, rather than technical, issues showed the im- on their capacities in terms of storage, bandwidth and avail- possibility to centrally manage Internet-wide trust infras- ability. tructures. The absence of a globally-available infrastructure (like the DNS in the case for Internet host naming) led to Because of the introduction of Distributed Hash Tables the establishment of many different and poorly interopera- (DHTs), the second generation of P2P overlay networks pro- ble trust infrastructures. Researchers and Internet working vides major advantages over the first generation by imple- groups have tried to address this problem by studying more menting a predictable (maximum) number of hops needed distributed trust models that use cross-certification and/or to answer a query. DHTs are a distributed version of a bridge CAs. Unfortunately, the difficulties related to path hash table data structure. The combination of (key, value) validation in these more complex trust infrastructures have is used to look-up,retrieve, store, and delete shared data slowed down their adoption in the real world. Moreover, the across peers. The key idea behind the usage of DHTs is need to accept a common certification policy is an obstacle to provide each peer with a unique identifier and assign a to their deployment in open environment. sub-set of the general (key, space) to it. There are several Today, the need to provide services beyond the borders routing protocols based on DHTs often referred to as P2P of a single organization demands for more interoperable en- routing substrates or P2P overlay networks. The first us- vironments. Government Agencies, Grid Computing Com- able approach of a DHT-based routing substrate is found in munities and Trust Computing environments provide a clear Chord [24] where a circular address space is used to map example of organizations (or Virtual Organizations in some nodes and the key space. cases) where the need for a globally-available trust infras- tructure is compelling. Several P2P routing substrates followed after Chord. These The lack of a standardized method to provide federated systems introduced more sophisticated (and sometimes quite identities has pushed many communities to provide solu- complex) design to minimize the maximum number of hops tions based on weak credentials (like passwords). For exam- and the overhead introduced by the P2P routing infrastruc- ple, the Federal Government CIO Council established the ture. For example Pastry [1] considers the network locality Identity, Credential, and Access Management (ICAM) Sub- when routing messages through its network. In Pastry, in committee i.e. ICAMSC, with the charter to foster effective addition to the leaf nodes a neighborhood list is maintained ICAM policies and enable trust across organizational, oper- where the M closest peers, in terms of the routing met- ational, physical, and network boundaries [12]. The release rics, are listed. Although it is not used directly in the rout- of an ICAM Trust Framework (TF) has led to several key ing algorithm, the neighborhood list is used for maintaining federations seeking accreditation with the ICAM TF (e.g. locality principals in the routing table. A more complex InCommon Federation [14], OpenID Federation [20]). How- topology is implemented in Content Addressable Network ever, these federations are primarily focused on the lower (CAN) [23]. It uses a “d-dimensional” cartesian coordinate levels of authentication as defined in NIST Special Publi- space mapped on a d-torus. In CAN, a node is responsible cation 800-63 [5] (i.e. levels 1 and 2) which do not require for a specific value if the corresponding key hashes in the strong credentials. Hence, they do not have the identity sub-space “owned” by the node itself. binding necessary for providing high level of assurance (LoA) credentials. Other examples of advanced DHT-based overlay networks To avoid political issues that led to the failure of IPRA, are Tapestry [27], Kademila [18], and P-Grid [2]. Tapestry the globally-available infrastructure should be designed in uses two identifiers: the NodeID and the Application specific such a way that would (a) provide support for federated endpoints or GUID. The main focus of Tapestry is efficiency. identities under well defined authorities, (b) define a de- In particular, it minimizes message latency by construct- ployment framework that helps infrastructure management, ing locally optimal routing tables from initialization and by and (c) facilitates trust decisions for the user. maintaining them in order to reduce routing stretch. Simi- 110 lar to Tapestry, Kademlia algorithm uses a special notion of deployment. In particular, we enhance the peer-to-peer pro- locality based on the calculation of the “distance” between tocol to support (1) interoperable PKI message exchange two nodes. This distance is computed as the Exclusive Or among CAs, and (2) usable federated identities deployment. of the two node IDs. Kademila uses the Exclusive Or be- Similar to PEACH, we leverage the possibility to join() cause it shares some properties with the geometric distance the network by using multiple identity-based node identi- formula: the distance between a node and itself is zero, it is fiers. Different from our previous work, we support two symmetric, and it supports the triangle inequality. Kadem- different type of nodes: the PKS responders and the PK lia routing tables consist of a list for each bit of the node Federation Authorities. id: nodes that can go in the nth list must have a differing The PKS responders act as a PKI proxy for applications. nth bit from the node’s own id. Node look-ups proceed by They are capable of (a) answering clients about PKI requests querying to the k nodes in its own k-buckets that are the as described in Section 3.2, and (b) forward PKI requests on closest ones to the desired key. These nodes will send back the PKS and send back responses to the client application. the k closes entries they know. The iterations continue until The PK Federation Authorities, instead, provide informa- no nodes are returned that are closer than the best previ- tion about the deployed federations by indicating if a par- ous results. Different from any of the previously discussed ticular entity is part of the authorized federation. protocols, P-Grid uses a bit-level approach to provide ef- In order to locate available CAs efficiently on the PKS ficient node look-ups by resolving queries based on prefix network, we use unique node identifiers for each CA. We matching. Instead of using a DHT, P-Grid uses a trie [10], leverage the availability of the CAs’ digital certificates by or prefix tree, which is an ordered tree data structure. P- deriving the node’s identifier from the fingerprint of the CA Grid partitions the key-space in a granularity adaptive to certificate itself. For example, if CA1 wants to participate the load at that part of the key-space. Unlike DHTs that in the PKS network, it will setup a PKS node and issue perform efficiently only for uniform load-distributions, an a certificate that identifies it as the authoritative PKS re- overlay network based on P-Grid presents peers with simi- sponder. When joining the PKS network, the PKI gateway lar storage load even for non-uniform load distributions. will present its own certificate together with the CA1 ’s cer- tificate. The node identifier, that is the identifier that will enable the node to be found on the network, is calculated by 3. THE PUBLIC KEY SYSTEM (PKS) using the fingerprint of the CA1 ’s certificate. To validate the identity of the joining node, a simple validation of the pre- The PKI System (PKS) we propose in this paper is com- sented certificate chain will guarantee that the joining node posed of three main components: the DHT-based overlay has been authorized as a PKS responder for that particular network design and routing properties, the message format, CA. Let n be the PKS responder for CA1 , the trust chain: and the support for federated identities. The PKS uses a Peer-to-peer overlay network to route Certif icate(n) ← Certif icate(CA1 ) messages to the target CAs and federation authorities. In guarantees the authorization of the node to respond as the particular, we use a simplified version of the Chord protocol PKS responder (a specific extension in the PKS responder’s based on the PEACH [22] system. All of the different types certificate might be required). Moreover, because the node of overlay networks discussed in the previous section provide identifier is the hash of the CA’s certificate, it enables the a large number of options (e.g., storing keys and values, re- PKS responder only for that particular CA. This approach trieving values, and providing support for multicast traffic). guarantees high scalability and provides a simple approach We selected the PEACH routing algorithm for two reasons. to PKS responders deployment. First, it already provides support for node identifiers based It is important to notice that the PKS network can sup- on public key certificates. Second, the PEACH protocol is port any type of public key identifiers. This feature stems easy to support from the developers point of view: other from the use of the output of the hash function to link a node protocols like Kademilia or P-Grid might provide additional on the PKS network to an identity (e.g., a CA or a PK-FA). features that are not required by our system. In particu- Although our work primarily focuses on X.509 certificates, lar, it does not support many of the operations traditionally PKS is capable of supporting multiple type of public key implemented over peer-to-peer networks (i.e., get(), put(), based identifiers. delete()). Ultimately, the PKS could use any of the peer-to-peer Applications such as browsers or email clients, access the overlay networks discussed in Section 2 provided that changes PKS by querying the local PKS server. By looking at the to support identity-based node identifiers are in place. target responder in the PKS network, the local PKS respon- der discovers if a responder for the target CA is available 3.1 The PKS Network and, if so, forwards the application’s request to the target node. The response is then routed back to the client. As In our previous work, we designed and prototyped a scal- described in Section 3.2, applications use only one simple able system for PKI resources look-up. In [22], we intro- transport protocol for all PKI-related queries (e.g., OCSP, duced a new peer-to-peer overlay network that makes use of CMM, SCEP, etc.) and do not need to implement any of the a Distributed Hash Table routing protocol (namely, Peach). overlay network operations (e.g., join() or lookup()). If a lo- Results from this work have demonstrated that PKIs can cal PKS responder is not available, one of the pre-configured make effective use of peer-to-peer technologies and have laid servers can be used instead (same approach as in DNS where the path for the next steps in this new field. In this paper we applications and operating systems are provided with the list build on our previous work and extend this approach to pro- of root DNSs). We envisage that local PKS responders (or vide a support system for Public Key trust infrastructures PKI gateways)—as in the case of caching servers for DNS— 111 will regularly be deployed in LANs to facilitate access to Size (bytes) PKS for applications. 0 4 8 Although we provide an overview of all of the main fea- tures of the PKS network, because many operations are sim- ilar to the ones described in PEACH we refer to our previous work for a more exhaustive description of the protocol and CMD_CODE PKT_SIZE its performances. PAYLOAD The PKS Local Routing Table PKSMessage or Our system uses a DHT table together with a hash func- CMD data tion (i.e., SHA-256) to implement efficient routing in PKS. Each participating node is provided with an identifier that is derived by calculating the hash of the responder’s CA cer- Figure 1: Structure of PKS messages. tificate or, in case of a Federation Authority, the authority’s certificate. To support efficient nodes lookup, each node stores a local routing table. Each table carries m entries If the PKS responder also acts as one or multiple federated where m is equal to the number of bits in the node iden- authorities, the set of certificates associated with this role tifiers, that is the size of the output of the selected hash can be expressed as: function. As we use the same algorithms as identified in [17] to build and update the local routing table in PKS, we do η = {x01 , x02 , . . . , x0m } not report the full description here. However, we describe the basic structure of the routing table to provide a clear Let θ be the set of network identifiers related to the joining view of the network properties. Moreover, as the routing PKS responder: algorithm is derived from Chord, all formal proofs still hold θ= {y1 , y2 , . . . , yn } for the PKS. The local routing table correlates the nodeIDs to the node’s and let ψ be the set of network identifiers related to the network address. To optimize lookup operations, the rout- federation authority role: ing table is kept ordered by nodeID. Let idn be the node ψ= {y10 , y20 , . . . , ym 0 } identifier for node n, and m be the size (in bits) of the node identifiers, then the stored values of the local routing table where: range from: ∀i ∈ [1, 2, ... , n], ∃xi , yi : xi ∈ φ ∨ yi ∈ θ xn 0 = (idn + 20 ) mod 2m ⇒ yi = H(xi ) to: and: xn m = (idn + 2m−1 ) mod 2m ∀k ∈ [1, 2, ... , m], ∃x0k , yk0 : x0k ∈ η ∨ yk0 ∈ ψ In general, the value for the i—th entry in the local routing table can be expressed as: ⇒ yk0 = H(x0i ) ∀i ∈ [1, 2, ... , m], ⇒ xn i = (idn + 2i ) mod 2m For each xi the responder is authoritative for, the PKS re- sponder has a different network identifier yi , which is based therefore, the node-identifier space related to the i—th entry on the CA’s certificate fingerprint. For each x0k the respon- is: der is the federation authority for, the PKS responder has a γin = [xn n i , xi+1 ) different network identifier yk0 , which is based on the federa- tion’s authority certificate (not on the federation authority’s Let k be the target node for a query. By looking at the local certificate issuer). For each of these identifiers, the joining routing table, the node n can find the closest node whose peer performs f indnode() to find its successor in the network identifier is equal or precedes k. By iterating this approach ring and proceeds to register itself in the right position. This it is possible to find the requested node in O(log(m)) oper- approach enables the responder to provide PKS services for ations. different CAs and federation authorities and potentially fa- Multiple Identifiers cilitate the deployment of existing CAs in the PKS network. A PKS responder might need to be identified on the PKS network by multiple node—IDs. This happens, for example, 3.2 The PKS Message Format when the PKS responder is authoritative for multiple CAs. The simplicity of the PKS message format constitutes one Moreover, the same node can serve as an authority for one of the core features of our system. To minimize the impact of or more federations at once (see section 4). the message format and support the integration of existing To be assigned multiple node identifiers, the joining PKS PKIX protocols, we opted for a simple binary format. In responder performs multiple join() operations on the net- PKS, each message is composed by a header and a body. work. Let n be the number of certificates the PKS responder The header carries two integers (uint32) that indicate the possesses. The set of the CA certificates related to the PKS type of the message (command code) and the size of the responder (φ) can be expressed as: body (message length) in bytes. The payload of the message φ = {x1 , x2 , . . . , xn } is a DER encoding of a PKSMessage that acts as a wrapper 112 around the PKIX message (e.g., CMS) to be dispatched to and allows them to use end-entity certificates provided by the target node. any third-party CA. By trusting the PK-FA to be authori- The command code is 4 bytes long and it specifies the tative for a specific federation, users and applications are be action to be performed on the target node or the return able to query for the participation of an entity in a specific code. The packet length is used to identify the length of federation. the payload and is also 4 bytes long (type uint32 t). When As an example, let’s consider the impact of PK-FA nodes the control code identifies a network-related operation (e.g., on browsers’ trust store. Today, CAs undergo specific au- lookup(), join(), or leave()), the payload carries the control dits and certification processes to be included in browsers data. For example, when a lookup is requested, the payload and operating systems. By leveraging the PKS features, carries the node identifier of the searched PKS responder the number of certificates embedded into applications could (the certificate’s hash). drop substantially. In fact, let company χ be a certifica- A special case is represented by the CMD_PKI_MESSAGE tion/auditing provider and x its PK-FA node in PKS. If the command code. In this case, instead of the CMD Data, the CA has positively passed the certification process, the node payload content is a PKSMessage. The PKS message struc- x will report that CA as being part of its federation (CAs ture is depicted in Figure 1. The PKS command codes are certified by company χ). By embedding the certificate of reported in Table 1. node x in the trust store, the application can verify that The PKSMessage is defined as follows (ASN.1 notation): the certificate presented by a third party has been issued PKSMessage ::= { by a CA certified by the company χ. The CA certificate protocol OBJECT IDENTIFIER, does not need to be embedded as a trust anchor in the ap- --- Identifier for the data protocol plication’s store. This approach would hold, for example, targetNode OCTET STRING, to verify extended validation certificates (the PK-FA node --- Target Node Identifier (hash) could be maintained by the CA/Browser forum authority). rawBytes OCTET STRING The introduced federation authority nodes allow for a dy- --- Binary data (e.g., CMS message) namic approach to trust anchor management, smaller trust } stores size, and the possibility for policy management bodies and virtual organization to be easily deployed and supported The PKSMessage is composed of three fields: protocol, tar- into applications. getNode, and rawBytes. The protocol field carries the ob- Moreover, applications can leverage the presence of feder- ject identifier for the data format used. For example, if ation support built into PKS and provide more usable inter- the body of the message carries an OCSP response, the faces to the user. In fact, users could be provided with the protocol will carry the id-ad-ocsp object identifier. The possibility to choose which (set of) PK-FA is to be trusted targetNode bears the node identifier of the target node. for a particular session. For example, when shopping online This helps the receiving node to correctly process the re- a user could enable α and β credit card federation authori- quest in case the node is assigned with multiple nodeIDs. ties only, thus providing the application with a trust context Last but not least, the rawBytes field encapsulates the con- based on the familiar concept of federation/organization. tents of the original PKI message in DER format. The cho- On top of knowing that a merchant’s website is respond- sen approach simplifies the routing of PKI protocol messages ing to a verified URL, the user can discover if her credit in PKS without requiring any change in the published stan- card company has an established trust relationship with the dards. Moreover, the rawBytes can encapsulate and form merchant. It is worth noticing that queries to a PK-FA of data, thus providing support for future PKIX (and non- can be related to CAs or to End Entities (e.g., a website’s PKIX) protocols. certificate or even a user’s certificate). If the appropriate authoritative PK-FA is deployed, the system can provide 4. FEDERATION AUTHORITIES answers to queries like “Is this user’s certificate part of the help desk (federation) of organization’s γ ?”, “Is this CA Along with PKS responders, we introduce special kind of part of TAGPMA ?”, or “Is this CA part of the US Higher nodes, namely PKS Federation Authorities (PK-FA). These Educational Authority?” special nodes serve as responders for determining if a par- ticular entity is part of a specific set, also called Federation. PK-FA nodes use identifiers similar to the PKS responders’ 4.1 Federation Authority Queries ones. However, different from the latter, the Federation Au- thority identifiers are calculated by using the fingerprint of An important feature of the PKS is the possibility to easily the PK-FA certificate instead of its issuer’s one. For in- federate identities under well-defined federation authorities. stance, when a node k joins the PKS network as a responder PK-FA nodes provide authoritative answers to the question for CA1 , its assigned nodeID is: “Is this entity part of your federation ?”. idk = hash(xCA1 ) In particular, when an application wants to know if a cer- tification authority is part of a federation, it routes a PKS where xCA1 is the certificate of CA1 . Instead, when a node j message with the CMD_LOOKUP_FEDERATION code. The pay- joins the PKS network as a federation authority, the assigned load of the message is a PKIAuthRequest. node identifier is: The data structure of the the federation lookup command idj = hash(xj ) is as follows: where xj is the certificate of the responder (j) itself. This PKSAuthRequest ::= { approach relieves federation authorities from deploying ad- targetAuthority OCTET STRING, hoc certification authorities (as in the case of bridge CAs), --- Target Authority Identifier (hash) 113 PKS Network (b') PKS Responder CA Message Engine Core (a) (b) (c) OCSP PKI Client SCVP Engine CMS PKS Responder Existing PKI Services Figure 2: Integration of current PKI services. The PKS responder can act as a PKS/PKI communication gateway. targetEntity CertIdentifier These authorities use the same message protocol as PK- --- Certificate Identifier (hash) FA nodes. The most noticeable difference is the usage of the } hash of the Public Key associated with the Class Federation Authority instead of the hash of its certificate. This choice is To populate the fields of the PKIAuthRequest structure, based on the consideration that these types of nodes can be the application derives the target nodeID from the federa- deployed (but further work is needed in this area) to provide tion authority’s certificate (targetAuthority field). Then, a distributed support system for secure DNS. it calculates the entity’s certificate identifier (targetEntity In order to discover if a federation authority is part of a field). The message is then routed to the target PK-FA node specific class of federations, the application sends a PKSMes- through the PKS network. sage with a CMD_LOOKUP_FEDERATION command code. By When a CMD_LOOKUP_FEDERATION message is received by using the hash of the public key associated with the author- a PK-FA node, it responds with a CMD_SUCCESS in case the itative node for the requested class (e.g., Internet or Trusted target entity is part of its federation, otherwise a CMD_ERROR Computing classes), the application is capable of recognizing followed by the appropriate error code is returned. which class the PK-FA is part of. We notice that the possibility to support federated identi- This type of system could be used to deploy trusted keys ties in PKS provide a technical mean to provide contextual for primary DNS domains (e.g., “.”, “.net”, “.edu”). We en- trust in applications, thus demanding for well defined poli- visage that well identified authorities like ICANN will run cies. the class federation authorities. 4.2 Classes of Federation Authorities In order to provide a more flexible authority management, we use different classes of PK-FA such as Local, Internet, 5. PKS DEPLOYMENT Network, Organizational, and Application. Different classes PKS provides an overlay network that supports the de- have different characteristics. For example, authorities can ployment of trust infrastructures at the Internet scale. freely join the PKS network in the Organizational class, al- An important feature of PKS is the possibility to integrate lowing for private organizations to easily deploy their own existing PKI services. Figure 2 depicts the design of a PKS federation authorities. Other PKS classes might require responder that allows for integration of existing infrastruc- tighten control over who can join the PKS. For example, the tures in PKS. In particular, to deploy the services offered set of participants in the Internet class—which will comprise by a CA, a PKS responder can act as a bridge between the authorities that secure the Internet infrastructure (e.g., S- PKS and the deployed PKI services. The control flow is as BGB [19])—can be constrained depending on well identified follows: properties. To support classes of federations, we introduce a special type of PK-FA nodes, namely Class Federation (a) The Responder Engine subsystem is responsible for Authorities. This special set of PK-FA nodes are used to providing PKS network services. Besides the overlay support hierarchical authorities deployment. network operations, the Responder Engine is in charge 114 Table 1: PKS opt codes values and description. Command Name Code Description CMD ERROR 0x200 + 0 General Error CMD SUCCESS 0x200 + 1 Cmd Successful CMD GET NODE INFO 0x500 + 0 Get node information CMD GET NODE SUCCESSOR 0x500 + 1 Get node successor CMD GET NODE PREDECESSOR 0x500 + 2 Get node predecessor CMD UPDATE PREDECESSOR 0x600 + 1 Update predecessor info CMD UPDATE SUCCESSOR 0x600 + 2 Update successor info CMD LOOKUP NODE 0x600 + 2 Perform a lookup CMD LOOKUP FEDERATION 0x600 + 2 Federation participation CMD PKI MESSAGE 0x800 + 0 PKI Data Packet of processing PKS messages. In particular, when a We envisage the deployment of Internet PKS to happen message is received via the PKS network, the Respon- in three phases. In particular, we think that initial partic- der Engine unwraps the PKI message embedded in the ipation in PKS will be driven by policy bodies and their rawBytes field. On the other hand, when a response communities (Phase I). For example, computing grids com- is ready to be sent over the PKS Network, the PKS munities have already expressed interest in our work. These responder builds up the PKSMessage by including the communities can freely deploy their own Federation Author- generated PKI response (e.g., the OCSP or SCVP re- ities. After an experimental deployment, we plan to work sponse) in the message and routes it to the requesting closely with Internet communities (Phase II) to identify and PKS node. In case the service is not available from the deploy the root Class Federation Authorities. As the suc- CA, an error message is returned instead. cess of the PKS will depend on the availability of software to support it, we plan to working closely with certification au- (b) If no integration with the CA core component is pos- thorities, software vendors, and certificate service providers sible, the Responder Engine passes the contents of to stimulate the adoption of PKS on a large scale (Phase the rawBytes on to the PKI Client Engine which is III). in charge of the communication between the PKS re- sponder and the provided PKI services. The response received from the PKI client engine is then returned 6. CONCLUSIONS AND FUTURE WORK to the requesting PKS node. The need for an homogeneous PKI System capable of (b’) If the CA provides some sort of integration with the addressing current problems in trust infrastructure deploy- core service (e.g., via a plug-in infrastructure), the ments is evident. This work outlines major problems related PKS responder can leverage this tight integration with to current approaches and lists the limitations that come the CA software in order to efficiently build the PKS from the lack of a support system for public key infrastruc- response. In this case, the CA core service must pro- tures. vide APIs capable of parsing the PKI request, access- Our future work will be focused in three different areas. ing the data needed to build the response, and build- First, we will build a PKS protocol simulator to evaluate the ing the PKI response. The development costs of pro- performance of the PKS routing protocol. This will help us viding such an interface can be justified by the the to measure routing overhead in the PKS network and serve faster response times and easier application debugging as a validation tool for the correctness of the developed al- as the interaction with the CA core component is not gorithm. The simulation tracks will provide valuable infor- achieved via a client/server approach (as in (c)). mation about the developed model and an overview of the scalability properties of PKS infrastructure. Secondly, after (c) In case PKI services are available only though stan- setting up the test-bed environment, we will build and de- dard protocols (e.g., HTTP), the PKI client sends the ploy a PKS prototype in collaboration with our peers and PKI request (extracted from the PKS message) via domain experts (e.g., members of organizations like IGTF, normal network communication. If a valid response is TAGPMA, TACAR) to keep our work tied to real-world re- returned, it is sent back to the Responder Engine. An quirements and constraints. Finally, we will promote PKS error message is returned in case the requested service within IRTF and IETF working groups by writing a PKS is non responsive, unknown, or not available. Internet Draft (I-Ds) and encouraging PKIX and PKNG participants to provide feedback on our proposed system. Where the integration with the CA’s infrastructure is not Ultimately, our proposal provides initial steps toward an possible, and the path: Internet-scale trust system that will enable new opportu- nities for research in federated identities deployment, trust b→c infrastructure deployment, and usability of digital identities. is used to generate the PKS response, a communication over- head is introduced that can negatively impact the response 7. REFERENCES time of the PKS responder. [1] Pastry. 115 [2] K. Aberer, P. Cudrı̈£¡-Mauroux, A. Datta, Theory and Practice, 5057:223–238, June 2008. Z. Despotovic, M. Hauswirth, M. Punceva, and [23] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and R. Schmidt. P-Grid: A Self-organizing Structured P2P S. Schenker. A scalable content-addressable network. System. SIGMOD Record, 32(3), September 2003. In SIGCOMM ’01: Proceedings of the 2001 conference http://lsirpeople.epfl.ch/rschmidt/papers/ on Applications, technologies, architectures, and Aberer03P-GridSelfOrganizing.pdf. protocols for computer communications, volume 31, [3] A. K. Bhushan. File transfer protocol, 1971. pages 161–172. ACM Press, October 2001. [4] D. Boneh and M. Franklin. Identity Based Encryption [24] I. Stoica, R. Morris, D. Karger, F. F. Kaashoek, and from the Weil Pairing. SIAM Journal of Computing, H. Balakrishnan. Chord: A scalable peer-to-peer 32(3):586–615, 2003. lookup service for internet applications. SIGCOMM [5] W. E. Burr, D. F. Dodson, and W. T. Polk. Electronic Comput. Commun. Rev., 31(4):149–160, October 2001. authentication guideline. OnLine. [25] TAGPMA. The Americas Grid Policy Management [6] J. Callas, L. Donnerhacke, H. Finney, and D. Shaw. Authority. OnLine. OpenPGP Message Format. Internet Engineering Task [26] M. Wahl, T. Howes, and S. Kille. Lightweight Force: RFC-4880, November 2007. directory access protocol (v3), 1997. [7] D. Clark, J. Elien, C. Ellison, M. Fredette, A. Morcos, [27] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. and R. Rivest. Certificate Chain Discovery in Tapestry: An infrastructure for fault-tolerant SPKI/SDSI. Journal of Computer Security, wide-area location and routing. Technical Report 9(4):285–322, 2001. UCB/CSD-01-1141, UC Berkeley, # apr # 2001. [8] D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, W. Polk. Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. RFC 5280, May 2008. [9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext transfer protocol – http/1.1, 1999. [10] E. Fredkin. Trie memory. Commun. ACM, 3(9):490–499, 1960. [11] R. Housley, W. Polk, W. Ford, and D. Solo. Certificate and Certificate Revocation List (CRL) Profile. Internet Engineering Task Force: RFC 3280, 2002. [12] ICAM. Identity, credential, and access management. OnLine. [13] IGTF. The International Grid Trust Federation. OnLine. [14] InCommon. InCommon Federation Homepage. OnLine. [15] S. Kent. Privacy Enhancement for Internet Electronic Mail: Part II: Certificate-Based Key Management. Internet Engineering Task Force: RFC-1422, February 1993. [16] R. Khare and S. Lawrence. Upgrading to tls within http/1.1, 2000. [17] Massimiliano Pala and Sean W. Smith. PEACHES and Peers. In 5th European PKI Workshop: Theory and Practice, volume 5057, pages 223—238. Lecture Notes in Computer Science, Springer Verlag, June EuroPKI 2008. [18] P. Maymounkov and D. Mazières. Kademlia: A peer-to-peer information system based on the xor metric. In IPTPS ’01: Revised Papers from the First International Workshop on Peer-to-Peer Systems, pages 53–65, London, UK, 2002. Springer-Verlag. [19] D. Meyer and K. Patel. Bgp-4 protocol analysis. Internet Engineering Task Force: RFC 4274, 2006. [20] OpenID. Open identity homepage. OnLine. [21] M. Pala. The PKI Resource Query Protocol (PRQP). Internet Engineering Task Force: Internet-Draft, November 2009. [22] M. Pala and S. W. Smith. PEACHES and Peers. Proceedings of the 5th European PKI Workshop: 116 A proposal for Collaborative Internet­Scale Trust  Infrastructures Deployment: The Public Key System 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Outline ● Motivations ● Model Description ● Message Definition ● The PKS Node ● Federated Identities ● Considerations ● Future Work 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Objective ● Ease deployment of Trust Infrastructures based on Public Key technology in the Internet – X.509 PKIs – DNSSEC 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Motivations­1 ● Heterogeneous deployment environment – How easy is it to interact with your PKI ? ● The Need for Federated Identities – FBPKI, HEBCA, 4BF, TACAR, IGTF, etc. ● Many different Protocols (X509) – SCVP, CMP, OCSP, TAMP... ● Other Public Key Infrastructures (DNSSEC) – Future Infrastructures (?) 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Motivations­2 ● Each day we rely on Public Key technologies for online authentication – Web Authentication – Physical Authentication No support for  Trust Infrastructures Deployment in the Internet 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Current Needs Demand Solutions... ● DNSSEC to distribute certificates – Trust does not follow DNS hierarchies – Organizational Problems (DNS vs CA) ● Computing Grids TA distribution – Ad-Hoc TA distribution – No interoperability 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Message to take away... We need a standardized, scalable and interoperable system for PK support for the Internet 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala So far... It's been a Bumpy Ride! 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Problem­1 ● No Globally Authoritative Infrastructure ● No easy Interaction with different Infrastructures – PKI Resource Query Protocol (PRQP) A Public Key System is needed to allow PK­enabled  applications  to  discover  and  easily  use  resources  offered by different Authorities 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Problem­2 ● Interaction with different parts of a PKI is difficult – Many Different PKI Protocols – Many Different Transport Protocols ● HTTP, HTTPS, FTP, etc. ● Applications and Certificates – renewal, revokation A  Public  Key  System  that  mandates  for  a  simple  transport  protocol  capable  of  routing  all  current  and  future PKI messages 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Problem­3 ● Lack of contextual trust – Classes of trust (eCommerce, eBanking, eMail) – Easy Trust Anchor Management ● Mobile devices – Local trust in home environment A  Public  Key  System  that  provides  the  ability  to  group TA according to specific environments to help  users manage (or delegate) trust settings. 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Problem­4 ● Lack of support for federated identities ● Need to know if a CA/PK is part of a federation – Computing Grids, DNSSEC, etc. A  Public  Key  System  that  eases  the  deployment  of  federated  identities  by  facilitating  a  method  for  disseminating  information  about  which  organization  or federation use/include/trust a specific TA. 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Challenge To provide a flexible support system for Trust Infrastructures deployment 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala .... SO ... (very dramatic pause... ) 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Public Key System ● A system to support current needs for Trust Infrastructures (TI) deployment – Addresses the aforementioned problems – Increases Interoperability among TI ● Supports Public Key systems – Algorithm(s) agile – Backward compatible with deployed TI ● Internet Oriented – Scalability 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Public Key System (PKS) ● Peer-to-peer system based on DHT [Chord] ● Simple Operations – lookup() – join() ● Identifiers based on hash(PK) [PEACH] – m = bits hash function ● Each node keeps a lookup table – m entries 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Public Key System (PKS) i m (N + 2 ) mod 2 Node n n γi n γ i+1 i i+1 m (N + 2    ) mod 2 i+2 m (N + 2    ­1) mod 2 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala DHT Basics ● ID space hash(x) ● Lookup table n => idn < xik ● Lookup in O(log(m)) 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The Public Key System (PKS) ● n-th node lookup table 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala The PKS Message Format 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Simplified Message System Size (bytes) 0 4 8 CMD_CODE PKT_SIZE PAYLOAD PKSMessage or CMD data 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala PKS Network The PKS Nodes (b') PKS Responder CA Message Engine Core (a) (b) (c) OCSP PKI Client SCVP Engine CMS PKS  Responder idj=hash(CAx) Existing PKI Services 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Federated Identities ● PKS Federation Authorities (PK-FA) ● PK-FA provides responses to client about a CA being part of a federation – Is this CA part of the Federal Government ? – Is this user's certificate part of TACAR ? – Is this certificate for an Internet DNS server ? 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Extending the PKS Network Let's add a new class of Nodes to the PKS network PKS Nodes 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Extending the PKS Network:  Federation Authorities WebTrust WebTrust FB­PKI FB­PKI IGTF IGTF PK­FA Nodes (Federation Authorities) 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala PKS Network The PK­FA service (b') PKS Responder CA Message Engine Core (a) (b) (c) OCSP PKI Client SCVP Engine PKS  Fed Responder Auth idj=hash(CAx) New Service: The Federation  Authority 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Classes of Federations ● Hierarchical Federation Infrastructure ● Class Federation Authorities – Identifiers based on PK (not certs) – Local, Internet, Network, Organization, and Application ● Deployment of Trusted Keys for primary DNS domains – “.”, “.edu”, “.net”, “.org”, “.com”, etc. – Keys for “.” can be used/revoked/replaced 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Conclusions ● We rely on PK technology – Digital IDs – Passports – DNSSec ● We need a Public Key System capable of supporting the use of PK on the Internet ● We proposed a PKS and a possible deployment design based on a collaborative approach 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Future Work ● Deploy the system in a test bed ● Study attacks to the PK network – Malicious nodes, etc. ● Define an API for providing access to the PKS for: – Easy integration with existing OSes and Apps ● Publishing an I-D at IETF for consideration within the PK-NG WG (IRTF) 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala Contacts, Questions, etc. ● Email: – Massimiliano Pala ● Website: – http://www.openca.org/projects/ng/ 9th IDTrust, NIST, Gaithersburg, MD Massimiliano Pala LOA Panel 15 April 2008 Pre-computer model Permission EE RP RP: Relying Party or Resource Provider Computer-age model Permission Name EE RP PKI model Name EE RP ?? CPS ?? LoA Partial Threat Model → Full LOA * Permission Name * * ** EE RP * = Point of vulnerability Expression of multiple permissions LoA for Attributes David Chadwick University of Kent 15 April 2010 IDTrust 2010 1 Acknowledgements • This research has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 216287 (TAS³ - Trusted Architecture for Securely Shared Services). • The information in this presentation is provided "as is", and no guarantee or warranty is given that the information is fit for any particular purpose. The above referenced consortium members shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials subject to any liability which is mandatory due to applicable law. 15 April 2010 IDTrust 2010 2 Current NIST 800-63 LoA Model • Guidelines “to remotely authenticate a user’s identity to a Federal IT system”. • Two components – Identity Proofing and Registration of applicant – Authentication mechanism used • Combined into one LoA value in range 1 (lowest) to 4 (highest) • Designed for a single system that both registers the user and authenticates the user and provides the identity of the user to the Federal IT system (as an identifier and optional attributes) 15 April 2010 IDTrust 2010 3 Deficiencies in NIST Model (1) • What if a user has multiple authentication mechanisms provided by an IdP e.g. un/pw and a hardware PKI token? – Different LoAs should be provided per login session • Leads to concept of Session LoA, which is dynamically computed from Registration LoA (fixed) and Authentication Mechanism LoA (variable) 15 April 2010 IDTrust 2010 4 Deficiencies in NIST Model (2) • What if the system is distributed and the user’s identity attributes are provided by multiple authorities? Authorisation is what is actually required, not just authentication • So, you are David Chadwick? But what are you entitled to do? • In federated identity management, a user’s identity is now recognised as being a set of possibly distributed identity attributes, rather than an identifier and optional local attributes (which is assumed by NIST) • E.g. “the user is a student of university X”. This may be sufficient to authorise access to a resource (typical Shibboleth scenario). The resource does not need to know that the user is David Chadwick so the identifier is not needed. 15 April 2010 IDTrust 2010 5 The Way Forward Today (for a single IdP) • In RBAC/ABAC systems access is granted based on the attributes of the user (one of which may, but need not, be a unique identifier) • We can supplement the set of user attributes with the existing NIST LoA value assigned to the current session in order to provide finer grained access controls – E.g. Students with Session LOA 1 can read the module syllabus. Students with Session LOA 2 can upload their assignments • We have had this implemented for several years in our open source software (PERMIS) 15 April 2010 IDTrust 2010 6 A Way Forward Today (for multiple IdPs) • Users typically have accounts at multiple IdPs and need to provide attributes from several IdPs in order to gain access. The user configures a linking service to know (some of) these accounts • When the user logs in, a Session LoA is dynamically computed by the authenticating IdP Session LoA = Authentication LoA (if no attributes are asserted) Session LoA = Lowest of Authentication LoA and Registration LoA (if at least one registered attribute is asserted) • The linking service coordinates attribute assertion collection from the multiple IdPs • Each of these attribute assertions need their own LoA but currently we have to munge these to fit the single session LoA by excluding those assertions with a lower LoA 15 April 2010 IDTrust 2010 7 And for Tomorrow - A Model • A user registers with each IDP and is assigned a Registration LoA (according to the procedure that is used) which is attached to the user’s registered attributes. • The user is given one or more authentication tokens/mechanisms by the IdP each with its own Authentication LoA • When the user logs in to an IdP, a Session LoA is dynamically computed for the session according to the formula Session LoA = Authentication LoA (if no attributes are asserted to service provider) Session LoA = Lowest of Authentication LoA and Registration LoA (if at least one registered attribute is asserted to service provider) • All other linked IdPs create their own attribute assertions for this session and include their own LoA in the attribute assertion Assertion LoA = Lowest of Session LoA and Registration LoA • Service Provider has a fine grained ABAC policy in which each identity attribute in a rule has a required LoA. For the rule to be passed the assertion LoA must be GE to the required LoA 15 April 2010 IDTrust 2010 8 Example Use Case • Case: American Medical Schools (AAMC) • Scenario: The American Medicals Schools (AAMC) administer a test for admission into accredited US medical schools. Accounts are primarily given to users via e-mail verification to allow for the application process, but full identity proofing is then undertaken (fingerprinting and photo) when the students come to take the test. Campuses could benefit from capturing the value of the AAMC identity-proofing process. • LoA Details: The initial Registration LoA is low (1) due to email verification only, which means that the Session LoA will remain low no matter how good the authentication mechanism is. After the students have taken the test, the Registration LoA is now high (say 3) due to fingerprinting etc., so the Session LoA can rise to the lower of the Authentication LoA and Registration LoA. 15 April 2010 IDTrust 2010 9 Example Use Case • Case: Students Using External Identities • Scenario: User creates an OpenID for a username you do not know and Provider does no checks as to who user is in the real world. However it has a good authentication mechanism (LoA 2). Any RP accepting the OpenID has reasonable assurance it is the same user each time (but not who the user is.) User then turns up as a student at University X. The university can do all its normal checks on the person e.g. have the right school exam results, have paid fees, am entitled to be in the UK, etc. (Registration LoA >2) but it does not need to issue its own authentication credentials. Instead it checks the technical quality of OpenID Provider, and that its processes are sufficiently robust to qualify as LoA 2, and then it can assert the student’s identity attributes to service providers with a Session LoA of 2, even though the OpenID Provider doesn’t know them. • LoA Details: Although the OpenID Registration LoA is the lowest, since no attributes are asserted the Session LoA is 2 due to its good authentication procedures. Once the user registers at the University and is verified she can continue to use the OpenID and the university asserts its own attributes with a Session LoA of 2 since its Registration LoA >2. 15 April 2010 IDTrust 2010 10 Example Use Case • Case: E Commerce Site • Scenario: Online shopping at Amazon you provide a self assertion of your name and postal address (for delivery), a signed assertion from Visa that you have a credit card, and a signed assertion from IEEE that you are a member and thus eligible for a discount. Visa has provided you with a smart card and PIN for authentication • LoA Details: Your session LoA is relatively high (say 3) due to the smartcard authentication mechanism, but your name and address card is self asserted so this has the lowest LoA (1). Your credit card attribute is sent by the issuer with a high LoA (3) due to the rigorous registration checks the bank undertook before issuing the card, whereas the IEEE membership attribute has an LoA of 2 due to the limited amount of registration checking they did. 15 April 2010 IDTrust 2010 11 Conclusions • Federated Identity Management systems recognise that users will need to provide attributes from multiple IdPs within a single session, but only need to authenticate once • The session LoA should be dynamically computed based on the authentication mechanism used, the IdP used, and what it asserts • Each IdP should be able provide its own Assertion LoA along with the attributes it asserts • This allows the SP to have a fine grained authorisation policy which places a LoA requirement on each identity attribute 15 April 2010 IDTrust 2010 12 LOA of Attributes: A Community-Based Approach Dr Ken Klingenstein Director, Internet2 Middleware and Security Topics • The larger picture – the Tao of Attributes • The theory of LOA of attributes • Parameters, mathematics, contracts, audits • The practice of LOA of attributes • Common community practices • Common software and systems • Common relying parties • Early lessons learned kjk@internet2.edu Enterprise IdM middleware plumbing 4 kjk@internet2.edu The Attribute Ecosystem • Authentication is very important, but identity is just one of many attributes • And attributes provide scalable access control, privacy, customization, linked identities, federated roles and more • We now have our first transport mechanisms to move attributes around – SAML and federations • There will be many sources of attributes, many consumers of attributes, query languages and other transport mechanisms • Together, this attribute ecosystem is the “access control” layer of infrastructure kjk@internet2.edu Attribute use cases are rapidly emerging Disaster “first responders” attributes and qualifications dynamically Access-ability use cases Public input processes – anonymous but qualified respondents Grid relying parties aggregating VO and campus attributes The “IEEE” problem The “over legal age” and the difference in legal ages use cases Self-asserted attributes – friend, interests, preferences, etc kjk@internet2.edu The Tao of Attributes workshop 属性之道 • Purpose of workshop was to start to explore the federal use case requirements for attributes, aggregation, sources of authority, delegation, query languages, etc. • Participants were the best and brightest – the folks who invented LDAP, SAML, OpenId, etc. • Webcast at http://videocast.nih.gov/PastEvents.asp • Twittered at TAOA • http://middleware.internet2.edu/tao-of-attributes/ kjk@internet2.edu Categories of attributes • Self-asserted • Enterprise and organizationally asserted • Values assigned by business processes • Third party asserted • Citizenship by SEVIS • “Verified by Verisign” • “Gleaned by Google” kjk@internet2.edu Attribute aggregation at the RP • From where - Gathering attributes from multiple sources • From IdP or several IdP • From other sources of authority • From intermediaries such as portals • When - static and dynamic acquisition • Some attributes are volatile (group memberships); others are static (Date of Birth) • Some should be acquired per assertion; some once in a boarding process • Will require a variety of standardized mechanisms – • Bulk feeds, user activated links, triggers kjk@internet2.edu Principles of the Tao • Least privilege/minimal release • Using data “closest” to source of authority • Late and dynamic bindings where possible • Dynamic identity data increases in value the shorter the exposure. • How much meaning is encoded in the attribute versus context, metadata? • How much flat attribute proliferation can be managed through a structured data space? 9 kjk@internet2.edu • “In theory, there is no difference between theory and practice. But, in practice, there is.” • Jan L. A. van de Snepscheut/Yogi Berra kjk@internet2.edu The Theory of LOA of attributes • Parameters • LOA of authn, integrity of the source systems, integrity of the attribute transports, etc. • Mathematics - unknown • Contracts – • Explicitly defined business processes for assigning values to attributes • Managing risk • Audits • Establishing compliance with the contract kjk@internet2.edu Before we practice… • The limits of 800-63 • Attributes without identity are “creepy” • The many possible issuers of “over 21” • Role of identity proofing in LOA of attributes and step-up identity kjk@internet2.edu The Practice of Attributes in R&E • There exists a set of widely shared attributes that work with consistent LOA for the applications that use them. • eduPersonaffiliation (the relationship of the subject to the institution) • epTID (the binding of a persistent, opaque identifier to an individual) • Who relies on them today? • MS to distribute software • Elsevier to distribute content • Student travel to provide discount travel passes • Many, many others kjk@internet2.edu LOA, attributes and collaboration • VO’s are the heart of science, research and collaboration • Roles and attributes scoped by collaboration; the “systems of record” are the PI’s kjk@internet2.edu Lessons learned • Commonality drives rough consensus and working attributes • E.g. student-ness, .edu-ness • Provide a few common base attributes • E.g. epTID, member of the IdP • Extensible attributes entitlements – establishes syntax and hint of semantics • Control the vocabulary • Principle of parsimony –more value -> more complexity • Create new schema rather than enlarge vocabulary kjk@internet2.edu Levels of Assurance for Attributes NIST IDtrust 4/15/2010 Chris Louden Agenda Perspectives: • Levels of Assurance (LOA) • Sources of Authority Disclaimer: • Views presented are not necessarily the views of my employer or my clients © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Levels of Assurance Two sides to usable Levels Of Assurance: • Assurance needed by the RP • Assurance provided by the attribute authority © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Levels of Assurance Assurance Needed by the RP • Some uses require more assurance than others: – Convenience for the user… “Welcome John” – Basis of Access Control • Is a Privilege an Attribute? – Attributes are often more important than identity… • All police officers can carry a gun • All “John Smith” can carry a gun • How sure does the RP need to be in this situation? – Generally risk based – Specifically the risk of a false positive • This person is not really a police officer © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Levels of Assurance Is M-04-04 Adoptable for Attributes? Assurance Level Impact Profiles Potential Impact Categories for Authentication Errors 1 2 3 4 Inconvenience, distress or damage to standing or reputation Low Mod Mod High Financial loss or agency liability Low Mod Mod High Harm to agency programs or public interests N/A Low Mod High Unauthorized release of sensitive information N/A Low Mod High Personal Safety N/A N/A Low Mod High Civil or criminal violations N/A Low Mod High © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Levels of Assurance Assurance provided by the Authority • Practices used to establish the attribute values • Practices used to maintain the values • Proper controls to protect the attribute database – Basic security controls for data integrity – Access by subject? • Trustworthiness of the bindings – Attribute bound to a common name? – Attribute bound to a session context? • Type of Authority – Different assurance for different types… – Different key practices for different types… © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Sources of Authority Types of Authority • “Natural Authority” – Employer for employment – SSA for SSN – Department of Motor Vehicles for drivers license number • “Proper Diligence” – Service provider checked appropriate sources, gathered appropriate evidence, etc • “Trusted Administrator” – Administrator sets the role & “they ought to know” – Often used for delegation – “Superuser” grants access to Administrators, they set up others © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Sources of Authority Issues • What role does the subject have? – What if the SSA says you’re dead? – Does the subject always reconcile with the source? – Can the subject reconcile with “Proper Diligence” authority? • What do authorities bind attributes to? – Common name? – Authenticated Session? – Credential Identifier? • Can Authorities delegate? – Do delegates necessarily inherit authority? © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Sources of Authority Needs & Tools • How do you anchor attribute trust? – Common trust anchor for attributes and identity? – Different anchors for different namespaces? • Do standards allow different authorities for attributes & identity? – Can the products do that? • Verify the identity claim & that the IDP is trusted • Find the attribute authority & request an attribute claim • Verify the attribute claim & that the authority is trusted for this claim • Verify these claims are bound to this session? • Verify this attribute is bound to this identity? © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Levels of Assurance Needs & Tools • Common Identifiers – Is jsmith the policy officer or is smithj? • Do we need an 800-63 equivalent? – Maybe just best practices for Natural Authorities? • Do we need an AuthN Context equivalent? © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. Agenda Perspectives:  Levels of Assurance  Sources of Authority Chris.Louden@pgs.protiviti.com © 2009 Protiviti Inc. An Equal Opportunity Employer. Confidential: This document is for your company’s internal use only and may not be copied nor distributed to any other third party. LOA of Attributes: An Examination Peter Alterman, Ph.D. Senior Advisor for Strategic Initiatives National Institutes of Health Fundamentals • Attributes are consumed by relying party applications for AuthZ and/or provisioning; • Attributes may be assigned by many issuers, including relying party apps, and these issuers are authoritative for them; • It doesn’t look like there will be consensus on the form of the attributes any time soon; 2 Basic Principles • The issuer of attributes is authoritative for the validity of those attributes; • Any useful user credential is likely to include attributes from more than one issuer; • Attributes may be stored or aggregated anywhere; • Relying party applications are likely to be both consumers of attributes and issuer of attributes. 3 Existing Models • X.500ish: local repositories hold attributes (assumes all attributes are issued locally) and some are exposed; • Shibboleth: user proxy service holds attributes (punts the question of issuer/reliability) • Silo-Land: each relying party application assigns attributes – usually roles and AuthZ – and stores them locally (since the app is issuing and storing them, they are authoritative for them) 4 Key Shortcomings of Existing Models • Transaction protocols are technology-specific – requires intermediate functionality; • Attribute exchange is pairwise today – will not scale – includes discovery and validation – see above; • No trust infrastructure for attributes that is comparable to that for identity. 5 The Million Dollar Question • In a federated world, how can a relying party application know it can trust an attribute issued by another entity? 6 Proposed Solutions • Keep the siloed approach, where each application issues and manages attributes locally; • Local Back-End Attribute Exchanges (BAE) store attributes and pointers to issuing entity data stores; • Wait for Government to issue attribute policies comparable to identity policies; • Select an industry entity (Internet Society, OASIS, ISO, etc.) to host the design, development and construction of a global attribute management infrastructure, such as an uber-BAE. 7 Why LOA of Attributes Is More Trouble Than It’s Worth • Any separation of attribute validation from issuer introduces trust and security threats which rapidly degrade the utility of attributes; • Proxied attribute validation requiring LOA also requires a common body of policy, an authoritative source for policy and a high assurance assessment infrastructure; • Informal agreements don’t scale – reintroduces the pairwise model and there is no way to mediate among multiple pairwise models. 8 Attribute LOA Should Be Binary (but no solution is without its issues) • Let the issuer validate attributes. Then the answer is either Y or N (yes, it’s like the X.509 model) • Requires attributes to include a pointer to the issuer and would require the issuer to maintain a repository 9 Caboose • Because of our experience and the general culture of our business, we are inclined to find elegant, complex solutions to issues. That should be avoided like the plague in this case. • Contact info: peter.alterman@nih.gov 10 PKI Resources Query Protocol Deployment Massimiliano Pala OpenCA Project Manager Rump Session, IDTrust 2010 PRQP Deployment PKI Resources Discovery  Pointers to Resources  Extensions in Certificate  Ad-Hoc Configurations in Apps  Advertise them on the CA's web pages  The PKI Resource Query Protocol  Working Item at PKIX WG  Experimental Track Rump Session, IDTrust 2010 PRQP Deployment PKI Resource Discovery Protocol CMS Gateway for CA1 is at http://.../../ Resource Resource Query Query Authority Authority Where is the CMS Gateway associated with CA1 PRQP defines the message format Client ClientApplication Application between a client (or (or OS???) OS ???) and a server Rump Session, IDTrust 2010 PRQP Deployment PRQP & Document Status  Simple client-server protocol  Defines two type of messages  PRQP Request  PRQP Response  Updated beginning of 2010 (v04)  Small Fixes  Addition of new OIDs for Grid Services Rump Session, IDTrust 2010 PRQP Deployment Updated OIDs Rump Session, IDTrust 2010 PRQP Deployment Deployment in TACAR  TACAR Project  TERENA Academic CA Repository  Identification/authorisation procedures  Most of the EuGridPMA root CAs  National Research and Education Net- works  PRQP Management included in the new CA Management Panel  Server hosted at Dartmouth College  Certificate Issued by TERENA's CA  Responder for all TACAR's CA Rump Session, IDTrust 2010 PRQP Deployment Deployment in FBPKI  Initial Deployment in ICAM test lab  Open Source Software  Evaluation for deploying the protocol within the FBPKI architecture  Just Started!  Expect some news in the next few months Rump Session, IDTrust 2010 PRQP Deployment Available Software  Open Source implementation (PRQPD) available  OpenCA Labs  OpenCA PKI support for PRQP build in v1.1.0+  UNIX operating system(s)  Based on LibPKI library  Ease-to-use PKI Library  New release available (v0.5.0)  Client implemented (?) in PKIF Rump Session, IDTrust 2010 PRQP Deployment Conclusions  Move PRQP from Experimental to Standard Track  Move to standard-track I-D  Extend support for major clients  Firefox  Operating Systems  Continue the development of the PRQP Server  OpenCA Labs Rump Session, IDTrust 2010 PRQP Deployment Questions & Contacts  Dartmouth College pala@cs.dartmouth.edu  OpenCA madwolf@openca.org  Website http://www.openca.org/projects/prqpd http://www.openca.org/wiki/ Rump Session, IDTrust 2010 PRQP Deployment