Our partners at NIH brought to our attention a policy gap associated with AWS’s AI services that we believe all institutions ought to be aware of. We will dive into this issue in more detail in NIH’s write-up below. In summary, the issue lies in AWS’s default data collection policy, which is set to opt-in. We met with AWS, who confirmed NIH’s concerns and said they are working on it. In the meantime, we want the community to be aware of the issue and have the information needed to mitigate concerns.

Although the write up below was initially intended for AWS administrators at NIH overseeing accounts using the STRIDES Initiative, the steps they took and the solution they implemented are applicable to any research and education (R&E) institution using AWS. NIH operates as a large enterprise leveraging AWS, managing organization-wide policies like many other enterprises in the R&E community. If your institution has an AWS Organization and you have permission to manage organizational policies, you can adopt the same steps NIH took to mitigate issues with AWS’s default data collection policy.

When the cloud team at NIH learned that some AWS AI services require users to opt-out of the collection of their data from training AI models, they drafted the documentation and instructions below. With NIH’s permission we are sharing them with you to help you address this concern more quickly.

If you have any questions or feedback about this topic, feel free to contact me at tmanik@internet2.edu. Enjoy the write up below from NIH.


Background Information: 

The AWS Artificial Intelligence opt-out policy implementation occurred because AWS’s default service terms for machine learning and AI are not in line with HHS and NIH needs and expectations around data protection and use. Specifically, default AWS account settings had allowed certain AWS AI services to use and store customer data in the training of AI models.

Which AWS AI Services are included in the opt-out policy?

Per AWS service terms as of February 14, 2024, AWS may “use and store AI Content that is processed by each of the foregoing AI Services to develop and improve the applicable AI Service and its underlying technologies...”:

The attached guide sets the AWS service control policies to the “opt-out” setting, preventing potential data usage and/or storage, including outside of the region(s) where customers intended for it to be stored.

Opt-out Guide

Objective:

This memo provides instructions for STRIDES AWS account administrators who are managing CBA or other accounts outside of the STRIDES Enterprise Organization. It describes how to enable a policy to opt out of allowing AWS artificial intelligence (AI) services to collect, store, and use customer content for AWS’s AI model training. Such policies may be enabled at the Organization, Organization Unit (OU), or Account level, and will flow down to Accounts within a container.

Enabling these opt-out policies addresses concerns regarding management of customer data in certain AWS AI services. Per AWS Service Terms on ML and AI, AWS may extract certain customer data from the use of specific services to develop and improve the applicable AI Service and its underlying technologies. These services are: Amazon CodeGuru Profiler, Amazon CodeWhisperer Individual, Amazon Comprehend, Amazon Lex, Amazon Polly, Amazon Rekognition, Amazon Textract, Amazon Transcribe, and Amazon Translate. The Service Terms hold that AWS has implicit consent for this usage. This usage may include taking customer data outside of the region where customers intended for it to be stored.

Enabling an opt-out policy in an AWS organization, OU, or Account withdraws any implied consent. It explicitly indicates that AWS does not have permission for data collection from customer use of these or future services connected to artificial intelligence/machine learning (AI/ML).

STRIDES has enabled this opt-out policy in its development and production Enterprise Organizations, following NIH engineering and change control policies including testing of these services in test accounts, and verifying that there has been no impact on the pertinent services.

Note that Microsoft and Google have explicitly assured STRIDES that customer data is not extracted for AI/ML service training.

Scope:

STRIDES recommends that these policies be deployed organization-wide within a STRIDES organization from the master (payer) account. The policy will flow down to other accounts.

As highlighted in the AWS-provided documentation, the policy to be attached to the Root Organization Unit (OU) specifies “default” as the service name instead of the individual services that AWS collects data on. This would encompass all currently available AI services and implicitly and automatically include any AI services that might be added in the future.

When you specify an opt in or opt out preference for a service, that setting is global and applied to all AWS Regions. Setting the value from within one AWS Region replicates to all other Regions.

When the opt-out policies are implemented, customer data previously extracted for training services is deleted from the training sets. Per AWS, “When you opt out of content use by an AWS AI service, that service deletes all of the associated historical content that was shared with AWS before you set the option.” Furthermore, “data deleted in response to an opt-out action would be of a copy, and not of any customer-held data.”

Please reach out to STRIDES for advice if account administrators want to opt-in to allowing AWS extraction of customer data from specific services or within certain accounts. We do not provide guidance on those cases but can point to relevant AWS documentation.

Preparation:

Review required permissions and procedures to view, create, attach and delete these opt-out policies in AI services opt-out policies.

Deployment steps:

  1. Sign in to the AWS Organizations console. You must sign in as an IAM user, assume an IAM role, or sign in as the root user (not recommended) in the organization’s management account.
  2. Manually enable the AI services opt-out policies from the policies page under AWS Organizations on the AWS console of the top level account.
  3. On the AI services opt-out policies page, choose Create policy.
  4. On the Create new AI services opt-out policy page, enter a Policy name.
  5. Create the policy document per the example below.
  6. Choose Create policy at the lower-right corner of the page.
  7. On the AI services opt-out policies page, choose the name of the policy that you want to attach.
  8. On the Targets tab, choose Attach.
  9. Choose the radio button next to the root, OU, or account that you want to attach the policy to. You might have to expand OUs to find the OU or account that you want.
  10. Choose Attach policy.
  11. The list of attached AI services opt-out policies on the Targets tab is updated to include the new addition. The policy change takes effect immediately.

Policy document:

{
    "services": {
      "@@operators_allowed_for_child_policies": [
        "@@none"
      ],
      "default": {
        "@@operators_allowed_for_child_policies": [
          "@@none"
        ],
        "opt_out_policy": {
          "@@operators_allowed_for_child_policies": [
            "@@none"
          ],
          "@@assign": "optOut"
        }
      }
    }
 
}


Reference documentation: