From AI advancements to security best practices, the July 24, 2024 NET+ AWS Tech Share covered a wide spectrum of crucial topics. For those of you who couldn't make it to our session, we've got a nice recap to share with you. Let's dive into the key points!
We kicked things off with announcements of upcoming events:
- AWS LZA Community of Practice's next meeting on Tuesday, August 6, 2024
- Internet2 TechEx (Boston) - December 9-13, 2024
- Registration now open
- AWS GameDay, Monday, December 9, 2024
- Internet2 CommEx (Anaheim) - April 28-May 1, 2025 (save the date)
- Call for proposals now open
- Cloud Forum 2025 - May 20-22, 2025 (save the date)
During the open discussion, participants covered several AI-related topics. Old Dominion University is planning a GenAI project similar to Vanderbilt's and is using AI tool licenses already in place. The University of Rhode Island is also pursuing a GenAI initiative but is encountering UI development challenges. The University of Virginia has completed its M365 Copilot rollout and noted a trend toward offering multiple AI models, while Loyola Marymount University opened up a discussion on the pros and cons of open-sourcing AI models, especially LLMs. Additionally, AWS announced a new AI-focused foundational certification.
Pivoting from AI conversations, the University of Wisconsin-Madison shared their approach to managing Security Hub alerts using AWS SNS, Terraform, and Lambda functions. They shared a neat tip on using Lambda to deduplicate Security Hub findings for reporting. After hearing this, other institutions were eager to have future discussions about Security Hub reporting best practices.
That's it for this month's AWS NET+ Tech Share update. Don't miss out on these insightful discussions and the opportunity to connect with peers and industry experts. Feel free to provide feedback and/or share any ideas you want the community to discuss in the future at tmanik@internet2.edu. If you would like to learn more about the NET+ AWS program or just want to join Internet2’s Cloud Infrastructure Community Program, email netplus@internet2.edu.
Our partners at NIH brought to our attention a policy gap associated with AWS’s AI services that we believe all institutions ought to be aware of. We will dive into this issue in more detail in NIH’s write-up below. In summary, the issue lies in AWS’s default data collection policy, which is set to opt-in. We met with AWS, who confirmed NIH’s concerns and said they are working on it. In the meantime, we want the community to be aware of the issue and have the information needed to mitigate concerns.
Although the write up below was initially intended for AWS administrators at NIH overseeing accounts using the STRIDES Initiative, the steps they took and the solution they implemented are applicable to any research and education (R&E) institution using AWS. NIH operates as a large enterprise leveraging AWS, managing organization-wide policies like many other enterprises in the R&E community. If your institution has an AWS Organization and you have permission to manage organizational policies, you can adopt the same steps NIH took to mitigate issues with AWS’s default data collection policy.
When the cloud team at NIH learned that some AWS AI services require users to opt-out of the collection of their data from training AI models, they drafted the documentation and instructions below. With NIH’s permission we are sharing them with you to help you address this concern more quickly.
If you have any questions or feedback about this topic, feel free to contact me at tmanik@internet2.edu. Enjoy the write up below from NIH.
Background Information:
The AWS Artificial Intelligence opt-out policy implementation occurred because AWS’s default service terms for machine learning and AI are not in line with HHS and NIH needs and expectations around data protection and use. Specifically, default AWS account settings had allowed certain AWS AI services to use and store customer data in the training of AI models.
Which AWS AI Services are included in the opt-out policy?
Per AWS service terms as of February 14, 2024, AWS may “use and store AI Content that is processed by each of the foregoing AI Services to develop and improve the applicable AI Service and its underlying technologies...”:
- Amazon CodeGuru Profiler
- Amazon CodeWhisperer Individual
- Amazon Comprehend
- Amazon Lex
- Amazon Polly
- Amazon Rekognition
- Amazon Textract
- Amazon Transcribe
- Amazon Translate
The attached guide sets the AWS service control policies to the “opt-out” setting, preventing potential data usage and/or storage, including outside of the region(s) where customers intended for it to be stored.
Opt-out Guide
Objective:
This memo provides instructions for STRIDES AWS account administrators who are managing CBA or other accounts outside of the STRIDES Enterprise Organization. It describes how to enable a policy to opt out of allowing AWS artificial intelligence (AI) services to collect, store, and use customer content for AWS’s AI model training. Such policies may be enabled at the Organization, Organization Unit (OU), or Account level, and will flow down to Accounts within a container.
Enabling these opt-out policies addresses concerns regarding management of customer data in certain AWS AI services. Per AWS Service Terms on ML and AI, AWS may extract certain customer data from the use of specific services to develop and improve the applicable AI Service and its underlying technologies. These services are: Amazon CodeGuru Profiler, Amazon CodeWhisperer Individual, Amazon Comprehend, Amazon Lex, Amazon Polly, Amazon Rekognition, Amazon Textract, Amazon Transcribe, and Amazon Translate. The Service Terms hold that AWS has implicit consent for this usage. This usage may include taking customer data outside of the region where customers intended for it to be stored.
Enabling an opt-out policy in an AWS organization, OU, or Account withdraws any implied consent. It explicitly indicates that AWS does not have permission for data collection from customer use of these or future services connected to artificial intelligence/machine learning (AI/ML).
STRIDES has enabled this opt-out policy in its development and production Enterprise Organizations, following NIH engineering and change control policies including testing of these services in test accounts, and verifying that there has been no impact on the pertinent services.
Note that Microsoft and Google have explicitly assured STRIDES that customer data is not extracted for AI/ML service training.
Scope:
STRIDES recommends that these policies be deployed organization-wide within a STRIDES organization from the master (payer) account. The policy will flow down to other accounts.
As highlighted in the AWS-provided documentation, the policy to be attached to the Root Organization Unit (OU) specifies “default” as the service name instead of the individual services that AWS collects data on. This would encompass all currently available AI services and implicitly and automatically include any AI services that might be added in the future.
When you specify an opt in or opt out preference for a service, that setting is global and applied to all AWS Regions. Setting the value from within one AWS Region replicates to all other Regions.
When the opt-out policies are implemented, customer data previously extracted for training services is deleted from the training sets. Per AWS, “When you opt out of content use by an AWS AI service, that service deletes all of the associated historical content that was shared with AWS before you set the option.” Furthermore, “data deleted in response to an opt-out action would be of a copy, and not of any customer-held data.”
Please reach out to STRIDES for advice if account administrators want to opt-in to allowing AWS extraction of customer data from specific services or within certain accounts. We do not provide guidance on those cases but can point to relevant AWS documentation.
Preparation:
Review required permissions and procedures to view, create, attach and delete these opt-out policies in AI services opt-out policies.
Deployment steps:
- Sign in to the AWS Organizations console. You must sign in as an IAM user, assume an IAM role, or sign in as the root user (not recommended) in the organization’s management account.
- Manually enable the AI services opt-out policies from the policies page under AWS Organizations on the AWS console of the top level account.
- On the AI services opt-out policies page, choose Create policy.
- On the Create new AI services opt-out policy page, enter a Policy name.
- Create the policy document per the example below.
- Choose Create policy at the lower-right corner of the page.
- On the AI services opt-out policies page, choose the name of the policy that you want to attach.
- On the Targets tab, choose Attach.
- Choose the radio button next to the root, OU, or account that you want to attach the policy to. You might have to expand OUs to find the OU or account that you want.
- Choose Attach policy.
- The list of attached AI services opt-out policies on the Targets tab is updated to include the new addition. The policy change takes effect immediately.
Policy document:
{ "services": { "@@operators_allowed_for_child_policies": [ "@@none" ], "default": { "@@operators_allowed_for_child_policies": [ "@@none" ], "opt_out_policy": { "@@operators_allowed_for_child_policies": [ "@@none" ], "@@assign": "optOut" } } } }
Reference documentation:
Although the research and education community is largely multi-cloud, it doesn’t mean that all platforms are resourced equally. Some organizations fully support Google Cloud, some that focus on other cloud platforms tolerate its use, and others actively block it because they don’t have the staff and/or skills to set it up.
Internet2 is working with Google to create a GCP Admin Basic Training course. We’ve just started working with the course developer and trainer from ROI Training that Google has asked to do this. Yesterday he joined the monthly NET+ GCP Tech Share to hear from a group of experienced GCP admins from the R&E community. I’m happy to say, you guys did not hold back!
The discussion covered everything from the basics like the separation of GWE and GCP management in many institutions, to down-in-the-weeds topics like best practices around API usage and org policies. We discussed how to best deal with the notifications admins get and useful esoterica like the Essential Contacts list. Naturally, we dealt with the biggies, like understanding billing IDs and folder structure, the “joys” of abandoned projects, dealing with research and teaching credits, and, of course, the difference between Google Drive storage and Google Cloud Storage.
Let’s face it, there is a lot of stuff that makes GCP administration unique. Sadly, I’ve never been able to find lessons or learning paths to cover the administration of Google Cloud, including in the vast Google Skills Boost library. They have a Professional Google Workspace Administrator certification, but not Google Cloud. We are going to take a shot at correcting that.
It was clear from the discussion that there is plenty of material here, not just for a basic course, but an advanced one as well. The plan is to run this as a pre-conference tutorial session at the 2024 Technology Exchange in Boston in December. If your institution is just starting out with Google Cloud or you just want to have a deeper bench so you can take a holiday and be left in peace on the beach, you should send staff to this training.
If you have additional topics you think should be covered in a Google Cloud Admin training, basic or advanced, please post them to the NET+ GCP tech list netplus-gcp-campus-technical@internet2.edu or send them to me bflynn@internet2.edu.
With active audience participation from several institutions and organizations for the entirety of the event, a single blog post may not do justice in summarizing our recent AWS Town Hall. However, I think we can list important key takeaways for you and your team to keep in mind. So, what exactly attracted so many people to attend a Zoom meeting? Two words: GenAI. Actually, that may be three words.
To break it down, there were two main topics of discussion:
- Brief Internet2 NET+ program updates
- Presentation by Vanderbilt’s Dr. Jules White and his team
General NET+ updates
- Refresher on the NET+ AWS Program.
- Sneak peak of an upcoming Internet2 program: Cloud Infrastructure Community Program (CICP).
- CICP gives access to best practices, strategic, technical, and informational gatherings on AWS and Google Cloud topics.
NET+ GenAI Updates
- Tiffany Frank introduces herself as the new technical program manager for AI programs at Internet2.
- Collaborative opportunities within AI: GenAI working group, upcoming AI service brokerage opportunities via NET+, and training on building with LLMs via CLASS.
- Update on the LLM Gateway Service Evaluation.
Dr. Jules White from Vanderbilt University Presents: "Vanderbilt’s Future of Learning and the Open Source Amplify GenAI Enterprise Platform"
- Dr. White and his team built a web app that allows students and faculty to interact with different LLM models to perform various tasks.
- These tasks range from getting help from the LLM to build quizzes for a certain topic, to analyzing expense policies, to creating PowerPoints.
- A cool note to add is that their platform allows end users to build their own agents and/or prompt templates for certain repetitive tasks.
- Key takeaway: LLMs are changing the way we solve problems and we should use it to our advantage given the available resources within the research and education community.
There were many questions from the audience. Below are some questions and answers I thought were relevant to point out here:
- How big is your team?
- A team of three full-time employees plus Dr. White, who is juggling his responsibilities as a professor.
- Is the Amplify AI app open-sourced? If so, can the research and education community contribute to it?
- Yes and yes. Here is the GitHub repository: https://github.com/gaiin-platform.
- Are you fine-tuning your LLMs?
- No, but we use RAG. We plan on launching adaptive RAG soon.
- Are there limits on the app?
- No limits. The average user is not costly for us. The average user will get the value they need very quickly. The cost comes from long conversations.
- How do you recommend other institutions build/introduce a similar tool to their institution?
- Start with an interdisciplinary group who are doing cool things with GenAI and introduce them to your tool. These people are your champions. Continue to bring on board more users, which then creates a ripple effect.
In short, what started as a presentation turned into a collaborative session among individuals from various institutions, discussing their ideas for their own GenAI projects with the presenters and other participants.
If you could not make it, we hope this summary is informative. We have made the slides and recording of the meeting accessible to the community. We hope to see you at our next AWS Town Hall meeting and at the next AWS Tech Share on 7/24.
Feel free to give feedback on this post below or send me an email at tmanik@internet2.edu. Let me know what you’d like to be included in future blog posts or if you have any follow-up questions. I’m still experimenting with the length and format of the blog post, so your feedback on this aspect is greatly appreciated. Till next time.
The AWS Landing Zone Accelerator (LZA) Community of Practice (CoP) held its kick-off meeting last Tuesday (7/2), and it was a great success.
You might be wondering what the AWS LZA Community of Practice is. The AWS LZA CoP is a place for anyone in the research and education community to to ask questions, share ideas, and give feedback on the AWS LZA solution.
You may also wonder, "What is the AWS LZA?" or "Why should I care about it?" These questions were answered during the kick-off meeting. In short, the AWS LZA is an open-source solution developed by AWS to automate the setup of foundational security and operational best practices in a multi-account AWS environment. Below I’ve shared some links to resources for more details.
Interesting, right? We think so! The kick-off meeting had 18 attendees from 8 institutions along with AWS, Internet2, and TD SYNNEX Public Sector. Participants ranged from those who have implemented and customized the LZA to those who were there to learn and see if it might be suitable for their institution.
The discussion was excellent. Here are some key highlights:
- Tufts University shared their experience implementing a new, greenfield LZA.
- Washington State University and the University of Oklahoma discussed their goals, such as aligning with security standards and centralizing AWS accounts.
- The University of Denver shared their proof of concept and sought operational insights for a production deployment.
- AWS mentioned that institutions with an NDA will have an opportunity to preview future roadmap items for AWS LZA through this CoP. Contact your account team for assistance if you're unsure about your NDA status.
What's next for the AWS LZA CoP? The long-term goal is to help the research and education community understand, use, and improve AWS LZA through collaboration with other institutions, AWS, and Internet2. It is also to give input to the AWS LZA team on how the project can better meet institutions' needs. In the next meeting, we’ll address unanswered questions like "What are the different flavors of LZA?" and "How can we use LZA with other tools in addition to CloudFormation?" We hope to see you there.
Resources from the call:
- What is AWS LZA? https://aws.amazon.com/solutions/implementations/landing-zone-accelerator-on-aws/
- AWS LZA for Education: https://aws.amazon.com/blogs/publicsector/announcing-landing-zone-accelerator-for-education/
- AWS recently released an API for AWS LZA: https://github.com/aws-samples/lza-account-creation-workflow
- Someone on the call had asked if there was detailed documentation for LZA. The AWS team provided a resource that provides a comprehensive walkthrough of the components of the LZA in a workshop format: https://catalog.workshops.aws/landing-zone-accelerator/en-US
- In the call we highlighted the networking section: https://catalog.workshops.aws/landing-zone-accelerator/en-US/workshop-advanced/network-configuration