Date: Thu, 28 Mar 2024 12:58:35 +0000 (UTC) Message-ID: <765375878.6347.1711630715602@ip-10-10-7-29.ec2.internal> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_6346_1955284497.1711630715601" ------=_Part_6346_1955284497.1711630715601 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Streaming first approach is part of the UCSD Enterprise Renewal Strate= gy.
Convinced execs to invest in the middleware layer.
Big rules.
Value proposition: Better, faster, cheaper.
Ent App ops.
Data Ops
DataEng.
ML Ops.
Designing it for the future, for ML.
In Data Eng layer, opportunities during pandemic to join data quic= kly and provide value.
Focus on automation and observability.
Moving away from batch.
Long term, needed data pipelines.
Seeing more and more use cases being thrown on the platform.
What we call our platform, borrowed from the industry term.=
Kafka.
Nifi.
WSO2 API manager.
GoAnywhere.
Airflow.
Always adjusting the talent pools, along with process and tech.
Started with Nifi.
With each iteration on the roadmap, we had to show value: First it= eration focused on speed; the next, quality;= now, cost. High costs, so need to get those down.=
Version 1 - 3, now working on 4.
From a handful of people. Now 150 workloads. 100 APIs. 75 people.<= /span>
Pure open source early on, but that also hindered us.
Developers wanted to run current release.
Brought in service contracts to support.
Not always chasing latest release in OSS.
Instead, using a vendor supported package: Cloud= era Nifi, Confluent Kafka.
First thing we needed was observability e.g. Nifi.
Today, we swivel chair because so many tools.
Need to improve our observability.
Want to move towards AI ops.
Learn from the environment to fine tune it.
Example: tried to shut down dev and QA for hana.
Brought in Airflow.
Was able to do this with Airflow, but it was difficult because use= rs were not transparent about when they use it.
Had to look at logs for usage patterns, to get to more just in tim= e computing.
Also, need to trim costs.
Trying to achieve just in time compute.
2 UCSD campuses using activity hubs.
Published RACIs, MOUs, SLAs.
In the past, had operational level agreements, but had to up the g= ame.
Ashish.
CoE.
Need to foster collaboration with all the developers.
Want to expose information to campus developers.
What are use cases that drive the streaming approach? What are rea= l-time event needs?
----------
Brian - any idea of the total budget for the integration / data la= yer(s)?
Brian DeMeulle (he/him/his) to Everyone (11:22 AM)
Construction costs ~$2.5M. Run costs currently $500k annual, but w= e=E2=80=99re working on approaches to optimize and shrink that down to $300= k.
Mona Zarei Guerra to Everyone (11:22 AM)
I like "design for future" mindset.
Nick Austin - Kansas State University to Everyone (11:22 AM)
I missed it. What is the ESB or ETL tool you are using to move dat= a?
Also, can you provide 1 or 2 examples of the projects that are nee= ding to utilize Spark?
Irene Walsh (UC Berkeley) to Everyone (11:22 AM)
for the big rule 'cloud first', was the expectation to use public = cloud or hybrid cloud to maintain some focus on onprem infrastructure?
Brian DeMeulle (he/him/his) to Everyone (11:23 AM)
Public cloud initially, but we have a hybrid cloud operating model= so we have flexibility to move workloads as needed.
ETL is performed through Kafka/NiFi
For remaining batch, some light ETL is available through MFT
Louis King to Everyone (11:25 AM)
Are you querying across the activity hubs of are queries limited t= o the activity hub?
Brian DeMeulle (he/him/his) to Everyone (11:26 AM)
We blend data across the activity hubs and materialize views for t= he blended data sets
Louis King to Everyone (11:27 AM)
How are you governing access to the materialized data and ensuring= that it is consistent to the business rules of the enterprise purpose buil= t systems?
Brian DeMeulle (he/him/his) to Everyone (11:27 AM)
Data access requests are vetted with the data owners/stewards, who= authorize access.
Dave Berry - University of Edinburgh (He/Him) to Everyone (11:28 A= M)
How long do you keep data for?
Brian DeMeulle (he/him/his) to Everyone (11:29 AM)
In the Activity Hubs, currently we are keeping everything, but we = are planning on tiering out older data for cost and performance
Garrett King (CMU) to Everyone (11:29 AM)
Any tie ins with Identity & Access Management (provisioning id= entity data, directory services, role based access,)
Brian DeMeulle (he/him/his) to Everyone (11:30 AM)
There is some connection with our legacy IAM. We=E2=80=99re in the= process of deploying a new IAM architecture which will provision data acce= ss based on roles.
J.J. Du Chateau (Wisconsin) to Everyone (11:31 AM)
All of the diagram arrows point towards use of data, presumably RE= AD use. How are the APIs or this infrastructure used to Create, Updat= e or Delete data?
Brian DeMeulle (he/him/his) to Everyone (11:33 AM)
We have existing APIs for create/update into our enterprise system= s. These diagrams are for data pipelines and downstream data consumption, w= hich is why they don=E2=80=99t have those others APIs shown. There is no cr= eate/update allowed into the Activity Hubs.
Nick Austin - Kansas State University to Everyone (11:37 AM)
For the limited access API cataloging, do you still utilize the da= ta steward process for access to these API=E2=80=99s and how do you maintai= n the security
Me to Everyone (11:38 AM)
Is the pattern catalog publicly available? URL?
Jim Phelps (UW) He/Him to Everyone (11:38 AM)
Is your Patterns & Cookbooks site open to the world to read?= span>
Jim Phelps (UW) He/Him to Everyone (11:39 AM)
Ha! Jinx
Brian DeMeulle (he/him/his) to Everyone (11:39 AM)
Yes. We still require approval even if programmatic access is requ= ested for downstream apps. We allow role-based access and manage the securi= ty controls through an annual =E2=80=9Crenewal=E2=80=9D process.
I don=E2=80=99t believe we have published the patterns yet. But we= can look into sharing those.
Jim Phelps (UW) He/Him to Everyone (11:41 AM)
Brian - Org and Talent would be interesting to hear about= p>
Brian DeMeulle (he/him/his) to Everyone (11:42 AM)
Sure. Happy to do that. Either during this or at another session.<= /span>
Jim Phelps (UW) He/Him to Everyone (11:46 AM)
Vendor Package =3D WS02?
Brian DeMeulle (he/him/his) to Everyone (11:46 AM)
API management
Also, Confluent Kafka
Cloudera NiFi
Russell Connacher (he/him) to Everyone (11:48 AM)
SAP HANA seems much more expensive than say Redis, is it that much= better for your use cases?
Brian DeMeulle (he/him/his) to Everyone (11:48 AM)
Currently, yes. However, we are going to be performing some experi= ments with alternate platforms to manage ongoing cost.
And see how the performance is on those alternate platforms=
Garrett King (CMU) to Everyone (11:56 AM)
Have to sign off, thanks everyone/Ashish/Brian/Scott, very useful<= /span>
Russell Connacher, UC Berkeley (he/him) to Everyone (11:56 AM)
Has there been any student push-back over privacy over the LM acti= vity streams?
Brian DeMeulle (he/him/his) to Everyone (11:59 AM)
No. This was all vetted through our privacy office. And our privac= y office is paranoid.
Russell Connacher, UC Berkeley (he/him) to Everyone (11:59 AM)
(yes, they tend to be)