1. Logistics

  • 17 February 2023
  • Rupert Berk, University of Washington
  • Jill Peterson, University of Colorado Boulder

2. Data Mesh

2.1. Expectations for this Presentation

  • Rupert and Jill are not necessarily data mesh experts.
  • Both institutions are still learning, exploring, and experimenting, and hope to foster continued conversation and feedback by sharing.

2.2. Context

  • There  is often a great divide between operational and analytical data leading to fragile and challenging architectures (from Martin Fowler's "Data Mesh Principles and Logical Architecture" article - see link in the Resources section below).
  • It's important to note the use of "sociotechnical" in the following definition from Zhamak Dehghani in her O'Reilly Book titled "Data Mesh" - there is a strong "cultural" aspect to this approach:

Data mesh is a decentralized sociotechnical approach to share, access, and manage analytic data in complex and large-scale environments — within or across organizations.

  • Current pain points as captured from an Itana community poll taken during the call:

    ...and what level of awareness | execution is there for data mesh? — about half are either just unaware or aware:

2.3. Why talk about data mesh now at the UW and CU Boulder?

  • Why is the Data Mesh pattern of interest (at least at the University of Washington) at this time?:

    ...and why at CU Boulder at this time?:
  • In addition, CU Boulder noted the following motivating factors for data mesh work:
    • Loss of institutional knowledge, legacy processes, and technical debt related to data.
    • Recurring challenges or failures with efforts to build new, centralized data warehouses or “data lakes”.
    • A need for real-time, event-driven frameworks to solve particular business needs.
      • One example is where students expect or need some sort of change to take effect right away.

2.4. Data Mesh Principles & Concepts

    • Domain-Driven Data Ownership: one of the major ideas of the Data Mesh, the promotion of decentralization and distribution of data responsibility to the people who are closest to the data, a major "shift left" approach.
      • This sometimes includes a need to shift some IT responsibilities to teams that, traditionally, may have been more "business" oriented. This can be a challenge.
    • Data as a Product: the importance of applying product thinking to data and ensuring that Data Mesh products have happy customers, service-level agreements, and features such as files, views, and APIs.
    • Data Products are Built on Other Data Products: any data product can build upon other data products, "bundling" their offering to make a new product.
    • Self-Service Infrastructure as a Platform: there needs to be a common self-service data infrastructure platform that teams orchestrating the components of a data mesh can use (e.g., data product developer experience plane, data infrastructure plane, mesh supervision plane...).
      • Each business team working with data or owning data doesn’t have to stand up their own full stack of IT processes but rather have a consistent platform into which they can “plug in”.
      • This tends to be where vendors and SaaS platforms the data space are focusing their efforts.
      • Note, however, that even with vendor solutions, significant internal work may still be required to facilitate this.
    • Federated Computational Governance: this is the magic! — bringing the federation alive across the decentralized and distributed responsibility of governance through "shift lift", the stitching it all back together with metadata management, workflow tools, and more.
      • A culture of automation is often key here.
    • Note!  Data Mesh is not just a flip-of-the-switch, there are degrees of adoption, features, and maturity in any implementation, illustrated by the level of principle adoption shown below:
      • Building a data mesh will almost always be an iterative process.

2.5. Emerging Vendor Data Mesh Solutions

  • Nearly all cloud providers are offering a myriad of data-related tools and solutions to support a data mesh architecture or otherwise help manage data in complex environments, for example:
  • Data "lakes" tend to be a big part of these offerings, attempting to centralize data and make it available via specific tools and processes.
    • Trend is to move away from "linked servers" and consolidate the publishing and consuming of data into a consistent framework.
  • Gartner's Predicts 2023 for Data Management gives the impression that Data Mesh is "dead on the vine" and was perhaps left behind on the data management "hype cycle".
    • However, the supporting commentary basically reflects that there are no genuine implementations as yet, and that folk should wait for the arrival of suitable fully-formed data ecosystems from the major cloud vendors.
    • There may be some real shortcomings of a truly self-service data architecture.
    • Gartner's suggestions shouldn't necessarily deter teams from working towards implementations of data mesh principles.

2.6. CU Boulder Spring 2023 Data Mesh Experiment

  • CU Boulder is undertaking some experiments with Data Mesh, working to exercise the Data Mesh principles properly, using the tools available at hand (e.g., Spring Boot and various open-source tools, Streams Plug-In, RabbitMQ, Open Policy Agent), working with an appreciation of business domains.
    • Starting with two data "quantums" - essentially two things where one can use the output from the other.
    • The use case is automatically getting new students enrolled into the proper orientation programs in Canvas.
    • Looking down the track to ensure the data products are discoverable (less important today with just two data products!).
    • Seeking also to provide a wider range of formats to access these data, prioritized by what the customers of the products most want to be using.  Currently, it's mainly CSV or REST.
    • Opportunities to take this experiment to the next level by adding in things like SLAs, observability, and discoverability...
    • Looking to integrate federated data governance in the longer term, but currently hand-coding the known rules around data access and data privileges.
    • Ideally, business domains would eventually take over some of this work.
      • This connects to earlier comments about domain-driven data ownership - teams close to the data and the business should own and manage data products.
      • Realistically, though, central IT will still maintain some shared infrastructure to facilitate this.

2.7. Discussion & Questions

  • Stephanie Warner (UW Milwaukee)
    • Challenges with data where people or teams are reluctant to let others use it people don’t want others to use - “that’s my data” mentality.
    • Jill noted that her CU Boulder team is using data they already have to avoid this issue with the experiment.
      • However, they don’t have full control of the data and lineage, which presents challenges.
    • Rupert noted that there are often disconnects with data that is available and the kind of data that the business actually needs.
  • Ron Janik (Bradley University)
    • Exploring data mesh seriously and planning experiments.
    • Disagrees with Gartner's assessments about data mesh being obsolete (suspicious that Gartner is trying to promote vendor products).
    • Reality of IT work is a more incremental approach (as opposed to a "big bang" vendor rollout).
    • Working to address data/domain ownership:
      • Domain owners need more modern applications and tools.
      • Domain owners have been purchasing cloud/SaaS offerings on their own, which has been fracturing the environment when it comes to data.
        • Data often ends up being "curated" in various applications but not shared elsewhere.
    • Finding that you can’t always “code” every solution.
    • Bradley has a Data Governance committee - tends to be oriented more towards policy than action.
    • If you really want domain responsibility… two things are required:
      1. Data has to be made available to the institution.
      2. Domain owners need to be accountable and share their data
        • Policy and standards with "teeth" need to be implemented to ensure this happens
  • Jason Clevenger (?)
    • Question about "hybrid" architectures to help move forward in a transitional way (lay the foundation for a data mesh architecture and path for incremental improvements). Is this possible?
    • Rupert noted…
      • Simple cloud data tools like S3 buckets can be used to start assembling a loose “data lake”.
      • Governance and workflow tooling to support it is key.
      • Hard to say about paving a transitional hybrid path.
    • Ron Janik felt a hybrid approach was counterproductive at Bradley - too hard to support and too complex.
    • Jill noted that everything in the data realm is already segmented at CU Boulder, so a hybrid approach is almost required
      • New models of accomplishing things can hopefully live alongside legacy processes as they determine the best way forward.
  • Dimuthu Tilakaratne (?)
    • Are there challenges with data literacy and tooling?
    • Jill responded:
      • Ideally, there are different ways to consume data that democratize its use and access.
      • Data lineage, dictionaries, data metadata management is essential here.
  • Jeff Kennedy (The University of Auckland)
    • It sounds like not only can data mesh happen incrementally, it almost must (see Jill’s comment above).
    • But “when” do you know you have a data mesh - how many increments are needed?
      • Interesting to think about.
    • Pave a path to retire or rework legacy applications and processes.

3. Further Information

3.1. Slide Pack

3.2. Resources

3.3. Gartner

Note that these resources require having Gartner account access:

  • Ronthal, A., Feinberg, D., Beyer, M., Zaidi, E., Chien, M., & Thanaraj, R. (2022) Predicts 2023: Data Management Solutions Finally Leverage Foundational Concepts , Gartner Research, Article ID #G00778878, available at https://www.gartner.com/document/4021919The continued emergence of data ecosystems built on active metadata and data fabrics will enable efficiency, automation, augmentation, financial governance and sustainability. Data and analytics leaders should use these predictions to plan for and invest in an ecosystem-driven future.
  • Thanaraj, R. & Beyer, N. (2022) Data Fabric or Data Mesh: How to Decide Your Future Data Management Architecture , Gartner Research, Article ID #G00770696, available at https://www.gartner.com/document/4015368Many Gartner clients struggle when deciding between fabric and mesh approaches. We clarify these two concepts for data and analytics leaders with benefits, case studies and a decision path to choose their future data management architecture.

  • Beyer, M., Cook, H., Zaidi, E., De Simoni, G., Chien, M., Showell, N., & Thanaraj, R. (2022) Infographic: Strategic Comparison of Data Mesh and Data Fabric , Gartner Research, Article ID #G00769853, available at https://www.gartner.com/document/4015363Enterprise architects and data and analytics leaders are asking for substantive comparison and contrast between data fabric and data mesh. We provide a high-level parallel and differential summary of the primary differences and emphasize the greatest risk scenario for both data fabric and mesh.

3.4. ZOOM Chat

08:05:12 From  Rupert Berk - UWash (he/him)  to  Everyone:
    Please take the following 2 question poll on Data Mesh:: https://docs.google.com/forms/d/e/1FAIpQLSe2wc9NomvpGyVDL6uz8_3Z6sxxBJMGY52BJmr1Z3_GRDvNqA/viewform
08:29:15 From  Rupert Berk - UWash (he/him)  to  Everyone:
    https://docs.google.com/presentation/d/1Dp-OKciDplP7uIbuggIjxw7-JjBfKxhLbYOMZstZPDk/edit#slide=id.g208c4306e10_0_6
08:50:11 From  jeff kennedy  to  Everyone:
    ¿ What i think i heard was that the shift toward Data Mesh can be effected incrementally across the data estate (so, it's not the "flip of a switch").  Assuming the first increments are built in data domains where going The Data Mesh Way will make a positive difference: what does that look like in terms of business-outcome value?; is there are critical mass of increments needed for that value to be worthwhile?
08:50:42 From  jeff kennedy  to  Everyone:
    ! can't be an architect _without_ being a bit of an evangelist !
08:51:38 From  J.J. Du Chateau (Wisconsin)  to  Everyone:
    Reacted to "! can't be an archit..." with 👍
08:51:50 From  Kunta Hutabarat - CU Boulder  to  Everyone:
    Reacted to "! can't be an archit..." with 👍
08:52:00 From  Jim Phelps (UW)  to  Everyone:
    I think a lot of what I do as an architect is being an evangelist for the strategies I think are important
08:52:08 From  Glenn Donaldson (Ohio State)  to  Everyone:
    Reacted to "I think a lot of wha..." with 👍🏽
08:52:10 From  Ron Janik - Braadley Univ.  to  Everyone:
    Reacted to "I think a lot of wha..." with 👍
08:52:13 From  Ron Janik - Braadley Univ.  to  Everyone:
    Reacted to "! can't be an archit..." with 👍
08:52:13 From  J.J. Du Chateau (Wisconsin)  to  Everyone:
    Reacted to "I think a lot of wha..." with 👍
08:52:14 From  Glenn Donaldson (Ohio State)  to  Everyone:
    Great Presentation - Thank you both!
08:52:26 From  Ron Janik - Braadley Univ.  to  Everyone:
    Reacted to "Great Presentation -..." with 👍
08:52:38 From  jeff kennedy  to  Everyone:
    ¿ Appreciated too the call-out for the importance of "sociotechnical" in the Data Mesh definition, and wonder conversely whether or not the goals of "traditional" data governance in higher education can be ever realized: that is, could we solve the problems seeding the ground for Data Mesh by doing-data-governance-properly?
08:52:53 From  jeff kennedy  to  Everyone:
    Reacted to "I think a lot of wha..." with 👍
08:54:27 From  Ron Janik - Braadley Univ.  to  Everyone:
    Replying to "¿ Appreciated too th..."    
    It may be a "the chicken or the egg" situation, at least as I see it. :)
08:54:38 From  Ryan Thomas, Illinois  to  Everyone:
    Whether Data Mesh or something akin, a lot of guidance is one must have good, mature Metadata Management capabilities.  I.e., metadata management capabilities might be a good place to start.
08:55:47 From  jeff kennedy  to  Everyone:
    Reacted to "It may be a "the chi..." with 🐔
08:56:19 From  Dave Goldhammer (CU Boulder)  to  Everyone:
    If there is time, I’m curious if there are higher-ed institutions that would be considered “high” on the data mesh maturity model.
08:56:39 From  Beth Schaefer - University of Wisconsin-Milwaukee  to  Everyone:
    Reacted to "I think a lot of wha..." with 👍
08:56:47 From  Beth Schaefer - University of Wisconsin-Milwaukee  to  Everyone:
    Reacted to "! can't be an archit..." with 👍
08:57:05 From  Rupert Berk - UWash (he/him)  to  Everyone:
    Replying to "¿ Appreciated too th..."    
    I totally agree that standing up a data governance structure would support this.
08:57:44 From  Jim Phelps (UW)  to  Everyone:
    The “how many students do we have” problem
08:58:30 From  jeff kennedy  to  Everyone:
    Data Literacy = great question!  Equally, if there are lovely data-and-information products furnished by a Data Mesh xor by-not-a-data-mesh does meshiness make a difference there?  Data Literacy is a major impediment for us here, even with traditional | legacy information assets.
08:59:53 From  Ron Janik - Braadley Univ.  to  Everyone:
    Reacted to "Data Literacy = grea..." with 🤔
09:00:02 From  Ryan Thomas, Illinois  to  Everyone:
    Especially given what Gartner said in 2022, it’s probably good to avoid putting too many eggs in a term like “Data Mesh”, but rather focus on the principles and target benefits.  The ones articulated in Data Mesh are excellent… and more important than the phrase “Data Mesh”.
09:00:11 From  J.J. Du Chateau (Wisconsin)  to  Everyone:
    Reacted to "Especially given wha..." with 👍
09:00:16 From  Ryan Thomas, Illinois  to  Everyone:
    Reacted to "Especially given wha…" with 👍
09:00:18 From  Ryan Thomas, Illinois  to  Everyone:
    Removed a 👍 reaction from "Especially given wha…"
09:01:38 From  Stephanie Warner - UW Milwaukee  to  Everyone:
    Replying to "Data Literacy = grea..."    
    Great point.  This very morning I began working on a structure for releasing shared data models to our Authors.  This includes a data literacy component, marketing and validation - and this is just from a single source.
09:01:40 From  Ron Janik - Braadley Univ.  to  Everyone:
    great conversation and presentation everyone. Thank you!

3.5. Participants

  • No labels