Data Mesh

Book description

We're at an inflection point in data, where our data management solutions no longer match the complexity of organizations, the proliferation of data sources, and the scope of our aspirations to get value from data with AI and analytics. In this practical book, author Zhamak Dehghani introduces data mesh, a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale.

Dehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance.

  • Get a complete introduction to data mesh principles and its constituents
  • Design a data mesh architecture
  • Guide a data mesh strategy and execution
  • Navigate organizational design to a decentralized data ownership model
  • Move beyond traditional data warehouses and lakes to a distributed data mesh

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Why I Wrote This Book and Why Now
    2. Who Should Read This Book
    3. How to Read This Book
    4. Conventions Used in This Book
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  3. Prologue: Imagine Data Mesh
    1. Data Mesh in Action
      1. A Culture of Data Curiosity and Experimentation
      2. An Embedded Partnership with Data and ML
      3. The Invisible Platform and Policies
      4. Limitless Scale with Autonomous Data Products
      5. The Positive Network Effect
    2. Why Transform to Data Mesh?
    3. The Way Forward
  4. I. What Is Data Mesh?
  5. 1. Data Mesh in a Nutshell
    1. The Outcomes
    2. The Shifts
    3. The Principles
      1. Principle of Domain Ownership
      2. Principle of Data as a Product
      3. Principle of the Self-Serve Data Platform
      4. Principle of Federated Computational Governance
    4. Interplay of the Principles
    5. Data Mesh Model at a Glance
    6. The Data
      1. Operational Data
      2. Analytical Data
    7. The Origin
  6. 2. Principle of Domain Ownership
    1. A Brief Background on Domain-Driven Design
    2. Applying DDD’s Strategic Design to Data
    3. Domain Data Archetypes
      1. Source-Aligned Domain Data
      2. Aggregate Domain Data
      3. Consumer-Aligned Domain Data
    4. Transition to Domain Ownership
      1. Push Data Ownership Upstream
      2. Define Multiple Connected Models
      3. Embrace the Most Relevant Domain Data: Don’t Expect a Single Source of Truth
      4. Hide the Data Pipelines as Domains’ Internal Implementation
    5. Recap
  7. 3. Principle of Data as a Product
    1. Applying Product Thinking to Data
      1. Baseline Usability Attributes of a Data Product
    2. Transition to Data as a Product
      1. Include Data Product Ownership in Domains
      2. Reframe the Nomenclature to Create Change
      3. Think of Data as a Product, Not a Mere Asset
      4. Establish a Trust-But-Verify Data Culture
      5. Join Data and Compute as One Logical Unit
    3. Recap
  8. 4. Principle of the Self-Serve Data Platform
    1. Data Mesh Platform: Compare and Contrast
      1. Serving Autonomous Domain-Oriented Teams
      2. Managing Autonomous and Interoperable Data Products
      3. A Continuous Platform of Operational and Analytical Capabilities
      4. Designed for a Generalist Majority
      5. Favoring Decentralized Technologies
      6. Domain Agnostic
    2. Data Mesh Platform Thinking
      1. Enable Autonomous Teams to Get Value from Data
      2. Exchange Value with Autonomous and Interoperable Data Products
      3. Accelerate Exchange of Value by Lowering the Cognitive Load
      4. Scale Out Data Sharing
      5. Support a Culture of Embedded Innovation
    3. Transition to a Self-Serve Data Mesh Platform
      1. Design the APIs and Protocols First
      2. Prepare for Generalist Adoption
      3. Do an Inventory and Simplify
      4. Create Higher-Level APIs to Manage Data Products
      5. Build Experiences, Not Mechanisms
      6. Begin with the Simplest Foundation, Then Harvest to Evolve
    4. Recap
  9. 5. Principle of Federated Computational Governance
    1. Apply Systems Thinking to Data Mesh Governance
      1. Maintain Dynamic Equilibrium Between Domain Autonomy and Global Interoperability
      2. Embrace Dynamic Topology as a Default State
      3. Utilize Automation and the Distributed Architecture
    2. Apply Federation to the Governance Model
      1. Federated Team
      2. Guiding Values
      3. Policies
      4. Incentives
    3. Apply Computation to the Governance Model
      1. Standards as Code
      2. Policies as Code
      3. Automated Tests
      4. Automated Monitoring
    4. Transition to Federated Computational Governance
      1. Delegate Accountability to Domains
      2. Embed Policy Execution in Each Data Product
      3. Automate Enablement and Monitoring over Interventions
      4. Model the Gaps
      5. Measure the Network Effect
      6. Embrace Change over Constancy
    5. Recap
  10. II. Why Data Mesh?
  11. 6. The Inflection Point
    1. Great Expectations of Data
    2. The Great Divide of Data
    3. Scale: Encounter of a New Kind
    4. Beyond Order
    5. Approaching the Plateau of Return
    6. Recap
  12. 7. After the Inflection Point
    1. Respond Gracefully to Change in a Complex Business
      1. Align Business, Tech, and Now Analytical Data
      2. Close the Gap Between Analytical and Operational Data
      3. Localize Data Changes to Business Domains
      4. Reduce Accidental Complexity of Pipelines and Copying Data
    2. Sustain Agility in the Face of Growth
      1. Remove Centralized and Monolithic Bottlenecks
      2. Reduce Coordination of Data Pipelines
      3. Reduce Coordination of Data Governance
      4. Enable Autonomy
    3. Increase the Ratio of Value from Data to Investment
      1. Abstract Technical Complexity with a Data Platform
      2. Embed Product Thinking Everywhere
      3. Go Beyond the Boundaries
    4. Recap
  13. 8. Before the Inflection Point
    1. Evolution of Analytical Data Architectures
      1. First Generation: Data Warehouse Architecture
      2. Second Generation: Data Lake Architecture
      3. Third Generation: Multimodal Cloud Architecture
    2. Characteristics of Analytical Data Architecture
      1. Monolithic
      2. Centralized Data Ownership
      3. Technology Oriented
    3. Recap
  14. III. How to Design the Data Mesh Architecture
  15. 9. The Logical Architecture
    1. Domain-Oriented Analytical Data Sharing Interfaces
      1. Operational Interface Design
      2. Analytical Data Interface Design
      3. Interdomain Analytical Data Dependencies
    2. Data Product as an Architecture Quantum
      1. A Data Product’s Structural Components
      2. Data Product Data Sharing Interactions
      3. Data Discovery and Observability APIs
    3. The Multiplane Data Platform
      1. A Platform Plane
      2. Data Infrastructure (Utility) Plane
      3. Data Product Experience Plane
      4. Mesh Experience Plane
      5. Example
    4. Embedded Computational Policies
      1. Data Product Sidecar
      2. Data Product Computational Container
      3. Control Port
    5. Recap
  16. 10. The Multiplane Data Platform Architecture
    1. Design a Platform Driven by User Journeys
    2. Data Product Developer Journey
      1. Incept, Explore, Bootstrap, and Source
      2. Build, Test, Deploy, and Run
      3. Maintain, Evolve, and Retire
    3. Data Product Consumer Journey
      1. Incept, Explore, Bootstrap, Source
      2. Build, Test, Deploy, Run
      3. Maintain, Evolve, and Retire
    4. Recap
  17. IV. How to Design the Data Product Architecture
  18. 11. Design a Data Product by Affordances
    1. Data Product Affordances
    2. Data Product Architecture Characteristics
    3. Design Influenced by the Simplicity of Complex Adaptive Systems
      1. Emergent Behavior from Simple Local Rules
      2. No Central Orchestrator
    4. Recap
  19. 12. Design Consuming, Transforming, and Serving Data
    1. Serve Data
      1. The Needs of Data Users
      2. Serve Data Design Properties
      3. Serve Data Design
    2. Consume Data
      1. Archetypes of Data Sources
      2. Locality of Data Consumption
      3. Data Consumption Design
    3. Transform Data
      1. Programmatic Versus Nonprogrammatic Transformation
      2. Dataflow-Based Transformation
      3. ML as Transformation
      4. Time-Variant Transformation
      5. Transformation Design
    4. Recap
  20. 13. Design Discovering, Understanding, and Composing Data
    1. Discover, Understand, Trust, and Explore
      1. Begin Discovery with Self-Registration
      2. Discover the Global URI
      3. Understand Semantic and Syntax Models
      4. Establish Trust with Data Guarantees
      5. Explore the Shape of Data
      6. Learn with Documentation
      7. Discover, Explore, and Understand Design
    2. Compose Data
      1. Consume Data Design Properties
      2. Traditional Approaches to Data Composability
      3. Compose Data Design
    3. Recap
  21. 14. Design Managing, Governing, and Observing Data
    1. Manage the Life Cycle
      1. Manage Life-Cycle Design
      2. Data Product Manifest Components
    2. Govern Data
      1. Govern Data Design
      2. Standardize Policies
      3. Data and Policy Integration
      4. Linking Policies
    3. Observe, Debug, and Audit
      1. Observability Design
    4. Recap
  22. V. How to Get Started
  23. 15. Strategy and Execution
    1. Should You Adopt Data Mesh Today?
    2. Data Mesh as an Element of Data Strategy
    3. Data Mesh Execution Framework
      1. Business-Driven Execution
      2. End-to-End and Iterative Execution
      3. Evolutionary Execution
    4. Recap
  24. 16. Organization and Culture
    1. Change
    2. Culture
      1. Values
    3. Reward
      1. Intrinsic Motivations
      2. Extrinsic Motivations
    4. Structure
      1. Organization Structure Assumptions
      2. Discover Data Product Boundaries
    5. People
      1. Roles
      2. Skillset Development
    6. Process
      1. Key Process Changes
    7. Recap
  25. Index
  26. About the Author

Product information

  • Title: Data Mesh
  • Author(s): Zhamak Dehghani
  • Release date: March 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492092391