Data Governance: The Definitive Guide

Book description

As you move data to the cloud, you need to consider a comprehensive approach to data governance, along with well-defined and agreed-upon policies to ensure your organization meets compliance requirements. Data governance incorporates the ways people, processes, and technology work together to ensure data is trustworthy and can be used effectively. This practical guide shows you how to effectively implement and scale data governance throughout your organization.

Chief information, data, and security officers and their teams will learn strategy and tooling to support democratizing data and unlocking its value while enforcing security, privacy, and other governance standards. Through good data governance, you can inspire customer trust, enable your organization to identify business efficiencies, generate more competitive offerings, and improve customer experience. This book shows you how.

You'll learn:

  • Data governance strategies addressing people, processes, and tools
  • Benefits and challenges of a cloud-based data governance approach
  • How data governance is conducted from ingest to preparation and use
  • How to handle the ongoing improvement of data quality
  • Challenges and techniques in governing streaming data
  • Data protection for authentication, security, backup, and monitoring
  • How to build a data culture in your organization

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why Your Business Needs Data Governance in the Cloud
    2. Framework and Best Practices for Data Governance in the Cloud
      1. Data Governance Framework
      2. Operationalizing Data Governance in Your Organization
      3. The Business Benefits of Robust Data Governance
    3. Who Is This Book For?
    4. Conventions Used in This Book
    5. O’Reilly Online Learning
    6. How to Contact Us
    7. Acknowledgments
  2. 1. What Is Data Governance?
    1. What Data Governance Involves
      1. Holistic Approach to Data Governance
      2. Enhancing Trust in Data
      3. Classification and Access Control
      4. Data Governance Versus Data Enablement and Data Security
    2. Why Data Governance Is Becoming More Important
      1. The Size of Data Is Growing
      2. The Number of People Working and/or Viewing the Data Has Grown Exponentially
      3. Methods of Data Collection Have Advanced
      4. More Kinds of Data (Including More Sensitive Data) Are Now Being Collected
      5. The Use Cases for Data Have Expanded
      6. New Regulations and Laws Around the Treatment of Data
      7. Ethical Concerns Around the Use of Data
    3. Examples of Data Governance in Action
      1. Managing Discoverability, Security, and Accountability
      2. Improving Data Quality
    4. The Business Value of Data Governance
      1. Fostering Innovation
      2. The Tension Between Data Governance and Democratizing Data Analysis
      3. Manage Risk (Theft, Misuse, Data Corruption)
      4. Regulatory Compliance
      5. Considerations for Organizations as They Think About Data Governance
    5. Why Data Governance Is Easier in the Public Cloud
      1. Location
      2. Reduced Surface Area
      3. Ephemeral Compute
      4. Serverless and Powerful
      5. Labeled Resources
      6. Security in a Hybrid World
    6. Summary
  3. 2. Ingredients of Data Governance: Tools
    1. The Enterprise Dictionary
      1. Data Classes
      2. Data Classes and Policies
      3. Data Classification and Organization
      4. Data Cataloging and Metadata Management
      5. Data Assessment and Profiling
      6. Data Quality
      7. Lineage Tracking
      8. Key Management and Encryption
      9. Data Retention and Data Deletion
      10. Workflow Management for Data Acquisition
      11. IAM—Identity and Access Management
      12. User Authorization and Access Management
    2. Summary
  4. 3. Ingredients of Data Governance: People and Processes
    1. The People: Roles, Responsibilities, and Hats
      1. User Hats Defined
      2. Data Enrichment and Its Importance
    2. The Process: Diverse Companies, Diverse Needs and Approaches to Data Governance
      1. Legacy
      2. Cloud Native/Digital Only
      3. Retail
      4. Highly Regulated
      5. Small Companies
      6. Large Companies
    3. People and Process Together: Considerations, Issues, and Some Successful Strategies
      1. Considerations and Issues
      2. Processes and Strategies with Varying Success
    4. Summary
  5. 4. Data Governance over a Data Life Cycle
    1. What Is a Data Life Cycle?
    2. Phases of a Data Life Cycle
      1. Data Creation
      2. Data Processing
      3. Data Storage
      4. Data Usage
      5. Data Archiving
      6. Data Destruction
    3. Data Life Cycle Management
      1. Data Management Plan
    4. Applying Governance over the Data Life Cycle
      1. Data Governance Framework
      2. Data Governance in Practice
      3. Example of How Data Moves Through a Data Platform
    5. Operationalizing Data Governance
      1. What Is a Data Governance Policy?
      2. Importance of a Data Governance Policy
      3. Developing a Data Governance Policy
      4. Data Governance Policy Structure
      5. Roles and Responsibilities
      6. Step-by-Step Guidance
      7. Considerations for Governance Across a Data Life Cycle
    6. Summary
  6. 5. Improving Data Quality
    1. What Is Data Quality?
    2. Why Is Data Quality Important?
      1. Data Quality in Big Data Analytics
      2. Data Quality in AI/ML Models
    3. Why Is Data Quality a Part of a Data Governance Program?
    4. Techniques for Data Quality
      1. Scorecard
      2. Prioritization
      3. Annotation
      4. Profiling
    5. Summary
  7. 6. Governance of Data in Flight
    1. Data Transformations
    2. Lineage
      1. Why Lineage Is Useful
      2. How to Collect Lineage
      3. Types of Lineage
      4. The Fourth Dimension
      5. How to Govern Data in Flight
    3. Policy Management, Simulation, Monitoring, Change Management
    4. Audit, Compliance
    5. Summary
  8. 7. Data Protection
    1. Planning Protection
      1. Lineage and Quality
      2. Level of Protection
      3. Classification
    2. Data Protection in the Cloud
      1. Multi-Tenancy
      2. Security Surface
      3. Virtual Machine Security
    3. Physical Security
      1. Network Security
      2. Security in Transit
    4. Data Exfiltration
      1. Virtual Private Cloud Service Controls (VPC-SC)
      2. Secure Code
      3. Zero-Trust Model
    5. Identity and Access Management
      1. Authentication
      2. Authorization
      3. Policies
      4. Data Loss Prevention
      5. Encryption
      6. Differential Privacy
      7. Access Transparency
    6. Keeping Data Protection Agile
      1. Security Health Analytics
      2. Data Lineage
      3. Event Threat Detection
    7. Data Protection Best Practices
      1. Separated Network Designs
      2. Physical Security
      3. Portable Device Encryption and Policy
      4. Data Deletion Process
    8. Summary
  9. 8. Monitoring
    1. What Is Monitoring?
    2. Why Perform Monitoring?
    3. What Should You Monitor?
      1. Data Quality Monitoring
      2. Data Lineage Monitoring
      3. Compliance Monitoring
      4. Program Performance Monitoring
      5. Security Monitoring
    4. What Is a Monitoring System?
      1. Analysis in Real Time
      2. System Alerts
      3. Notifications
      4. Reporting/Analytics
      5. Graphic Visualization
      6. Customization
    5. Monitoring Criteria
    6. Important Reminders for Monitoring
    7. Summary
  10. 9. Building a Culture of Data Privacy and Security
    1. Data Culture: What It Is and Why It’s Important
    2. Starting at the Top—Benefits of Data Governance to the Business
      1. Analytics and the Bottom Line
      2. Company Persona and Perception
    3. Intention, Training, and Communications
      1. A Data Culture Needs to Be Intentional
      2. Training: Who Needs to Know What
    4. Beyond Data Literacy
      1. Motivation and Its Cascading Effects
    5. Maintaining Agility
      1. Requirements, Regulations, and Compliance
      2. The Importance of Data Structure
      3. Scaling the Governance Process Up and Down
    6. Interplay with Legal and Security
      1. Staying on Top of Regulations
      2. Communication
      3. Interplay in Action
      4. Agility Is Still Key
    7. Incident Handling
      1. When “Everyone” Is Responsible, No One Is Responsible
    8. Importance of Transparency
      1. What It Means to Be Transparent
      2. Building Internal Trust
      3. Building External Trust
      4. Setting an Example
    9. Summary
  11. A. Google’s Internal Data Governance
    1. The Business Case for Google’s Data Governance
    2. The Scale of Google’s Data Governance
    3. Google’s Governance Process
    4. How Does Google Handle Data?
      1. Privacy Safe—ADH as a Case Study
  12. B. Additional Resources
  13. Index

Product information

  • Title: Data Governance: The Definitive Guide
  • Author(s): Evren Eryurek, Uri Gilad, Valliappa Lakshmanan, Anita Kibunguchy-Grant, Jessi Ashdown
  • Release date: March 2021
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492063490