Foundations of Scalable Systems

Book description

In many systems, scalability becomes the primary driver as the user base grows. Attractive features and high utility breed success, which brings more requests to handle and more data to manage. But organizations reach a tipping point when design decisions that made sense under light loads suddenly become technical debt. This practical book covers design approaches and technologies that make it possible to scale an application quickly and cost-effectively.

Author Ian Gorton takes software architects and developers through the foundational principles of distributed systems. You'll explore the essential ingredients of scalable solutions, including replication, state management, load balancing, and caching. Specific chapters focus on the implications of scalability for databases, microservices, and event-based streaming systems.

You will focus on:

  • Foundations of scalable systems: Learn basic design principles of scalability, its costs, and architectural tradeoffs
  • Designing scalable services: Dive into service design, caching, asynchronous messaging, serverless processing, and microservices
  • Designing scalable data systems: Learn data system fundamentals, NoSQL databases, and eventual consistency versus strong consistency
  • Designing scalable streaming systems: Explore stream processing systems and scalable event-driven processing

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Why Scalability?
    2. Who This Book Is For
    3. What You Will Learn
    4. Note for Educators
    5. Conventions Used in This Book
    6. Using Code Examples
    7. O’Reilly Online Learning
    8. How to Contact Us
    9. Acknowledgments
  2. I. The Basics
  3. 1. Introduction to Scalable Systems
    1. What Is Scalability?
    2. Examples of System Scale in the Early 2000s
    3. How Did We Get Here? A Brief History of System Growth
    4. Scalability Basic Design Principles
    5. Scalability and Costs
    6. Scalability and Architecture Trade-Offs
      1. Performance
      2. Availability
      3. Security
      4. Manageability
    7. Summary and Further Reading
  4. 2. Distributed Systems Architectures: An Introduction
    1. Basic System Architecture
    2. Scale Out
    3. Scaling the Database with Caching
    4. Distributing the Database
    5. Multiple Processing Tiers
    6. Increasing Responsiveness
    7. Systems and Hardware Scalability
    8. Summary and Further Reading
  5. 3. Distributed Systems Essentials
    1. Communications Basics
      1. Communications Hardware
      2. Communications Software
    2. Remote Method Invocation
    3. Partial Failures
    4. Consensus in Distributed Systems
    5. Time in Distributed Systems
    6. Summary and Further Reading
  6. 4. An Overview of Concurrent Systems
    1. Why Concurrency?
    2. Threads
    3. Order of Thread Execution
    4. Problems with Threads
      1. Race Conditions
      2. Deadlocks
    5. Thread States
    6. Thread Coordination
    7. Thread Pools
    8. Barrier Synchronization
    9. Thread-Safe Collections
    10. Summary and Further Reading
  7. II. Scalable Systems
  8. 5. Application Services
    1. Service Design
      1. Application Programming Interface (API)
      2. Designing Services
      3. State Management
    2. Applications Servers
    3. Horizontal Scaling
    4. Load Balancing
      1. Load Distribution Policies
      2. Health Monitoring
      3. Elasticity
      4. Session Affinity
    5. Summary and Further Reading
  9. 6. Distributed Caching
    1. Application Caching
    2. Web Caching
      1. Cache-Control
      2. Expires and Last-Modified
      3. Etag
    3. Summary and Further Reading
  10. 7. Asynchronous Messaging
    1. Introduction to Messaging
      1. Messaging Primitives
      2. Message Persistence
      3. Publish–Subscribe
      4. Message Replication
    2. Example: RabbitMQ
      1. Messages, Exchanges, and Queues
      2. Distribution and Concurrency
      3. Data Safety and Performance Trade-offs
      4. Availability and Performance Trade-Offs
    3. Messaging Patterns
      1. Competing Consumers
      2. Exactly-Once Processing
      3. Poison Messages
    4. Summary and Further Reading
  11. 8. Serverless Processing Systems
    1. The Attractions of Serverless
    2. Google App Engine
      1. The Basics
      2. GAE Standard Environment
      3. Autoscaling
    3. AWS Lambda
      1. Lambda Function Life Cycle
      2. Execution Considerations
      3. Scalability
    4. Case Study: Balancing Throughput and Costs
      1. Choosing Parameter Values
      2. GAE Autoscaling Parameter Study Design
      3. Results
    5. Summary and Further Reading
  12. 9. Microservices
    1. The Movement to Microservices
      1. Monolithic Applications
      2. Breaking Up the Monolith
      3. Deploying Microservices
      4. Principles of Microservices
    2. Resilience in Microservices
      1. Cascading Failures
      2. Bulkhead Pattern
    3. Summary and Further Reading
  13. III. Scalable Distributed Databases
  14. 10. Scalable Database Fundamentals
    1. Distributed Databases
    2. Scaling Relational Databases
      1. Scaling Up
      2. Scaling Out: Read Replicas
      3. Scale Out: Partitioning Data
      4. Example: Oracle RAC
    3. The Movement to NoSQL
      1. NoSQL Data Models
      2. Query Languages
      3. Data Distribution
    4. The CAP Theorem
    5. Summary and Further Reading
  15. 11. Eventual Consistency
    1. What Is Eventual Consistency?
      1. Inconsistency Window
      2. Read Your Own Writes
    2. Tunable Consistency
    3. Quorum Reads and Writes
    4. Replica Repair
      1. Active Repair
      2. Passive Repair
    5. Handling Conflicts
      1. Last Writer Wins
      2. Version Vectors
    6. Summary and Further Reading
  16. 12. Strong Consistency
    1. Introduction to Strong Consistency
    2. Consistency Models
    3. Distributed Transactions
      1. Two-Phase Commit
      2. 2PC Failure Modes
    4. Distributed Consensus Algorithms
      1. Raft
      2. Leader Election
      3. Strong Consistency in Practice
      4. VoltDB
      5. Google Cloud Spanner
    5. Summary and Further Reading
  17. 13. Distributed Database Implementations
    1. Redis
      1. Data Model and API
      2. Distribution and Replication
      3. Strengths and Weaknesses
    2. MongoDB
      1. Data Model and API
      2. Distribution and Replication
      3. Strengths and Weaknesses
    3. Amazon DynamoDB
      1. Data Model and API
      2. Distribution and Replication
      3. Strengths and Weaknesses
    4. Summary and Further Reading
  18. IV. Event and Stream Processing
  19. 14. Scalable Event-Driven Processing
    1. Event-Driven Architectures
    2. Apache Kafka
      1. Topics
      2. Producers and Consumers
      3. Scalability
      4. Availability
    3. Summary and Further Reading
  20. 15. Stream Processing Systems
    1. Introduction to Stream Processing
    2. Stream Processing Platforms
    3. Case Study: Apache Flink
      1. DataStream API
      2. Scalability
      3. Data Safety
    4. Conclusions and Further Reading
  21. 16. Final Tips for Success
    1. Automation
    2. Observability
    3. Deployment Platforms
    4. Data Lakes
    5. Further Reading and Conclusions
  22. Index
  23. About the Author

Product information

  • Title: Foundations of Scalable Systems
  • Author(s): Ian Gorton
  • Release date: June 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098106065