Graph Algorithms

Book description

Learn how graph algorithms can help you leverage relationships within your data to develop intelligent solutions and enhance your machine learning models. With this practical guide,developers and data scientists will discover how graph analytics deliver value, whether they’re used for building dynamic network models or forecasting real-world behavior.

Mark Needham and Amy Hodler from Neo4j explain how graph algorithms describe complex structures and reveal difficult-to-find patterns—from finding vulnerabilities and bottlenecksto detecting communities and improving machine learning predictions. You’ll walk through hands-on examples that show you how to use graph algorithms in Apache Spark and Neo4j, two of the most common choices for graph analytics.

  • Learn how graph analytics reveal more predictive elements in today’s data
  • Understand how popular graph algorithms work and how they’re applied
  • Use sample code and tips from more than 20 graph algorithm examples
  • Learn which algorithms to use for different types of questions
  • Explore examples with working code and sample datasets for Spark and Neo4j
  • Create an ML workflow for link prediction by combining Neo4j and Spark

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. What’s in This Book
    2. Conventions Used in This Book
    3. Using Code Examples
    4. O’Reilly Online Learning
    5. How to Contact Us
    6. Acknowledgments
  2. Foreword
  3. 1. Introduction
    1. What Are Graphs?
    2. What Are Graph Analytics and Algorithms?
    3. Graph Processing, Databases, Queries, and Algorithms
      1. OLTP and OLAP
    4. Why Should We Care About Graph Algorithms?
    5. Graph Analytics Use Cases
    6. Conclusion
  4. 2. Graph Theory and Concepts
    1. Terminology
    2. Graph Types and Structures
      1. Random, Small-World, Scale-Free Structures
    3. Flavors of Graphs
      1. Connected Versus Disconnected Graphs
      2. Unweighted Graphs Versus Weighted Graphs
      3. Undirected Graphs Versus Directed Graphs
      4. Acyclic Graphs Versus Cyclic Graphs
      5. Sparse Graphs Versus Dense Graphs
      6. Monopartite, Bipartite, and k-Partite Graphs
    4. Types of Graph Algorithms
      1. Pathfinding
      2. Centrality
      3. Community Detection
    5. Summary
  5. 3. Graph Platforms and Processing
    1. Graph Platform and Processing Considerations
      1. Platform Considerations
      2. Processing Considerations
    2. Representative Platforms
      1. Selecting Our Platform
      2. Apache Spark
      3. Neo4j Graph Platform
    3. Summary
  6. 4. Pathfinding and Graph Search Algorithms
    1. Example Data: The Transport Graph
      1. Importing the Data into Apache Spark
      2. Importing the Data into Neo4j
    2. Breadth First Search
      1. Breadth First Search with Apache Spark
    3. Depth First Search
    4. Shortest Path
      1. When Should I Use Shortest Path?
      2. Shortest Path with Neo4j
      3. Shortest Path (Weighted) with Neo4j
      4. Shortest Path (Weighted) with Apache Spark
      5. Shortest Path Variation: A*
      6. Shortest Path Variation: Yen’s k-Shortest Paths
    5. All Pairs Shortest Path
      1. A Closer Look at All Pairs Shortest Path
      2. When Should I Use All Pairs Shortest Path?
      3. All Pairs Shortest Path with Apache Spark
      4. All Pairs Shortest Path with Neo4j
    6. Single Source Shortest Path
      1. When Should I Use Single Source Shortest Path?
      2. Single Source Shortest Path with Apache Spark
      3. Single Source Shortest Path with Neo4j
    7. Minimum Spanning Tree
      1. When Should I Use Minimum Spanning Tree?
      2. Minimum Spanning Tree with Neo4j
    8. Random Walk
      1. When Should I Use Random Walk?
      2. Random Walk with Neo4j
    9. Summary
  7. 5. Centrality Algorithms
    1. Example Graph Data: The Social Graph
      1. Importing the Data into Apache Spark
      2. Importing the Data into Neo4j
    2. Degree Centrality
      1. Reach
      2. When Should I Use Degree Centrality?
      3. Degree Centrality with Apache Spark
    3. Closeness Centrality
      1. When Should I Use Closeness Centrality?
      2. Closeness Centrality with Apache Spark
      3. Closeness Centrality with Neo4j
      4. Closeness Centrality Variation: Wasserman and Faust
      5. Closeness Centrality Variation: Harmonic Centrality
    4. Betweenness Centrality
      1. When Should I Use Betweenness Centrality?
      2. Betweenness Centrality with Neo4j
      3. Betweenness Centrality Variation: Randomized-Approximate Brandes
    5. PageRank
      1. Influence
      2. The PageRank Formula
      3. Iteration, Random Surfers, and Rank Sinks
      4. When Should I Use PageRank?
      5. PageRank with Apache Spark
      6. PageRank with Neo4j
      7. PageRank Variation: Personalized PageRank
    6. Summary
  8. 6. Community Detection Algorithms
    1. Example Graph Data: The Software Dependency Graph
      1. Importing the Data into Apache Spark
      2. Importing the Data into Neo4j
    2. Triangle Count and Clustering Coefficient
      1. Local Clustering Coefficient
      2. Global Clustering Coefficient
      3. When Should I Use Triangle Count and Clustering Coefficient?
      4. Triangle Count with Apache Spark
      5. Triangles with Neo4j
      6. Local Clustering Coefficient with Neo4j
    3. Strongly Connected Components
      1. When Should I Use Strongly Connected Components?
      2. Strongly Connected Components with Apache Spark
      3. Strongly Connected Components with Neo4j
    4. Connected Components
      1. When Should I Use Connected Components?
      2. Connected Components with Apache Spark
      3. Connected Components with Neo4j
    5. Label Propagation
      1. Semi-Supervised Learning and Seed Labels
      2. When Should I Use Label Propagation?
      3. Label Propagation with Apache Spark
      4. Label Propagation with Neo4j
    6. Louvain Modularity
      1. When Should I Use Louvain?
      2. Louvain with Neo4j
    7. Validating Communities
    8. Summary
  9. 7. Graph Algorithms in Practice
    1. Analyzing Yelp Data with Neo4j
      1. Yelp Social Network
      2. Data Import
      3. Graph Model
      4. A Quick Overview of the Yelp Data
      5. Trip Planning App
      6. Travel Business Consulting
      7. Finding Similar Categories
    2. Analyzing Airline Flight Data with Apache Spark
      1. Exploratory Analysis
      2. Popular Airports
      3. Delays from ORD
      4. Bad Day at SFO
      5. Interconnected Airports by Airline
      6. Summary
  10. 8. Using Graph Algorithms to Enhance Machine Learning
    1. Machine Learning and the Importance of Context
      1. Graphs, Context, and Accuracy
    2. Connected Feature Engineering
      1. Graphy Features
      2. Graph Algorithm Features
    3. Graphs and Machine Learning in Practice: Link Prediction
      1. Tools and Data
      2. Importing the Data into Neo4j
      3. The Coauthorship Graph
      4. Creating Balanced Training and Testing Datasets
      5. How We Predict Missing Links
      6. Creating a Machine Learning Pipeline
      7. Predicting Links: Basic Graph Features
      8. Predicting Links: Triangles and the Clustering Coefficient
      9. Predicting Links: Community Detection
    4. Summary
    5. Wrapping Things Up
  11. A. Additional Information and Resources
    1. Other Algorithms
    2. Neo4j Bulk Data Import and Yelp
    3. APOC and Other Neo4j Tools
    4. Finding Datasets
    5. Assistance with the Apache Spark and Neo4j Platforms
    6. Training
  12. Index

Product information

  • Title: Graph Algorithms
  • Author(s): Mark Needham, Amy E. Hodler
  • Release date: May 2019
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492047681