Chapter 37: Further Reading

Foundational Texts

  • Hamilton, W. L. (2020). Graph Representation Learning. Morgan & Claypool. The most comprehensive textbook on GNNs and graph representation learning, covering spectral methods, message passing, and applications. Freely available at https://www.cs.mcgill.ca/~wlh/grl_book/.

  • Bronstein, M. M., Bruna, J., Cohen, T., and Velickovic, P. (2021). Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv:2104.13478. A unifying framework that views CNNs, GNNs, and Transformers through the lens of symmetry and invariance. Freely available at https://geometricdeeplearning.com/.

  • Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2021). "A Comprehensive Survey on Graph Neural Networks." IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4--24. An excellent survey covering spectral and spatial GNN methods, with a clear taxonomy.

Seminal Papers

Core Architectures

  • Kipf, T. N. and Welling, M. (2017). "Semi-Supervised Classification with Graph Convolutional Networks." ICLR 2017. The paper that introduced GCN with the symmetric normalization trick, making spectral graph convolutions practical. One of the most cited papers in deep learning.

  • Hamilton, W. L., Ying, R., and Leskovec, J. (2017). "Inductive Representation Learning on Large Graphs." NeurIPS 2017. Introduced GraphSAGE with neighbor sampling for scalable, inductive GNNs. Widely adopted in industry.

  • Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2018). "Graph Attention Networks." ICLR 2018. Introduced attention-based aggregation for graphs. The GATv2 follow-up (Brody et al., 2022) fixed an expressiveness limitation in the original attention mechanism.

  • Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2019). "How Powerful Are Graph Neural Networks?" ICLR 2019. A theoretical analysis connecting GNN expressiveness to the Weisfeiler-Lehman test, leading to the Graph Isomorphism Network (GIN).

Message Passing and Molecular Applications

  • Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017). "Neural Message Passing for Quantum Chemistry." ICML 2017. Unified various GNN architectures under the message passing framework and applied them to molecular property prediction.

  • Schutt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S., Tkatchenko, A., and Muller, K.-R. (2017). "SchNet: A Continuous-filter Convolutional Neural Network for Modeling Quantum Interactions." NeurIPS 2017. Introduced continuous-filter convolutions for 3D molecular geometry.

  • Gasteiger, J., Gros, J., and Gunnemann, S. (2020). "Directional Message Passing for Molecular Graphs." ICLR 2020. DimeNet: incorporates bond angles for more expressive molecular representations.

Knowledge Graphs

  • Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., and Welling, M. (2018). "Modeling Relational Data with Graph Convolutional Networks." ESWC 2018. R-GCN: extending GCN to multi-relational graphs with basis decomposition.

  • Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. (2013). "Translating Embeddings for Modeling Multi-relational Data." NeurIPS 2013. TransE: the foundational knowledge graph embedding method.

Graph Transformers

  • Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., Shen, Y., and Liu, T.-Y. (2021). "Do Transformers Really Perform Bad for Graph Representation?" NeurIPS 2021. Graphormer: a Transformer-based architecture for graphs that won the OGB Large-Scale Challenge.

  • Rampasek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G., and Beaini, D. (2022). "Recipe for a General, Powerful, Scalable Graph Transformer." NeurIPS 2022. GPS: a modular framework combining local message passing with global attention.

Pooling and Graph-Level Learning

  • Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W. L., and Leskovec, J. (2018). "Hierarchical Graph Representation Learning with Differentiable Pooling." NeurIPS 2018. DiffPool: learnable hierarchical graph coarsening.

  • Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2016). "Gated Graph Sequence Neural Networks." ICLR 2016. Introduced GRU-based updates for message passing and set-to-sequence readout functions.

Over-Smoothing and Depth

  • Rong, Y., Huang, W., Xu, T., and Huang, J. (2020). "DropEdge: Towards Deep Graph Convolutional Networks on Node Classification." ICLR 2020. Randomly removing edges as a regularization technique to enable deeper GNNs.

  • Xu, K., Li, C., Tian, Y., Sonobe, T., Kawarabayashi, K., and Jegelka, S. (2018). "Representation Learning on Graphs with Jumping Knowledge Networks." ICML 2018. Combining representations from all layers to preserve information at different scales.

Benchmarks and Datasets

  • Wu, Z., Ramsundar, B., Feinberg, E., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., and Pande, V. (2018). "MoleculeNet: A Benchmark for Molecular Machine Learning." Chemical Science, 9, 513--530. The standard benchmark suite for molecular property prediction.

  • Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., and Leskovec, J. (2020). "Open Graph Benchmark: Datasets for Machine Learning on Graphs." NeurIPS 2020. OGB: large-scale, realistic graph benchmarks with standardized evaluation.

  • Dwivedi, V. P., Joshi, C. K., Luu, A. T., Laurent, T., Bengio, Y., and Bresson, X. (2023). "Benchmarking Graph Neural Networks." JMLR, 24(43), 1--48. A systematic comparison of GNN architectures with controlled experimental conditions.

Software and Libraries

  • PyTorch Geometric: https://pyg.org/. The dominant GNN library for PyTorch, with efficient message passing and 40+ built-in architectures.

  • Deep Graph Library (DGL): https://www.dgl.ai/. An alternative GNN framework supporting PyTorch, TensorFlow, and MXNet, with strong distributed training support.

  • RDKit: https://www.rdkit.org/. The standard open-source cheminformatics library for molecular featurization, fingerprints, and graph construction from SMILES.

  • OGB: https://ogb.stanford.edu/. Open Graph Benchmark with standardized datasets, splits, and evaluation code.