Position-Aware Graph Neural Networks

2021-01-08

Citation

You, Jiaxuan, Rex Ying and Jure Leskovec. "Position-aware Graph Neural Networks." ArXiv abs/1906.04817 (2019) [Paper Link]

Background

A lot of graph neural networks get confused when two nodes reside in topologically similar neighborhoods. The authors here use a message-passing architecture to create what they call a "Position-Aware Graph Neural Network" or more mercifully "P-GNN".

Anchor Nodes

P-GNNs side-step the similar topology problem by incorporating in their neural network the distance between a node and a set of \(k\) "anchor" nodes.

Anchor nodes are sampled from the graph at each forward-pass, and a "aggregation scheme" is performed at each/multiple layers. This aggregation scheme weights nodes according to their distance to anchor nodes.

According to the authors, we can determine the ideal \(k\) is \(O(\log^2 n)\) using the Bourgain Theorem.

Because calculating distances can potentially be slow for large graphs, there is also a proposed P-GNN-Fast model which uses approximation of distance.

Message-Passing Networks, Prior Art

Message-passing networks are characterised by pooling features of a node's neighbor into "messages" that are passed on to other parts/layers of the network. By that broad definition, P-GNNs are hardly the first message-passing networks. The authors point to:

  • Graph Convolutional Networks
  • GraphSage, Graph Attention Networks
  • Message Passing Neural Networks.

They all use their own interesting ways of aggregating these messages, from simple means to considering edge and global graph information.

None of these methods, according to the authors, are particularly resilient to the case where distinct nodes have similar local topologies.

Some Relevant Definitions

The authors here define embeddings to be position-aware if "the embedding of two nodes can be used to (approximately) recover their shortest path distance in the network".

This is contrasted with structure-aware if they are "a function of up to \(q\)-hop network neighborhood" of some node \(v\).

The "position-aware" property is particularly well suited to link-detection and community detection tasks.

Methodology

Basically, the method boils down to:

  1. Passing messages between a node and its graph neighbors (something that message-passing networks commonly do).

  2. Additionally, pass messages between a node and a random sample of nodes in the graph (referred to as "Anchor" nodes). These anchor nodes are randomly sampled at every forward pass.

Item 2 prevents two nodes who belong to isomorphic neighborhoods from appearing identical if they are in distinct/distant locations in the graph from one another.

Architecture Overview

Figure 2 from You et al. depicting aspects of the P-GNN architecture. First, anchor-set selection, where random sets of anchor nodes are selected. These anchor nodes are then passed to a function F which computes anchor-set messages, is the aggregated by some function AGG sub M, which computes a matrix M for each anchor set. Finally, AGG sub S combines each matrix M from each anchor-set into one matrix M. You can then compute a vector Z by multiplying by a trainable weight or pass the hidden value to the next layer.

Figure 2 from You et al., Copyright 2019 J. You, R. Ying, J. Leskovec

The steps are as follows:

  1. Randomly sample \(K\) anchor-sets from the graph.
  2. For each node \(v_i\), compute an anchor-set message \(S_k\), where \(k\) is the index of the anchor-set, through a function \(F\). This message "combines feature information of two nodes with their network distance".
  3. \(S_k\) is then transformed by a non-linear, trainable, aggregation function \(AGG_M\) to a matrix \(M_{v_i}\). Each row \(i\) is an anchor-set message computed by \(F\).
  4. A second non-linear, trainable, aggregation function \(AGG_S\) is then employed to reduce \(M_{v_i}\) into node \(v_i\)'s message, \(h_{v_i}\). \(h_{v_i} \in \mathbb{R}^r \), where \(r\) is some chosen hidden dimensionality.
  5. We also train some trainable weight \(w\) which will give us a "fixed-size position-aware embedding" \(z_{v_i}\) when multiplied by each anchor-set message. It would appear that \(z_{v_i} \in \mathbb{R}^K\) where \(K\) is the number of anchor-sets.
Papersmachine learninggraphsgraph learningJiaxuan YouRex YingJure Leskovec

Relational Inductive Biases, Deep Learning, and Graph Networks

Setting Up a Windows System to Protect Untechnical Users from Themselves