Citation
You, Jiaxuan, Rex Ying and Jure Leskovec. "Positionaware Graph Neural Networks." ArXiv abs/1906.04817 (2019) [Paper Link]
Background
A lot of graph neural networks get confused when two nodes reside in topologically similar neighborhoods. The authors here use a messagepassing architecture to create what they call a "PositionAware Graph Neural Network" or more mercifully "PGNN".
Anchor Nodes
PGNNs sidestep the similar topology problem by incorporating in their neural network the distance between a node and a set of \(k\) "anchor" nodes.
Anchor nodes are sampled from the graph at each forwardpass, and a "aggregation scheme" is performed at each/multiple layers. This aggregation scheme weights nodes according to their distance to anchor nodes.
According to the authors, we can determine the ideal \(k\) is \(O(\log^2 n)\) using the Bourgain Theorem.
Because calculating distances can potentially be slow for large graphs, there is also a proposed PGNNFast model which uses approximation of distance.
MessagePassing Networks, Prior Art
Messagepassing networks are characterised by pooling features of a node's neighbor into "messages" that are passed on to other parts/layers of the network. By that broad definition, PGNNs are hardly the first messagepassing networks. The authors point to:
 Graph Convolutional Networks
 GraphSage, Graph Attention Networks
 Message Passing Neural Networks.
They all use their own interesting ways of aggregating these messages, from simple means to considering edge and global graph information.
None of these methods, according to the authors, are particularly resilient to the case where distinct nodes have similar local topologies.
Some Relevant Definitions
The authors here define embeddings to be positionaware if "the embedding of two nodes can be used to (approximately) recover their shortest path distance in the network".
This is contrasted with structureaware if they are "a function of up to \(q\)hop network neighborhood" of some node \(v\).
The "positionaware" property is particularly well suited to linkdetection and community detection tasks.
Methodology
Basically, the method boils down to:

Passing messages between a node and its graph neighbors (something that messagepassing networks commonly do).

Additionally, pass messages between a node and a random sample of nodes in the graph (referred to as "Anchor" nodes). These anchor nodes are randomly sampled at every forward pass.
Item 2 prevents two nodes who belong to isomorphic neighborhoods from appearing identical if they are in distinct/distant locations in the graph from one another.
Architecture Overview
The steps are as follows:
 Randomly sample \(K\) anchorsets from the graph.
 For each node \(v_i\), compute an anchorset message \(S_k\), where \(k\) is the index of the anchorset, through a function \(F\). This message "combines feature information of two nodes with their network distance".
 \(S_k\) is then transformed by a nonlinear, trainable, aggregation function \(AGG_M\) to a matrix \(M_{v_i}\). Each row \(i\) is an anchorset message computed by \(F\).
 A second nonlinear, trainable, aggregation function \(AGG_S\) is then employed to reduce \(M_{v_i}\) into node \(v_i\)'s message, \(h_{v_i}\). \(h_{v_i} \in \mathbb{R}^r \), where \(r\) is some chosen hidden dimensionality.
 We also train some trainable weight \(w\) which will give us a "fixedsize positionaware embedding" \(z_{v_i}\) when multiplied by each anchorset message. It would appear that \(z_{v_i} \in \mathbb{R}^K\) where \(K\) is the number of anchorsets.