Path census & structural coefficients

PathCensus is the basis which structural similarity and complementarity coefficients are derived from. In its raw form it is a set of counts of wedge and head triples (2-paths) and quadruples (3-paths) traversing an (i, j) edge as well as counts of corresponding triangles (3-cycles) and quadrangles (4-cycles). In the weighted case there are separate counts of triangles and quadrangles for wedge and head paths as average weights are defined differently for these two cases.

Note

Path/cycle counts and structural coefficients are returned as pandas.Series or pandas.DataFrame objects indexed properly with integer node indices corresponding to the ordering of rows in the underlying adjacency matrix of the network.

Note

Path census calculations are relatively efficient as the main workhorse functions are just-in-time (JIT) compiled to highly optimized C code using numba package. Moreover, the path census algorithm is based on a state-of-the-art graphlet counting algorithm proposed by Ahmed et al. [ANRD15] which has worst-case asymptotic computational complexity of \(O(md^2_{\text{max}})\), where \(m\) is the number of edges and \(d_{\text{max}}\) is the maximum degree.

Moreover, some additional optimizations are used to speed up the calculations in the case of highly heterogeneous degree distributions (e.g. power laws). See min_di argument in PathCensus.count_paths().

Node- and graph-level counts are derived from edge-level counts according to simple aggregations rules. They are defined in definition classes implemented in pathcensus.definitions submodule.

See also

pathcensus.definitions.PathDefinitionsUnweighted for the naming scheme used for unweighted counts.

pathcensus.definitions.PathDefinitionsWeighted for the naming scheme used for weighted counts.

Below a node-level census for a simple triangle graph is counted.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
   t  tw  th  q0  qw  qh
i
0  1   2   2   0   0   0
1  1   2   2   0   0   0
2  1   2   2   0   0   0

Structural similarity

Structural similarity coefficients (PathCensus.similarity()) as well as their corresponding clustering (PathCensus.tclust()) and closure coefficients (PathCensus.tclosure()) are defined quite simply in terms of ratios of 3-cycles (triangles) to 2- (triples) counted at the levels of edges, nodes or globaly within an entire graph. The figure below presents a summary of the underlying geometric motivation as well as the main properties of structural similarity coefficients, including the differences relative to local clustering and closure coefficient [WS98, YBL19].

Overview of the properties of structural similarity coefficients

Overview of the properties of structural similarity coefficients.

Below node-wise structural similarity coefficients are counted for a simple triangle graph.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0    1.0
1    1.0
2    1.0
dtype: float64

Structural complementarity

Structural complementarity (PathCensus.complementarity()) coefficients and their corresponding clustering (PathCensus.qclust()) and closure coefficients (PathCensus.qclosure()) are defined in terms of ratios of 4-cycles (quadrangles) to 3-paths (quadruples) and can be defined at the levels of edges, nodes and entire graphs. The figure below present a summary of the underlying geometric motivation and some of the main properties.

Overview of the properties of structural complementarity coefficients

Overview of the properties of structural complementarity coefficients.

Below we calculate complementarity coefficients for nodes in a 4-clique graph. Note that complementarity coefficients are all zeros as there are not quadrangles without any chordal edges.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0    0.0
1    0.0
2    0.0
3    0.0
dtype: float64

Weighted coefficients

Structural coefficients can be also defined for weighted networks in which case paths and cycles are weighted according to the arithmetic average over edge weights defining an underlying path (so closing or chordal edges in triangles/quadrangles are ignored). This can be seen as an extension of the weighted clustering coefficient proposed by Barrat et al. [BBPastorSatorrasV04]. Indeed, our formulation of the weighted clustering based on triangles is equivalent to it. The figure below presents a summary of the weighting rules.

Overview of weighted path/cycle counts

Overview of weighted path/cycle counts.

Edge weights should be detected automatically in most cases provided that a standard name of edge weight attribute ("weight") is used. However, weighted computations may be also enabled/disabled explicitly by using weighted argument.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
   twc   thc    tw    th  q0wc  q0hc  qw  qh
i
0  2.5  6.75   5.0  13.5   0.0   0.0   0.0   0.0
1  6.5  4.75  13.0   9.5   0.0   0.0   0.0   0.0
2  7.0  4.50  14.0   9.0   0.0   0.0   0.0   0.0
>>> PathCensus(G, weighted=False).census("nodes")
   t  tw  th  q0  qw  qh
i
0  1   2   2   0   0   0
1  1   2   2   0   0   0
2  1   2   2   0   0   0