Path census & structural coefficients
PathCensus
is the basis which structural similarity and
complementarity coefficients are derived from. In its raw form
it is a set of counts of wedge and head triples (2-paths) and quadruples
(3-paths) traversing an (i, j)
edge as well as counts of corresponding
triangles (3-cycles) and quadrangles (4-cycles).
In the weighted case there are separate counts of triangles
and quadrangles for wedge and head paths as average weights are defined
differently for these two cases.
Note
Path/cycle counts and structural coefficients are returned
as pandas.Series
or pandas.DataFrame
objects indexed
properly with integer node indices corresponding to the ordering of
rows in the underlying adjacency matrix of the network.
Note
Path census calculations are relatively efficient as the main
workhorse functions are just-in-time (JIT) compiled to highly
optimized C code using numba
package. Moreover,
the path census algorithm is based on a state-of-the-art
graphlet counting algorithm proposed by
Ahmed et al. [ANRD15]
which has worst-case asymptotic computational complexity of
\(O(md^2_{\text{max}})\), where \(m\) is the number
of edges and \(d_{\text{max}}\) is the maximum degree.
Moreover, some additional optimizations are used to speed up
the calculations in the case of highly heterogeneous degree
distributions (e.g. power laws). See min_di
argument in
PathCensus.count_paths()
.
Node- and graph-level counts are derived from edge-level counts according
to simple aggregations rules. They are defined in definition classes
implemented in pathcensus.definitions
submodule.
See also
pathcensus.definitions.PathDefinitionsUnweighted
for the naming scheme used for unweighted counts.
pathcensus.definitions.PathDefinitionsWeighted
for the naming scheme used for weighted counts.
Below a node-level census for a simple triangle graph is counted.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
Structural similarity
Structural similarity coefficients (PathCensus.similarity()
)
as well as their corresponding clustering (PathCensus.tclust()
)
and closure coefficients (PathCensus.tclosure()
)
are defined quite simply in terms of ratios of 3-cycles (triangles)
to 2- (triples) counted at the levels of edges, nodes or globaly within an
entire graph. The figure below presents a summary of the underlying geometric
motivation as well as the main properties of structural similarity
coefficients, including the differences relative to local clustering and
closure coefficient
[WS98, YBL19].
Below node-wise structural similarity coefficients are counted for a simple triangle graph.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0 1.0
1 1.0
2 1.0
dtype: float64
Structural complementarity
Structural complementarity (PathCensus.complementarity()
)
coefficients and their corresponding clustering
(PathCensus.qclust()
) and closure coefficients
(PathCensus.qclosure()
) are defined in terms of ratios of 4-cycles
(quadrangles) to 3-paths (quadruples) and can be defined at the levels
of edges, nodes and entire graphs. The figure below present a summary
of the underlying geometric motivation and some of the main properties.
Below we calculate complementarity coefficients for nodes in a 4-clique graph. Note that complementarity coefficients are all zeros as there are not quadrangles without any chordal edges.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0 0.0
1 0.0
2 0.0
3 0.0
dtype: float64
Weighted coefficients
Structural coefficients can be also defined for weighted networks in which case paths and cycles are weighted according to the arithmetic average over edge weights defining an underlying path (so closing or chordal edges in triangles/quadrangles are ignored). This can be seen as an extension of the weighted clustering coefficient proposed by Barrat et al. [BBPastorSatorrasV04]. Indeed, our formulation of the weighted clustering based on triangles is equivalent to it. The figure below presents a summary of the weighting rules.
Edge weights should be detected automatically in most cases provided that
a standard name of edge weight attribute ("weight"
) is used.
However, weighted computations may be also enabled/disabled explicitly
by using weighted
argument.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
twc thc tw th q0wc q0hc qw qh
i
0 2.5 6.75 5.0 13.5 0.0 0.0 0.0 0.0
1 6.5 4.75 13.0 9.5 0.0 0.0 0.0 0.0
2 7.0 4.50 14.0 9.0 0.0 0.0 0.0 0.0
>>> PathCensus(G, weighted=False).census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0