""":class:`PathCensus` is the basis which structural similarity and
complementarity coefficients are derived from. In its raw form
it is a set of counts of wedge and head triples (2-paths) and quadruples
(3-paths) traversing an ``(i, j)`` edge as well as counts of corresponding
triangles (3-cycles) and quadrangles (4-cycles).
In the weighted case there are separate counts of triangles
and quadrangles for wedge and head paths as average weights are defined
differently for these two cases.
.. note::
Path/cycle counts and structural coefficients are returned
as :class:`pandas.Series` or :class:`pandas.DataFrame` objects indexed
properly with integer node indices corresponding to the ordering of
rows in the underlying adjacency matrix of the network.
.. note::
Path census calculations are relatively efficient as the main
workhorse functions are just-in-time (JIT) compiled to highly
optimized C code using :py:mod:`numba` package. Moreover,
the path census algorithm is based on a state-of-the-art
graphlet counting algorithm proposed by
:cite:t:`ahmedEfficientGraphletCounting2015`
which has worst-case asymptotic computational complexity of
:math:`O(md^2_{\\text{max}})`, where :math:`m` is the number
of edges and :math:`d_{\\text{max}}` is the maximum degree.
Moreover, some additional optimizations are used to speed up
the calculations in the case of highly heterogeneous degree
distributions (e.g. power laws). See ``min_di`` argument in
:meth:`PathCensus.count_paths`.
Node- and graph-level counts are derived from edge-level counts according
to simple aggregations rules. They are defined in definition classes
implemented in :py:mod:`pathcensus.definitions` submodule.
.. seealso::
:class:`pathcensus.definitions.PathDefinitionsUnweighted`
for the naming scheme used for unweighted counts.
:class:`pathcensus.definitions.PathDefinitionsWeighted`
for the naming scheme used for weighted counts.
Below a node-level census for a simple triangle graph is counted.
.. doctest:: census-triangle
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
Structural similarity
---------------------
Structural similarity coefficients (:meth:`PathCensus.similarity`)
as well as their corresponding clustering (:meth:`PathCensus.tclust`)
and closure coefficients (:meth:`PathCensus.tclosure`)
are defined quite simply in terms of ratios of 3-cycles (triangles)
to 2- (triples) counted at the levels of edges, nodes or globaly within an
entire graph. The figure below presents a summary of the underlying geometric
motivation as well as the main properties of structural similarity
coefficients, including the differences relative to local clustering and
closure coefficient
:cite:p:`wattsCollectiveDynamicsSmallworld1998,yinLocalClosureCoefficient2019`.
.. figure:: /figures/sim.svg
:align: center
:alt: Overview of the properties of structural similarity coefficients
Overview of the properties of structural similarity coefficients.
Below node-wise structural similarity coefficients are counted for
a simple triangle graph.
.. doctest:: simcoef-nodes
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0 1.0
1 1.0
2 1.0
dtype: float64
Structural complementarity
--------------------------
Structural complementarity (:meth:`PathCensus.complementarity`)
coefficients and their corresponding clustering
(:meth:`PathCensus.qclust`) and closure coefficients
(:meth:`PathCensus.qclosure`) are defined in terms of ratios of 4-cycles
(quadrangles) to 3-paths (quadruples) and can be defined at the levels
of edges, nodes and entire graphs. The figure below present a summary
of the underlying geometric motivation and some of the main properties.
.. figure:: /figures/comp.svg
:align: center
:alt: Overview of the properties of structural complementarity coefficients
Overview of the properties of structural complementarity coefficients.
Below we calculate complementarity coefficients for nodes
in a 4-clique graph. Note that complementarity coefficients are all
zeros as there are not quadrangles without any chordal edges.
.. doctest:: compcoefs-nodes
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0 0.0
1 0.0
2 0.0
3 0.0
dtype: float64
Weighted coefficients
---------------------
Structural coefficients can be also defined for weighted networks in which
case paths and cycles are weighted according to the arithmetic average
over edge weights defining an underlying path
(so closing or chordal edges in triangles/quadrangles are ignored).
This can be seen as an extension of the weighted clustering coefficient
proposed by :cite:t:`barratArchitectureComplexWeighted2004`.
Indeed, our formulation of the weighted clustering based on triangles
is equivalent to it.
The figure below presents a summary of the weighting rules.
.. figure:: /figures/weighted.svg
:align: center
:alt: Overview of weighted path/cycle counts
Overview of weighted path/cycle counts.
Edge weights should be detected automatically in most cases provided that
a standard name of edge weight attribute (``"weight"``) is used.
However, weighted computations may be also enabled/disabled explicitly
by using ``weighted`` argument.
.. doctest:: coefs-weighted
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
twc thc tw th q0wc q0hc qw qh
i
0 2.5 6.75 5.0 13.5 0.0 0.0 0.0 0.0
1 6.5 4.75 13.0 9.5 0.0 0.0 0.0 0.0
2 7.0 4.50 14.0 9.0 0.0 0.0 0.0 0.0
>>> PathCensus(G, weighted=False).census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
"""
from __future__ import annotations
from typing import Union, Literal, Optional, Any, Tuple, Dict
import numpy as np
from scipy.sparse import csr_matrix
import numba
import pandas as pd
from . import types, adjacency
from .core.graph import Graph
from .core.parallel import count_paths_parallel
from .types import UInt, Float
from .definitions import PathDefinitionsUnweighted, PathDefinitionsWeighted
[docs]
class PathCensus:
"""Path census and structural coefficients calculations
for undirected graphs.
Attributes
----------
graph
:class:`pathcensus.core.graph.Graph` instance
for calculating path census.
counts
Data frame with path/cycle counts per edge.
Initialization may be postponed.
Notes
-----
Naming scheme used for denoting counts is documented in the docstring
for ``definitions`` attribute (i.e. ``self.definitions``).
.. testsetup:: pathcensus
import numpy as np
from pathcensus import PathCensus
"""
# pylint: disable=too-many-public-methods
def __init__(
self,
graph: types.GraphABC,
weighted: Optional[bool] = None,
validate: bool = True,
adj_kws: Optional[Dict] = None,
count_paths: bool = True,
**kwds: Any
) -> None:
"""Initialization method.
Parameters
----------
graph
Graph-like object registered with
:py:class:`pathcensus.types.GraphABC` abstract base class
and a registered single dispatch method registered on
:py:class:`pathcensus.utils.adjacency`. Sparse matrices
work out of the box.
weighted
Should the graph be interpreted as weighted.
Determined automatically if ``None``.
validate
Should input graph be validate for correctness
(i.e. checked if is undirected).
adj_kws
Additional keyword params passed to
:py:func:`pathcensus.utils.adjacency`.
count_paths
Should `counts` attribute be initialized immediately.
It can be initialized later using :py:meth:`count` method.
**kwds
Passed to :py:meth:`count_paths`.
"""
adj_kws = adj_kws or {}
graph_kws = dict(weighted=weighted, validate=validate, **adj_kws)
self.graph = self.get_graph(graph, **graph_kws)
# Setup path definition objects
if self.weighted:
self.definitions = PathDefinitionsWeighted()
else:
self.definitions = PathDefinitionsUnweighted()
self.counts = None
if count_paths:
self.count(**kwds)
[docs]
def count(self, **kwds: Any) -> None:
"""Count paths and set `self.counts` attribute.
``**kwds`` are passed to :py:meth:`count_paths`.
"""
E, counts = self.count_paths(self.graph, **kwds)
self.counts = self._make_counts(E, counts)
# Properties --------------------------------------------------------------
@property
def n_nodes(self) -> int:
return self.graph.n_nodes
@property
def vcount(self) -> int:
return self.n_nodes
@property
def n_edges(self) -> int:
return self.counts.shape[0]
@property
def ecount(self) -> int:
return self.n_edges
@property
def weighted(self) -> bool:
return self.graph.weighted
@property
def degree(self) -> np.ndarray:
"""Get degree sequence of the underlying graph."""
return self.graph.D
@property
def strength(self) -> np.ndarray:
"""Get strength sequence of the underlying graph
(or degree sequence in the unweighted case).
"""
if self.weighted:
return self.graph.S
return self.graph.D
@property
def tdf(self) -> pd.DataFrame:
"""Data frame with triple/triangle counts per edge."""
cols = [
name for name in self.definitions.get_column_names()
if name in self.definitions["sim"]
]
return self.counts[cols]
@property
def qdf(self) -> pd.DataFrame:
"""Data frame with quadruple/quadrangle counts per edge."""
cols = [
name for name in self.definitions.get_column_names()
if name in self.definitions["comp"]
]
return self.counts[cols]
# Static & class methods --------------------------------------------------
[docs]
@classmethod
def get_graph(
cls,
graph: types.GraphABC,
weighted: Optional[bool] = None,
validate: bool = True,
**kwds: Any
) -> Graph:
"""Get graph object for path counting.
Parameters
----------
graph
A compatibe graph object registered with `paths.types.GraphABC`.
n_nodes
Number of nodes. Used only when `graph` is passed as an edgelist.
weighted
Should the graph be interpreted as weighted graph.
If ``None`` then it is determined based on the number of
unique values of non-zero values in the adjacency matrix.
validate
Should input graph be validate for correctness
(i.e. checked if is undirected).
**kwds
Passed to :py:func:`pathcensus.utils.adjacency`.
"""
A = adjacency(graph, **kwds)
if validate and (A != A.T).count_nonzero() > 0:
raise AttributeError("only undirected graphs are accepted")
n_nodes = A.shape[0]
E = np.ascontiguousarray(np.array(A.nonzero(), dtype=UInt).T)
if weighted is None:
weighted = any(A.data != 1)
if weighted:
W = A.data.astype(Float)
else:
W = None
G = Graph(n_nodes, E, W)
return G
[docs]
@classmethod
def count_paths(
cls,
graph: Union[types.GraphABC, Graph],
*,
parallel: Optional[bool] = None,
num_threads: Optional[int] = None,
graph_kws: Optional[Dict] = None,
min_di: bool = True,
**kwds: Any
) -> Tuple[int, np.ndarray]:
"""Count paths and cycles in a graph.
Parameters
----------
graph
:py:class:`pathcensus.core.graph.Graph` instance.
or graph-like object that can be converted to it.
parallel
Should parallel counting algorithm be used.
When ``None`` it is used by default for graphs
with at least one million edges.
num_threads
Number of threads to use when ``parallel=True``.
batch_size
Batch size to use when running with ``parallel=True``.
graph_kws
Additional keyword arguments passed to :py:meth:`get_graph`.
Used only when `graph` is not already in the JIT-compiled form.
min_di
Should `di < dj` rule for iterating over edges be used.
This way the most expensive loop of the `PathCensus` algorithm
for computing edge-wise path/cycle counts always iterates over
neighbors of the lower degree node in an ``(i, j)`` edge.
Almost always should be set to ``True``.
The argument is used mostly for testing purposes.
**kwds
Passed to :py:func:`pathcensus.core.parallel.count_paths_parallel`
when ``parallel=True``.
Notes
-----
The ``parallel=True`` argument may not work and lead to segmentation
faults on some MacOS machines.
Returns
-------
n_nodes
Number of nodes.
counts
Path and cycles counts.
"""
if isinstance(graph, types.GraphABC):
graph = cls.get_graph(graph, **(graph_kws or {}))
if min_di:
E = graph.get_min_di_edges()
else:
E = graph.get_edges()
if parallel is None:
# Use parallel algorithm when at least 100k edges
parallel = graph.n_edges >= 1e5
if not num_threads or num_threads <= 0:
# pylint: disable=no-member
num_threads = numba.config.NUMBA_NUM_THREADS
if parallel and num_threads > 1:
orig_num_threads = numba.get_num_threads()
numba.set_num_threads(num_threads)
try:
E, counts = count_paths_parallel(graph, **kwds)
finally:
numba.set_num_threads(orig_num_threads)
else:
E, counts = graph.count_paths(E)
return E, counts
# Auxiliary methods -------------------------------------------------------
[docs]
def get_counts(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
) -> pd.DataFrame:
"""Get (possibly aggregated) path counts.
Parameters
----------
mode
Should node, edge or global counts be calculated.
"""
self._check_mode(mode)
counts = self.counts
if mode == "nodes":
counts = counts.groupby(level="i") \
.sum() \
.reindex(np.arange(self.n_nodes), copy=False) \
.fillna(0)
elif mode == "global":
counts = counts.sum()
return counts
# Similarity coefficients -------------------------------------------------
[docs]
def tclust(
self,
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> pd.Series:
"""Triangle-based local clustering (node-wise).
It is equivalent to local clustering coefficient
:cite:p:`wattsCollectiveDynamicsSmallworld1998`.
Parameters
----------
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
It is defined as the ratio of triangles including a focal node ``i``
to the number of wedge triples centered at it:
.. math::
s^W_i = \\frac{2T_i}{t^W_i}
.. figure:: /figures/t-wedge.svg
:align: center
:alt: Wedge triple
Wedge triple.
Examples
--------
.. doctest:: pathcensus
>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).tclust()
i
0 1.0
1 1.0
2 1.0
dtype: float64
"""
df = counts if counts is not None else self.get_counts("nodes")
num = df[self._a("twc")]
denom = df[self._a("tw")]
return self._divide(num, denom, undefined=undefined)
[docs]
def tclosure(
self,
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> pd.Series:
"""Triangle-based local closure coefficient (node-wise).
It is equivalent to local closure coefficient
:cite:p:`yinLocalClosureCoefficient2019`.
Parameters
----------
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
It is defined as the ratio of the number of triangles including
a focal node ``i`` to the number of head triples starting from it:
.. math::
s^H_i = \\frac{2T_i}{t^H_i}
.. figure:: /figures/t-head.svg
:align: center
:alt: Head triple
Head triple.
Examples
--------
.. doctest:: pathcensus
>>> # Triangle graph
>>> A = np.array([[0,1,1],[1,0,1],[1,1,0]])
>>> PathCensus(A).tclosure()
i
0 1.0
1 1.0
2 1.0
dtype: float64
"""
df = counts if counts is not None else self.get_counts("nodes")
num = df[self._a("thc")]
denom = df[self._a("th")]
return self._divide(num, denom, undefined=undefined)
[docs]
def similarity(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> Union[pd.Series, float]:
"""Structural similarity coefficients.
Parameters
----------
mode
Should it be calculated for nodes, edges or globally
(equivalent to global clustering).
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
It is defined as the ratio of triangles including a focal node ``i``
to the total number of both wedge and head triples:
.. math::
s_i = \\frac{4T_i}{t^W_i + t^H_i}
See Also
--------
simcoefs : structural similarity coefficients
coefs : structural coefficients
Examples
--------
.. doctest:: pathcensus
>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).similarity("edges")
i j
0 1 1.0
2 1.0
1 0 1.0
2 1.0
2 0 1.0
1 1.0
dtype: float64
>>> PathCensus(A).similarity("nodes")
i
0 1.0
1 1.0
2 1.0
dtype: float64
>>> PathCensus(A).similarity("global")
1.0
"""
df = counts if counts is not None else self.get_counts(mode)
num = df[self._a("twc")] + df[self._a("thc")]
denom = df[self._a("tw")] + df[self._a("th")]
return self._divide(num, denom, undefined=undefined)
# Complementarity coefficients --------------------------------------------
[docs]
def qclust(
self,
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> pd.Series:
"""Quadrangle-based local clustering coefficient (node-wise).
Parameters
----------
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
It is defined as the ratio of quadrangles including a focal node ``i``
and the number of wedge quadruples with ``i`` at the second position
(this is to avoid double counting and make the number of wedge and
head quadruples per quadrangle equal):
.. math::
c^W_i = \\frac{2Q_i}{q^W_i}
.. figure:: /figures/q-wedge.svg
:align: center
:alt: Wedge quadruple
Wedge quadruple.
Examples
--------
.. doctest:: pathcensus
>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).qclust()
i
0 1.0
1 1.0
2 1.0
3 1.0
dtype: float64
"""
df = counts if counts is not None else self.get_counts("nodes")
num = self._qcount(df, which="wedge")
denom = df[self._a("qw")]
return self._divide(num, denom, undefined=undefined)
[docs]
def qclosure(
self,
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> pd.Series:
"""Quadrangle-based local closure coefficient.
Parameters
----------
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
It is defined as the ratio of quadrangles including a focal node ``i``
and the number of head quadruples starting from it.
.. math::
c^H_i = \\frac{2Q_i}{q^H_i}
.. figure:: /figures/q-head.svg
:align: center
:alt: Head quadruple
Head quadruple.
Examples
--------
.. doctest:: pathcensus
>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).qclosure()
i
0 1.0
1 1.0
2 1.0
3 1.0
dtype: float64
"""
df = counts if counts is not None else self.get_counts("nodes")
num = self._qcount(df, which="head")
denom = df[self._a("qh")]
return self._divide(num, denom, undefined=undefined)
[docs]
def complementarity(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
counts: Optional[pd.DataFrame] = None
) -> Union[pd.Series, float]:
"""Structural complementarity coefficients.
Parameters
----------
mode
Should it be calculated for nodes, edges or globally
(equivalent to global clustering).
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
counts
Path counts data frame to use. Mostly for internal use.
Notes
-----
The node-wise coefficient is defined as the ratio of quadrangles
including a focal node ``i`` and the total number of both wedge
and head quadruples:
.. math::
c_i = \\frac{4Q_i}{q^W_i + q^H_i}
The edge-wise coefficient is defined as the ratio of quadrangles
including an ``(i, j)`` edge and the number of quadruples starting
at it:
.. math::
c_{ij} = \\frac{2Q_{ij}}{q_{ij}}
The global coefficient is defined as the ratio of sums of quadrangles
to the sum of quadruples (wedge or head):
.. math::
c
= \\frac{2\\sum_i Q_i}{\\sum_i q^W_i}
= \\frac{2\\sum_i Q_i}{\\sum_i q^H_i}
See Also
--------
compcoefs : structural complementarity coefficients
coefs : structural coefficients
Examples
--------
.. doctest:: pathcensus
>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).complementarity("edges")
i j
0 1 1.0
3 1.0
1 0 1.0
2 1.0
2 1 1.0
3 1.0
3 0 1.0
2 1.0
dtype: float64
>>> PathCensus(A).complementarity("nodes")
i
0 1.0
1 1.0
2 1.0
3 1.0
dtype: float64
>>> PathCensus(A).complementarity("global")
1.0
"""
df = counts if counts is not None else self.get_counts(mode)
num = self._qcount(df, which="wedge") \
+ self._qcount(df, which="head")
denom = df[self._a("qw")] + df[self._a("qh")]
return self._divide(num, denom, undefined=undefined)
# Summaries ---------------------------------------------------------------
[docs]
def simcoefs(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
census: bool = False,
counts: Optional[pd.DataFrame] = None
) -> pd.DataFrame | pd.Series:
"""Calculate similarity coefficients including clustering
and closure coefficients when ``mode="nodes"`` or their
node-wise averages when ``mode="global"``.
Parameters
----------
mode
Should node, edge or global counts be calculated.
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
census
If ``True`` then path census data is added.
as columns in the front of the data frame.
counts
Path counts data frame to use. Mostly for internal use.
Examples
--------
.. doctest:: pathcensus
>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).simcoefs("edges")
sim
i j
0 1 1.0
2 1.0
1 0 1.0
2 1.0
2 0 1.0
1 1.0
>>> PathCensus(A).simcoefs("nodes")
sim tclust tclosure
i
0 1.0 1.0 1.0
1 1.0 1.0 1.0
2 1.0 1.0 1.0
>>> PathCensus(A).simcoefs("global")
sim_g 1.0
sim 1.0
tclust 1.0
tclosure 1.0
dtype: float64
"""
counts = counts if counts is not None else self.get_counts(mode)
kwds = dict(undefined=undefined)
if mode == "edges":
coefs = pd.DataFrame({
"sim": self.similarity(mode, counts=counts, **kwds),
}, index=counts.index)
elif mode == "nodes":
coefs = pd.DataFrame({
"sim": self.similarity(mode, counts=counts, **kwds),
"tclust": self.tclust(counts=counts, **kwds),
"tclosure": self.tclosure(counts=counts, **kwds),
})
else:
coefs = pd.Series({
"sim_g": self.similarity(mode, counts=counts, **kwds),
**self.simcoefs(mode="nodes", **kwds).mean()
})
if census:
paths = self.census(mode)
coefs = pd.concat([coefs, paths], axis=mode != "global")
return coefs
[docs]
def compcoefs(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
census: bool = False,
counts: Optional[pd.DataFrame] = None
) -> pd.DataFrame:
"""Calculate complementarity coefficients including clustering
and closure coefficients when ``mode="nodes"`` or their
node-wise averages when ``mode="global"``.
Parameters
----------
mode
Should node, edge or global counts be calculated.
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
census
If ``True`` then path census data is added.
counts
Path counts data frame to use. Mostly for internal use.
Examples
--------
.. doctest:: pathcensus
>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).compcoefs("edges")
comp
i j
0 1 1.0
3 1.0
1 0 1.0
2 1.0
2 1 1.0
3 1.0
3 0 1.0
2 1.0
>>> PathCensus(A).compcoefs("nodes")
comp qclust qclosure
i
0 1.0 1.0 1.0
1 1.0 1.0 1.0
2 1.0 1.0 1.0
3 1.0 1.0 1.0
>>> PathCensus(A).compcoefs("global")
comp_g 1.0
comp 1.0
qclust 1.0
qclosure 1.0
dtype: float64
"""
counts = counts if counts is not None else self.get_counts(mode)
kwds = dict(undefined=undefined)
if mode == "edges":
coefs = pd.DataFrame({
"comp": self.complementarity(mode, counts=counts, **kwds),
}, index=counts.index)
elif mode == "nodes":
coefs = pd.DataFrame({
"comp": self.complementarity(mode, counts=counts, **kwds),
"qclust": self.qclust(counts=counts, **kwds),
"qclosure": self.qclosure(counts=counts, **kwds)
})
else:
coefs = pd.Series({
"comp_g": self.complementarity(mode, counts=counts, **kwds),
**self.compcoefs(mode="nodes", **kwds).mean()
})
if census:
paths = self.census(mode)
coefs = pd.concat([coefs, paths], axis=mode != "global")
return coefs
[docs]
def coefs(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
**kwds
) -> pd.DataFrame:
"""Calculate structural coefficients.
Parameters
----------
mode
Should node, edge or global counts be calculated.
undefined
If ``'nan'`` the nodes with undefined values are treated
as NaNs. If ``'zero'`` then they are considered zeros.
census
If ``True`` then path census data is added.
counts
Path counts data frame to use. Mostly for internal use.
See Also
--------
simcoefs : structural similarity coefficients
compcoefs: structural complementarity coefficients
"""
if "counts" not in kwds:
kwds["counts"] = self.get_counts(mode)
census = kwds.pop("census", False)
skw = kwds.copy()
ckw = kwds
scoefs = self.simcoefs(mode, **skw)
ccoefs = self.compcoefs(mode, **ckw)
coefs = pd.concat([scoefs, ccoefs], axis=mode != "global")
if census:
paths = self.census(mode)
coefs = pd.concat([coefs, paths], axis=mode != "global")
return coefs
[docs]
def census(
self,
mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore
*,
counts: Optional[pd.DataFrame] = None
) -> pd.DataFrame:
"""Calculate path census.
Parameters
----------
mode
Should node, edge or global counts be calculated.
counts
Path counts data frame to use. Mostly for internal use.
Examples
--------
.. doctest:: pathcensus
>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).census()
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
"""
self._check_mode(mode)
if counts is None:
counts = self.get_counts(mode).copy()
arules = self.definitions.aggregation.get(mode, {})
for k, v in arules.items():
if self.weighted:
counts[k] /= v
else:
counts[k] //= v
return counts
# Serialization -----------------------------------------------------------
[docs]
def dump(self) -> Tuple[
int, np.ndarray, Optional[np.ndarray], Optional[np.ndarray]
]:
"""Dump to raw data in the form of arrays and the number of nodes.
Returns
-------
n_nodes
Number of nodes.
E
Edgelist array.
W
Optional edge weights array.
counts
Path counts array. May be ``None``.
"""
return self.n_nodes, self.graph.E, self.graph.W, self.counts
[docs]
@classmethod
def from_dump(
cls,
n_nodes: int,
E: np.ndarray,
W: Optional[np.ndarray] = None,
counts: Optional[np.ndarray] = None,
adj_kws: Optional[Dict] = None,
**kwds: Any
) -> PathCensus: # type: ignore
"""Construct from the output of :py:meth:`dump`.
Parameters
----------
n_nodes
Number of nodes.
E
Edgelist array.
W
Optional edge weights array.
counts
Optional path counts array. It is calculated on-the-fly
when ``None``.
adj_kws
Passed to :py:func:`pathcensus.utils.adjacency`.
**kwds
Passed to :py:meth:`count_paths`
when `counts` is ``None``.
"""
if W is None:
weighted = False
W = np.full(len(E), 1)
else:
weighted = True
i, j = E[:, 1:].T
A = csr_matrix((W, (i, j)), shape=(n_nodes, n_nodes))
adj_kws = adj_kws or {}
paths = cls(A, weighted=weighted, adj_kws=adj_kws, count_paths=False)
if counts is None:
paths.count(**kwds)
else:
paths.counts = counts
return paths
# Internals ---------------------------------------------------------------
def _a(self, name: str) -> str:
"""Resolve possibly aliased path name."""
return self.definitions.resolve(name)
def _make_counts(
self,
E: np.ndarray,
counts: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:
"""Make path counts data frame."""
swaps = self.definitions.get_swap_rules()
cols = self.definitions.get_column_ids()
names = self.definitions.get_column_names()
E2 = E.copy()
E2[:, [0, 1]] = E[:, [1, 0]]
E = np.vstack((E, E2))
if not self.weighted:
counts = counts.astype(UInt)
counts2 = counts.copy()
for u, v in swaps:
counts2[:, [u, v]] = counts[:, [v, u]]
counts = np.vstack((counts, counts2))
# Drop unnecessary columns
counts = counts[:, cols]
counts = pd.DataFrame(
data=counts,
columns=names,
index=pd.MultiIndex.from_arrays(
arrays=E.T,
names=["i", "j"]
)
)
return counts.sort_index()
def _check_undef(self, val: str) -> None:
if val not in self.Meta.undef:
raise ValueError(f"'undefined' has to be one of {self.Meta.undef}")
def _check_mode(self, val: str) -> None:
if val not in self.Meta.mode:
raise ValueError(f"'mode' has to be one of {self.Meta.mode}")
def _divide(
self,
x: Union[int, float],
y: Union[int, float],
*,
undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore
) -> float:
with np.errstate(invalid="ignore"):
out = x / y
if undefined == "zero":
if np.isscalar(out):
if np.isnan(out) or np.isinf(out):
out = 0.0
else:
out[np.isnan(out) | np.isinf(out)] = 0.0
else:
if np.isscalar(out):
if np.isinf(out):
out = np.nan
else:
out[np.isinf(out)] = np.nan
return out
def _qcount(
self,
df: pd.DataFrame,
which: Literal["wedge", "head"],
) -> pd.Series:
"""Get quadrangle count."""
if which == "wedge":
col = "q0wc"
elif which == "head":
col = "q0hc"
else:
raise ValueError("incorrect 'which' value")
q = df[self._a(col)].copy()
return q
def _rev_index(self, s: pd.Series) -> pd.Series:
s = s.copy()
s.index.names = s.index.names[::-1]
return s.swaplevel()