Source code for pathcensus.pathcensus

""":class:`PathCensus` is the basis which structural similarity and
complementarity coefficients are derived from. In its raw form
it is a set of counts of wedge and head triples (2-paths) and quadruples
(3-paths) traversing an ``(i, j)`` edge as well as counts of corresponding
triangles (3-cycles) and quadrangles (4-cycles).
In the weighted case there are separate counts of triangles
and quadrangles for wedge and head paths as average weights are defined
differently for these two cases.

.. note::
    Path/cycle counts and structural coefficients are returned
    as :class:`pandas.Series` or :class:`pandas.DataFrame` objects indexed
    properly with integer node indices corresponding to the ordering of
    rows in the underlying adjacency matrix of the network.

.. note::
    Path census calculations are relatively efficient as the main
    workhorse functions are just-in-time (JIT) compiled to highly
    optimized C code using :py:mod:`numba` package. Moreover,
    the path census algorithm is based on a state-of-the-art
    graphlet counting algorithm proposed by
    :cite:t:`ahmedEfficientGraphletCounting2015`
    which has worst-case asymptotic computational complexity of
    :math:`O(md^2_{\\text{max}})`, where :math:`m` is the number
    of edges and :math:`d_{\\text{max}}` is the maximum degree.

    Moreover, some additional optimizations are used to speed up
    the calculations in the case of highly heterogeneous degree
    distributions (e.g. power laws). See ``min_di`` argument in
    :meth:`PathCensus.count_paths`.

Node- and graph-level counts are derived from edge-level counts according
to simple aggregations rules. They are defined in definition classes
implemented in :py:mod:`pathcensus.definitions` submodule.

.. seealso::

    :class:`pathcensus.definitions.PathDefinitionsUnweighted`
    for the naming scheme used for unweighted counts.

    :class:`pathcensus.definitions.PathDefinitionsWeighted`
    for the naming scheme used for weighted counts.

Below a node-level census for a simple triangle graph is counted.

.. doctest:: census-triangle

    >>> import numpy as np
    >>> from pathcensus import PathCensus
    >>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
    >>> P = PathCensus(G)
    >>> # Census calculations can be also parallelized
    >>> # by default all available threads are used
    >>> P = PathCensus(G, parallel=True)
    >>> # But the number of threads can be set explicitly
    >>> P = PathCensus(G, num_threads=2)
    >>> P.census("nodes")
       t  tw  th  q0  qw  qh
    i
    0  1   2   2   0   0   0
    1  1   2   2   0   0   0
    2  1   2   2   0   0   0

Structural similarity
---------------------

Structural similarity coefficients (:meth:`PathCensus.similarity`)
as well as their corresponding clustering (:meth:`PathCensus.tclust`)
and closure coefficients (:meth:`PathCensus.tclosure`)
are defined quite simply in terms of ratios of 3-cycles (triangles)
to 2- (triples) counted at the levels of edges, nodes or globaly within an
entire graph. The figure below presents a summary of the underlying geometric
motivation as well as the main properties of structural similarity
coefficients, including the differences relative to local clustering and
closure coefficient
:cite:p:`wattsCollectiveDynamicsSmallworld1998,yinLocalClosureCoefficient2019`.

.. figure:: /figures/sim.svg
    :align: center
    :alt: Overview of the properties of structural similarity coefficients

    Overview of the properties of structural similarity coefficients.

Below node-wise structural similarity coefficients are counted for
a simple triangle graph.

.. doctest:: simcoef-nodes

    >>> import numpy as np
    >>> from pathcensus import PathCensus
    >>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
    >>> P = PathCensus(G)
    >>> P.similarity("nodes")
    i
    0    1.0
    1    1.0
    2    1.0
    dtype: float64

Structural complementarity
--------------------------

Structural complementarity (:meth:`PathCensus.complementarity`)
coefficients and their corresponding clustering
(:meth:`PathCensus.qclust`) and closure coefficients
(:meth:`PathCensus.qclosure`) are defined in terms of ratios of 4-cycles
(quadrangles) to 3-paths (quadruples) and can be defined at the levels
of edges, nodes and entire graphs. The figure below present a summary
of the underlying geometric motivation and some of the main properties.

.. figure:: /figures/comp.svg
    :align: center
    :alt: Overview of the properties of structural complementarity coefficients

    Overview of the properties of structural complementarity coefficients.

Below we calculate complementarity coefficients for nodes
in a 4-clique graph. Note that complementarity coefficients are all
zeros as there are not quadrangles without any chordal edges.

.. doctest:: compcoefs-nodes

    >>> import numpy as np
    >>> from pathcensus import PathCensus
    >>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
    >>> P = PathCensus(G)
    >>> P.complementarity("nodes")
    i
    0    0.0
    1    0.0
    2    0.0
    3    0.0
    dtype: float64

Weighted coefficients
---------------------

Structural coefficients can be also defined for weighted networks in which
case paths and cycles are weighted according to the arithmetic average
over edge weights defining an underlying path
(so closing or chordal edges in triangles/quadrangles are ignored).
This can be seen as an extension of the weighted clustering coefficient
proposed by :cite:t:`barratArchitectureComplexWeighted2004`.
Indeed, our formulation of the weighted clustering based on triangles
is equivalent to it.
The figure below presents a summary of the weighting rules.

.. figure:: /figures/weighted.svg
    :align: center
    :alt: Overview of weighted path/cycle counts

    Overview of weighted path/cycle counts.

Edge weights should be detected automatically in most cases provided that
a standard name of edge weight attribute (``"weight"``) is used.
However, weighted computations may be also enabled/disabled explicitly
by using ``weighted`` argument.

.. doctest:: coefs-weighted

    >>> import numpy as np
    >>> from pathcensus import PathCensus
    >>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
    >>> PathCensus(G).census("nodes")
       twc   thc    tw    th  q0wc  q0hc  qw  qh
    i
    0  2.5  6.75   5.0  13.5   0.0   0.0   0.0   0.0
    1  6.5  4.75  13.0   9.5   0.0   0.0   0.0   0.0
    2  7.0  4.50  14.0   9.0   0.0   0.0   0.0   0.0
    >>> PathCensus(G, weighted=False).census("nodes")
       t  tw  th  q0  qw  qh
    i
    0  1   2   2   0   0   0
    1  1   2   2   0   0   0
    2  1   2   2   0   0   0
"""
from __future__ import annotations
from typing import Union, Literal, Optional, Any, Tuple, Dict
import numpy as np
from scipy.sparse import csr_matrix
import numba
import pandas as pd
from . import types, adjacency
from .core.graph import Graph
from .core.parallel import count_paths_parallel
from .types import UInt, Float
from .definitions import PathDefinitionsUnweighted, PathDefinitionsWeighted


[docs] class PathCensus: """Path census and structural coefficients calculations for undirected graphs. Attributes ---------- graph :class:`pathcensus.core.graph.Graph` instance for calculating path census. counts Data frame with path/cycle counts per edge. Initialization may be postponed. Notes ----- Naming scheme used for denoting counts is documented in the docstring for ``definitions`` attribute (i.e. ``self.definitions``). .. testsetup:: pathcensus import numpy as np from pathcensus import PathCensus """ # pylint: disable=too-many-public-methods
[docs] class Meta: """Container class with various metadata such as lists of possible values of arguments of different methods of :class:`PathCensus`. **Fields** mode Allowed values of ``mode`` argument in structural coefficients methods. undef Allowed values of ``undefined`` argument in structural coefficients methods. """ mode = ("nodes", "edges", "global") undef = ("nan", "zero")
def __init__( self, graph: types.GraphABC, weighted: Optional[bool] = None, validate: bool = True, adj_kws: Optional[Dict] = None, count_paths: bool = True, **kwds: Any ) -> None: """Initialization method. Parameters ---------- graph Graph-like object registered with :py:class:`pathcensus.types.GraphABC` abstract base class and a registered single dispatch method registered on :py:class:`pathcensus.utils.adjacency`. Sparse matrices work out of the box. weighted Should the graph be interpreted as weighted. Determined automatically if ``None``. validate Should input graph be validate for correctness (i.e. checked if is undirected). adj_kws Additional keyword params passed to :py:func:`pathcensus.utils.adjacency`. count_paths Should `counts` attribute be initialized immediately. It can be initialized later using :py:meth:`count` method. **kwds Passed to :py:meth:`count_paths`. """ adj_kws = adj_kws or {} graph_kws = dict(weighted=weighted, validate=validate, **adj_kws) self.graph = self.get_graph(graph, **graph_kws) # Setup path definition objects if self.weighted: self.definitions = PathDefinitionsWeighted() else: self.definitions = PathDefinitionsUnweighted() self.counts = None if count_paths: self.count(**kwds)
[docs] def count(self, **kwds: Any) -> None: """Count paths and set `self.counts` attribute. ``**kwds`` are passed to :py:meth:`count_paths`. """ E, counts = self.count_paths(self.graph, **kwds) self.counts = self._make_counts(E, counts)
# Properties -------------------------------------------------------------- @property def n_nodes(self) -> int: return self.graph.n_nodes @property def vcount(self) -> int: return self.n_nodes @property def n_edges(self) -> int: return self.counts.shape[0] @property def ecount(self) -> int: return self.n_edges @property def weighted(self) -> bool: return self.graph.weighted @property def degree(self) -> np.ndarray: """Get degree sequence of the underlying graph.""" return self.graph.D @property def strength(self) -> np.ndarray: """Get strength sequence of the underlying graph (or degree sequence in the unweighted case). """ if self.weighted: return self.graph.S return self.graph.D @property def tdf(self) -> pd.DataFrame: """Data frame with triple/triangle counts per edge.""" cols = [ name for name in self.definitions.get_column_names() if name in self.definitions["sim"] ] return self.counts[cols] @property def qdf(self) -> pd.DataFrame: """Data frame with quadruple/quadrangle counts per edge.""" cols = [ name for name in self.definitions.get_column_names() if name in self.definitions["comp"] ] return self.counts[cols] # Static & class methods --------------------------------------------------
[docs] @classmethod def get_graph( cls, graph: types.GraphABC, weighted: Optional[bool] = None, validate: bool = True, **kwds: Any ) -> Graph: """Get graph object for path counting. Parameters ---------- graph A compatibe graph object registered with `paths.types.GraphABC`. n_nodes Number of nodes. Used only when `graph` is passed as an edgelist. weighted Should the graph be interpreted as weighted graph. If ``None`` then it is determined based on the number of unique values of non-zero values in the adjacency matrix. validate Should input graph be validate for correctness (i.e. checked if is undirected). **kwds Passed to :py:func:`pathcensus.utils.adjacency`. """ A = adjacency(graph, **kwds) if validate and (A != A.T).count_nonzero() > 0: raise AttributeError("only undirected graphs are accepted") n_nodes = A.shape[0] E = np.ascontiguousarray(np.array(A.nonzero(), dtype=UInt).T) if weighted is None: weighted = any(A.data != 1) if weighted: W = A.data.astype(Float) else: W = None G = Graph(n_nodes, E, W) return G
[docs] @classmethod def count_paths( cls, graph: Union[types.GraphABC, Graph], *, parallel: Optional[bool] = None, num_threads: Optional[int] = None, graph_kws: Optional[Dict] = None, min_di: bool = True, **kwds: Any ) -> Tuple[int, np.ndarray]: """Count paths and cycles in a graph. Parameters ---------- graph :py:class:`pathcensus.core.graph.Graph` instance. or graph-like object that can be converted to it. parallel Should parallel counting algorithm be used. When ``None`` it is used by default for graphs with at least one million edges. num_threads Number of threads to use when ``parallel=True``. batch_size Batch size to use when running with ``parallel=True``. graph_kws Additional keyword arguments passed to :py:meth:`get_graph`. Used only when `graph` is not already in the JIT-compiled form. min_di Should `di < dj` rule for iterating over edges be used. This way the most expensive loop of the `PathCensus` algorithm for computing edge-wise path/cycle counts always iterates over neighbors of the lower degree node in an ``(i, j)`` edge. Almost always should be set to ``True``. The argument is used mostly for testing purposes. **kwds Passed to :py:func:`pathcensus.core.parallel.count_paths_parallel` when ``parallel=True``. Notes ----- The ``parallel=True`` argument may not work and lead to segmentation faults on some MacOS machines. Returns ------- n_nodes Number of nodes. counts Path and cycles counts. """ if isinstance(graph, types.GraphABC): graph = cls.get_graph(graph, **(graph_kws or {})) if min_di: E = graph.get_min_di_edges() else: E = graph.get_edges() if parallel is None: # Use parallel algorithm when at least 100k edges parallel = graph.n_edges >= 1e5 if not num_threads or num_threads <= 0: # pylint: disable=no-member num_threads = numba.config.NUMBA_NUM_THREADS if parallel and num_threads > 1: orig_num_threads = numba.get_num_threads() numba.set_num_threads(num_threads) try: E, counts = count_paths_parallel(graph, **kwds) finally: numba.set_num_threads(orig_num_threads) else: E, counts = graph.count_paths(E) return E, counts
# Auxiliary methods -------------------------------------------------------
[docs] def get_counts( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore ) -> pd.DataFrame: """Get (possibly aggregated) path counts. Parameters ---------- mode Should node, edge or global counts be calculated. """ self._check_mode(mode) counts = self.counts if mode == "nodes": counts = counts.groupby(level="i") \ .sum() \ .reindex(np.arange(self.n_nodes), copy=False) \ .fillna(0) elif mode == "global": counts = counts.sum() return counts
# Similarity coefficients -------------------------------------------------
[docs] def tclust( self, *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> pd.Series: """Triangle-based local clustering (node-wise). It is equivalent to local clustering coefficient :cite:p:`wattsCollectiveDynamicsSmallworld1998`. Parameters ---------- undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- It is defined as the ratio of triangles including a focal node ``i`` to the number of wedge triples centered at it: .. math:: s^W_i = \\frac{2T_i}{t^W_i} .. figure:: /figures/t-wedge.svg :align: center :alt: Wedge triple Wedge triple. Examples -------- .. doctest:: pathcensus >>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).tclust() i 0 1.0 1 1.0 2 1.0 dtype: float64 """ df = counts if counts is not None else self.get_counts("nodes") num = df[self._a("twc")] denom = df[self._a("tw")] return self._divide(num, denom, undefined=undefined)
[docs] def tclosure( self, *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> pd.Series: """Triangle-based local closure coefficient (node-wise). It is equivalent to local closure coefficient :cite:p:`yinLocalClosureCoefficient2019`. Parameters ---------- undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- It is defined as the ratio of the number of triangles including a focal node ``i`` to the number of head triples starting from it: .. math:: s^H_i = \\frac{2T_i}{t^H_i} .. figure:: /figures/t-head.svg :align: center :alt: Head triple Head triple. Examples -------- .. doctest:: pathcensus >>> # Triangle graph >>> A = np.array([[0,1,1],[1,0,1],[1,1,0]]) >>> PathCensus(A).tclosure() i 0 1.0 1 1.0 2 1.0 dtype: float64 """ df = counts if counts is not None else self.get_counts("nodes") num = df[self._a("thc")] denom = df[self._a("th")] return self._divide(num, denom, undefined=undefined)
[docs] def similarity( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> Union[pd.Series, float]: """Structural similarity coefficients. Parameters ---------- mode Should it be calculated for nodes, edges or globally (equivalent to global clustering). undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- It is defined as the ratio of triangles including a focal node ``i`` to the total number of both wedge and head triples: .. math:: s_i = \\frac{4T_i}{t^W_i + t^H_i} See Also -------- simcoefs : structural similarity coefficients coefs : structural coefficients Examples -------- .. doctest:: pathcensus >>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).similarity("edges") i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 dtype: float64 >>> PathCensus(A).similarity("nodes") i 0 1.0 1 1.0 2 1.0 dtype: float64 >>> PathCensus(A).similarity("global") 1.0 """ df = counts if counts is not None else self.get_counts(mode) num = df[self._a("twc")] + df[self._a("thc")] denom = df[self._a("tw")] + df[self._a("th")] return self._divide(num, denom, undefined=undefined)
# Complementarity coefficients --------------------------------------------
[docs] def qclust( self, *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> pd.Series: """Quadrangle-based local clustering coefficient (node-wise). Parameters ---------- undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- It is defined as the ratio of quadrangles including a focal node ``i`` and the number of wedge quadruples with ``i`` at the second position (this is to avoid double counting and make the number of wedge and head quadruples per quadrangle equal): .. math:: c^W_i = \\frac{2Q_i}{q^W_i} .. figure:: /figures/q-wedge.svg :align: center :alt: Wedge quadruple Wedge quadruple. Examples -------- .. doctest:: pathcensus >>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclust() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64 """ df = counts if counts is not None else self.get_counts("nodes") num = self._qcount(df, which="wedge") denom = df[self._a("qw")] return self._divide(num, denom, undefined=undefined)
[docs] def qclosure( self, *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> pd.Series: """Quadrangle-based local closure coefficient. Parameters ---------- undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- It is defined as the ratio of quadrangles including a focal node ``i`` and the number of head quadruples starting from it. .. math:: c^H_i = \\frac{2Q_i}{q^H_i} .. figure:: /figures/q-head.svg :align: center :alt: Head quadruple Head quadruple. Examples -------- .. doctest:: pathcensus >>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclosure() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64 """ df = counts if counts is not None else self.get_counts("nodes") num = self._qcount(df, which="head") denom = df[self._a("qh")] return self._divide(num, denom, undefined=undefined)
[docs] def complementarity( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore counts: Optional[pd.DataFrame] = None ) -> Union[pd.Series, float]: """Structural complementarity coefficients. Parameters ---------- mode Should it be calculated for nodes, edges or globally (equivalent to global clustering). undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. counts Path counts data frame to use. Mostly for internal use. Notes ----- The node-wise coefficient is defined as the ratio of quadrangles including a focal node ``i`` and the total number of both wedge and head quadruples: .. math:: c_i = \\frac{4Q_i}{q^W_i + q^H_i} The edge-wise coefficient is defined as the ratio of quadrangles including an ``(i, j)`` edge and the number of quadruples starting at it: .. math:: c_{ij} = \\frac{2Q_{ij}}{q_{ij}} The global coefficient is defined as the ratio of sums of quadrangles to the sum of quadruples (wedge or head): .. math:: c = \\frac{2\\sum_i Q_i}{\\sum_i q^W_i} = \\frac{2\\sum_i Q_i}{\\sum_i q^H_i} See Also -------- compcoefs : structural complementarity coefficients coefs : structural coefficients Examples -------- .. doctest:: pathcensus >>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).complementarity("edges") i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 dtype: float64 >>> PathCensus(A).complementarity("nodes") i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64 >>> PathCensus(A).complementarity("global") 1.0 """ df = counts if counts is not None else self.get_counts(mode) num = self._qcount(df, which="wedge") \ + self._qcount(df, which="head") denom = df[self._a("qw")] + df[self._a("qh")] return self._divide(num, denom, undefined=undefined)
# Summaries ---------------------------------------------------------------
[docs] def simcoefs( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore census: bool = False, counts: Optional[pd.DataFrame] = None ) -> pd.DataFrame | pd.Series: """Calculate similarity coefficients including clustering and closure coefficients when ``mode="nodes"`` or their node-wise averages when ``mode="global"``. Parameters ---------- mode Should node, edge or global counts be calculated. undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. census If ``True`` then path census data is added. as columns in the front of the data frame. counts Path counts data frame to use. Mostly for internal use. Examples -------- .. doctest:: pathcensus >>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).simcoefs("edges") sim i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 >>> PathCensus(A).simcoefs("nodes") sim tclust tclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 >>> PathCensus(A).simcoefs("global") sim_g 1.0 sim 1.0 tclust 1.0 tclosure 1.0 dtype: float64 """ counts = counts if counts is not None else self.get_counts(mode) kwds = dict(undefined=undefined) if mode == "edges": coefs = pd.DataFrame({ "sim": self.similarity(mode, counts=counts, **kwds), }, index=counts.index) elif mode == "nodes": coefs = pd.DataFrame({ "sim": self.similarity(mode, counts=counts, **kwds), "tclust": self.tclust(counts=counts, **kwds), "tclosure": self.tclosure(counts=counts, **kwds), }) else: coefs = pd.Series({ "sim_g": self.similarity(mode, counts=counts, **kwds), **self.simcoefs(mode="nodes", **kwds).mean() }) if census: paths = self.census(mode) coefs = pd.concat([coefs, paths], axis=mode != "global") return coefs
[docs] def compcoefs( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore census: bool = False, counts: Optional[pd.DataFrame] = None ) -> pd.DataFrame: """Calculate complementarity coefficients including clustering and closure coefficients when ``mode="nodes"`` or their node-wise averages when ``mode="global"``. Parameters ---------- mode Should node, edge or global counts be calculated. undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. census If ``True`` then path census data is added. counts Path counts data frame to use. Mostly for internal use. Examples -------- .. doctest:: pathcensus >>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).compcoefs("edges") comp i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 >>> PathCensus(A).compcoefs("nodes") comp qclust qclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 >>> PathCensus(A).compcoefs("global") comp_g 1.0 comp 1.0 qclust 1.0 qclosure 1.0 dtype: float64 """ counts = counts if counts is not None else self.get_counts(mode) kwds = dict(undefined=undefined) if mode == "edges": coefs = pd.DataFrame({ "comp": self.complementarity(mode, counts=counts, **kwds), }, index=counts.index) elif mode == "nodes": coefs = pd.DataFrame({ "comp": self.complementarity(mode, counts=counts, **kwds), "qclust": self.qclust(counts=counts, **kwds), "qclosure": self.qclosure(counts=counts, **kwds) }) else: coefs = pd.Series({ "comp_g": self.complementarity(mode, counts=counts, **kwds), **self.compcoefs(mode="nodes", **kwds).mean() }) if census: paths = self.census(mode) coefs = pd.concat([coefs, paths], axis=mode != "global") return coefs
[docs] def coefs( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore **kwds ) -> pd.DataFrame: """Calculate structural coefficients. Parameters ---------- mode Should node, edge or global counts be calculated. undefined If ``'nan'`` the nodes with undefined values are treated as NaNs. If ``'zero'`` then they are considered zeros. census If ``True`` then path census data is added. counts Path counts data frame to use. Mostly for internal use. See Also -------- simcoefs : structural similarity coefficients compcoefs: structural complementarity coefficients """ if "counts" not in kwds: kwds["counts"] = self.get_counts(mode) census = kwds.pop("census", False) skw = kwds.copy() ckw = kwds scoefs = self.simcoefs(mode, **skw) ccoefs = self.compcoefs(mode, **ckw) coefs = pd.concat([scoefs, ccoefs], axis=mode != "global") if census: paths = self.census(mode) coefs = pd.concat([coefs, paths], axis=mode != "global") return coefs
[docs] def census( self, mode: Literal[Meta.mode] = Meta.mode[0], # type: ignore *, counts: Optional[pd.DataFrame] = None ) -> pd.DataFrame: """Calculate path census. Parameters ---------- mode Should node, edge or global counts be calculated. counts Path counts data frame to use. Mostly for internal use. Examples -------- .. doctest:: pathcensus >>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).census() t tw th q0 qw qh i 0 1 2 2 0 0 0 1 1 2 2 0 0 0 2 1 2 2 0 0 0 """ self._check_mode(mode) if counts is None: counts = self.get_counts(mode).copy() arules = self.definitions.aggregation.get(mode, {}) for k, v in arules.items(): if self.weighted: counts[k] /= v else: counts[k] //= v return counts
# Serialization -----------------------------------------------------------
[docs] def dump(self) -> Tuple[ int, np.ndarray, Optional[np.ndarray], Optional[np.ndarray] ]: """Dump to raw data in the form of arrays and the number of nodes. Returns ------- n_nodes Number of nodes. E Edgelist array. W Optional edge weights array. counts Path counts array. May be ``None``. """ return self.n_nodes, self.graph.E, self.graph.W, self.counts
[docs] @classmethod def from_dump( cls, n_nodes: int, E: np.ndarray, W: Optional[np.ndarray] = None, counts: Optional[np.ndarray] = None, adj_kws: Optional[Dict] = None, **kwds: Any ) -> PathCensus: # type: ignore """Construct from the output of :py:meth:`dump`. Parameters ---------- n_nodes Number of nodes. E Edgelist array. W Optional edge weights array. counts Optional path counts array. It is calculated on-the-fly when ``None``. adj_kws Passed to :py:func:`pathcensus.utils.adjacency`. **kwds Passed to :py:meth:`count_paths` when `counts` is ``None``. """ if W is None: weighted = False W = np.full(len(E), 1) else: weighted = True i, j = E[:, 1:].T A = csr_matrix((W, (i, j)), shape=(n_nodes, n_nodes)) adj_kws = adj_kws or {} paths = cls(A, weighted=weighted, adj_kws=adj_kws, count_paths=False) if counts is None: paths.count(**kwds) else: paths.counts = counts return paths
# Internals --------------------------------------------------------------- def _a(self, name: str) -> str: """Resolve possibly aliased path name.""" return self.definitions.resolve(name) def _make_counts( self, E: np.ndarray, counts: np.ndarray ) -> Tuple[np.ndarray, np.ndarray]: """Make path counts data frame.""" swaps = self.definitions.get_swap_rules() cols = self.definitions.get_column_ids() names = self.definitions.get_column_names() E2 = E.copy() E2[:, [0, 1]] = E[:, [1, 0]] E = np.vstack((E, E2)) if not self.weighted: counts = counts.astype(UInt) counts2 = counts.copy() for u, v in swaps: counts2[:, [u, v]] = counts[:, [v, u]] counts = np.vstack((counts, counts2)) # Drop unnecessary columns counts = counts[:, cols] counts = pd.DataFrame( data=counts, columns=names, index=pd.MultiIndex.from_arrays( arrays=E.T, names=["i", "j"] ) ) return counts.sort_index() def _check_undef(self, val: str) -> None: if val not in self.Meta.undef: raise ValueError(f"'undefined' has to be one of {self.Meta.undef}") def _check_mode(self, val: str) -> None: if val not in self.Meta.mode: raise ValueError(f"'mode' has to be one of {self.Meta.mode}") def _divide( self, x: Union[int, float], y: Union[int, float], *, undefined: Literal[Meta.undef] = Meta.undef[0], # type: ignore ) -> float: with np.errstate(invalid="ignore"): out = x / y if undefined == "zero": if np.isscalar(out): if np.isnan(out) or np.isinf(out): out = 0.0 else: out[np.isnan(out) | np.isinf(out)] = 0.0 else: if np.isscalar(out): if np.isinf(out): out = np.nan else: out[np.isinf(out)] = np.nan return out def _qcount( self, df: pd.DataFrame, which: Literal["wedge", "head"], ) -> pd.Series: """Get quadrangle count.""" if which == "wedge": col = "q0wc" elif which == "head": col = "q0hc" else: raise ValueError("incorrect 'which' value") q = df[self._a(col)].copy() return q def _rev_index(self, s: pd.Series) -> pd.Series: s = s.copy() s.index.names = s.index.names[::-1] return s.swaplevel()