pathcensus package

Subpackages

Submodules

pathcensus.definitions module

Path counting and aggregation definitions.

class pathcensus.definitions.PathDefinitions[source]

Bases: object

Base class for path definitions.

See also

pathcensus.definitions.PathDefinitionsUnweighted: unweighted definitions
pathcensus.definitions.PathDefinitionsWeighted: weighted definitions

property aggregation: Dict: Aggregation rules for computing node and global counts from edge counts. The integers specify the factor to divide by the sum over edge counts to get a corresponding node/global count.

property aliases: Dict: Aliases mapping actual path names to raw names.

property definitions: Dict: Raw path definitions. Weighted and unweighted definitions are derived from this.

enumerate() → List[Tuple[str, int]][source]: Enumerate path names.

get_column_ids() → List[int][source]: Get indices of path columns to leave once reversed counting is done.

get_column_names() → List[str][source]: Get names of path columns used once the reverse counting is done.

get_swap_rules() → List[Tuple[int, int]][source]

Get swap rules for counting reversed paths.

They define indices of pairs of columns which need to be swaped in order to get reversed paths. Note that reverses of wedge paths are head paths and vice versa. In the case of weighted paths also wedge/head cycle counts need to be reversed.

list() → List[str][source]: List path names.

property npaths: int: Number of different path/cycle counts.

resolve(name) → str[source]: Resolve path name alias.

class pathcensus.definitions.PathDefinitionsUnweighted[source]

Bases: PathDefinitions

Unweighted path definitions.

Similarity-related paths

t: Triangles.
tw: Wedge-triples around i (i.e. k-i-j paths).
th: Head-triples originating from i (i.e. i-j-k paths).

Complementarity-related paths

q0: Strong quadrangles.
qw: Wedge-quadruples around i (i.e. k-i-j-l paths).
qh: Head-quadruples originating from i (i.e. i-j-k-l paths).

property aliases: Dict: Aliases mapping actual path names to raw names.

property definitions: Dict: Raw path definitions. Weighted and unweighted definitions are derived from this.

class pathcensus.definitions.PathDefinitionsWeighted[source]

Bases: PathDefinitions

Weighted path definitions.

Similarity-related paths

twc: Closed wedge triples or triangles weighted by ij and ik edges.
thc: Closed head triples or triangles weighted by ij and jk edges.
tw: Wedge triples.
th: Head triples.

Complementarity-related paths

q0wc: Closed wedge quadruples with no chords (strong quadrangles) weighted by ij, jk, and il edges.
qw: Wedge quadruples.
q0hc: Closed head quadruples with no chords (strong quadrangles) weighted by ij, jk and kl edges.
qh: Head quadruples.

pathcensus.graph module

Arbitrary classes of which instances can be interpreted as scipy sparse matrices or 2D square numpy arrays can be registered as abstract subclasses of GraphABC. This way all main classes/functions implemented in pathcensus can automatically interpret them as graph-like objects allowing seemless integration with many different data formats and third-party packages such as networkx or igraph.

In order to register a class a function for converting its instances to scipy.sparse.spmatrix (CRS format) needs to be defined. The conversion is handled by the pathcensus.graph.adjacency() function which can be overloaded through the single dispatch mechanism. In particular, it should be called on arrays/sparse matrices extracted from graph classes to ensure standardized format. See the example below.

Graph classes defined by networkx, igraph and graph_tool are registered automatically provided the packages are installed.

Note

pathcensus uses the A_{ij} = 1 convention to indicate that a node i sends a tie to a node j. Functions converting graph-like objects to arrays / sparse matrices need to be aware of that.

Below is an example in which a custom conversion from a list of list format is registered. Arguably, the below implementation is naive and handles the conversion by simply converting to a numpy array, without checking wheter the array is really 2D and square, but it illustrates the main logic of registering custom graph-like classes.

>>> import numpy as np
>>> from pathcensus import GraphABC, adjacency, PathCensus

>>> def _adj_list(graph: list) -> spmatrix:
...     """Adjacency matrix from list of lists."""
...     A = np.array(graph)
...     return adjacency(A)

>>> GraphABC.register_type(list, _adj_list)
>>> # A triangle graph
>>> A = [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
>>> # Calculate path census
>>> paths = PathCensus(A)
>>> paths.census("global")
   t  tw  th  q0  qw  qh
0  1   3   3   0   0   0

class pathcensus.graph.GraphABC[source]

Bases: ABC

Abstract Base Class (ABC) for registering different graphs classes.

Any kind of graph object from different libraries can be registered as a subclass of ABC as long as also a function for converting it into a sparse adjacency matrix is provided at the registration time. In particular, scipy.sparse.spmatrix objects are automatically recognized as graph-like objects.

This allows all graph-based functions/methods/class defined in pathcensus to operate flexibly on any sort of graph-like objects / graph implementations.

Provided the packages are installed methods for handling graph objects from networkx, igraph and graph_tool are automatically registered.

See class method register_graph() for more info.

classmethod register_type(subclass: type, adj: Callable[[...], spmatrix]) → None[source]

Register type as a subclass of pathcensus.graph.GraphABC.

Parameters

subclass – Any graph-like class.
adj – Function for converting subclass graphs to sparse adjacency matrices. It should use the following signature (graph, **kwds) -> spmatrix. The return matrix must be in the format in which (i, j) indicate an edge from i to j (in the case a network is directed). Using **kwds is optional and in general it is best to implement adj in such a way that using **kwds is not necessary, in particular for detecting whether a graph is weighted and converting it to a weighted adjacency matrix if necessary. This way pathcensus will be able automatically choose to use weighted methods for weighted graphs.

pathcensus.graph.adjacency(graph: GraphABC) → spmatrix[source]
pathcensus.graph.adjacency(graph: ndarray) → spmatrix
pathcensus.graph.adjacency(graph: spmatrix) → spmatrix
pathcensus.graph.adjacency(graph: Graph, **kwds: Any) → spmatrix: Get (unweighted) adjacency matrix of a graph.

pathcensus.inference module

Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:

Calculate statistics of interest on an observed graph.
Sample R randomized instances from an appropriate null model.
Calculate graph statistics on null model samples.
Compare observed and null model values.

Inference class implements the above approach. It is comaptible with any registered class of graph-like objects and any properly implemented subclass of pathcensus.nullmodels.base.ERGM representing a null model to sample from.

See also

pathcensus.graph for seemless pathcensus integration with arbitrary graph-like classes.

pathcensus.nullmodels for available null models.

This simulation-based approach is relatively efficient for graph- and node-level statistics but can be very computationally expensive when used for edge-level analyses. Hence, is this case it is often useful to use various coarse-graining strategies to reduce the number of unique combinations of values of sufficient statistics.

See also

pathcensus.graph.GraphABC for the abstract class for graph-like objects.

pathcensus.nullmodels for compatible ERGM classes.

pathcensus.inference.Inference.coarse_grain() for coarse-graining methods.

Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.

>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True

class pathcensus.inference.Inference(graph: GraphABC, model: ERGM, statistics: Callable[[GraphABC], Data], *, aggregate_by: Literal[_aggregate_by] = 'stats')[source]

Bases: object

Generic approximate statistical inference based on arbitrary null models with node-level sufficient statistics.

The methods implemented by this class are based on sampling from null models so they are may not be very efficient, in particular for edge-level statistics. On the other hand, they allow to conduct statistical inferency for arbitrary graph statistics.

graph: Graph-like object representing an observed network.

model: Fitted instance of a subclass of pathcensus.nullmodels.ERGM.

statistics

Function for calculating graph statistics of interest with the following signature:

(graph, **kwds) -> DataFrame / Series

The first argument must be a graph-like object (e.g. a sparse matrix), **kwds can be used to pass additional arguments (only keyword args are allowed) if necessary. The return value must be either a pandas.DataFrame or pandas.Series.

aggregate_by: Mode of aggregation for determining null distribution. If "stats" then null distribution is aggregated within unique combinations of values of the sufficient statistics (possibly coarse-grained, see init_comparison()). If "units" then null distribution is aggregated within individual units (e.g. nodes). This is often useful in analyses at the level of nodes but may require too many samples for edge-level analyses.

class Levels(units: Tuple[str, ...], stats: Tuple[str, ...], other: Tuple[str, ...])[source]

Bases: object

Container class for storing information on unit, sufficient statistics and other index levels in observed and null model data.

other: Tuple[str, ...]

stats: Tuple[str, ...]

units: Tuple[str, ...]

__call__(graph: GraphABC, _stats: Optional[ndarray] = None, **kwds: Any) → Union[Series, DataFrame][source]

This method should be called to actually calculate graph statistics.

Parameters

graph – Graph-like object to calculate statistics for.
stats – Array of sufficient statistics for nodes. If None then self.model.statistics is used.
**kwds – Passed to graph statistics function.

static add_index(data: Union[Series, DataFrame], idx: Mapping[str, Sequence], *, prepend: bool = False, drop_unnamed: bool = True, copy: bool = True) → Union[Series, DataFrame][source]

Add index to a data frame or series.

Parameters

data – Data frame or series.
idx – Mapping from index names to sequences of values.
prepend – Should new indexes be prepended or appended to the existing indexes.
drop_unnamed – Should unnamed indexes be droppped during the process. Unnamed indexes are usually generic indexes which are redundant after adding additional indexes.
copy – Should a copy be returned.

add_stats_index(data: Union[Series, DataFrame], stats: Optional[ndarray] = None) → Union[Series, DataFrame][source]

Add indexes with sufficient statistics.

Parameters

data – Data frame or series with graph statistics.
stats – Array of sufficient statistics. Use self.model.statistics if None.

static adjust_pvalues(pvals: Data, *, alpha: float = 0.05, copy: True = <class 'bool'>, **kwds: Any) → Data[source]

Adjust p-values for multiple testing.

Benjamini-Hochberg-Yekuteli two-stage procedure implemented in statsmodels.multitest.fdrcorrection_twostage() is used.

Parameters

pvals – Data frame / series with p-values for different coefficients in columns.
alpha – Desired type I error rate after the adjustement.
copy – Should copy of pvals be returned.
**kwds – Additional arguments passed to statsmodels.multitest.fdrcorrection_twostage()

estimate_pvalues(data: pd.DataFrame, null: pd.DataFrame, *, alternative: Literal[_alternative] = 'greater', adjust: bool = True, resolution: int = 1000, **kwds: Any) → Data[source]

Estimate p-values of node/edge/global coefficients based on sampling from a configuraiton model (as returned by init_comparison()).

Parameters

data – Data frame with observed graph statistics.
null – Data frame with simulated null distribution of graph statistics.
alternative – Type of test two perform. Currently only one-sided tests are supported.
adjust – Should p-values be adjusted. Benjamini-Hochberg FDR correction is used by default when True.
resolution – Resolution of p-value estimation. It specifies the number of quantiles to comapre observed values against. For instance, if resolution=100 then p-values will be accurate only up to 0.01. This parameter controls the amount of memory consumed by the estimation process.
**kwds – Passed as additional arguents to adjust_pvalues() when adjust=True.

See also

adjust_pvalues: p-value adjustment method

Returns: P-values for statistics as pandas.Series (for one graph statistic) or pandas.DataFrame (for multiple statistics).
Return type: pvalues

filter_index(data: Data, target: Data, *, how: Literal[_filter_index] = 'values', levels: Optional[Union[Sequence[str], Sequence[int]]] = None) → Data[source]

Filter data by index with respect to target.

Parameters

data – Data to filter.
target – Dataset with target index.
how – How index should be filtered. Either by unique combinations of values or just contained to the range of values for separate levels in target.
levels – Levels to use for filtering. If None then either self.levels.units or self.levels.stats is used depending on the value of self.aggregate_by.

Returns

Filtered copy of data.

Return type

data

get_levels(data: Data) → Levels[source]: Get index levels descriptor from a data object.

index_names = ('i', 'j')

init_comparison(n: int, *, filter_index: Union[bool, Literal[_filter_index]] = False, sample_index: bool = False, null_kws: Optional[Dict] = None, **kwds: Any) → Tuple[Data, Data, Levels][source]

Initialize data for a comparison with a null model and determine index level names.

Parameters

n – Number of null model samples.
filter_index – If True or "values" then null will be filtered to contain only observations with index values matching those in data with levels used for the comparison selected based on self.aggregate_by. If "range" then null model samples will be filtered to be in the range of index values in the observed data.
sample_index – If False then _sample index with sample ids is dropped from null data frame with null model samples.
null_kws – Keyword args passed to simulate_null().
**kwds – Passed to statistics() method used for calculating statistics of interest.

Notes

Estimating distributions of edge-wise statistics conditional on sufficient statistics of the participating nodes may require really large number of samples and in general is not really feasible for large networks (in particular weighted). The same applies, although probably to slightly lesser degree, to node-wise statistics when use_observed_stats=False is passed to simulate_null().

Efficient methods for solving these problems will be implemented in the future.

Returns

data – Observed graph statistics.
null – Null distribution samples.

property n_nodes: int: Number of nodes in the observed network.

postprocess(data: Union[Series, DataFrame], target: Union[Series, DataFrame]) → Union[Series, DataFrame][source]

Postprocess data after running a comparison.

This mainly involves sanitizing index names after aggregation as well as setting proper shape and types for outputs so data has the same general form as target.

postprocess_index(data: Union[Series, DataFrame]) → Union[Series, DataFrame][source]

Postprocess index after running a comparison.

This involves getting rid of temporary index names used when running comparisons as well as any unnamed indexes. Moreover sufficient statistics indexes are removed if self.aggregate_by == "unit" or ensuring that observed values of sufficient statistics are used in the index (instead of coarse-grained values) if self.aggregate == "stats".

simulate_null(n: int, *, progress: bool = False, progress_kws: Optional[Dict] = None, use_observed_stats: bool = True, **kwds: Any) → Union[Series, DataFrame][source]

Get data frame of null model samples of strucutral coefficients.

Parameters

n – Number of samples.
progress – Should progress bar be showed.
progress_kws – Keyword arguments for customizing progress bar when progress=True. Passed to tqdm.tqdm().
use_observed_stats – If True then simulated data is indexed with sufficient statistics from the observed network. This often helps to accumulate enough observations faster at the expense of not fully exact conditioning.
**kwds – Keyword arguments passed to statistics().

Returns

Data frame with simulated null distribution.

Return type

null

pathcensus.pathcensus module

PathCensus is the basis which structural similarity and complementarity coefficients are derived from. In its raw form it is a set of counts of wedge and head triples (2-paths) and quadruples (3-paths) traversing an (i, j) edge as well as counts of corresponding triangles (3-cycles) and quadrangles (4-cycles). In the weighted case there are separate counts of triangles and quadrangles for wedge and head paths as average weights are defined differently for these two cases.

Note

Path/cycle counts and structural coefficients are returned as pandas.Series or pandas.DataFrame objects indexed properly with integer node indices corresponding to the ordering of rows in the underlying adjacency matrix of the network.

Note

Path census calculations are relatively efficient as the main workhorse functions are just-in-time (JIT) compiled to highly optimized C code using numba package. Moreover, the path census algorithm is based on a state-of-the-art graphlet counting algorithm proposed by Ahmed et al. [ANRD15] which has worst-case asymptotic computational complexity of \(O(md^2_{\text{max}})\), where \(m\) is the number of edges and \(d_{\text{max}}\) is the maximum degree.

Moreover, some additional optimizations are used to speed up the calculations in the case of highly heterogeneous degree distributions (e.g. power laws). See min_di argument in PathCensus.count_paths().

Node- and graph-level counts are derived from edge-level counts according to simple aggregations rules. They are defined in definition classes implemented in pathcensus.definitions submodule.

See also

pathcensus.definitions.PathDefinitionsUnweighted for the naming scheme used for unweighted counts.

pathcensus.definitions.PathDefinitionsWeighted for the naming scheme used for weighted counts.

Below a node-level census for a simple triangle graph is counted.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
   t  tw  th  q0  qw  qh
i
0  1   2   2   0   0   0
1  1   2   2   0   0   0
2  1   2   2   0   0   0

Structural similarity

Structural similarity coefficients (PathCensus.similarity()) as well as their corresponding clustering (PathCensus.tclust()) and closure coefficients (PathCensus.tclosure()) are defined quite simply in terms of ratios of 3-cycles (triangles) to 2- (triples) counted at the levels of edges, nodes or globaly within an entire graph. The figure below presents a summary of the underlying geometric motivation as well as the main properties of structural similarity coefficients, including the differences relative to local clustering and closure coefficient [WS98, YBL19].

Below node-wise structural similarity coefficients are counted for a simple triangle graph.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0    1.0
1    1.0
2    1.0
dtype: float64

Structural complementarity

Structural complementarity (PathCensus.complementarity()) coefficients and their corresponding clustering (PathCensus.qclust()) and closure coefficients (PathCensus.qclosure()) are defined in terms of ratios of 4-cycles (quadrangles) to 3-paths (quadruples) and can be defined at the levels of edges, nodes and entire graphs. The figure below present a summary of the underlying geometric motivation and some of the main properties.

Below we calculate complementarity coefficients for nodes in a 4-clique graph. Note that complementarity coefficients are all zeros as there are not quadrangles without any chordal edges.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0    0.0
1    0.0
2    0.0
3    0.0
dtype: float64

Weighted coefficients

Structural coefficients can be also defined for weighted networks in which case paths and cycles are weighted according to the arithmetic average over edge weights defining an underlying path (so closing or chordal edges in triangles/quadrangles are ignored). This can be seen as an extension of the weighted clustering coefficient proposed by Barrat et al. [BBPastorSatorrasV04]. Indeed, our formulation of the weighted clustering based on triangles is equivalent to it. The figure below presents a summary of the weighting rules.

Overview of weighted path/cycle counts.

Edge weights should be detected automatically in most cases provided that a standard name of edge weight attribute ("weight") is used. However, weighted computations may be also enabled/disabled explicitly by using weighted argument.

>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
   twc   thc    tw    th  q0wc  q0hc  qw  qh
i
0  2.5  6.75   5.0  13.5   0.0   0.0   0.0   0.0
1  6.5  4.75  13.0   9.5   0.0   0.0   0.0   0.0
2  7.0  4.50  14.0   9.0   0.0   0.0   0.0   0.0
>>> PathCensus(G, weighted=False).census("nodes")
   t  tw  th  q0  qw  qh
i
0  1   2   2   0   0   0
1  1   2   2   0   0   0
2  1   2   2   0   0   0

class pathcensus.pathcensus.PathCensus(graph: GraphABC, weighted: Optional[bool] = None, validate: bool = True, adj_kws: Optional[Dict] = None, count_paths: bool = True, **kwds: Any)[source]

Bases: object

Path census and structural coefficients calculations for undirected graphs.

graph: pathcensus.core.graph.Graph instance for calculating path census.

counts: Data frame with path/cycle counts per edge. Initialization may be postponed.

Notes

Naming scheme used for denoting counts is documented in the docstring for definitions attribute (i.e. self.definitions).

class Meta[source]

Bases: object

Container class with various metadata such as lists of possible values of arguments of different methods of PathCensus.

Fields

mode: Allowed values of mode argument in structural coefficients methods.
undef: Allowed values of undefined argument in structural coefficients methods.

mode = ('nodes', 'edges', 'global')

undef = ('nan', 'zero')

census(mode: Literal[Meta.mode] = 'nodes', *, counts: Optional[pd.DataFrame] = None) → pd.DataFrame[source]

Calculate path census.

Parameters

mode – Should node, edge or global counts be calculated.
counts – Path counts data frame to use. Mostly for internal use.

Examples

>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).census()
   t  tw  th  q0  qw  qh
i
0  1   2   2   0   0   0
1  1   2   2   0   0   0
2  1   2   2   0   0   0

coefs(mode: Literal[Meta.mode] = 'nodes', **kwds) → pd.DataFrame[source]

Calculate structural coefficients.

Parameters

mode – Should node, edge or global counts be calculated.
undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
census – If True then path census data is added.
counts – Path counts data frame to use. Mostly for internal use.

See also

simcoefs: structural similarity coefficients
compcoefs: structural complementarity coefficients

compcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: Optional[pd.DataFrame] = None) → pd.DataFrame[source]

Calculate complementarity coefficients including clustering and closure coefficients when mode="nodes" or their node-wise averages when mode="global".

Parameters

mode – Should node, edge or global counts be calculated.
undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
census – If True then path census data is added.
counts – Path counts data frame to use. Mostly for internal use.

Examples

>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).compcoefs("edges")
    comp
i j
0 1   1.0
  3   1.0
1 0   1.0
  2   1.0
2 1   1.0
  3   1.0
3 0   1.0
  2   1.0
>>> PathCensus(A).compcoefs("nodes")
   comp  qclust  qclosure
i
0   1.0     1.0       1.0
1   1.0     1.0       1.0
2   1.0     1.0       1.0
3   1.0     1.0       1.0
>>> PathCensus(A).compcoefs("global")
   comp_g  comp  qclust  qclosure
0     1.0   1.0     1.0       1.0

complementarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → Union[pd.Series, float][source]

Structural complementarity coefficients.

Parameters

mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

The node-wise coefficient is defined as the ratio of quadrangles including a focal node i and the total number of both wedge and head quadruples:

\[c_i = \frac{4Q_i}{q^W_i + q^H_i}\]

The edge-wise coefficient is defined as the ratio of quadrangles including an (i, j) edge and the number of quadruples starting at it:

\[c_{ij} = \frac{2Q_{ij}}{q_{ij}}\]

The global coefficient is defined as the ratio of sums of quadrangles to the sum of quadruples (wedge or head):

\[c = \frac{2\sum_i Q_i}{\sum_i q^W_i} = \frac{2\sum_i Q_i}{\sum_i q^H_i}\]

See also

compcoefs: structural complementarity coefficients
coefs: structural coefficients

Examples

>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).complementarity("edges")
i  j
0  1    1.0
   3    1.0
1  0    1.0
   2    1.0
2  1    1.0
   3    1.0
3  0    1.0
   2    1.0
dtype: float64
>>> PathCensus(A).complementarity("nodes")
i
0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64
>>> PathCensus(A).complementarity("global")
1.0

count(**kwds: Any) → None[source]

Count paths and set self.counts attribute.

**kwds are passed to count_paths().

classmethod count_paths(graph: Union[GraphABC, Graph], *, parallel: Optional[bool] = None, num_threads: Optional[int] = None, graph_kws: Optional[Dict] = None, min_di: bool = True, **kwds: Any) → Tuple[int, ndarray][source]

Count paths and cycles in a graph.

Parameters

graph – pathcensus.core.graph.Graph instance. or graph-like object that can be converted to it.
parallel – Should parallel counting algorithm be used. When None it is used by default for graphs with at least one million edges.
num_threads – Number of threads to use when parallel=True.
batch_size – Batch size to use when running with parallel=True.
graph_kws – Additional keyword arguments passed to get_graph(). Used only when graph is not already in the JIT-compiled form.
min_di – Should di < dj rule for iterating over edges be used. This way the most expensive loop of the PathCensus algorithm for computing edge-wise path/cycle counts always iterates over neighbors of the lower degree node in an (i, j) edge. Almost always should be set to True. The argument is used mostly for testing purposes.
**kwds – Passed to pathcensus.core.parallel.count_paths_parallel() when parallel=True.

Notes

The parallel=True argument may not work and lead to segmentation faults on some MacOS machines.

Returns

n_nodes – Number of nodes.
counts – Path and cycles counts.

property degree: ndarray: Get degree sequence of the underlying graph.

dump() → Tuple[int, ndarray, Optional[ndarray], Optional[ndarray]][source]

Dump to raw data in the form of arrays and the number of nodes.

Returns

n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Path counts array. May be None.

property ecount: int

classmethod from_dump(n_nodes: int, E: ndarray, W: Optional[ndarray] = None, counts: Optional[ndarray] = None, adj_kws: Optional[Dict] = None, **kwds: Any) → PathCensus[source]

Construct from the output of dump().

Parameters

n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Optional path counts array. It is calculated on-the-fly when None.
adj_kws – Passed to pathcensus.utils.adjacency().
**kwds – Passed to count_paths() when counts is None.

get_counts(mode: Literal[Meta.mode] = 'nodes') → pd.DataFrame[source]

Get (possibly aggregated) path counts.

Parameters: mode – Should node, edge or global counts be calculated.

classmethod get_graph(graph: GraphABC, weighted: Optional[bool] = None, validate: bool = True, **kwds: Any) → Graph[source]

Get graph object for path counting.

Parameters

graph – A compatibe graph object registered with paths.types.GraphABC.
n_nodes – Number of nodes. Used only when graph is passed as an edgelist.
weighted – Should the graph be interpreted as weighted graph. If None then it is determined based on the number of unique values of non-zero values in the adjacency matrix.
validate – Should input graph be validate for correctness (i.e. checked if is undirected).
**kwds – Passed to pathcensus.utils.adjacency().

property n_edges: int

property n_nodes: int

qclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → pd.Series[source]

Quadrangle-based local closure coefficient.

Parameters

undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

It is defined as the ratio of quadrangles including a focal node i and the number of head quadruples starting from it.

\[c^H_i = \frac{2Q_i}{q^H_i}\]

Examples

>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).qclosure()
i
0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64

qclust(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → pd.Series[source]

Quadrangle-based local clustering coefficient (node-wise).

Parameters

undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

It is defined as the ratio of quadrangles including a focal node i and the number of wedge quadruples with i at the second position (this is to avoid double counting and make the number of wedge and head quadruples per quadrangle equal):

\[c^W_i = \frac{2Q_i}{q^W_i}\]

Examples

>>> # Quadrangle graph
>>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]])
>>> PathCensus(A).qclust()
i
0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64

property qdf: DataFrame: Data frame with quadruple/quadrangle counts per edge.

simcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: Optional[pd.DataFrame] = None) → pd.DataFrame[source]

Calculate similarity coefficients including clustering and closure coefficients when mode="nodes" or their node-wise averages when mode="global".

Parameters

mode – Should node, edge or global counts be calculated.
undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
census – If True then path census data is added. as columns in the front of the data frame.
counts – Path counts data frame to use. Mostly for internal use.

Examples

>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).simcoefs("edges")
    sim
    i j
    0 1  1.0
      2  1.0
    1 0  1.0
      2  1.0
    2 0  1.0
      1  1.0
>>> PathCensus(A).simcoefs("nodes")
   sim  tclust  tclosure
i
0  1.0     1.0       1.0
1  1.0     1.0       1.0
2  1.0     1.0       1.0
>>> PathCensus(A).simcoefs("global")
   sim_g  sim  tclust  tclosure
0    1.0  1.0     1.0       1.0

similarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → Union[pd.Series, float][source]

Structural similarity coefficients.

Parameters

mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

It is defined as the ratio of triangles including a focal node i to the total number of both wedge and head triples:

\[s_i = \frac{4T_i}{t^W_i + t^H_i}\]

See also

simcoefs: structural similarity coefficients
coefs: structural coefficients

Examples

>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).similarity("edges")
i  j
0  1    1.0
   2    1.0
1  0    1.0
   2    1.0
2  0    1.0
   1    1.0
dtype: float64
>>> PathCensus(A).similarity("nodes")
i
0    1.0
1    1.0
2    1.0
dtype: float64
>>> PathCensus(A).similarity("global")
1.0

property strength: ndarray: Get strength sequence of the underlying graph (or degree sequence in the unweighted case).

tclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → pd.Series[source]

Triangle-based local closure coefficient (node-wise).

It is equivalent to local closure coefficient [YBL19].

Parameters

undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

It is defined as the ratio of the number of triangles including a focal node i to the number of head triples starting from it:

\[s^H_i = \frac{2T_i}{t^H_i}\]

Examples

>>> # Triangle graph
>>> A = np.array([[0,1,1],[1,0,1],[1,1,0]])
>>> PathCensus(A).tclosure()
i
0    1.0
1    1.0
2    1.0
dtype: float64

tclust(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) → pd.Series[source]

Triangle-based local clustering (node-wise).

It is equivalent to local clustering coefficient [WS98].

Parameters

undefined – If 'nan' the nodes with undefined values are treated as NaNs. If 'zero' then they are considered zeros.
counts – Path counts data frame to use. Mostly for internal use.

Notes

It is defined as the ratio of triangles including a focal node i to the number of wedge triples centered at it:

\[s^W_i = \frac{2T_i}{t^W_i}\]

Examples

>>> # Triangle graph
>>> A = np.array([[0,1,1], [1,0,1], [1,1,0]])
>>> PathCensus(A).tclust()
i
0    1.0
1    1.0
2    1.0
dtype: float64

property tdf: DataFrame: Data frame with triple/triangle counts per edge.

property vcount: int

property weighted: bool

pathcensus.types module

Custom type defintions.

pathcensus.utils module

Utility functions.

pathcensus.utils.relclose(x1: ndarray, x2: ndarray, rtol: float = 1e-06) → ndarray[source]

Are two arrays relatively close.

rtol defines the maximum allowed relative difference between x1 and x2 relative to the magnitude of x2.

pathcensus.utils.relerr(x1: ndarray, x2: ndarray) → ndarray[source]: Relative error |(x1 - x2)| / |x2|.

pathcensus.utils.rowsums(X: Union[ndarray, spmatrix]) → ndarray[source]: Calculate row sums of a matrix.

pathcensus.utils.set_seed(all: Optional[int] = None, *, random: Optional[int] = None, numpy: Optional[int] = None, numba: Optional[int] = None) → None[source]

Set seeds of random number generators.

Parameters

random – Seed value for random generator.
numpy – Seed value for numpy generator.
numba – Seed value for py:mod:numba generator.
all – Seed value used for all generators. Cannot be used jointly with other arguments.

Raises

ValueError – If ‘all’ is used with other arguments or no seed is set.

pathcensus package

Subpackages

Submodules

pathcensus.definitions module

pathcensus.graph module

pathcensus.inference module

pathcensus.pathcensus module

Structural similarity

Structural complementarity

Weighted coefficients

pathcensus.types module

pathcensus.utils module

Module contents