pathcensus package
Subpackages
Submodules
pathcensus.definitions module
Path counting and aggregation definitions.
- class pathcensus.definitions.PathDefinitions[source]
Bases:
objectBase class for path definitions.
See also
pathcensus.definitions.PathDefinitionsUnweightedunweighted definitions
pathcensus.definitions.PathDefinitionsWeightedweighted definitions
- property aggregation: Dict
Aggregation rules for computing node and global counts from edge counts. The integers specify the factor to divide by the sum over edge counts to get a corresponding node/global count.
- property aliases: Dict
Aliases mapping actual path names to raw names.
- property definitions: Dict
Raw path definitions. Weighted and unweighted definitions are derived from this.
- get_column_ids() List[int][source]
Get indices of path columns to leave once reversed counting is done.
- get_column_names() List[str][source]
Get names of path columns used once the reverse counting is done.
- get_swap_rules() List[Tuple[int, int]][source]
Get swap rules for counting reversed paths.
They define indices of pairs of columns which need to be swaped in order to get reversed paths. Note that reverses of wedge paths are head paths and vice versa. In the case of weighted paths also wedge/head cycle counts need to be reversed.
- property npaths: int
Number of different path/cycle counts.
- class pathcensus.definitions.PathDefinitionsUnweighted[source]
Bases:
PathDefinitionsUnweighted path definitions.
Similarity-related paths
- t
Triangles.
- tw
Wedge-triples around
i(i.e.k-i-jpaths).- th
Head-triples originating from
i(i.e.i-j-kpaths).
Complementarity-related paths
- q0
Strong quadrangles.
- qw
Wedge-quadruples around
i(i.e.k-i-j-lpaths).- qh
Head-quadruples originating from
i(i.e.i-j-k-lpaths).
- property aliases: Dict
Aliases mapping actual path names to raw names.
- property definitions: Dict
Raw path definitions. Weighted and unweighted definitions are derived from this.
- class pathcensus.definitions.PathDefinitionsWeighted[source]
Bases:
PathDefinitionsWeighted path definitions.
Similarity-related paths
- twc
Closed wedge triples or triangles weighted by
ijandikedges.- thc
Closed head triples or triangles weighted by
ijandjkedges.- tw
Wedge triples.
- th
Head triples.
Complementarity-related paths
- q0wc
Closed wedge quadruples with no chords (strong quadrangles) weighted by
ij,jk, andiledges.- qw
Wedge quadruples.
- q0hc
Closed head quadruples with no chords (strong quadrangles) weighted by
ij,jkandkledges.- qh
Head quadruples.
pathcensus.graph module
Arbitrary classes of which instances can be interpreted
as scipy sparse matrices or 2D square numpy arrays
can be registered as abstract subclasses of GraphABC.
This way all main classes/functions implemented in pathcensus
can automatically interpret them as graph-like objects allowing
seemless integration with many different data formats and third-party
packages such as networkx or igraph.
In order to register a class a function for converting its instances to
scipy.sparse.spmatrix (CRS format) needs to be defined.
The conversion is handled by the pathcensus.graph.adjacency() function
which can be overloaded through the single dispatch mechanism. In particular,
it should be called on arrays/sparse matrices extracted from graph classes
to ensure standardized format. See the example below.
Graph classes defined by networkx, igraph
and graph_tool are registered automatically provided
the packages are installed.
Note
pathcensus uses the A_{ij} = 1 convention to indicate
that a node i sends a tie to a node j. Functions converting
graph-like objects to arrays / sparse matrices need to be aware
of that.
Below is an example in which a custom conversion from a list of list
format is registered. Arguably, the below implementation is naive and
handles the conversion by simply converting to a numpy array,
without checking wheter the array is really 2D and square, but it illustrates
the main logic of registering custom graph-like classes.
>>> import numpy as np
>>> from pathcensus import GraphABC, adjacency, PathCensus
>>> def _adj_list(graph: list) -> spmatrix:
... """Adjacency matrix from list of lists."""
... A = np.array(graph)
... return adjacency(A)
>>> GraphABC.register_type(list, _adj_list)
>>> # A triangle graph
>>> A = [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
>>> # Calculate path census
>>> paths = PathCensus(A)
>>> paths.census("global")
t 1
tw 3
th 3
q0 0
qw 0
qh 0
dtype: uint64
- class pathcensus.graph.GraphABC[source]
Bases:
ABCAbstract Base Class (ABC) for registering different graphs classes.
Any kind of graph object from different libraries can be registered as a subclass of ABC as long as also a function for converting it into a sparse adjacency matrix is provided at the registration time. In particular,
scipy.sparse.spmatrixobjects are automatically recognized as graph-like objects.This allows all graph-based functions/methods/class defined in
pathcensusto operate flexibly on any sort of graph-like objects / graph implementations.Provided the packages are installed methods for handling graph objects from
networkx,igraphandgraph_toolare automatically registered.See class method
register_graph()for more info.- classmethod register_type(subclass: type, adj: Callable[[...], spmatrix]) None[source]
Register type as a subclass of
pathcensus.graph.GraphABC.- Parameters:
subclass – Any graph-like class.
adj – Function for converting subclass graphs to sparse adjacency matrices. It should use the following signature (graph, **kwds) -> spmatrix. The return matrix must be in the format in which (i, j) indicate an edge from i to j (in the case a network is directed). Using
**kwdsis optional and in general it is best to implement adj in such a way that using**kwdsis not necessary, in particular for detecting whether a graph is weighted and converting it to a weighted adjacency matrix if necessary. This waypathcensuswill be able automatically choose to use weighted methods for weighted graphs.
- pathcensus.graph.adjacency(graph: GraphABC) spmatrix[source]
- pathcensus.graph.adjacency(graph: ndarray) spmatrix
- pathcensus.graph.adjacency(graph: spmatrix) spmatrix
- pathcensus.graph.adjacency(graph: sparray) spmatrix
- pathcensus.graph.adjacency(graph: Graph, **kwds: Any) spmatrix
Get (unweighted) adjacency matrix of a graph.
pathcensus.inference module
Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:
Calculate statistics of interest on an observed graph.
Sample
Rrandomized instances from an appropriate null model.Calculate graph statistics on null model samples.
Compare observed and null model values.
Inference class implements the above approach.
It is comaptible with any registered class of graph-like objects
and any properly implemented subclass of
pathcensus.nullmodels.base.ERGM representing a null model
to sample from.
See also
pathcensus.graph for seemless pathcensus integration
with arbitrary graph-like classes.
pathcensus.nullmodels for available null models.
This simulation-based approach is relatively efficient for graph- and node-level statistics but can be very computationally expensive when used for edge-level analyses. Hence, is this case it is often useful to use various coarse-graining strategies to reduce the number of unique combinations of values of sufficient statistics.
See also
pathcensus.graph.GraphABC for the abstract class
for graph-like objects.
pathcensus.nullmodels for compatible ERGM classes.
pathcensus.inference.Inference.coarse_grain()
for coarse-graining methods.
Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.
>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True
- class pathcensus.inference.Inference(graph: GraphABC, model: ERGM, statistics: Callable[[GraphABC], Data], *, aggregate_by: Literal[_aggregate_by] = 'stats')[source]
Bases:
objectGeneric approximate statistical inference based on arbitrary null models with node-level sufficient statistics.
The methods implemented by this class are based on sampling from null models so they are may not be very efficient, in particular for edge-level statistics. On the other hand, they allow to conduct statistical inferency for arbitrary graph statistics.
- graph
Graph-like object representing an observed network.
- model
Fitted instance of a subclass of
pathcensus.nullmodels.ERGM.
- statistics
Function for calculating graph statistics of interest with the following signature:
(graph, **kwds) -> DataFrame / Series
The first argument must be a graph-like object (e.g. a sparse matrix),
**kwdscan be used to pass additional arguments (only keyword args are allowed) if necessary. The return value must be either apandas.DataFrameorpandas.Series.
- aggregate_by
Mode of aggregation for determining null distribution. If
"stats"then null distribution is aggregated within unique combinations of values of the sufficient statistics (possibly coarse-grained, seeinit_comparison()). If"units"then null distribution is aggregated within individual units (e.g. nodes). This is often useful in analyses at the level of nodes but may require too many samples for edge-level analyses.
- class Levels(units: Tuple[str, ...], stats: Tuple[str, ...], other: Tuple[str, ...])[source]
Bases:
objectContainer class for storing information on unit, sufficient statistics and other index levels in observed and null model data.
- other: Tuple[str, ...]
- stats: Tuple[str, ...]
- units: Tuple[str, ...]
- __call__(graph: GraphABC, _stats: ndarray | None = None, **kwds: Any) Series | DataFrame[source]
This method should be called to actually calculate graph statistics.
- Parameters:
graph – Graph-like object to calculate statistics for.
stats – Array of sufficient statistics for nodes. If
Nonethen self.model.statistics is used.**kwds – Passed to graph statistics function.
- static add_index(data: Series | DataFrame, idx: Mapping[str, Sequence], *, prepend: bool = False, drop_unnamed: bool = True, copy: bool = True) Series | DataFrame[source]
Add index to a data frame or series.
- Parameters:
data – Data frame or series.
idx – Mapping from index names to sequences of values.
prepend – Should new indexes be prepended or appended to the existing indexes.
drop_unnamed – Should unnamed indexes be droppped during the process. Unnamed indexes are usually generic indexes which are redundant after adding additional indexes.
copy – Should a copy be returned.
- add_stats_index(data: Series | DataFrame, stats: ndarray | None = None) Series | DataFrame[source]
Add indexes with sufficient statistics.
- Parameters:
data – Data frame or series with graph statistics.
stats – Array of sufficient statistics. Use
self.model.statisticsifNone.
- static adjust_pvalues(pvals: Data, *, alpha: float = 0.05, copy: True = <class 'bool'>, **kwds: Any) Data[source]
Adjust p-values for multiple testing.
Benjamini-Hochberg-Yekuteli two-stage procedure implemented in
statsmodels.multitest.fdrcorrection_twostage()is used.- Parameters:
pvals – Data frame / series with p-values for different coefficients in columns.
alpha – Desired type I error rate after the adjustement.
copy – Should copy of
pvalsbe returned.**kwds – Additional arguments passed to
statsmodels.multitest.fdrcorrection_twostage()
- estimate_pvalues(data: pd.DataFrame, null: pd.DataFrame, *, alternative: Literal[_alternative] = 'greater', adjust: bool = True, resolution: int = 1000, **kwds: Any) Data[source]
Estimate p-values of node/edge/global coefficients based on sampling from a configuraiton model (as returned by
init_comparison()).- Parameters:
data – Data frame with observed graph statistics.
null – Data frame with simulated null distribution of graph statistics.
alternative – Type of test two perform. Currently only one-sided tests are supported.
adjust – Should p-values be adjusted. Benjamini-Hochberg FDR correction is used by default when
True.resolution – Resolution of p-value estimation. It specifies the number of quantiles to comapre observed values against. For instance, if
resolution=100then p-values will be accurate only up to0.01. This parameter controls the amount of memory consumed by the estimation process.**kwds – Passed as additional arguents to
adjust_pvalues()whenadjust=True.
See also
adjust_pvaluesp-value adjustment method
- Returns:
P-values for statistics as
pandas.Series(for one graph statistic) orpandas.DataFrame(for multiple statistics).- Return type:
pvalues
- filter_index(data: Data, target: Data, *, how: Literal[_filter_index] = 'values', levels: Sequence[str] | Sequence[int] | None = None) Data[source]
Filter
databy index with respect totarget.- Parameters:
data – Data to filter.
target – Dataset with target index.
how – How index should be filtered. Either by unique combinations of values or just contained to the range of values for separate levels in
target.levels – Levels to use for filtering. If
Nonethen eitherself.levels.unitsorself.levels.statsis used depending on the value ofself.aggregate_by.
- Returns:
Filtered copy of
data.- Return type:
data
- index_names = ('i', 'j')
- init_comparison(n: int, *, filter_index: bool | Literal[_filter_index] = False, sample_index: bool = False, null_kws: Dict | None = None, **kwds: Any) Tuple[Data, Data, Levels][source]
Initialize data for a comparison with a null model and determine index level names.
- Parameters:
n – Number of null model samples.
filter_index – If
Trueor"values"thennullwill be filtered to contain only observations with index values matching those indatawith levels used for the comparison selected based onself.aggregate_by. If"range"then null model samples will be filtered to be in the range of index values in the observed data.sample_index – If
Falsethen_sampleindex with sample ids is dropped fromnulldata frame with null model samples.null_kws – Keyword args passed to
simulate_null().**kwds – Passed to
statistics()method used for calculating statistics of interest.
Notes
Estimating distributions of edge-wise statistics conditional on sufficient statistics of the participating nodes may require really large number of samples and in general is not really feasible for large networks (in particular weighted). The same applies, although probably to slightly lesser degree, to node-wise statistics when
use_observed_stats=Falseis passed tosimulate_null().Efficient methods for solving these problems will be implemented in the future.
- Returns:
data – Observed graph statistics.
null – Null distribution samples.
- property n_nodes: int
Number of nodes in the observed network.
- postprocess(data: Series | DataFrame, target: Series | DataFrame) Series | DataFrame[source]
Postprocess data after running a comparison.
This mainly involves sanitizing index names after aggregation as well as setting proper shape and types for outputs so
datahas the same general form astarget.
- postprocess_index(data: Series | DataFrame) Series | DataFrame[source]
Postprocess index after running a comparison.
This involves getting rid of temporary index names used when running comparisons as well as any unnamed indexes. Moreover sufficient statistics indexes are removed if
self.aggregate_by == "unit"or ensuring that observed values of sufficient statistics are used in the index (instead of coarse-grained values) ifself.aggregate == "stats".
- simulate_null(n: int, *, progress: bool = False, progress_kws: Dict | None = None, use_observed_stats: bool = True, **kwds: Any) Series | DataFrame[source]
Get data frame of null model samples of strucutral coefficients.
- Parameters:
n – Number of samples.
progress – Should progress bar be showed.
progress_kws – Keyword arguments for customizing progress bar when
progress=True. Passed totqdm.tqdm().use_observed_stats – If
Truethen simulated data is indexed with sufficient statistics from the observed network. This often helps to accumulate enough observations faster at the expense of not fully exact conditioning.**kwds – Keyword arguments passed to
statistics().
- Returns:
Data frame with simulated null distribution.
- Return type:
null
pathcensus.pathcensus module
PathCensus is the basis which structural similarity and
complementarity coefficients are derived from. In its raw form
it is a set of counts of wedge and head triples (2-paths) and quadruples
(3-paths) traversing an (i, j) edge as well as counts of corresponding
triangles (3-cycles) and quadrangles (4-cycles).
In the weighted case there are separate counts of triangles
and quadrangles for wedge and head paths as average weights are defined
differently for these two cases.
Note
Path/cycle counts and structural coefficients are returned
as pandas.Series or pandas.DataFrame objects indexed
properly with integer node indices corresponding to the ordering of
rows in the underlying adjacency matrix of the network.
Note
Path census calculations are relatively efficient as the main
workhorse functions are just-in-time (JIT) compiled to highly
optimized C code using numba package. Moreover,
the path census algorithm is based on a state-of-the-art
graphlet counting algorithm proposed by
Ahmed et al. [ANRD15]
which has worst-case asymptotic computational complexity of
\(O(md^2_{\text{max}})\), where \(m\) is the number
of edges and \(d_{\text{max}}\) is the maximum degree.
Moreover, some additional optimizations are used to speed up
the calculations in the case of highly heterogeneous degree
distributions (e.g. power laws). See min_di argument in
PathCensus.count_paths().
Node- and graph-level counts are derived from edge-level counts according
to simple aggregations rules. They are defined in definition classes
implemented in pathcensus.definitions submodule.
See also
pathcensus.definitions.PathDefinitionsUnweighted
for the naming scheme used for unweighted counts.
pathcensus.definitions.PathDefinitionsWeighted
for the naming scheme used for weighted counts.
Below a node-level census for a simple triangle graph is counted.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
Structural similarity
Structural similarity coefficients (PathCensus.similarity())
as well as their corresponding clustering (PathCensus.tclust())
and closure coefficients (PathCensus.tclosure())
are defined quite simply in terms of ratios of 3-cycles (triangles)
to 2- (triples) counted at the levels of edges, nodes or globaly within an
entire graph. The figure below presents a summary of the underlying geometric
motivation as well as the main properties of structural similarity
coefficients, including the differences relative to local clustering and
closure coefficient
[WS98, YBL19].
Overview of the properties of structural similarity coefficients.
Below node-wise structural similarity coefficients are counted for a simple triangle graph.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0 1.0
1 1.0
2 1.0
dtype: float64
Structural complementarity
Structural complementarity (PathCensus.complementarity())
coefficients and their corresponding clustering
(PathCensus.qclust()) and closure coefficients
(PathCensus.qclosure()) are defined in terms of ratios of 4-cycles
(quadrangles) to 3-paths (quadruples) and can be defined at the levels
of edges, nodes and entire graphs. The figure below present a summary
of the underlying geometric motivation and some of the main properties.
Overview of the properties of structural complementarity coefficients.
Below we calculate complementarity coefficients for nodes in a 4-clique graph. Note that complementarity coefficients are all zeros as there are not quadrangles without any chordal edges.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0 0.0
1 0.0
2 0.0
3 0.0
dtype: float64
Weighted coefficients
Structural coefficients can be also defined for weighted networks in which case paths and cycles are weighted according to the arithmetic average over edge weights defining an underlying path (so closing or chordal edges in triangles/quadrangles are ignored). This can be seen as an extension of the weighted clustering coefficient proposed by Barrat et al. [BBPastorSatorrasV04]. Indeed, our formulation of the weighted clustering based on triangles is equivalent to it. The figure below presents a summary of the weighting rules.
Overview of weighted path/cycle counts.
Edge weights should be detected automatically in most cases provided that
a standard name of edge weight attribute ("weight") is used.
However, weighted computations may be also enabled/disabled explicitly
by using weighted argument.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
twc thc tw th q0wc q0hc qw qh
i
0 2.5 6.75 5.0 13.5 0.0 0.0 0.0 0.0
1 6.5 4.75 13.0 9.5 0.0 0.0 0.0 0.0
2 7.0 4.50 14.0 9.0 0.0 0.0 0.0 0.0
>>> PathCensus(G, weighted=False).census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
- class pathcensus.pathcensus.PathCensus(graph: GraphABC, weighted: bool | None = None, validate: bool = True, adj_kws: Dict | None = None, count_paths: bool = True, **kwds: Any)[source]
Bases:
objectPath census and structural coefficients calculations for undirected graphs.
- graph
pathcensus.core.graph.Graphinstance for calculating path census.
- counts
Data frame with path/cycle counts per edge. Initialization may be postponed.
Notes
Naming scheme used for denoting counts is documented in the docstring for
definitionsattribute (i.e.self.definitions).- class Meta[source]
Bases:
objectContainer class with various metadata such as lists of possible values of arguments of different methods of
PathCensus.Fields
- mode
Allowed values of
modeargument in structural coefficients methods.- undef
Allowed values of
undefinedargument in structural coefficients methods.
- mode = ('nodes', 'edges', 'global')
- undef = ('nan', 'zero')
- census(mode: Literal[Meta.mode] = 'nodes', *, counts: pd.DataFrame | None = None) pd.DataFrame[source]
Calculate path census.
- Parameters:
mode – Should node, edge or global counts be calculated.
counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).census() t tw th q0 qw qh i 0 1 2 2 0 0 0 1 1 2 2 0 0 0 2 1 2 2 0 0 0
- coefs(mode: Literal[Meta.mode] = 'nodes', **kwds) pd.DataFrame[source]
Calculate structural coefficients.
- Parameters:
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.census – If
Truethen path census data is added.counts – Path counts data frame to use. Mostly for internal use.
- compcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: pd.DataFrame | None = None) pd.DataFrame[source]
Calculate complementarity coefficients including clustering and closure coefficients when
mode="nodes"or their node-wise averages whenmode="global".- Parameters:
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.census – If
Truethen path census data is added.counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).compcoefs("edges") comp i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 >>> PathCensus(A).compcoefs("nodes") comp qclust qclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 >>> PathCensus(A).compcoefs("global") comp_g 1.0 comp 1.0 qclust 1.0 qclosure 1.0 dtype: float64
- complementarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series | float[source]
Structural complementarity coefficients.
- Parameters:
mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
The node-wise coefficient is defined as the ratio of quadrangles including a focal node
iand the total number of both wedge and head quadruples:\[c_i = \frac{4Q_i}{q^W_i + q^H_i}\]The edge-wise coefficient is defined as the ratio of quadrangles including an
(i, j)edge and the number of quadruples starting at it:\[c_{ij} = \frac{2Q_{ij}}{q_{ij}}\]The global coefficient is defined as the ratio of sums of quadrangles to the sum of quadruples (wedge or head):
\[c = \frac{2\sum_i Q_i}{\sum_i q^W_i} = \frac{2\sum_i Q_i}{\sum_i q^H_i}\]Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).complementarity("edges") i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 dtype: float64 >>> PathCensus(A).complementarity("nodes") i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64 >>> PathCensus(A).complementarity("global") 1.0
- count(**kwds: Any) None[source]
Count paths and set self.counts attribute.
**kwdsare passed tocount_paths().
- classmethod count_paths(graph: GraphABC | Graph, *, parallel: bool | None = None, num_threads: int | None = None, graph_kws: Dict | None = None, min_di: bool = True, **kwds: Any) Tuple[int, ndarray][source]
Count paths and cycles in a graph.
- Parameters:
graph –
pathcensus.core.graph.Graphinstance. or graph-like object that can be converted to it.parallel – Should parallel counting algorithm be used. When
Noneit is used by default for graphs with at least one million edges.num_threads – Number of threads to use when
parallel=True.batch_size – Batch size to use when running with
parallel=True.graph_kws – Additional keyword arguments passed to
get_graph(). Used only when graph is not already in the JIT-compiled form.min_di – Should di < dj rule for iterating over edges be used. This way the most expensive loop of the PathCensus algorithm for computing edge-wise path/cycle counts always iterates over neighbors of the lower degree node in an
(i, j)edge. Almost always should be set toTrue. The argument is used mostly for testing purposes.**kwds – Passed to
pathcensus.core.parallel.count_paths_parallel()whenparallel=True.
Notes
The
parallel=Trueargument may not work and lead to segmentation faults on some MacOS machines.- Returns:
n_nodes – Number of nodes.
counts – Path and cycles counts.
- dump() Tuple[int, ndarray, ndarray | None, ndarray | None][source]
Dump to raw data in the form of arrays and the number of nodes.
- Returns:
n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Path counts array. May be
None.
- property ecount: int
- classmethod from_dump(n_nodes: int, E: ndarray, W: ndarray | None = None, counts: ndarray | None = None, adj_kws: Dict | None = None, **kwds: Any) PathCensus[source]
Construct from the output of
dump().- Parameters:
n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Optional path counts array. It is calculated on-the-fly when
None.adj_kws – Passed to
pathcensus.utils.adjacency().**kwds – Passed to
count_paths()when counts isNone.
- get_counts(mode: Literal[Meta.mode] = 'nodes') pd.DataFrame[source]
Get (possibly aggregated) path counts.
- Parameters:
mode – Should node, edge or global counts be calculated.
- classmethod get_graph(graph: GraphABC, weighted: bool | None = None, validate: bool = True, **kwds: Any) Graph[source]
Get graph object for path counting.
- Parameters:
graph – A compatibe graph object registered with paths.types.GraphABC.
n_nodes – Number of nodes. Used only when graph is passed as an edgelist.
weighted – Should the graph be interpreted as weighted graph. If
Nonethen it is determined based on the number of unique values of non-zero values in the adjacency matrix.validate – Should input graph be validate for correctness (i.e. checked if is undirected).
**kwds – Passed to
pathcensus.utils.adjacency().
- property n_edges: int
- property n_nodes: int
- qclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series[source]
Quadrangle-based local closure coefficient.
- Parameters:
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of quadrangles including a focal node
iand the number of head quadruples starting from it.\[c^H_i = \frac{2Q_i}{q^H_i}\]Head quadruple.
Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclosure() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64
- qclust(*, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series[source]
Quadrangle-based local clustering coefficient (node-wise).
- Parameters:
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of quadrangles including a focal node
iand the number of wedge quadruples withiat the second position (this is to avoid double counting and make the number of wedge and head quadruples per quadrangle equal):\[c^W_i = \frac{2Q_i}{q^W_i}\]Wedge quadruple.
Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclust() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64
- property qdf: DataFrame
Data frame with quadruple/quadrangle counts per edge.
- simcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: pd.DataFrame | None = None) pd.DataFrame | pd.Series[source]
Calculate similarity coefficients including clustering and closure coefficients when
mode="nodes"or their node-wise averages whenmode="global".- Parameters:
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.census – If
Truethen path census data is added. as columns in the front of the data frame.counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).simcoefs("edges") sim i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 >>> PathCensus(A).simcoefs("nodes") sim tclust tclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 >>> PathCensus(A).simcoefs("global") sim_g 1.0 sim 1.0 tclust 1.0 tclosure 1.0 dtype: float64
- similarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series | float[source]
Structural similarity coefficients.
- Parameters:
mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of triangles including a focal node
ito the total number of both wedge and head triples:\[s_i = \frac{4T_i}{t^W_i + t^H_i}\]Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).similarity("edges") i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 dtype: float64 >>> PathCensus(A).similarity("nodes") i 0 1.0 1 1.0 2 1.0 dtype: float64 >>> PathCensus(A).similarity("global") 1.0
- property strength: ndarray
Get strength sequence of the underlying graph (or degree sequence in the unweighted case).
- tclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series[source]
Triangle-based local closure coefficient (node-wise).
It is equivalent to local closure coefficient [YBL19].
- Parameters:
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of the number of triangles including a focal node
ito the number of head triples starting from it:\[s^H_i = \frac{2T_i}{t^H_i}\]Head triple.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1],[1,0,1],[1,1,0]]) >>> PathCensus(A).tclosure() i 0 1.0 1 1.0 2 1.0 dtype: float64
- tclust(*, undefined: Literal[Meta.undef] = 'nan', counts: pd.DataFrame | None = None) pd.Series[source]
Triangle-based local clustering (node-wise).
It is equivalent to local clustering coefficient [WS98].
- Parameters:
undefined – If
'nan'the nodes with undefined values are treated as NaNs. If'zero'then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of triangles including a focal node
ito the number of wedge triples centered at it:\[s^W_i = \frac{2T_i}{t^W_i}\]Wedge triple.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).tclust() i 0 1.0 1 1.0 2 1.0 dtype: float64
- property tdf: DataFrame
Data frame with triple/triangle counts per edge.
- property vcount: int
- property weighted: bool
pathcensus.types module
Custom type defintions.
pathcensus.utils module
Utility functions.
- pathcensus.utils.relclose(x1: ndarray, x2: ndarray, rtol: float = 1e-06) ndarray[source]
Are two arrays relatively close.
rtoldefines the maximum allowed relative difference betweenx1andx2relative to the magnitude ofx2.
- pathcensus.utils.relerr(x1: ndarray, x2: ndarray) ndarray[source]
Relative error
|(x1 - x2)| / |x2|.
- pathcensus.utils.set_seed(all: int | None = None, *, random: int | None = None, numpy: int | None = None, numba: int | None = None) None[source]
Set seeds of random number generators.
- Parameters:
random – Seed value for
randomgenerator.numpy – Seed value for
numpygenerator.numba – Seed value for py:mod:numba generator.
all – Seed value used for all generators. Cannot be used jointly with other arguments.
- Raises:
ValueError – If ‘all’ is used with other arguments or no seed is set.