pathcensus package
Subpackages
Submodules
pathcensus.definitions module
Path counting and aggregation definitions.
- class pathcensus.definitions.PathDefinitions[source]
Bases:
object
Base class for path definitions.
See also
pathcensus.definitions.PathDefinitionsUnweighted
unweighted definitions
pathcensus.definitions.PathDefinitionsWeighted
weighted definitions
- property aggregation: Dict
Aggregation rules for computing node and global counts from edge counts. The integers specify the factor to divide by the sum over edge counts to get a corresponding node/global count.
- property aliases: Dict
Aliases mapping actual path names to raw names.
- property definitions: Dict
Raw path definitions. Weighted and unweighted definitions are derived from this.
- get_column_ids() List[int] [source]
Get indices of path columns to leave once reversed counting is done.
- get_column_names() List[str] [source]
Get names of path columns used once the reverse counting is done.
- get_swap_rules() List[Tuple[int, int]] [source]
Get swap rules for counting reversed paths.
They define indices of pairs of columns which need to be swaped in order to get reversed paths. Note that reverses of wedge paths are head paths and vice versa. In the case of weighted paths also wedge/head cycle counts need to be reversed.
- property npaths: int
Number of different path/cycle counts.
- class pathcensus.definitions.PathDefinitionsUnweighted[source]
Bases:
PathDefinitions
Unweighted path definitions.
Similarity-related paths
- t
Triangles.
- tw
Wedge-triples around
i
(i.e.k-i-j
paths).- th
Head-triples originating from
i
(i.e.i-j-k
paths).
Complementarity-related paths
- q0
Strong quadrangles.
- qw
Wedge-quadruples around
i
(i.e.k-i-j-l
paths).- qh
Head-quadruples originating from
i
(i.e.i-j-k-l
paths).
- property aliases: Dict
Aliases mapping actual path names to raw names.
- property definitions: Dict
Raw path definitions. Weighted and unweighted definitions are derived from this.
- class pathcensus.definitions.PathDefinitionsWeighted[source]
Bases:
PathDefinitions
Weighted path definitions.
Similarity-related paths
- twc
Closed wedge triples or triangles weighted by
ij
andik
edges.- thc
Closed head triples or triangles weighted by
ij
andjk
edges.- tw
Wedge triples.
- th
Head triples.
Complementarity-related paths
- q0wc
Closed wedge quadruples with no chords (strong quadrangles) weighted by
ij
,jk
, andil
edges.- qw
Wedge quadruples.
- q0hc
Closed head quadruples with no chords (strong quadrangles) weighted by
ij
,jk
andkl
edges.- qh
Head quadruples.
pathcensus.graph module
Arbitrary classes of which instances can be interpreted
as scipy
sparse matrices or 2D square numpy
arrays
can be registered as abstract subclasses of GraphABC
.
This way all main classes/functions implemented in pathcensus
can automatically interpret them as graph-like objects allowing
seemless integration with many different data formats and third-party
packages such as networkx
or igraph
.
In order to register a class a function for converting its instances to
scipy.sparse.spmatrix
(CRS format) needs to be defined.
The conversion is handled by the pathcensus.graph.adjacency()
function
which can be overloaded through the single dispatch mechanism. In particular,
it should be called on arrays/sparse matrices extracted from graph classes
to ensure standardized format. See the example below.
Graph classes defined by networkx
, igraph
and graph_tool
are registered automatically provided
the packages are installed.
Note
pathcensus
uses the A_{ij} = 1
convention to indicate
that a node i sends a tie to a node j. Functions converting
graph-like objects to arrays / sparse matrices need to be aware
of that.
Below is an example in which a custom conversion from a list of list
format is registered. Arguably, the below implementation is naive and
handles the conversion by simply converting to a numpy
array,
without checking wheter the array is really 2D and square, but it illustrates
the main logic of registering custom graph-like classes.
>>> import numpy as np
>>> from pathcensus import GraphABC, adjacency, PathCensus
>>> def _adj_list(graph: list) -> spmatrix:
... """Adjacency matrix from list of lists."""
... A = np.array(graph)
... return adjacency(A)
>>> GraphABC.register_type(list, _adj_list)
>>> # A triangle graph
>>> A = [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
>>> # Calculate path census
>>> paths = PathCensus(A)
>>> paths.census("global")
t tw th q0 qw qh
0 1 3 3 0 0 0
- class pathcensus.graph.GraphABC[source]
Bases:
ABC
Abstract Base Class (ABC) for registering different graphs classes.
Any kind of graph object from different libraries can be registered as a subclass of ABC as long as also a function for converting it into a sparse adjacency matrix is provided at the registration time. In particular,
scipy.sparse.spmatrix
objects are automatically recognized as graph-like objects.This allows all graph-based functions/methods/class defined in
pathcensus
to operate flexibly on any sort of graph-like objects / graph implementations.Provided the packages are installed methods for handling graph objects from
networkx
,igraph
andgraph_tool
are automatically registered.See class method
register_graph()
for more info.- classmethod register_type(subclass: type, adj: Callable[[...], spmatrix]) None [source]
Register type as a subclass of
pathcensus.graph.GraphABC
.- Parameters
subclass – Any graph-like class.
adj – Function for converting subclass graphs to sparse adjacency matrices. It should use the following signature (graph, **kwds) -> spmatrix. The return matrix must be in the format in which (i, j) indicate an edge from i to j (in the case a network is directed). Using
**kwds
is optional and in general it is best to implement adj in such a way that using**kwds
is not necessary, in particular for detecting whether a graph is weighted and converting it to a weighted adjacency matrix if necessary. This waypathcensus
will be able automatically choose to use weighted methods for weighted graphs.
pathcensus.inference module
Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:
Calculate statistics of interest on an observed graph.
Sample
R
randomized instances from an appropriate null model.Calculate graph statistics on null model samples.
Compare observed and null model values.
Inference
class implements the above approach.
It is comaptible with any registered class of graph-like objects
and any properly implemented subclass of
pathcensus.nullmodels.base.ERGM
representing a null model
to sample from.
See also
pathcensus.graph
for seemless pathcensus
integration
with arbitrary graph-like classes.
pathcensus.nullmodels
for available null models.
This simulation-based approach is relatively efficient for graph- and node-level statistics but can be very computationally expensive when used for edge-level analyses. Hence, is this case it is often useful to use various coarse-graining strategies to reduce the number of unique combinations of values of sufficient statistics.
See also
pathcensus.graph.GraphABC
for the abstract class
for graph-like objects.
pathcensus.nullmodels
for compatible ERGM classes.
pathcensus.inference.Inference.coarse_grain()
for coarse-graining methods.
Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.
>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True
- class pathcensus.inference.Inference(graph: GraphABC, model: ERGM, statistics: Callable[[GraphABC], Data], *, aggregate_by: Literal[_aggregate_by] = 'stats')[source]
Bases:
object
Generic approximate statistical inference based on arbitrary null models with node-level sufficient statistics.
The methods implemented by this class are based on sampling from null models so they are may not be very efficient, in particular for edge-level statistics. On the other hand, they allow to conduct statistical inferency for arbitrary graph statistics.
- graph
Graph-like object representing an observed network.
- model
Fitted instance of a subclass of
pathcensus.nullmodels.ERGM
.
- statistics
Function for calculating graph statistics of interest with the following signature:
(graph, **kwds) -> DataFrame / Series
The first argument must be a graph-like object (e.g. a sparse matrix),
**kwds
can be used to pass additional arguments (only keyword args are allowed) if necessary. The return value must be either apandas.DataFrame
orpandas.Series
.
- aggregate_by
Mode of aggregation for determining null distribution. If
"stats"
then null distribution is aggregated within unique combinations of values of the sufficient statistics (possibly coarse-grained, seeinit_comparison()
). If"units"
then null distribution is aggregated within individual units (e.g. nodes). This is often useful in analyses at the level of nodes but may require too many samples for edge-level analyses.
- class Levels(units: Tuple[str, ...], stats: Tuple[str, ...], other: Tuple[str, ...])[source]
Bases:
object
Container class for storing information on unit, sufficient statistics and other index levels in observed and null model data.
- other: Tuple[str, ...]
- stats: Tuple[str, ...]
- units: Tuple[str, ...]
- __call__(graph: GraphABC, _stats: Optional[ndarray] = None, **kwds: Any) Union[Series, DataFrame] [source]
This method should be called to actually calculate graph statistics.
- Parameters
graph – Graph-like object to calculate statistics for.
stats – Array of sufficient statistics for nodes. If
None
then self.model.statistics is used.**kwds – Passed to graph statistics function.
- static add_index(data: Union[Series, DataFrame], idx: Mapping[str, Sequence], *, prepend: bool = False, drop_unnamed: bool = True, copy: bool = True) Union[Series, DataFrame] [source]
Add index to a data frame or series.
- Parameters
data – Data frame or series.
idx – Mapping from index names to sequences of values.
prepend – Should new indexes be prepended or appended to the existing indexes.
drop_unnamed – Should unnamed indexes be droppped during the process. Unnamed indexes are usually generic indexes which are redundant after adding additional indexes.
copy – Should a copy be returned.
- add_stats_index(data: Union[Series, DataFrame], stats: Optional[ndarray] = None) Union[Series, DataFrame] [source]
Add indexes with sufficient statistics.
- Parameters
data – Data frame or series with graph statistics.
stats – Array of sufficient statistics. Use
self.model.statistics
ifNone
.
- static adjust_pvalues(pvals: Data, *, alpha: float = 0.05, copy: True = <class 'bool'>, **kwds: Any) Data [source]
Adjust p-values for multiple testing.
Benjamini-Hochberg-Yekuteli two-stage procedure implemented in
statsmodels.multitest.fdrcorrection_twostage()
is used.- Parameters
pvals – Data frame / series with p-values for different coefficients in columns.
alpha – Desired type I error rate after the adjustement.
copy – Should copy of
pvals
be returned.**kwds – Additional arguments passed to
statsmodels.multitest.fdrcorrection_twostage()
- estimate_pvalues(data: pd.DataFrame, null: pd.DataFrame, *, alternative: Literal[_alternative] = 'greater', adjust: bool = True, resolution: int = 1000, **kwds: Any) Data [source]
Estimate p-values of node/edge/global coefficients based on sampling from a configuraiton model (as returned by
init_comparison()
).- Parameters
data – Data frame with observed graph statistics.
null – Data frame with simulated null distribution of graph statistics.
alternative – Type of test two perform. Currently only one-sided tests are supported.
adjust – Should p-values be adjusted. Benjamini-Hochberg FDR correction is used by default when
True
.resolution – Resolution of p-value estimation. It specifies the number of quantiles to comapre observed values against. For instance, if
resolution=100
then p-values will be accurate only up to0.01
. This parameter controls the amount of memory consumed by the estimation process.**kwds – Passed as additional arguents to
adjust_pvalues()
whenadjust=True
.
See also
adjust_pvalues
p-value adjustment method
- Returns
P-values for statistics as
pandas.Series
(for one graph statistic) orpandas.DataFrame
(for multiple statistics).- Return type
pvalues
- filter_index(data: Data, target: Data, *, how: Literal[_filter_index] = 'values', levels: Optional[Union[Sequence[str], Sequence[int]]] = None) Data [source]
Filter
data
by index with respect totarget
.- Parameters
data – Data to filter.
target – Dataset with target index.
how – How index should be filtered. Either by unique combinations of values or just contained to the range of values for separate levels in
target
.levels – Levels to use for filtering. If
None
then eitherself.levels.units
orself.levels.stats
is used depending on the value ofself.aggregate_by
.
- Returns
Filtered copy of
data
.- Return type
data
- index_names = ('i', 'j')
- init_comparison(n: int, *, filter_index: Union[bool, Literal[_filter_index]] = False, sample_index: bool = False, null_kws: Optional[Dict] = None, **kwds: Any) Tuple[Data, Data, Levels] [source]
Initialize data for a comparison with a null model and determine index level names.
- Parameters
n – Number of null model samples.
filter_index – If
True
or"values"
thennull
will be filtered to contain only observations with index values matching those indata
with levels used for the comparison selected based onself.aggregate_by
. If"range"
then null model samples will be filtered to be in the range of index values in the observed data.sample_index – If
False
then_sample
index with sample ids is dropped fromnull
data frame with null model samples.null_kws – Keyword args passed to
simulate_null()
.**kwds – Passed to
statistics()
method used for calculating statistics of interest.
Notes
Estimating distributions of edge-wise statistics conditional on sufficient statistics of the participating nodes may require really large number of samples and in general is not really feasible for large networks (in particular weighted). The same applies, although probably to slightly lesser degree, to node-wise statistics when
use_observed_stats=False
is passed tosimulate_null()
.Efficient methods for solving these problems will be implemented in the future.
- Returns
data – Observed graph statistics.
null – Null distribution samples.
- property n_nodes: int
Number of nodes in the observed network.
- postprocess(data: Union[Series, DataFrame], target: Union[Series, DataFrame]) Union[Series, DataFrame] [source]
Postprocess data after running a comparison.
This mainly involves sanitizing index names after aggregation as well as setting proper shape and types for outputs so
data
has the same general form astarget
.
- postprocess_index(data: Union[Series, DataFrame]) Union[Series, DataFrame] [source]
Postprocess index after running a comparison.
This involves getting rid of temporary index names used when running comparisons as well as any unnamed indexes. Moreover sufficient statistics indexes are removed if
self.aggregate_by == "unit"
or ensuring that observed values of sufficient statistics are used in the index (instead of coarse-grained values) ifself.aggregate == "stats"
.
- simulate_null(n: int, *, progress: bool = False, progress_kws: Optional[Dict] = None, use_observed_stats: bool = True, **kwds: Any) Union[Series, DataFrame] [source]
Get data frame of null model samples of strucutral coefficients.
- Parameters
n – Number of samples.
progress – Should progress bar be showed.
progress_kws – Keyword arguments for customizing progress bar when
progress=True
. Passed totqdm.tqdm()
.use_observed_stats – If
True
then simulated data is indexed with sufficient statistics from the observed network. This often helps to accumulate enough observations faster at the expense of not fully exact conditioning.**kwds – Keyword arguments passed to
statistics()
.
- Returns
Data frame with simulated null distribution.
- Return type
null
pathcensus.pathcensus module
PathCensus
is the basis which structural similarity and
complementarity coefficients are derived from. In its raw form
it is a set of counts of wedge and head triples (2-paths) and quadruples
(3-paths) traversing an (i, j)
edge as well as counts of corresponding
triangles (3-cycles) and quadrangles (4-cycles).
In the weighted case there are separate counts of triangles
and quadrangles for wedge and head paths as average weights are defined
differently for these two cases.
Note
Path/cycle counts and structural coefficients are returned
as pandas.Series
or pandas.DataFrame
objects indexed
properly with integer node indices corresponding to the ordering of
rows in the underlying adjacency matrix of the network.
Note
Path census calculations are relatively efficient as the main
workhorse functions are just-in-time (JIT) compiled to highly
optimized C code using numba
package. Moreover,
the path census algorithm is based on a state-of-the-art
graphlet counting algorithm proposed by
Ahmed et al. [ANRD15]
which has worst-case asymptotic computational complexity of
\(O(md^2_{\text{max}})\), where \(m\) is the number
of edges and \(d_{\text{max}}\) is the maximum degree.
Moreover, some additional optimizations are used to speed up
the calculations in the case of highly heterogeneous degree
distributions (e.g. power laws). See min_di
argument in
PathCensus.count_paths()
.
Node- and graph-level counts are derived from edge-level counts according
to simple aggregations rules. They are defined in definition classes
implemented in pathcensus.definitions
submodule.
See also
pathcensus.definitions.PathDefinitionsUnweighted
for the naming scheme used for unweighted counts.
pathcensus.definitions.PathDefinitionsWeighted
for the naming scheme used for weighted counts.
Below a node-level census for a simple triangle graph is counted.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> # Census calculations can be also parallelized
>>> # by default all available threads are used
>>> P = PathCensus(G, parallel=True)
>>> # But the number of threads can be set explicitly
>>> P = PathCensus(G, num_threads=2)
>>> P.census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
Structural similarity
Structural similarity coefficients (PathCensus.similarity()
)
as well as their corresponding clustering (PathCensus.tclust()
)
and closure coefficients (PathCensus.tclosure()
)
are defined quite simply in terms of ratios of 3-cycles (triangles)
to 2- (triples) counted at the levels of edges, nodes or globaly within an
entire graph. The figure below presents a summary of the underlying geometric
motivation as well as the main properties of structural similarity
coefficients, including the differences relative to local clustering and
closure coefficient
[WS98, YBL19].
Below node-wise structural similarity coefficients are counted for a simple triangle graph.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0, 1, 1], [1, 0, 1], [1, 1, 0]])
>>> P = PathCensus(G)
>>> P.similarity("nodes")
i
0 1.0
1 1.0
2 1.0
dtype: float64
Structural complementarity
Structural complementarity (PathCensus.complementarity()
)
coefficients and their corresponding clustering
(PathCensus.qclust()
) and closure coefficients
(PathCensus.qclosure()
) are defined in terms of ratios of 4-cycles
(quadrangles) to 3-paths (quadruples) and can be defined at the levels
of edges, nodes and entire graphs. The figure below present a summary
of the underlying geometric motivation and some of the main properties.
Below we calculate complementarity coefficients for nodes in a 4-clique graph. Note that complementarity coefficients are all zeros as there are not quadrangles without any chordal edges.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,1,1,1],[1,0,1,1],[1,1,0,1],[1,1,1,0]])
>>> P = PathCensus(G)
>>> P.complementarity("nodes")
i
0 0.0
1 0.0
2 0.0
3 0.0
dtype: float64
Weighted coefficients
Structural coefficients can be also defined for weighted networks in which case paths and cycles are weighted according to the arithmetic average over edge weights defining an underlying path (so closing or chordal edges in triangles/quadrangles are ignored). This can be seen as an extension of the weighted clustering coefficient proposed by Barrat et al. [BBPastorSatorrasV04]. Indeed, our formulation of the weighted clustering based on triangles is equivalent to it. The figure below presents a summary of the weighting rules.
Edge weights should be detected automatically in most cases provided that
a standard name of edge weight attribute ("weight"
) is used.
However, weighted computations may be also enabled/disabled explicitly
by using weighted
argument.
>>> import numpy as np
>>> from pathcensus import PathCensus
>>> G = np.array([[0,2,3],[2,0,11],[3,11,0]])
>>> PathCensus(G).census("nodes")
twc thc tw th q0wc q0hc qw qh
i
0 2.5 6.75 5.0 13.5 0.0 0.0 0.0 0.0
1 6.5 4.75 13.0 9.5 0.0 0.0 0.0 0.0
2 7.0 4.50 14.0 9.0 0.0 0.0 0.0 0.0
>>> PathCensus(G, weighted=False).census("nodes")
t tw th q0 qw qh
i
0 1 2 2 0 0 0
1 1 2 2 0 0 0
2 1 2 2 0 0 0
- class pathcensus.pathcensus.PathCensus(graph: GraphABC, weighted: Optional[bool] = None, validate: bool = True, adj_kws: Optional[Dict] = None, count_paths: bool = True, **kwds: Any)[source]
Bases:
object
Path census and structural coefficients calculations for undirected graphs.
- graph
pathcensus.core.graph.Graph
instance for calculating path census.
- counts
Data frame with path/cycle counts per edge. Initialization may be postponed.
Notes
Naming scheme used for denoting counts is documented in the docstring for
definitions
attribute (i.e.self.definitions
).- class Meta[source]
Bases:
object
Container class with various metadata such as lists of possible values of arguments of different methods of
PathCensus
.Fields
- mode
Allowed values of
mode
argument in structural coefficients methods.- undef
Allowed values of
undefined
argument in structural coefficients methods.
- mode = ('nodes', 'edges', 'global')
- undef = ('nan', 'zero')
- census(mode: Literal[Meta.mode] = 'nodes', *, counts: Optional[pd.DataFrame] = None) pd.DataFrame [source]
Calculate path census.
- Parameters
mode – Should node, edge or global counts be calculated.
counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).census() t tw th q0 qw qh i 0 1 2 2 0 0 0 1 1 2 2 0 0 0 2 1 2 2 0 0 0
- coefs(mode: Literal[Meta.mode] = 'nodes', **kwds) pd.DataFrame [source]
Calculate structural coefficients.
- Parameters
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.census – If
True
then path census data is added.counts – Path counts data frame to use. Mostly for internal use.
- compcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: Optional[pd.DataFrame] = None) pd.DataFrame [source]
Calculate complementarity coefficients including clustering and closure coefficients when
mode="nodes"
or their node-wise averages whenmode="global"
.- Parameters
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.census – If
True
then path census data is added.counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).compcoefs("edges") comp i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 >>> PathCensus(A).compcoefs("nodes") comp qclust qclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 >>> PathCensus(A).compcoefs("global") comp_g comp qclust qclosure 0 1.0 1.0 1.0 1.0
- complementarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) Union[pd.Series, float] [source]
Structural complementarity coefficients.
- Parameters
mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
The node-wise coefficient is defined as the ratio of quadrangles including a focal node
i
and the total number of both wedge and head quadruples:\[c_i = \frac{4Q_i}{q^W_i + q^H_i}\]The edge-wise coefficient is defined as the ratio of quadrangles including an
(i, j)
edge and the number of quadruples starting at it:\[c_{ij} = \frac{2Q_{ij}}{q_{ij}}\]The global coefficient is defined as the ratio of sums of quadrangles to the sum of quadruples (wedge or head):
\[c = \frac{2\sum_i Q_i}{\sum_i q^W_i} = \frac{2\sum_i Q_i}{\sum_i q^H_i}\]Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).complementarity("edges") i j 0 1 1.0 3 1.0 1 0 1.0 2 1.0 2 1 1.0 3 1.0 3 0 1.0 2 1.0 dtype: float64 >>> PathCensus(A).complementarity("nodes") i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64 >>> PathCensus(A).complementarity("global") 1.0
- count(**kwds: Any) None [source]
Count paths and set self.counts attribute.
**kwds
are passed tocount_paths()
.
- classmethod count_paths(graph: Union[GraphABC, Graph], *, parallel: Optional[bool] = None, num_threads: Optional[int] = None, graph_kws: Optional[Dict] = None, min_di: bool = True, **kwds: Any) Tuple[int, ndarray] [source]
Count paths and cycles in a graph.
- Parameters
graph –
pathcensus.core.graph.Graph
instance. or graph-like object that can be converted to it.parallel – Should parallel counting algorithm be used. When
None
it is used by default for graphs with at least one million edges.num_threads – Number of threads to use when
parallel=True
.batch_size – Batch size to use when running with
parallel=True
.graph_kws – Additional keyword arguments passed to
get_graph()
. Used only when graph is not already in the JIT-compiled form.min_di – Should di < dj rule for iterating over edges be used. This way the most expensive loop of the PathCensus algorithm for computing edge-wise path/cycle counts always iterates over neighbors of the lower degree node in an
(i, j)
edge. Almost always should be set toTrue
. The argument is used mostly for testing purposes.**kwds – Passed to
pathcensus.core.parallel.count_paths_parallel()
whenparallel=True
.
Notes
The
parallel=True
argument may not work and lead to segmentation faults on some MacOS machines.- Returns
n_nodes – Number of nodes.
counts – Path and cycles counts.
- dump() Tuple[int, ndarray, Optional[ndarray], Optional[ndarray]] [source]
Dump to raw data in the form of arrays and the number of nodes.
- Returns
n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Path counts array. May be
None
.
- property ecount: int
- classmethod from_dump(n_nodes: int, E: ndarray, W: Optional[ndarray] = None, counts: Optional[ndarray] = None, adj_kws: Optional[Dict] = None, **kwds: Any) PathCensus [source]
Construct from the output of
dump()
.- Parameters
n_nodes – Number of nodes.
E – Edgelist array.
W – Optional edge weights array.
counts – Optional path counts array. It is calculated on-the-fly when
None
.adj_kws – Passed to
pathcensus.utils.adjacency()
.**kwds – Passed to
count_paths()
when counts isNone
.
- get_counts(mode: Literal[Meta.mode] = 'nodes') pd.DataFrame [source]
Get (possibly aggregated) path counts.
- Parameters
mode – Should node, edge or global counts be calculated.
- classmethod get_graph(graph: GraphABC, weighted: Optional[bool] = None, validate: bool = True, **kwds: Any) Graph [source]
Get graph object for path counting.
- Parameters
graph – A compatibe graph object registered with paths.types.GraphABC.
n_nodes – Number of nodes. Used only when graph is passed as an edgelist.
weighted – Should the graph be interpreted as weighted graph. If
None
then it is determined based on the number of unique values of non-zero values in the adjacency matrix.validate – Should input graph be validate for correctness (i.e. checked if is undirected).
**kwds – Passed to
pathcensus.utils.adjacency()
.
- property n_edges: int
- property n_nodes: int
- qclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) pd.Series [source]
Quadrangle-based local closure coefficient.
- Parameters
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of quadrangles including a focal node
i
and the number of head quadruples starting from it.\[c^H_i = \frac{2Q_i}{q^H_i}\]Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclosure() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64
- qclust(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) pd.Series [source]
Quadrangle-based local clustering coefficient (node-wise).
- Parameters
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of quadrangles including a focal node
i
and the number of wedge quadruples withi
at the second position (this is to avoid double counting and make the number of wedge and head quadruples per quadrangle equal):\[c^W_i = \frac{2Q_i}{q^W_i}\]Examples
>>> # Quadrangle graph >>> A = np.array([[0,1,0,1],[1,0,1,0],[0,1,0,1],[1,0,1,0]]) >>> PathCensus(A).qclust() i 0 1.0 1 1.0 2 1.0 3 1.0 dtype: float64
- property qdf: DataFrame
Data frame with quadruple/quadrangle counts per edge.
- simcoefs(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', census: bool = False, counts: Optional[pd.DataFrame] = None) pd.DataFrame [source]
Calculate similarity coefficients including clustering and closure coefficients when
mode="nodes"
or their node-wise averages whenmode="global"
.- Parameters
mode – Should node, edge or global counts be calculated.
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.census – If
True
then path census data is added. as columns in the front of the data frame.counts – Path counts data frame to use. Mostly for internal use.
Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).simcoefs("edges") sim i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 >>> PathCensus(A).simcoefs("nodes") sim tclust tclosure i 0 1.0 1.0 1.0 1 1.0 1.0 1.0 2 1.0 1.0 1.0 >>> PathCensus(A).simcoefs("global") sim_g sim tclust tclosure 0 1.0 1.0 1.0 1.0
- similarity(mode: Literal[Meta.mode] = 'nodes', *, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) Union[pd.Series, float] [source]
Structural similarity coefficients.
- Parameters
mode – Should it be calculated for nodes, edges or globally (equivalent to global clustering).
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of triangles including a focal node
i
to the total number of both wedge and head triples:\[s_i = \frac{4T_i}{t^W_i + t^H_i}\]Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).similarity("edges") i j 0 1 1.0 2 1.0 1 0 1.0 2 1.0 2 0 1.0 1 1.0 dtype: float64 >>> PathCensus(A).similarity("nodes") i 0 1.0 1 1.0 2 1.0 dtype: float64 >>> PathCensus(A).similarity("global") 1.0
- property strength: ndarray
Get strength sequence of the underlying graph (or degree sequence in the unweighted case).
- tclosure(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) pd.Series [source]
Triangle-based local closure coefficient (node-wise).
It is equivalent to local closure coefficient [YBL19].
- Parameters
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of the number of triangles including a focal node
i
to the number of head triples starting from it:\[s^H_i = \frac{2T_i}{t^H_i}\]Examples
>>> # Triangle graph >>> A = np.array([[0,1,1],[1,0,1],[1,1,0]]) >>> PathCensus(A).tclosure() i 0 1.0 1 1.0 2 1.0 dtype: float64
- tclust(*, undefined: Literal[Meta.undef] = 'nan', counts: Optional[pd.DataFrame] = None) pd.Series [source]
Triangle-based local clustering (node-wise).
It is equivalent to local clustering coefficient [WS98].
- Parameters
undefined – If
'nan'
the nodes with undefined values are treated as NaNs. If'zero'
then they are considered zeros.counts – Path counts data frame to use. Mostly for internal use.
Notes
It is defined as the ratio of triangles including a focal node
i
to the number of wedge triples centered at it:\[s^W_i = \frac{2T_i}{t^W_i}\]Examples
>>> # Triangle graph >>> A = np.array([[0,1,1], [1,0,1], [1,1,0]]) >>> PathCensus(A).tclust() i 0 1.0 1 1.0 2 1.0 dtype: float64
- property tdf: DataFrame
Data frame with triple/triangle counts per edge.
- property vcount: int
- property weighted: bool
pathcensus.types module
Custom type defintions.
pathcensus.utils module
Utility functions.
- pathcensus.utils.relclose(x1: ndarray, x2: ndarray, rtol: float = 1e-06) ndarray [source]
Are two arrays relatively close.
rtol
defines the maximum allowed relative difference betweenx1
andx2
relative to the magnitude ofx2
.
- pathcensus.utils.relerr(x1: ndarray, x2: ndarray) ndarray [source]
Relative error
|(x1 - x2)| / |x2|
.
- pathcensus.utils.rowsums(X: Union[ndarray, spmatrix]) ndarray [source]
Calculate row sums of a matrix.
- pathcensus.utils.set_seed(all: Optional[int] = None, *, random: Optional[int] = None, numpy: Optional[int] = None, numba: Optional[int] = None) None [source]
Set seeds of random number generators.
- Parameters
random – Seed value for
random
generator.numpy – Seed value for
numpy
generator.numba – Seed value for py:mod:numba generator.
all – Seed value used for all generators. Cannot be used jointly with other arguments.
- Raises
ValueError – If ‘all’ is used with other arguments or no seed is set.