Approximate statistical inference
Note
Additional examples and a demonstration of the correctness
of inference
submodule are presented in the examples
subfolder in the github repo.
Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:
Calculate statistics of interest on an observed graph.
Sample
R
randomized instances from an appropriate null model.Calculate graph statistics on null model samples.
Compare observed and null model values.
Inference
class implements the above approach.
It is comaptible with any registered class of graph-like objects
and any properly implemented subclass of
pathcensus.nullmodels.base.ERGM
representing a null model
to sample from.
See also
pathcensus.graph
for seemless pathcensus
integration
with arbitrary graph-like classes.
pathcensus.nullmodels
for available null models.
This simulation-based approach is relatively efficient for graph- and node-level statistics but can be very computationally expensive when used for edge-level analyses. Hence, is this case it is often useful to use various coarse-graining strategies to reduce the number of unique combinations of values of sufficient statistics.
See also
pathcensus.graph.GraphABC
for the abstract class
for graph-like objects.
pathcensus.nullmodels
for compatible ERGM classes.
pathcensus.inference.Inference.coarse_grain()
for coarse-graining methods.
Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.
>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True