Approximate statistical inference

Note

Additional examples and a demonstration of the correctness of inference submodule are presented in the examples subfolder in the github repo.

Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:

Calculate statistics of interest on an observed graph.
Sample R randomized instances from an appropriate null model.
Calculate graph statistics on null model samples.
Compare observed and null model values.

Inference class implements the above approach. It is comaptible with any registered class of graph-like objects and any properly implemented subclass of pathcensus.nullmodels.base.ERGM representing a null model to sample from.

See also

pathcensus.graph.GraphABC for the abstract class for graph-like objects.

pathcensus.nullmodels for compatible ERGM classes.

pathcensus.inference.Inference.coarse_grain() for coarse-graining methods.

Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.

>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True