Approximate statistical inference

Note

Additional examples and a demonstration of the correctness of inference submodule are presented in the examples subfolder in the github repo.

Approximate inference for arbitrary graph statistics, including structural coefficients, can be conducted using samples from appropriate Exponential Random Graph Models. The following generic algorithm can be used to solve a wide range of inferential problems:

  1. Calculate statistics of interest on an observed graph.

  2. Sample R randomized instances from an appropriate null model.

  3. Calculate graph statistics on null model samples.

  4. Compare observed and null model values.

Inference class implements the above approach. It is comaptible with any registered class of graph-like objects and any properly implemented subclass of pathcensus.nullmodels.base.ERGM representing a null model to sample from.

See also

pathcensus.graph for seemless pathcensus integration with arbitrary graph-like classes.

pathcensus.nullmodels for available null models.

This simulation-based approach is relatively efficient for graph- and node-level statistics but can be very computationally expensive when used for edge-level analyses. Hence, is this case it is often useful to use various coarse-graining strategies to reduce the number of unique combinations of values of sufficient statistics.

See also

pathcensus.graph.GraphABC for the abstract class for graph-like objects.

pathcensus.nullmodels for compatible ERGM classes.

pathcensus.inference.Inference.coarse_grain() for coarse-graining methods.

Below is a simple example of an estimation of p-values of node-wise structural similarity coefficients in an Erdős–Rényi random graph. The result, of course, should not be statistically significant. We use the default significance level of \(\alpha = 0.05\) and Benjamini-Hochberg FDR correction for multiple testing.

>>> import numpy as np
>>> from scipy import sparse as sp
>>> from pathcensus import PathCensus
>>> from pathcensus.inference import Inference
>>> from pathcensus.nullmodels import UBCM
>>> np.random.seed(34)
>>> # Generate ER random graph (roughly)
>>> A = sp.random(100, 100, density=0.05, dtype=int, data_rvs=lambda n: np.ones(n))
>>> A = (A + A.T).astype(bool).astype(int)
>>> ubcm = UBCM(A)
>>> err = ubcm.fit()
>>> infer = Inference(A, ubcm, lambda g: PathCensus(g).similarity())
>>> data, null = infer.init_comparison(100)
>>> pvals = infer.estimate_pvalues(data, null, alternative="greater")
>>> # Structural similarity coefficient values
>>> # should not be significant more often than 5% of times
>>> # (BH FDR correction is used)
>>> (pvals <= 0.05).mean() <= 0.05
True