pathcensus.nullmodels package

Submodules

pathcensus.nullmodels.base module

Exponential Random Graph Models (ERGM) with local constraints are such ERGMs in which sufficient statistics are defined at the level of individual nodes (or globally for the entire graph). In other words, their values for each node can be set independently. Unlike ERGMs with non-local constraints which are notoriously problematic (e.g. due to degenerate convergence and non-projectivity) they are analytically solvable. Prime examples of ERGMs with local constraints are configuration models which induce maximum entropy distributions over graphs with N nodes with arbitrary expected degree sequence and/or strength sequence constraints.

The pathcensus.nullmodels submodule implements several such ERGMs which are most appropriate for statistical calibration of strucutral coefficients. They can be applied to simple undirected and unweighted/weighted networks.

See also

ERGM

base class for ERGMs

pathcensus.nullmodels.ubcm

Undirected Binary Configuration Model (fixed expected degree sequence)

pathcensus.nullmodels.uecm

Undirected Enhanced Configuration Model (fixed expected degree and strength sequences assuming positive integer weights)

Note

The ERGM functionalities provided by pathcensus are simple wrappers around the NEMtropy package.

class pathcensus.nullmodels.base.ERGM(statistics: Union[ndarray, GraphABC], **kwds: Any)[source]

Bases: object

Generic base class for Exponential Random Graph Models with local (i.e. node-level) constraints.

statistics

2D (float) array with sufficient statistics for nodes. First axis is for nodes and second for differen statistics.

fit_args

Dictionary with arguments used in the last call of fit(). None if the model has not been fitted yet.

Notes

The following class attributes are required and need to be defined on concrete subclasses.

names

Mapping from names of sufficient statistics to attribute names in the NEMtropy solver class storing fitted model parameters. They must be provided in an order consistent with statistics. This is a class attribute which must be defined on subclasses implementing particular models. The mapping must have stable order (starting from python3.6 an ordinary dict will do). However, it is usually better to use mapping proxy objects instead of dicts as they are not mutable.

labels

Mapping from abbreviated labels to full names of sufficient statistics.

models

Model names as defined in NEMtropy allowed for the specific type of model. Must be implemented on a subclass as a class attribute. The first model on the list should will be used by default.

property Ewijfunc: Callable

JIT-compiled function calculating expected edge weights \(\mathbb{E}[w_{ij}]\) (conditional on being present) based on the model.

property X: Optional[ndarray]

Array with fitted model parameters (1D).

Raises

ValueError – If model is not fitted.

aliases = None
check_fitted() None[source]

Raise ValueError if model is not fitted.

default_fit_kwds = None
property default_model: str
default_rtol = 0.1
property directed: bool

Is model directed.

property error: ndarray

Get maximum overall absolute error of the fit.

property expected_statistics: ndarray

Model-based expected values of sufficient statistics.

extract_statistics(graph: GraphABC) ndarray[source]

Extract array of sufficient statistics from a graph-like object.

fit(model: Optional[str] = None, method: Literal['auto', 'newton', 'fixed-point'] = 'auto', **kwds) float[source]

Fit model parameters to the observed sufficient statistics and returns the overall maximum absolute error.

Parameters
  • model – Type of model to use. Default value defined in self.default_model is used when None.

  • method – Solver method to use. If "auto" then either Newton or fixed-point method is used depending on the number of nodes with the threshold defined by self.fp_threshold.

  • **kwds – Passed to NEMtropy solver method solve_tool.

Notes

Some of the **kwds may be prefilled (but can be overriden) with default values defined on default_fit_kwds class attribute.

Returns

Fitted model.

Return type

self

property fp_threshold: int

Threshold on the number of nodes after which by default the fixed-point solver is used instead of the Newton method solver.

property fullname: str

Full name of model. May be reimplemented on concrete subclass to allow using shortened class names.

get_P(*, dense: bool = False) Union[LinearOperator, ndarray][source]

Get matrix of edge probabilities.

Parameters

dense – If True then a dense array is returned. Otherwise a scipy.sparse.linalg.LinearOperator is returned.

get_W(*, dense: bool = False) Union[LinearOperator, ndarray][source]

Get matrix of expected edge weights.

Parameters

dense – If True then a dense array is returned. Otherwise a scipy.sparse.linalg.LinearOperator is returned.

Raises

NotImplementedError – If called on a model instance which is not weighted.

get_nemtropy_graph() Union[UndirectedGraph, DirectedGraph][source]

Get NEMtropy graph representation instance appropriate for a given type of model.

get_param(stat: Union[int, str]) ndarray[source]

Get parameter array associated with a given sufficient statistic.

None is returned if the model is not yet fitted.

Parameters

stat – Index or label of a sufficient statistic.

get_stat(stat: Union[int, str], expected: bool = False) ndarray[source]

Get sufficient statistic array by index or label.

Parameters
  • stat – Index or label of a sufficient statistic.

  • expected – Should observed or expected statistic be returned.

is_fitted() bool[source]

Check if model instance is fitted (this does not check quality of the fit).

is_valid(rtol: Optional[float] = None) bool[source]

Check if model is approximately correct or that the relative difference |expected - observed| / |observed| is not greater than rtol.

Parameters

rtol – Maximum allowed relative difference. Class attribute default_rtol is used when None.

property labels: Mapping

Mapping from short labels to full names corresponding to sufficient statistics.

methods = ('auto', 'newton', 'fixed-point')
property models: Tuple[str]
property n_nodes: int

Number of nodes in the underlying graph.

property n_stats: int

Number of sufficient statistics.

property names: Mapping

Mapping from names to NEMtropy solver attribute names corresponding to sufficient statistics.

property parameters: Optional[ndarray]

Array with fitted model parameters shaped as self.statistics.

Raises

ValueError – If model is not fitted.

property pijfunc: Callable

JIT-compiled function calculating \(p_{ij}\)’s based on the model.

property pmv: Callable

JIT-compiled function calculating \(Pv\) where \(P\) is the edge probability matrix and \(v\) is an arbitrary vector.

relerr() ndarray[source]

Get error of the fitted expected statistics relative to the observed sufficient statistics as |expected - observed| / |observed|.

property rpmv: Callable

JIT-compiled function calculating \(vP\) where \(P\) is the edge probability matrix and \(v\) is an abitrary vector.

property rwmv: Callable

JIT-compiled function calculating \(vW\) where \(W\) is the matrix of expected edge weights and \(v\) is an arbitrary vector.

sample(n: int) Iterable[spmatrix][source]

Generate n instances sampled from the model.

Yields

A – Graph instance represented as a sparse matrix (CSR format)

sample_one() spmatrix[source]

Sample a graph instance as sparse matrix from the model.

Returns

Graph instance represented as a sparse matrix (CSR format).

Return type

A

property solver: Union[UndirectedGraph, DirectedGraph]

NEMtropy graph solver instance.

validate(rtol: Optional[float] = None) None[source]

Raise ValueError if the relative difference |expected - observed| / |observed|, is greater than rtol.

Parameters

rtol – Maximum allowed relative difference. Class attribute default_rtol is used when None.

Returns

The same model instance if the error is not raised.

Return type

self

validate_statistics_shape(statistics: ndarray) None[source]

Raise ValueError if statistics has an incorrect shape which is not consistent with the class attribute cls.names.

validate_statistics_values(statistics: ndarray) None[source]

Raise if statistics contain incorrect values.

It must be implemented on a subclass.

Notes

Validation of the shape of statistics is implemented independently in validate_statistics_shape() which is a generic method which in most cases does not need to be implemented on subclasses.

property weighted: bool

Is model weighted.

property wijfunc: Callable

JIT-compiled function sampling edge weights \(w_{ij}\) based on the model.

property wmv: Callable

JIT-compiled function calculating \(Wv\) where \(W\) is the matrix of expected edge weights and \(v\) is an arbitrary vector.

class pathcensus.nullmodels.base.SoftConfigurationModel(statistics: Union[ndarray, GraphABC], **kwds: Any)[source]

Bases: ERGM

Base class for soft configuration models.

validate_statistics_values(statistics: ndarray) None[source]

Raise if degree sequence contains negative values.

class pathcensus.nullmodels.base.UndirectedSoftConfigurationModel(statistics: Union[ndarray, GraphABC], **kwds: Any)[source]

Bases: SoftConfigurationModel

Base class for undirected soft configuration models.

aliases = mappingproxy({'degree': 'd', 'strength': 's'})
property directed: bool

Is model directed.

pathcensus.nullmodels.base.get_pmv(X: ndarray, v: ndarray, pijfunc: Callable[[ndarray, int, int], float]) ndarray[source]

Calculate \(Pv\) where \(P\) is edge probability matrix and \(v\) an arbitrary vector.

Parameters
  • X – 1D array of model parameters.

  • v – Arbitrary vector.

  • pijfunc – JIT-compiled function (in no-python mode) calculating edge probabilities \(p_{ij}\). It should have the following signature: (X, i, j) -> float, where X is a 1D array of model parameters. The return value must be a float in [0, 1].

pathcensus.nullmodels.base.get_wmv(X: ndarray, v: ndarray, pijfunc: Callable[[ndarray, int, int], float], Ewijfunc: Callable[[ndarray, int, int], float]) ndarray[source]

Calculate \(Wv\) where \(W\) is expected edge weight matrix and \(v\) is an arbitrary vector.

Parameters
  • X – 1D array of model parameters.

  • v – Arbitrary vector.

  • pijfunc – JIT-compiled function (in no-python mode) calculating edge probabilities \(p_{ij}\). It should have the following signature: (X, i, j) -> float, where X is a 1D array of model parameters. The return value must be a float in [0, 1].

  • Ewijfunc – JIT-compiled function (in no-python mode) calculating expected edge weights \(\mathbb{E}[p_{ij}]\). It should have the following signature (X, i, j) -> float, where X is a 1D array of model parameters. The return value must be a positive float.

pathcensus.nullmodels.base.sample_edgelist_unweighted(X: ndarray, n_nodes: int, pijfunc: Callable[[ndarray, int, int], float]) ndarray[source]

Sample edgelist array from an ERGM.

Parameters
  • X – 1D array of model parameters.

  • n_nodes – Number of nodes in hte underlying graph.

  • pijfunc – JIT-compiled function (in no-python mode) calculating edge probabilities \(p_{ij}\). It should have the following signature: (X, i, j) -> float, where X is a 1D array of model parameters. The return value must be a float in [0, 1].

Returns

Edgelist array.

Return type

E

pathcensus.nullmodels.base.sample_edgelist_weighted(X: ndarray, n_nodes: int, pijfunc: Callable[[ndarray, int, int], float], wijfunc: Callable[[ndarray, int, int], Union[int, float]]) Tuple[ndarray, Optional[ndarray]][source]

Sample edgelist array from an ERGM.

Parameters
  • X – 1D array of model parameters.

  • n_nodes – Number of nodes in the underlying graph.

  • weighted – Is the model weighted

  • pijfunc – JIT-compiled function (in no-python mode) calculating edge probabilities \(p_{ij}\). It should have the following signature: (X, i, j) -> float, where X is a 1D array of model parameters. The return value must be a float in [0, 1].

  • wijfunc – JIT-compiled function (in no-python mode) sampling edge weights \(w_{ij}\). It should have the following signature: (X, i, j) -> float/int, where X is a 1D array of model arameters. The return value must be a positive int/float.

Returns

  • E – Edgelist array.

  • W – 1D array with edge weights.

pathcensus.nullmodels.ubcm module

Undirected Binary Configuration Model (UBCM) induces a maximum entropy probability distribution over networks of a given size such that it has a specific expected degree sequence. It can be used to model undirected unweighted networks. See [VBM+21] for details.

See also

UBCM

UBCM class

Examples

>>> # Make simple ER random graph using `igraph`
>>> import random
>>> import igraph as ig
>>> random.seed(101)
>>> G = ig.Graph.Erdos_Renyi(20, p=.2)
>>> # Initialize UBCM directly from the graph object
>>> ubcm = UBCM(G)
>>> # Alternatively, initialize from degree sequence array
>>> D = np.array(G.degree())
>>> ubcm = UBCM(D).fit()
>>> # Check fit error
>>> round(ubcm.error, 6)
0.0
>>> # Mean absolute deviation of the fitted expected degree sequence
>>> # from the observed sequence
>>> (np.abs(ubcm.ED - ubcm.D) <= 1e-6).all()
True
>>> # Sample a single ensemble instance
>>> ubcm.sample_one()    
<20x20 sparse matrix of type '<class 'numpy.uint8'>'
    with ... stored elements in Compressed Sparse Row format>
>>> # Sample multiple instances (generator)
>>> for instance in ubcm.sample(10): pass
class pathcensus.nullmodels.ubcm.UBCM(statistics: Union[ndarray, GraphABC], **kwds: Any)[source]

Bases: UndirectedSoftConfigurationModel

Undirected Binary Configuration Model.

This is a soft configuration model for undirected unweighted networks which belongs to the family of Exponential Random Graph Models (ERGMs) with local constraints. It induces a maximum entropy probability distribution over a set of networks with \(N\) nodes such that it yields a specific degree sequence on average.

statistics

2D (float) array with sufficient statistics for nodes. In this case there is only one sufficient statistic, that is, the degree sequence.

fit_args

Dictionary with arguments used in the last call of fit(). None if the model has not been fitted yet.

Notes

The following important class attributes are also defined:

labels

Mapping from abbreviated labels to full names identifying sufficient statistics.

models

Model names as defined in NEMtropy allowed for the specific type of model.

property D: ndarray

Observed degree sequence.

property ED: ndarray

Expected degree sequence.

default_fit_kwds = mappingproxy({'initial_guess': 'chung_lu'})
property expected_statistics: ndarray

Expected sufficient statistics.

extract_statistics(graph: GraphABC) ndarray[source]

Extract sufficient statistics from a graph-like object.

property fullname: str

Full name of model. May be reimplemented on concrete subclass to allow using shortened class names.

get_nemtropy_graph() UndirectedGraph[source]

Get NEMtropy graph representation instance.

models = ('cm_exp', 'cm')
names = mappingproxy({'degree': 'x'})
property pijfunc: Callable

JIT-compiled routine for calculating \(p_{ij}\).

property weighted: bool

Is model weighted.

pathcensus.nullmodels.ubcm.ubcm_pij(X: ndarray, i: int, j: int) float[source]

Calculate edge probability \(p_{ij}\) in UBCM model.

Parameters
  • X – 1D Array of model parameters.

  • i – Node indices.

  • j – Node indices.

pathcensus.nullmodels.uecm module

Undirected Enhanced Configuration Model (UECM) induces a maximum entropy probability distribution over networks of a given size such that it has specific expected degree and strength sequences. It can be used to model undirected weighted networks with edge weights being positive integers (with no upper bound). See [VBM+21] for details.

See also

UECM

UECM class

Examples

>>> import random
>>> import igraph as ig
>>> # Make a ER random graph with random integer weights
>>> random.seed(27732)
>>> G = ig.Graph.Erdos_Renyi(20, p=.2)
>>> G.es["weight"] = np.random.randint(1, 11, G.ecount())
>>> # Initialize UECM from the graph object
>>> uecm = UECM(G)
>>> # Alternatively initialize from an array of sufficient statistics
>>> # 1st column - degree sequence; 2nd column - strength sequence
>>> D = np.array(G.degree())
>>> S = np.array(G.strength(weights="weight"))
>>> stats = np.column_stack([D, S])
>>> uecm = UECM(stats).fit()
>>> # Check fit error
>>> round(uecm.error, 6)
0.0
>>> # Mean absolute deviation of the fitted expected degree sequence
>>> # from the observed sequence
>>> (np.abs(uecm.ED - uecm.D) <= 1e-6).all()
True
>>> # Mean absolute deviation of the fitted expected strength sequence
>>> # from the observed sequence
>>> (np.abs(uecm.ES - uecm.S) <= 1e-6).all()
True
>>> # Sample a single instance
>>> uecm.sample_one()    
<20x20 sparse matrix of type '<class 'numpy.int64'>'
    with ... stored elements in Compressed Sparse Row format>
>>> # Sample multiple instances (generator)
>>> for instance in uecm.sample(10): pass
class pathcensus.nullmodels.uecm.UECM(statistics: Union[ndarray, GraphABC], **kwds: Any)[source]

Bases: UndirectedSoftConfigurationModel

Undirected Enhanced Configuration Model.

This is a soft configuration model for undirected weighted networks with unbounded positive integer weights which belongs to the family of Exponential Random Graph Models (ERGMs) with local constraints. It induces a maximum entropy probability distribution over a set of networks with \(N\) nodes such that it yields a specific degree sequence and a specific strenght sequence on average.

statistics

2D (float) array with sufficient statistics for nodes. In this case there are two sufficient statistics, that is, the degree sequence and the strength sequence.

fit_args

Dictionary with arguments used in the last call of fit(). None if the model has not been fitted yet.

Notes

The following important class attributes are also defined:

labels

Mapping from abbreviated labels to full names identifying sufficient statistics.

models

Model names as defined in NEMtropy allowed for the specific type of model.

property D: ndarray

Observed degree sequence.

property ED: ndarray

Expected degree sequence.

property ES: ndarray

Expected strength sequence.

property Ewijfunc: Callable

JIT-compiled routing for calculating \(\mathbb{E}[w_{ij}]\) (conditional on the edge being present).

property S: ndarray

Observed strength sequence.

default_fit_kwds = mappingproxy({'initial_guess': 'strengths_minor'})
property expected_statistics: ndarray

Expected sufficient statistics.

extract_statistics(graph: GraphABC) ndarray[source]

Extract sufficient statistics from a graph-like object.

get_nemtropy_graph() UndirectedGraph[source]

Get NEMtropy graph representation instance.

models = ('ecm_exp', 'ecm')
names = mappingproxy({'degree': 'x', 'strength': 'y'})
property pijfunc: Callable

JIT-compiled routine for calculating \(p_{ij}\).

property weighted: bool

Is model weighted.

property wijfunc: Callable

JIT-compiled routine sampling \(w_{ij}\).

pathcensus.nullmodels.uecm.uecm_Ewij(X: ndarray, i: int, j: int) float[source]

Calculate expected edge weight \(\mathbb{E}[w_{ij}]\) (conditional on the edge being present) in UECM model.

Parameters
  • X – 1D array od model parameters.

  • i – Node indices.

  • j – Node indices.

pathcensus.nullmodels.uecm.uecm_pij(X: ndarray, i: int, j: int) float[source]

Calculate edge probability \(p_{ij}\) in UECM model.

Parameters
  • X – 1D array of model parameters.

  • i – Node indices.

  • j – Node indices.

pathcensus.nullmodels.uecm.uecm_wij(X: ndarray, i: int, j: int) int[source]

Sample edge weight \(w_{ij}\) in UECM model.

Parameters
  • X – 1D Array of model parameters.

  • i – Node indices.

  • j – Node indices.

Module contents

Null model classes implementing different variants of the configuration model.

The classes implemented in this module are simple wrappers around NEMtropy package.