DensityTree#

class lintsampler.DensityTree(mins, maxs, pdf, vectorizedpdf=False, pdf_args=(), pdf_kwargs={}, min_openings=1, usecache=True, batch=False)#

Tree-like object over which density function is evaluated.

DensityTree uses the parameters mins and maxes to construct the k-dimensional root cell of the tree, evaluating densities on the 2^k corners of the cell with the given pdf function. If min_openings is non-zero, then the root is opened into a series of children, and the children are successively opened, each time with the pdf function being evaluated on the corners. After construction, the refine method can be used to further open the tree. During all of these cell openings, the tree uses a cache to ensure that a density evaluated on a parent is not re-evaluated on a child (although this functionality can be turned off with the usecache flag).

See the examples below for the various usage patterns.

Parameters:
minsscalar or 1D iterable

For k-dimensional structure, length-k array giving coordinate minima along all axes (e.g., bottom-left corner in 2D). In one dimension, can simply provide single number.

maxsscalar or 1D iterable

For k-dimensional structure, length-k array giving coordinate maxima along all axes (e.g., top-right corner in 2D). In one dimension, can simply provide single number.

pdffunction

Probability density function to evaluate on grid. Function should take coordinate vector (or batch of vectors if vectorized; see vectorizedpdf parameter) and return (unnormalised) density (or batch of densities). Additional arguments can be passed to the function via pdf_args and pdf_kwargs parameters.

vectorizedpdfbool, optional

if True, assumes that the pdf function is vectorized, i.e., it accepts (…, k)-shaped batches of coordinate vectors and returns (…)-shaped batches of densities. If False, assumes that the pdf function simply accepts (k,)-shaped coordinate vectors and returns single densities. Default is False.

pdf_argstuple, optional

Additional positional arguments to pass to pdf function; function call is pdf(position, *pdf_args, **pdf_kwargs). Default is empty tuple (no additional positional arguments).

pdf_kwargsdict, optional

Additional keyword arguments to pass to pdf function; function call is pdf(position, *pdf_args, **pdf_kwargs). Default is empty dict (no additional keyword arguments).

min_openingsint, optional

Number of full tree openings to perform on initialisation. This is distinct from any further openings that happen on refining. Default is 1.

usecachebool, optional

Whether to use the cache to store density evaluations, so that densities evaluated on a parent are not later re-evaluated on a child. It is generally recommended to use this, unless a PDF function is so fast that that re-evaluations are cheaper than cache lookups. The cache is not used if batch=True (see below). Default is True.

batchbool, optional

When creating the 2^k child cells of a given parent cell, whether to evaluate their various vertex densities in a single batch. This is incompatible with the density cache, so is switched off by default. However, there might be circumstances where batch=True plus usecache=False is faster than vice versa. Default is False.

Examples

These examples demonstrate the various ways to set up and use an instance of DensityTree.

  • A basic tree in 1D

    In the simplest case, we only need three parameters to construct a tree: the minimum coordinate bound(s), the maximum coordinate bound(s), and the pdf function to evaluate on the tree. In the 1D case, we can just use single numbers for the coordinate bounds, and in this example we’ll use the scipy implementation of the standard normal distribution for the PDF.

    >>> from scipy.stats import norm
    >>> tree = DensityTree(-10, 10, pdf=norm.pdf)
    

    This sets up a one-dimensional tree, bounded by [-10, 10]. Let’s interrogate some of the attributes describing the geometry of the tree:

    >>> tree.mins
    array([-10])
    >>> tree.maxs
    array([10])
    >>> tree.dim
    1
    

    mins and maxs are the same as the input parameters, but cast to arrays. dim is an integer describing the dimensionality of the space.

    The attribute root gives a reference to the root cell of the tree:

    >>> tree.root
    <lintsampler.density_structures.tree._TreeCell at 0x7fd1b2b34740>
    

    This is an instance of the private _TreeCell class.

    By default, min_openings is 1, so the root cell will be opened into (in one dimension) 2 children. These can be accessed via the leaves attribute:

    >>> tree.leaves
    [<lintsampler.density_structures.tree._TreeCell at 0x7fc6f2d6f560>,
     <lintsampler.density_structures.tree._TreeCell at 0x7fc6f3cbf320>]
    

    We can get the total probability mass of the tree via the total_mass attribute:

    >>> tree.total_mass
    3.989422804014327
    

    Because we’re using a properly normalised Gaussian PDF, this should actually be unity. However, our estimate of the integral is quite bad because we are essentially using the trapezoid approximation to the integral with only two trapezoids.

  • 1D tree with more full openings

    In the example above, we saw that the tree was not able to give a good approximation for the integral under the normal distribution with only two leaves, because the resolution is too low. One way to improve this is to increase the min_openings parameter, which sets how many times the initial root cell is opened.

    >>> from scipy.stats import norm
    >>> tree = DensityTree(-10, 10, pdf=norm.pdf, min_openings=3)
    >>> len(tree.leaves)
    8
    >>> tree.total_mass
    1.0850046370702153
    

    Now, we have done 3 full openings, giving us 2^3=8 leaves. As a consequence, the estimate of the total probability mass is greatly improved.

  • 3D tree

    We can go beyond one dimension by providing mins and maxs parameters which are iterables rather than scalars. For the PDF here, we can use the multivariate normal implementation in scipy:

    >>> from scipy.stats import multivariate_normal
    >>> pdf = multivariate_normal(np.zeros(3), np.eye(3)).pdf
    >>> tree = DensityTree([10, 100, 1000], [20, 200, 2000], pdf=pdf)
    >>> tree.dim
    3
    
  • Refinement

    As well as min_openings, another way to increase the resolution of the tree is the refine method. Instead of opening all the leaves in the tree, this uses a Romberg integration method to try to find the leaves which would be most helpful to open (i.e., some combination of the most massive and most erroneous leaves). The only required parameter of refine is tree_tol: a fractional tolerance level, which can be roughly understood as the tolerance level on the total mass error of the tree. See the docstring of that method for more details about its various parameter options and settings.

    >>> from scipy.stats import norm
    >>> tree = DensityTree(-10, 10, pdf=norm.pdf)
    >>> tree.refine(1e-3)
    >>> tree.total_mass
    1.0020293093918315
    
  • Vectorized PDF calls

    By default, it is assumed that the PDF function takes single k-vectors and returns scalar densities. However, if a PDF function is vectorized such that it takes a batch of positions shaped (N, k) and returns a batch of densities shaped (N,) then we can let DensityTree know via the vectorizedpdf flag, which will hopefully speed up tree construction and refinement.

    As it happens, the scipy PDF functions we were using above are all vectorized, so:

    >>> DensityTree(-10, 10, pdf=norm.pdf, vectorizedpdf=True)
    
  • get_leaf_at_pos

    The method get_leaf_at_pos is a useful method which can find the leaf on the tree that contains a given position.

    >>> from scipy.stats import multivariate_normal
    >>> pdf = multivariate_normal(mean=np.ones(2), cov=np.eye(2)).pdf
    >>> tree = DensityTree([-5, -5], [5, 5], pdf=pdf, vectorizedpdf=True)
    >>> tree.refine(1e-2)
    >>> tree.get_leaf_at_pos(np.array([1.02, -3.74]))
    <lintsampler.density_structures.tree._TreeCell at 0x7f9003228680>
    

    This returns a reference to the relevant leaf.

Attributes:
mins1D array-like

Minimum boundary values (i.e., first corner) of the structure.

maxs1D array-like

Maximum boundary values (i.e., last corner) of the structure.

dimint

Dimension of the density structure.

total_massfloat

Total probability mass of the structure.

rootTreeCell

Root cell of tree.

leaveslist

List of TreeCell instances representing leaves of tree.

usecachebool

See corresponding parameter in constructor.

batchbool

See corresponding parameter in constructor.

Methods

choose_cells(u)

Choose cells given 1D array of uniform samples.

get_leaf_at_pos(pos)

Find leaf cell which contains given position.

refine(tree_tol[, leaf_tol, ...])

Refine the tree by opening leaves with large estimated mass errors.

choose_cells(u)#

Choose cells given 1D array of uniform samples.

Method enforced by base class DensityStructure.

Parameters:
u1D array of floats, shape (N,)

Array of uniform samples ~ U(0, 1).

Returns:
mins2D array of floats, shape (N, k)

Coordinate vector of first corner of each cell.

maxs2D array of floats, shape (N, k)

Coordinate vector of last corner of each cell.

corners2^k-tuple of 1D arrays, each length N

Densities at corners of given cells. Conventional ordering applies, e.g., in 3D: (f000, f001, f010, f011, f100, f101, f110, f111)

property dim#

Dimension of the density structure.

Returns:
int

The dimension of the structure.

get_leaf_at_pos(pos)#

Find leaf cell which contains given position.

Walk down the tree until at leaf cell which contains given position.

Parameters:
posscalar or 1D iterable

Position at which to get leaf cell. Can be scalar or length-1 iterable in 1D, otherwise should be iterable with length equal to dimensionality of tree (tree.dim).

Returns:
cellTreeCell

Leaf cell containing given position.

property maxs#

Maximum boundary values (i.e., last corner) of the structure.

Returns:
array_like

1D array, length-k, giving the coordinates of the last corner of the k-dimensional structure.

property mins#

Minimum boundary values (i.e., first corner) of the structure.

Returns:
array_like

1D array, length-k, giving the coordinates of the first corner of the k-dimensional structure.

refine(tree_tol, leaf_tol=0.01, leaf_mass_contribution_tol=0.01, verbose=False)#

Refine the tree by opening leaves with large estimated mass errors.

Refinement uses a strategy based on Romberg integration, and happens in two stages. In the first stage, the tree leaves are repeatedly looped over and opened if their individual ‘errors’ (estimated from the difference between the last two Romberg-integrated masses) are above the given tolerance threshold. In the second stage, the most erroneous leaf on the tree is repeatedly found and opened until the total error on the tree is below the given threshold.

Parameters:
tree_tolfloat

Total tree error tolerance level used in the second refinement loop. This is a fractional error, i.e., a leaf is opened if the tree’s total mass error divided by the tree’s total mass is above this threshold.

leaf_tolfloat, optional

Individual leaf error tolerance level used in the first refinement loop. This is a fractional error, i.e., a leaf is opened if the leaf’s mass error divided by the leaf’s mass is above this threshold. Default: 0.01, or 1% fractional mass error per leaf.

leaf_mass_contribution_tol: float, optional

Individual leaf fractional mass below which the leav will not be refined. The parameter acts as a prefactor to compare leaves to the mean mass of all leaves in the tree. Default: 0.01, or leaves with 1% of the mass of the mean leaf.

verbosebool, optional

If True, print messages at every loop iteration. Default: False.

Returns:
None
property total_mass#

Total probability mass of the structure.

Returns:
float

The total probability mass, summed over all the cells of the structure.