DensityTree#
- class lintsampler.DensityTree(mins, maxs, pdf, vectorizedpdf=False, pdf_args=(), pdf_kwargs={}, min_openings=1, usecache=True, batch=False)#
Tree-like object over which density function is evaluated.
DensityTreeuses the parametersminsandmaxesto construct the k-dimensional root cell of the tree, evaluating densities on the 2^k corners of the cell with the givenpdffunction. Ifmin_openingsis non-zero, then the root is opened into a series of children, and the children are successively opened, each time with thepdffunction being evaluated on the corners. After construction, therefinemethod can be used to further open the tree. During all of these cell openings, the tree uses a cache to ensure that a density evaluated on a parent is not re-evaluated on a child (although this functionality can be turned off with theusecacheflag).See the examples below for the various usage patterns.
- Parameters:
- minsscalar or 1D iterable
For k-dimensional structure, length-k array giving coordinate minima along all axes (e.g., bottom-left corner in 2D). In one dimension, can simply provide single number.
- maxsscalar or 1D iterable
For k-dimensional structure, length-k array giving coordinate maxima along all axes (e.g., top-right corner in 2D). In one dimension, can simply provide single number.
- pdffunction
Probability density function to evaluate on grid. Function should take coordinate vector (or batch of vectors if vectorized; see vectorizedpdf parameter) and return (unnormalised) density (or batch of densities). Additional arguments can be passed to the function via pdf_args and pdf_kwargs parameters.
- vectorizedpdfbool, optional
if True, assumes that the pdf function is vectorized, i.e., it accepts (…, k)-shaped batches of coordinate vectors and returns (…)-shaped batches of densities. If False, assumes that the pdf function simply accepts (k,)-shaped coordinate vectors and returns single densities. Default is False.
- pdf_argstuple, optional
Additional positional arguments to pass to pdf function; function call is pdf(position, *pdf_args, **pdf_kwargs). Default is empty tuple (no additional positional arguments).
- pdf_kwargsdict, optional
Additional keyword arguments to pass to pdf function; function call is pdf(position, *pdf_args, **pdf_kwargs). Default is empty dict (no additional keyword arguments).
- min_openingsint, optional
Number of full tree openings to perform on initialisation. This is distinct from any further openings that happen on refining. Default is 1.
- usecachebool, optional
Whether to use the cache to store density evaluations, so that densities evaluated on a parent are not later re-evaluated on a child. It is generally recommended to use this, unless a PDF function is so fast that that re-evaluations are cheaper than cache lookups. The cache is not used if
batch=True(see below). Default is True.- batchbool, optional
When creating the 2^k child cells of a given parent cell, whether to evaluate their various vertex densities in a single batch. This is incompatible with the density cache, so is switched off by default. However, there might be circumstances where
batch=Trueplususecache=Falseis faster than vice versa. Default is False.
Examples
These examples demonstrate the various ways to set up and use an instance of
DensityTree.A basic tree in 1D
In the simplest case, we only need three parameters to construct a tree: the minimum coordinate bound(s), the maximum coordinate bound(s), and the pdf function to evaluate on the tree. In the 1D case, we can just use single numbers for the coordinate bounds, and in this example we’ll use the
scipyimplementation of the standard normal distribution for the PDF.>>> from scipy.stats import norm >>> tree = DensityTree(-10, 10, pdf=norm.pdf)
This sets up a one-dimensional tree, bounded by [-10, 10]. Let’s interrogate some of the attributes describing the geometry of the tree:
>>> tree.mins array([-10]) >>> tree.maxs array([10]) >>> tree.dim 1
minsandmaxsare the same as the input parameters, but cast to arrays.dimis an integer describing the dimensionality of the space.The attribute root gives a reference to the root cell of the tree:
>>> tree.root <lintsampler.density_structures.tree._TreeCell at 0x7fd1b2b34740>
This is an instance of the private _TreeCell class.
By default,
min_openingsis 1, so the root cell will be opened into (in one dimension) 2 children. These can be accessed via theleavesattribute:>>> tree.leaves [<lintsampler.density_structures.tree._TreeCell at 0x7fc6f2d6f560>, <lintsampler.density_structures.tree._TreeCell at 0x7fc6f3cbf320>]
We can get the total probability mass of the tree via the
total_massattribute:>>> tree.total_mass 3.989422804014327
Because we’re using a properly normalised Gaussian PDF, this should actually be unity. However, our estimate of the integral is quite bad because we are essentially using the trapezoid approximation to the integral with only two trapezoids.
1D tree with more full openings
In the example above, we saw that the tree was not able to give a good approximation for the integral under the normal distribution with only two leaves, because the resolution is too low. One way to improve this is to increase the
min_openingsparameter, which sets how many times the initial root cell is opened.>>> from scipy.stats import norm >>> tree = DensityTree(-10, 10, pdf=norm.pdf, min_openings=3) >>> len(tree.leaves) 8 >>> tree.total_mass 1.0850046370702153
Now, we have done 3 full openings, giving us 2^3=8 leaves. As a consequence, the estimate of the total probability mass is greatly improved.
3D tree
We can go beyond one dimension by providing
minsandmaxsparameters which are iterables rather than scalars. For the PDF here, we can use the multivariate normal implementation in scipy:>>> from scipy.stats import multivariate_normal >>> pdf = multivariate_normal(np.zeros(3), np.eye(3)).pdf >>> tree = DensityTree([10, 100, 1000], [20, 200, 2000], pdf=pdf) >>> tree.dim 3
Refinement
As well as
min_openings, another way to increase the resolution of the tree is therefinemethod. Instead of opening all the leaves in the tree, this uses a Romberg integration method to try to find the leaves which would be most helpful to open (i.e., some combination of the most massive and most erroneous leaves). The only required parameter ofrefineistree_tol: a fractional tolerance level, which can be roughly understood as the tolerance level on the total mass error of the tree. See the docstring of that method for more details about its various parameter options and settings.>>> from scipy.stats import norm >>> tree = DensityTree(-10, 10, pdf=norm.pdf) >>> tree.refine(1e-3) >>> tree.total_mass 1.0020293093918315
Vectorized PDF calls
By default, it is assumed that the PDF function takes single k-vectors and returns scalar densities. However, if a PDF function is vectorized such that it takes a batch of positions shaped (N, k) and returns a batch of densities shaped (N,) then we can let
DensityTreeknow via thevectorizedpdfflag, which will hopefully speed up tree construction and refinement.As it happens, the
scipyPDF functions we were using above are all vectorized, so:>>> DensityTree(-10, 10, pdf=norm.pdf, vectorizedpdf=True)
get_leaf_at_posThe method
get_leaf_at_posis a useful method which can find the leaf on the tree that contains a given position.>>> from scipy.stats import multivariate_normal >>> pdf = multivariate_normal(mean=np.ones(2), cov=np.eye(2)).pdf >>> tree = DensityTree([-5, -5], [5, 5], pdf=pdf, vectorizedpdf=True) >>> tree.refine(1e-2) >>> tree.get_leaf_at_pos(np.array([1.02, -3.74])) <lintsampler.density_structures.tree._TreeCell at 0x7f9003228680>
This returns a reference to the relevant leaf.
- Attributes:
mins1D array-likeMinimum boundary values (i.e., first corner) of the structure.
maxs1D array-likeMaximum boundary values (i.e., last corner) of the structure.
dimintDimension of the density structure.
total_massfloatTotal probability mass of the structure.
- root
TreeCell Root cell of tree.
- leaveslist
List of
TreeCellinstances representing leaves of tree.- usecachebool
See corresponding parameter in constructor.
- batchbool
See corresponding parameter in constructor.
Methods
choose_cells(u)Choose cells given 1D array of uniform samples.
get_leaf_at_pos(pos)Find leaf cell which contains given position.
refine(tree_tol[, leaf_tol, ...])Refine the tree by opening leaves with large estimated mass errors.
- choose_cells(u)#
Choose cells given 1D array of uniform samples.
Method enforced by base class
DensityStructure.- Parameters:
- u1D array of floats, shape (N,)
Array of uniform samples ~ U(0, 1).
- Returns:
- mins2D array of floats, shape (N, k)
Coordinate vector of first corner of each cell.
- maxs2D array of floats, shape (N, k)
Coordinate vector of last corner of each cell.
- corners2^k-tuple of 1D arrays, each length N
Densities at corners of given cells. Conventional ordering applies, e.g., in 3D: (f000, f001, f010, f011, f100, f101, f110, f111)
- property dim#
Dimension of the density structure.
- Returns:
- int
The dimension of the structure.
- get_leaf_at_pos(pos)#
Find leaf cell which contains given position.
Walk down the tree until at leaf cell which contains given position.
- Parameters:
- posscalar or 1D iterable
Position at which to get leaf cell. Can be scalar or length-1 iterable in 1D, otherwise should be iterable with length equal to dimensionality of tree (
tree.dim).
- Returns:
- cellTreeCell
Leaf cell containing given position.
- property maxs#
Maximum boundary values (i.e., last corner) of the structure.
- Returns:
- array_like
1D array, length-k, giving the coordinates of the last corner of the k-dimensional structure.
- property mins#
Minimum boundary values (i.e., first corner) of the structure.
- Returns:
- array_like
1D array, length-k, giving the coordinates of the first corner of the k-dimensional structure.
- refine(tree_tol, leaf_tol=0.01, leaf_mass_contribution_tol=0.01, verbose=False)#
Refine the tree by opening leaves with large estimated mass errors.
Refinement uses a strategy based on Romberg integration, and happens in two stages. In the first stage, the tree leaves are repeatedly looped over and opened if their individual ‘errors’ (estimated from the difference between the last two Romberg-integrated masses) are above the given tolerance threshold. In the second stage, the most erroneous leaf on the tree is repeatedly found and opened until the total error on the tree is below the given threshold.
- Parameters:
- tree_tolfloat
Total tree error tolerance level used in the second refinement loop. This is a fractional error, i.e., a leaf is opened if the tree’s total mass error divided by the tree’s total mass is above this threshold.
- leaf_tolfloat, optional
Individual leaf error tolerance level used in the first refinement loop. This is a fractional error, i.e., a leaf is opened if the leaf’s mass error divided by the leaf’s mass is above this threshold. Default: 0.01, or 1% fractional mass error per leaf.
- leaf_mass_contribution_tol: float, optional
Individual leaf fractional mass below which the leav will not be refined. The parameter acts as a prefactor to compare leaves to the mean mass of all leaves in the tree. Default: 0.01, or leaves with 1% of the mass of the mean leaf.
- verbosebool, optional
If True, print messages at every loop iteration. Default: False.
- Returns:
- None
- property total_mass#
Total probability mass of the structure.
- Returns:
- float
The total probability mass, summed over all the cells of the structure.