Index to site:
Density estimation in Chameleon Statistics
Data ‘samples’ are by definition representatives of some unknown ‘population’ (e.g. surveys). The goal of density estimation is to model the distribution of data in the population based upon the distribution of the samples.
Large sample sets will more accurately reflect the population than small ones. Accurate estimation of population distributions from samples of limited size is an important but challenging problem.
Consider the sample data set on the left. We would like to estimate the population distribution from which these samples are derived. Most density-estimators are based upon one or more of the following techniques:
(a) Spatial binning methods.
binning methods simply partition the space into regular blocks
and count the number of samples in each block. The population density estimate
within each block is given by the number of samples per unit volume for the
In nearest-nearest-neighbor methods, the population density estimate for a test point is obtained by measuring the volume V of the ball containing its k nearest points. The associated density estimate is given by the ratio k/V.
Properties: The overall shapes of population distributions are generally modeled well. The density estimate tends to be good inside the clusters where sample points are plentiful, but overestimates in the tails as a result of which the overall distribution is non-integrable. Density estimates reveal sudden sharp fluctuations. Like kernel methods, small values of k tend to overfit the sample, while large values oversmooth, leaving the problem of selecting the optimal value for the parameter.
(Gaussian) mixture models attempt to find the superposition of Gaussians which best accounts for the sample data.
Properties: Continuous and robust density estimates are obtained with good asymptotic properties. The method can in principle model any shape of cluster, and works best when the population is described well by a mixture of Gaussians. The method typically requires large sample sizes for accuracy. Serious degradation of results can occur as the number of variables increases. In practice the method also has difficulty modeling complex geometries and topologies.
kernel-based methods, each
point is spread out over a region determined by the “kernel” function
(usually flat or bell-shaped).
Properties: Continuous normalized density estimates are obtained. The estimates have good asymptotic properties. The estimation quality depends on wise selection of the local spread? Too small a spread generates estimates which undulate greatly, and too large a spread leads to over smooth estimates leading to loss of shape.
Seventh Sense Software actively researches into advanced density estimation
techniques, and some of our discoveries are summarized on the Algorithms
Payment can be made by American Express, MasterCard, VISA, money order, or checks drawn on US banks. Institutional purchase orders are also accepted within the USA. Call, FAX, or e-mail to place your order today! Orders will be sent out promptly. Shipping costs. Return policy. Try our convenient Currency converter.