Implementation
This page describes the internal flow of variance and the order in which input validation, selection of the unweighted or weighted branch, denominator checks, and final squared-deviation calculation are applied. Unlike Overview, the goal here is not to repeat the signature, but to show how the function actually builds the variance.
The following snippets are taken from the current src/mespy/stats_utils.py implementation. Private helpers are mentioned only to clarify the flow; complete details are documented in _as_float_vector and _validate_weights.
Execution sequence
The implementation follows this sequence:
Converts
xinto a one-dimensional, finitefloat64vector.If
wis provided, it validates it as a compatible vector with strictly positive values and positive sum.If
w is None, it uses the unweighted branch with denominatorlen(x) - ddof.If
wis provided, it uses the weighted branch with denominatorsum(w) - ddof.In both branches it first computes the mean consistent with the current case, then the sum of squared deviations.
If the denominator is less than or equal to zero, it stops the calculation with
ValueError.
Unweighted and weighted branches
def variance(x: ArrayLike, w: ArrayLike | None = None, ddof: int | float = 0) -> float:
values = _as_float_vector("x", x)
weights = _validate_weights(values, w)
if weights is None:
n = values.size
denom = n - ddof
if denom <= 0:
raise ValueError(
f"denominatore non positivo: len(x) - ddof = {n} - {ddof}"
)
mean_x = float(np.mean(values))
return float(np.sum((values - mean_x) ** 2) / denom)
w_sum = float(np.sum(weights))
denom = w_sum - ddof
if denom <= 0:
raise ValueError(
f"denominatore non positivo: sum(w) - ddof = {w_sum} - {ddof}"
)
mean_x_w = float(np.sum(weights * values) / w_sum)
return float(np.sum(weights * (values - mean_x_w) ** 2) / denom)
The core of the function is the clear separation between two operational definitions of variance.
In the unweighted case the effective cardinality is
n = values.size.In the weighted case the normalization quantity becomes
w_sum = sum(weights).In both cases
ddofacts by subtracting a correction from the denominator, but the base it applies to changes.
Implemented formulas
In the unweighted branch the function uses
In the weighted branch it instead uses
The function does not delegate to numpy.var(...): it explicitly builds the mean consistent with the selected branch and then sums the squared deviations.
Role of ddof and error conditions
The ddof parameter does not only change the statistical interpretation of the result: it can make the computation impossible.
In the unweighted case the code requires
len(x) - ddof > 0.In the weighted case it requires
sum(w) - ddof > 0.If one of these conditions fails, the function raises
ValueErrorbefore performing the final division.
The default ddof=0 is consistent with the descriptive setup of the package: by default the function computes a population variance, not the version with Bessel’s correction.
Important interactions between parameters
Some aspects of the behavior are worth making explicit.
w=Nonedoes not mean “all weights equal” implemented by hand: it explicitly means using the unweighted branch based onnp.mean(values).If all weights are equal and
ddof=0, the value matches a variance computed with uniform weights, but the denominator in the weighted branch remainssum(w), notlen(x).The function also accepts
ddofas afloat, so the correction is not limited to integers.Gli errori legati a input vuoti, non monodimensionali o non finiti vengono sempre intercettati all’inizio da
_as_float_vector, mentre gli errori specifici del ramo pesato passano da_validate_weights.
Commented example
from mespy import variance
x = [1.0, 2.0, 3.0]
w = [1.0, 1.0, 2.0]
print(variance(x)) # ramo non pesato
print(variance(x, w=w)) # ramo pesato
print(variance(x, ddof=1))
In this example the same function is used in three different ways.
The first call uses
n - ddofwithddof=0.The second instead uses
sum(w) - ddof, so the normalization depends on the weights.The third keeps the unweighted case but applies an explicit correction to the denominator.