Implementation

This page describes the internal flow of variance and the order in which input validation, selection of the unweighted or weighted branch, denominator checks, and final squared-deviation calculation are applied. Unlike Overview, the goal here is not to repeat the signature, but to show how the function actually builds the variance.

The following snippets are taken from the current src/mespy/stats_utils.py implementation. Private helpers are mentioned only to clarify the flow; complete details are documented in _as_float_vector and _validate_weights.

Execution sequence

The implementation follows this sequence:

  1. Converts x into a one-dimensional, finite float64 vector.

  2. If w is provided, it validates it as a compatible vector with strictly positive values and positive sum.

  3. If w is None, it uses the unweighted branch with denominator len(x) - ddof.

  4. If w is provided, it uses the weighted branch with denominator sum(w) - ddof.

  5. In both branches it first computes the mean consistent with the current case, then the sum of squared deviations.

  6. If the denominator is less than or equal to zero, it stops the calculation with ValueError.

Unweighted and weighted branches

def variance(x: ArrayLike, w: ArrayLike | None = None, ddof: int | float = 0) -> float:
    values = _as_float_vector("x", x)
    weights = _validate_weights(values, w)

    if weights is None:
        n = values.size
        denom = n - ddof
        if denom <= 0:
            raise ValueError(
                f"denominatore non positivo: len(x) - ddof = {n} - {ddof}"
            )

        mean_x = float(np.mean(values))
        return float(np.sum((values - mean_x) ** 2) / denom)

    w_sum = float(np.sum(weights))
    denom = w_sum - ddof
    if denom <= 0:
        raise ValueError(
            f"denominatore non positivo: sum(w) - ddof = {w_sum} - {ddof}"
        )

    mean_x_w = float(np.sum(weights * values) / w_sum)
    return float(np.sum(weights * (values - mean_x_w) ** 2) / denom)

The core of the function is the clear separation between two operational definitions of variance.

  • In the unweighted case the effective cardinality is n = values.size.

  • In the weighted case the normalization quantity becomes w_sum = sum(weights).

  • In both cases ddof acts by subtracting a correction from the denominator, but the base it applies to changes.

Implemented formulas

In the unweighted branch the function uses

\[ \bar{x} = \frac{1}{n}\sum_i x_i, \qquad \mathrm{Var}(x) = \frac{\sum_i (x_i - \bar{x})^2}{n - \mathrm{ddof}}. \]

In the weighted branch it instead uses

\[ \bar{x}_w = \frac{\sum_i w_i x_i}{\sum_i w_i}, \qquad \mathrm{Var}_w(x) = \frac{\sum_i w_i (x_i - \bar{x}_w)^2}{\sum_i w_i - \mathrm{ddof}}. \]

The function does not delegate to numpy.var(...): it explicitly builds the mean consistent with the selected branch and then sums the squared deviations.

Role of ddof and error conditions

The ddof parameter does not only change the statistical interpretation of the result: it can make the computation impossible.

  • In the unweighted case the code requires len(x) - ddof > 0.

  • In the weighted case it requires sum(w) - ddof > 0.

  • If one of these conditions fails, the function raises ValueError before performing the final division.

The default ddof=0 is consistent with the descriptive setup of the package: by default the function computes a population variance, not the version with Bessel’s correction.

Important interactions between parameters

Some aspects of the behavior are worth making explicit.

  • w=None does not mean “all weights equal” implemented by hand: it explicitly means using the unweighted branch based on np.mean(values).

  • If all weights are equal and ddof=0, the value matches a variance computed with uniform weights, but the denominator in the weighted branch remains sum(w), not len(x).

  • The function also accepts ddof as a float, so the correction is not limited to integers.

  • Gli errori legati a input vuoti, non monodimensionali o non finiti vengono sempre intercettati all’inizio da _as_float_vector, mentre gli errori specifici del ramo pesato passano da _validate_weights.

Commented example

from mespy import variance

x = [1.0, 2.0, 3.0]
w = [1.0, 1.0, 2.0]

print(variance(x))        # ramo non pesato
print(variance(x, w=w))   # ramo pesato
print(variance(x, ddof=1))

In this example the same function is used in three different ways.

  • The first call uses n - ddof with ddof=0.

  • The second instead uses sum(w) - ddof, so the normalization depends on the weights.

  • The third keeps the unweighted case but applies an explicit correction to the denominator.