Implementation

This page describes the internal flow of covariance and the order in which separate validation of x and y, shape compatibility checks, optional weight validation, and final covariance calculation are applied. Unlike Overview, the goal here is not to repeat the parameters, but to show how the function actually builds the result.

The following snippets are taken from the current src/mespy/stats_utils.py implementation. Private helpers are mentioned only to clarify the flow; complete details are documented in _as_float_vector and _validate_weights.

Execution sequence

The implementation follows this sequence:

  1. Converts x and y into one-dimensional, finite float64 vectors.

  2. Checks that the two vectors have the same shape.

  3. If w is provided, it validates it against the shape of x.

  4. If w is None, it computes E[xy], E[x], and E[y] as simple means.

  5. If w is provided, it computes the same quantities as weighted means using the same weight vector.

  6. Returns the difference mean_xy - mean_x * mean_y.

Input validation and identity used

def covariance(x: ArrayLike, y: ArrayLike, w: ArrayLike | None = None) -> float:
    x_values = _as_float_vector("x", x)
    y_values = _as_float_vector("y", y)

    if x_values.shape != y_values.shape:
        raise ValueError("x e y devono avere la stessa lunghezza")

    weights = _validate_weights(x_values, w)
    if weights is None:
        mean_xy = float(np.mean(x_values * y_values))
        mean_x = float(np.mean(x_values))
        mean_y = float(np.mean(y_values))
    else:
        w_sum = float(np.sum(weights))
        mean_xy = float(np.sum(weights * x_values * y_values) / w_sum)
        mean_x = float(np.sum(weights * x_values) / w_sum)
        mean_y = float(np.sum(weights * y_values) / w_sum)

    return float(mean_xy - mean_x * mean_y)

The first part of the flow is entirely dedicated to putting x and y on the same numeric footing.

  • The two inputs are validated separately, so both must be one-dimensional, non-empty, and composed of finite values.

  • Only after this normalization does the function check that x_values.shape == y_values.shape.

  • If the vectors are not compatible, the calculation does not even start.

Implemented formula

La funzione usa l’identita

\[ \mathrm{Cov}(x, y) = E[xy] - E[x]E[y]. \]

Nel caso non pesato questo significa

\[ E[xy] = \frac{1}{n}\sum_i x_i y_i, \qquad E[x] = \frac{1}{n}\sum_i x_i, \qquad E[y] = \frac{1}{n}\sum_i y_i. \]

Nel caso pesato le tre medie vengono sostituite da

\[ E_w[xy] = \frac{\sum_i w_i x_i y_i}{\sum_i w_i}, \qquad E_w[x] = \frac{\sum_i w_i x_i}{\sum_i w_i}, \qquad E_w[y] = \frac{\sum_i w_i y_i}{\sum_i w_i}. \]

Il valore finale resta in entrambi i casi

\[ \mathrm{Cov}(x, y) = \mathrm{mean\_xy} - \mathrm{mean\_x}\,\mathrm{mean\_y}. \]

Important interactions and explicit limits

Some implementation choices are worth making explicit.

  • La funzione non introduce un parametro ddof: implementa solo la definizione basata su E[xy] - E[x]E[y].

  • Gli stessi pesi vengono usati per tutte le medie del ramo pesato; non esistono pesi distinti per x, y o xy.

  • Se w is None, il flusso non passa da weighted_mean(...): le medie sono scritte direttamente con np.mean(...).

  • Se w e presente, il codice non lo normalizza preventivamente; usa la normalizzazione implicita tramite divisione per sum(w).

  • Errori su valori non finiti o pesi non positivi vengono intercettati prima dell’ultima formula, cosi la funzione non restituisce mai covarianze nan in silenzio.

Commented example

from mespy import covariance

x = [1.0, 2.0, 3.0]
y = [2.0, 4.0, 6.0]
w = [1.0, 1.0, 2.0]

print(covariance(x, y))      # medie semplici
print(covariance(x, y, w))   # medie pesate

Le due chiamate condividono la stessa identita matematica, ma cambiano il modo in cui vengono costruite mean_xy, mean_x e mean_y.

  • Senza pesi ogni punto contribuisce allo stesso modo.

  • Con w, i contributi passano tutti attraverso la stessa normalizzazione pesata.