Overview

Purpose

Load a CSV file into a pandas DataFrame with a restricted set of options and explicit checks on required columns and missing values.

Parameters

path: path of the file to read.
sep: column delimiter passed to pandas.read_csv.
decimal: decimal separator, useful for files that use the Italian format.
rename_columns: {source_name: destination_name} mapping applied after loading.
required_columns: list of columns that must exist after the optional rename step.
missing: policy for NaN values. It can be "error", "drop", or "allow".
comment: optional comment character for read_csv.
skip_initial_space: if true, ignores spaces immediately after the delimiter.

Returns

A pd.DataFrame with the loaded data and, if requested, already renamed or filtered.

Errors and exceptions

ValueError if missing is not a supported policy.
ValueError if one or more required columns are missing.
ValueError if the file contains missing values and missing="error".

Example

from mespy import load_csv

df = load_csv(
    "data/reference/test_misure.csv",
    rename_columns={"misura_n": "n", "lunghezza_mm": "lunghezza", "sigma_mm": "sigma"},
    required_columns=["n", "lunghezza", "sigma"],
    missing="drop",
)

Practical notes

The required_columns check happens after rename_columns.
missing="drop" removes incomplete rows with DataFrame.dropna().
The function does not directly convert the DataFrame into numeric arrays: that step remains the responsibility of the statistics and plotting functions.