Overview

Purpose

Load a CSV file into a pandas DataFrame with a restricted set of options and explicit checks on required columns and missing values.

Parameters

  • path: path of the file to read.

  • sep: column delimiter passed to pandas.read_csv.

  • decimal: decimal separator, useful for files that use the Italian format.

  • rename_columns: {source_name: destination_name} mapping applied after loading.

  • required_columns: list of columns that must exist after the optional rename step.

  • missing: policy for NaN values. It can be "error", "drop", or "allow".

  • comment: optional comment character for read_csv.

  • skip_initial_space: if true, ignores spaces immediately after the delimiter.

Returns

A pd.DataFrame with the loaded data and, if requested, already renamed or filtered.

Errors and exceptions

  • ValueError if missing is not a supported policy.

  • ValueError if one or more required columns are missing.

  • ValueError if the file contains missing values and missing="error".

Example

from mespy import load_csv

df = load_csv(
    "data/reference/test_misure.csv",
    rename_columns={"misura_n": "n", "lunghezza_mm": "lunghezza", "sigma_mm": "sigma"},
    required_columns=["n", "lunghezza", "sigma"],
    missing="drop",
)

Practical notes

  • The required_columns check happens after rename_columns.

  • missing="drop" removes incomplete rows with DataFrame.dropna().

  • The function does not directly convert the DataFrame into numeric arrays: that step remains the responsibility of the statistics and plotting functions.