API Reference¶

This page is generated directly from the docstrings in the pev_synth source via mkdocstrings. It is the authoritative reference for the public surface — the names exported in pev_synth.__all__.

ev-flow on PyPI, pev_synth in Python

Install with pip install ev-flow; import with import pev_synth. Every symbol below is reachable as pev_synth.<name> (for example pev_synth.generate_profiles or pev_synth.Fleet).

Package overview¶

pev_synth — Synthetic EV charging dataset pipeline + library API.

Library API (v2.0)

The public surface for downstream code is:

>>> import pev_synth as ps
>>> ps.list_regions()
['bay_area', 'boston', 'chicago', 'dallas_fort_worth', 'la_basin',
 'new_york_metro', 'seattle', 'us_national']
>>> ps.list_profile_types()
['residential', 'workplace']
>>> fleet = ps.generate_profiles('residential', n=1000,
...                              region='bay_area')
>>> prof = fleet[0]
>>> pa = prof.generate_presence_absence('2001-01-01', '2001-01-08')

See src/pev_synth/api.py and docs/pev_synth_api.md for details.

Pipeline modules (M1..M9, methodology v2.0.0, master seed 20260520)

nhts_loader — NHTS 2017 public-use file loader.
vehicle_archetypes — N-EV archetype sampler (M2).
donor_matcher — NHTS donor-vehicle matcher (M3).
travel_week_builder — one-year travel sequence builder (M4).
plug_in_model — session plug-in / dwell sampler (M5).
soc_trajectory — continuous-time SoC ledger + sessions (M6).
hourly_resampler — 15-min + hourly plug-status rasteriser (M7).
validation_bounds_curator — bound curation (M8).
validator — §10 validation runner + report writer (M9).
regions — 8-region registry (Region dataclass).
_utc_migration — package-internal v1.1 → v2.0 UTC cache migrator (leading underscore = not part of the public API; used only by in-house callers with a v1.1 cache to upgrade).

The library API wraps the artifacts these modules produce.

Generating fleets¶

The primary entry point is generate_profiles, which draws a reproducible subset from the local cache and returns a Fleet.

generate_profiles ¶

generate_profiles(
    profile_type: ProfileType | str = "residential",
    n: int = 1000,
    region: Region | str = "bay_area",
    seed: int = _DEFAULT_SEED,
    data_root: Path | None = None,
    *,
    replicate_id: int = 0,
    r_total: int = 1,
) -> Fleet

Return a :class:Fleet of n EVs of profile_type for region.

Parameters:

Name	Type	Description	Default
`profile_type`	`ProfileType \| str`	`'residential'` or `'workplace'`. The legacy `'fleet_depot'` from v1.1 is de-scoped in v2.0 (plan §2.6) and raises `ValueError`.	`'residential'`
`n`	`int`	Number of EVs to return. Must be a positive integer not exceeding the cached fleet size for the requested `(region, profile_type)`.	`1000`
`region`	`Region \| str`	Either a :class:`Region` instance or its short-name string (e.g. `'bay_area'`, `'dallas_fort_worth'`). The region drives (i) the cache path `data/pev/processed/<region.name>/<profile_type>_ev_synth/` and (ii) the default local tz for tz-naive `t_start` / `t_stop` inputs on every Fleet / Profile time-window method.	`'bay_area'`
`seed`	`int`	Master seed for the random subset selection (default `20260520` — the v2.0 master seed).	`_DEFAULT_SEED`
`data_root`	`Path \| None`	Optional override of the per-region cache directory. Useful for tests against ad-hoc bundles. Defaults to the canonical layout.	`None`

Raises:

Type	Description
`ValueError`	`profile_type='fleet_depot'` (de-scoped); unknown profile types; `n` not positive; `n` exceeds the cached fleet size.
`FileNotFoundError`	The `(region, profile_type)` cache does not exist on disk yet. No fleet caches ship in the wheel or in a fresh checkout; build one with a one-time `python -m pev_synth.nhts_loader` (NHTS 2017 download + processing) followed by `python -m pev_synth.cache_regen one --region <r> --profile-type <t>`. The error message points at that CLI.

Source code in pev_synth/api.py

def generate_profiles(
    profile_type: ProfileType | str = "residential",
    n: int = 1000,
    region: Region | str = "bay_area",
    seed: int = _DEFAULT_SEED,
    data_root: Path | None = None,
    *,
    replicate_id: int = 0,
    r_total: int = 1,
) -> Fleet:
    """Return a :class:`Fleet` of ``n`` EVs of ``profile_type`` for ``region``.

    Parameters
    ----------
    profile_type:
        ``'residential'`` or ``'workplace'``. The legacy ``'fleet_depot'``
        from v1.1 is de-scoped in v2.0 (plan §2.6) and raises
        ``ValueError``.
    n:
        Number of EVs to return. Must be a positive integer not exceeding
        the cached fleet size for the requested ``(region, profile_type)``.
    region:
        Either a :class:`Region` instance or its short-name string (e.g.
        ``'bay_area'``, ``'dallas_fort_worth'``). The region drives
        (i) the cache path
        ``data/pev/processed/<region.name>/<profile_type>_ev_synth/`` and
        (ii) the default local tz for tz-naive ``t_start`` / ``t_stop``
        inputs on every Fleet / Profile time-window method.
    seed:
        Master seed for the random subset selection (default
        ``20260520`` — the v2.0 master seed).
    data_root:
        Optional override of the per-region cache directory. Useful for
        tests against ad-hoc bundles. Defaults to the canonical layout.

    Raises
    ------
    ValueError
        ``profile_type='fleet_depot'`` (de-scoped); unknown profile types;
        ``n`` not positive; ``n`` exceeds the cached fleet size.
    FileNotFoundError
        The ``(region, profile_type)`` cache does not exist on disk yet.
        No fleet caches ship in the wheel or in a fresh checkout; build
        one with a one-time ``python -m pev_synth.nhts_loader`` (NHTS 2017
        download + processing) followed by ``python -m
        pev_synth.cache_regen one --region <r> --profile-type <t>``. The
        error message points at that CLI.
    """
    _validate_profile_type(profile_type)

    if n is None or n < 1:
        raise ValueError(f"n must be a positive integer; got {n!r}")

    region_obj = _resolve_region(region)

    if data_root is not None:
        root = Path(data_root)
    else:
        # RFC-022.r: r_total > 1 resolves through replicate_dir so the
        # layout matches what cache_regen wrote.
        root = Fleet.cache_path(
            region_obj,
            profile_type,
            replicate_id=replicate_id,
            r_total=r_total,
        )
    fleet_pq = root / "fleet.parquet"
    if not fleet_pq.exists():
        raise FileNotFoundError(
            f"No cached fleet at {fleet_pq}. The cache for "
            f"region={region_obj.name!r}, profile_type={profile_type!r} "
            "has not been generated yet. No fleet caches ship in the wheel "
            "or in a fresh checkout; build one in two steps: (1) one-time "
            "`python -m pev_synth.nhts_loader` to fetch and process the "
            "NHTS 2017 microdata, then (2) "
            f"`python -m pev_synth.cache_regen one --region {region_obj.name} "
            f"--profile-type {profile_type}` (add `--n <N>` to size the "
            "fleet). Alternatively, if you pip-installed ev-flow and keep a "
            "prebuilt data tree elsewhere, set the PEV_SYNTH_DATA_ROOT "
            "environment variable to that directory (the default lookup "
            "path is <repo_root>/data, which only exists in a dev checkout)."
        )
    cached = pd.read_parquet(fleet_pq, columns=["ev_id"])
    cache_size = len(cached)
    if n > cache_size:
        raise ValueError(
            f"n={n} exceeds the cached fleet size ({cache_size}) for "
            f"(region={region_obj.name!r}, "
            f"profile_type={profile_type!r}). "
            "Use pev_synth.regenerate_fleet(...) to rebuild a bigger "
            "fleet (heavy pipeline; v2.0 stub)."
        )

    rng = np.random.default_rng(seed)
    all_ids = cached["ev_id"].astype(int).to_numpy()
    if n == cache_size:
        chosen = all_ids.tolist()
    else:
        chosen = sorted(
            int(x) for x in rng.choice(all_ids, size=n, replace=False)
        )

    meta_file = root / "meta.json"
    meta: dict[str, Any] = {}
    if meta_file.exists():
        try:
            with meta_file.open(encoding="utf-8") as f:
                meta = json.load(f)
        except json.JSONDecodeError:
            meta = {}

    return Fleet(
        data_root=root,
        ev_ids=chosen,
        profile_type=profile_type,
        region=region_obj,
        meta=meta,
    )

regenerate_fleet ¶

regenerate_fleet(
    profile_type: ProfileType | str,
    n: int,
    region: Region | str = "bay_area",
    seed: int = _DEFAULT_SEED,
    data_root: Path | None = None,
) -> Fleet

Run the heavy M1-M7 pipeline to produce a fresh n-EV bundle.

Stub: not implemented at the api level. The full pipeline runs via python -m pev_synth.cache_regen one.

Source code in pev_synth/api.py

def regenerate_fleet(
    profile_type: ProfileType | str,
    n: int,
    region: Region | str = "bay_area",
    seed: int = _DEFAULT_SEED,
    data_root: Path | None = None,
) -> Fleet:
    """Run the heavy M1-M7 pipeline to produce a fresh ``n``-EV bundle.

    Stub: not implemented at the api level. The full pipeline runs via
    ``python -m pev_synth.cache_regen one``.
    """
    _validate_profile_type(profile_type)
    _ = (n, region, seed, data_root)  # acknowledged, unused
    raise NotImplementedError(
        "regenerate_fleet is a stub. Run "
        "`python -m pev_synth.cache_regen one --region <name> "
        "--profile-type <type> --n <N>` to rebuild a cache (after a "
        "one-time `python -m pev_synth.nhts_loader`), or use "
        "generate_profiles(...) with n <= cache_size for the already-"
        "generated cache."
    )

Fleet¶

A Fleet is a lazy collection of Profile objects backed by parquet files. Only the static fleet.parquet is read at construction time; larger artifacts are read on demand. Fleet is iterable, indexable (fleet[i] / fleet['ev_0042']), and filterable, and exposes wide, per-EV-column versions of the time-window queries plus a fleet-level aggregate_load.

Fleet ¶

Fleet(
    data_root: Path,
    ev_ids: list[int],
    profile_type: str,
    region: Region,
    meta: dict[str, Any] | None = None,
)

A collection of :class:Profile instances backed by parquet files.

Fleet is lazy: only fleet.parquet is read at construction time; bigger artifacts (plug_status, sessions, mobility) are read on demand via pyarrow predicate-pushdown by ev_id.

Attributes:

Name	Type	Description
`region`	`Region`	The :class:`Region` this Fleet was generated for. The region's `tz` is the default local zone for tz-naive user input timestamps. Cache timestamps are always stored in UTC.
`profile_type`	`str`	`'residential'` or `'workplace'`.

Source code in pev_synth/api.py

def __init__(
    self,
    data_root: Path,
    ev_ids: list[int],
    profile_type: str,
    region: Region,
    meta: dict[str, Any] | None = None,
) -> None:
    self._data_root = Path(data_root)
    self._ev_ids: list[int] = sorted(int(e) for e in ev_ids)
    self._profile_type: str = str(profile_type)
    self._region: Region = region
    self._meta: dict[str, Any] = dict(meta or {})

    fleet_pq = self._data_root / "fleet.parquet"
    if not fleet_pq.exists():
        raise FileNotFoundError(
            f"fleet.parquet not found at {fleet_pq}. Did you point "
            "data_root at a valid processed bundle?"
        )
    full = pd.read_parquet(fleet_pq)
    # RFC-019: assert uniqueness once; ``.loc[self._ev_ids]`` silently
    # *expands* the frame when ev_id is duplicated, masking a corrupt
    # parquet as "merely large".
    if full["ev_id"].duplicated().any():
        dup_ct = int(full["ev_id"].duplicated().sum())
        raise ValueError(
            f"fleet.parquet at {fleet_pq} has {dup_ct} duplicated ev_id "
            "rows; the primary key must be unique."
        )
    sub = full.loc[full["ev_id"].isin(self._ev_ids)].copy()
    sub = (
        sub.set_index("ev_id", drop=False)
        .loc[self._ev_ids]
        .reset_index(drop=True)
    )
    self._fleet_df: pd.DataFrame = sub
    # Position lookup for __getitem__ by int and by ev_id string.
    self._pos_by_evid: dict[int, int] = {
        int(ev): i
        for i, ev in enumerate(self._fleet_df["ev_id"].tolist())
    }
    # L3: derive the ev_id zero-pad width from the largest ev_id in
    # the Fleet rather than hard-coding ``:04d``. Floor at 4 to
    # preserve the v1.1 ``ev_0042`` aesthetic on small fleets.
    max_id = max(self._ev_ids) if self._ev_ids else 0
    self._ev_id_pad: int = max(4, len(str(max_id)))

    # H6: surface the workplace-cohort caveat (105-vehicle EVWatts
    # cohort, plug-in median ~12:00 LT vs literature-canonical
    # ~09:00) at construction time so a downstream user cannot
    # silently mis-interpret the v2.0 workplace pipeline.
    if self._profile_type == "workplace":
        warnings.warn(
            "Workplace fleets in v2.0 are fit from the 105-vehicle public "
            "EVWatts cohort; plug-in median ~12:00 LT is ~3h later than "
            "the literature-canonical workplace median of ~09:00. "
            "Validator W1-W4 flag this divergence as EXPLAINED_FAIL. "
            "See pev_synth_api.md 'Workplace caveat' for context.",
            RuntimeWarning,
            stacklevel=2,
        )

meta `property` ¶

meta: dict[str, Any]

Read-only view of the bundle's meta.json (best effort).

by_ev_id ¶

by_ev_id(ev_id: int) -> Profile

Return the Profile for the given ev_id.

Raises KeyError if ev_id is not in this Fleet. Use this when you want unambiguous by-id lookup; fleet[ev_id] works the same way only when ev_id is present.

Source code in pev_synth/api.py

def by_ev_id(self, ev_id: int) -> Profile:
    """Return the Profile for the given ev_id.

    Raises ``KeyError`` if ``ev_id`` is not in this Fleet. Use this
    when you want unambiguous by-id lookup; ``fleet[ev_id]`` works the
    same way only when ``ev_id`` is present.
    """
    if isinstance(ev_id, bool):
        raise TypeError(
            f"ev_id must be int, not bool (got {ev_id!r})"
        )
    if not isinstance(ev_id, (int, np.integer)):
        raise TypeError(
            f"ev_id must be int, got {type(ev_id).__name__}"
        )
    ev = int(ev_id)
    if ev not in self._pos_by_evid:
        raise KeyError(f"ev_id {ev} not in this Fleet")
    return Profile(self, ev)

by_position ¶

by_position(i: int) -> Profile

Return the i-th Profile in this Fleet (0-indexed positional).

Raises IndexError if i is out of range. Use this when you want unambiguous positional access; fleet[i] works the same way only when i is NOT also a valid ev_id in this Fleet.

Source code in pev_synth/api.py

def by_position(self, i: int) -> Profile:
    """Return the i-th Profile in this Fleet (0-indexed positional).

    Raises ``IndexError`` if ``i`` is out of range. Use this when you
    want unambiguous positional access; ``fleet[i]`` works the same way
    only when ``i`` is NOT also a valid ev_id in this Fleet.
    """
    if isinstance(i, bool):
        raise TypeError(
            f"position must be int, not bool (got {i!r})"
        )
    if not isinstance(i, (int, np.integer)):
        raise TypeError(
            f"position must be int, got {type(i).__name__}"
        )
    pos = int(i)
    if not (0 <= pos < len(self._ev_ids)):
        raise IndexError(
            f"position out of range: {pos} (Fleet has {len(self._ev_ids)} EVs)"
        )
    return Profile(self, self._ev_ids[pos])

cache_path `staticmethod` ¶

cache_path(
    region: Region | str,
    profile_type: str,
    *,
    replicate_id: int = 0,
    r_total: int = 1,
) -> Path

Resolve the canonical cache directory for one replicate.

RFC-022.r: when r_total > 1 the returned path is <processed_root>/<region>/<profile_type>_ev_synth/replicates/r{replicate_id}/. r_total == 1 stays flat (alias r=0) so legacy single-cache layouts continue to resolve.

Source code in pev_synth/api.py

@staticmethod
def cache_path(
    region: Region | str,
    profile_type: str,
    *,
    replicate_id: int = 0,
    r_total: int = 1,
) -> Path:
    """Resolve the canonical cache directory for one replicate.

    RFC-022.r: when ``r_total > 1`` the returned path is
    ``<processed_root>/<region>/<profile_type>_ev_synth/replicates/r{replicate_id}/``.
    ``r_total == 1`` stays flat (alias ``r=0``) so legacy single-cache
    layouts continue to resolve.
    """
    region_obj = _resolve_region(region)
    return _replicate_dir(
        region_obj.name, str(profile_type), int(replicate_id), int(r_total)
    )

profile_ids ¶

profile_ids() -> list[int]

List of ev_id ints in this Fleet.

Source code in pev_synth/api.py

def profile_ids(self) -> list[int]:
    """List of ``ev_id`` ints in this Fleet."""
    return list(self._ev_ids)

ev_ids ¶

ev_ids() -> list[int]

Public alias of :meth:profile_ids for :class:FleetReader.

Source code in pev_synth/api.py

def ev_ids(self) -> list[int]:
    """Public alias of :meth:`profile_ids` for :class:`FleetReader`."""
    return list(self._ev_ids)

has_ev ¶

has_ev(ev_id: int) -> bool

Whether ev_id is present in this Fleet (public position lookup).

Source code in pev_synth/api.py

def has_ev(self, ev_id: int) -> bool:
    """Whether ``ev_id`` is present in this Fleet (public position lookup)."""
    return int(ev_id) in self._pos_by_evid

row_for ¶

row_for(ev_id: int) -> _ProfileRow

Public alias of the per-EV static row look-up.

Source code in pev_synth/api.py

def row_for(self, ev_id: int) -> _ProfileRow:
    """Public alias of the per-EV static row look-up."""
    return self._row_for(int(ev_id))

charging_sessions_for ¶

charging_sessions_for(
    ev_ids: list[int],
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    tz: str | None = None,
) -> DataFrame

Public alias of the per-EV-list session query.

Source code in pev_synth/api.py

def charging_sessions_for(
    self,
    ev_ids: list[int],
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    tz: str | None = None,
) -> pd.DataFrame:
    """Public alias of the per-EV-list session query."""
    return self._charging_sessions(list(ev_ids), t_start, t_stop, tz=tz)

presence_absence_one ¶

presence_absence_one(
    ev_id: int,
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> Series

Public alias of :meth:_presence_absence_one.

Source code in pev_synth/api.py

def presence_absence_one(
    self,
    ev_id: int,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.Series:
    """Public alias of :meth:`_presence_absence_one`."""
    return self._presence_absence_one(int(ev_id), t_start, t_stop, freq, tz=tz)

read_parquet ¶

read_parquet(
    name: str,
    columns: list[str] | None = None,
    ev_ids: list[int] | None = None,
) -> DataFrame

Public alias of the per-EV-id parquet reader.

Source code in pev_synth/api.py

def read_parquet(
    self,
    name: str,
    columns: list[str] | None = None,
    ev_ids: list[int] | None = None,
) -> pd.DataFrame:
    """Public alias of the per-EV-id parquet reader."""
    return self._read_parquet(name, columns=columns, ev_ids=ev_ids)

summary ¶

summary() -> DataFrame

One row per EV. Columns include the brief's requested set.

Source code in pev_synth/api.py

def summary(self) -> pd.DataFrame:
    """One row per EV. Columns include the brief's requested set."""
    df = self._fleet_df.copy()
    # Compute the derived columns.
    df["max_charge_kw"] = np.minimum(
        df["OBC_kw"], df["EVSE_home_kw"]
    ).astype(float)
    # Per-EV mileage & session counts (from sessions + mobility).
    sess = self._read_parquet(
        "sessions.parquet",
        columns=["ev_id", "t_in"],
        ev_ids=self._ev_ids,
    )
    sess_count = sess.groupby("ev_id").size().rename("n_sessions_yr")
    # v2.0 caches always include the ``plug_in`` column; the v1.1
    # fallback was removed per the fail-loud policy.
    plug_events = self._read_parquet(
        "plug_in_events.parquet",
        columns=["ev_id", "plug_in"],
        ev_ids=self._ev_ids,
    )
    plug_rate = (
        plug_events.groupby("ev_id")["plug_in"]
        .mean()
        .rename("plug_in_rate_yr")
    )
    miles = self._read_parquet(
        "mobility.parquet",
        columns=["ev_id", "miles"],
        ev_ids=self._ev_ids,
    )
    miles_yr = miles.groupby("ev_id")["miles"].sum().rename("miles_yr")

    out = pd.DataFrame(
        {
            "ev_id": df["ev_id"].astype(int).values,
            "battery_kwh": df["B_kwh"].astype(float).values,
            "max_charge_kw": df["max_charge_kw"].astype(float).values,
            "eta_charge": df["eta_charge_nominal"].astype(float).values,
            "archetype": df["archetype"].astype(str).values,
            "cluster_z": df["cluster_z"].astype(int).values,
            # v2.1 EVSE enrichment -- cast to str for parity with the
            # ``archetype`` column above (Fleet.summary semantics).
            "evse_brand": df["EVSE_brand"].astype(str).values,
            "evse_connector": df["EVSE_connector"].astype(str).values,
        }
    )
    out = out.set_index("ev_id", drop=False)
    out["plug_in_rate_yr"] = (
        plug_rate.reindex(out.index).astype(float).fillna(0.0).values
    )
    out["n_sessions_yr"] = (
        sess_count.reindex(out.index)
        .astype("Int64")
        .fillna(0)
        .astype(int)
        .values
    )
    out["miles_yr"] = (
        miles_yr.reindex(out.index).astype(float).fillna(0.0).values
    )
    return out.reset_index(drop=True)

filter ¶

filter(**predicates: Any) -> Fleet

Return a new Fleet containing EVs matching the predicates.

Supported keys

archetype: equality (string).
powertrain: equality (string, e.g. 'BEV').
cluster_z: equality (int).
evse_brand: equality on the v2.1 EVSE_brand column (e.g. 'Tesla', 'ChargePoint').
evse_connector: equality on the v2.1 EVSE_connector column (e.g. 'J1772', 'NACS').
battery_kwh_gte / battery_kwh_lt / battery_kwh_gt / battery_kwh_lte: numeric thresholds on B_kwh.
max_charge_kw_gte / max_charge_kw_lt / ...

Multiple predicates are AND-combined. Unknown keys raise KeyError with a list of supported keys.

Source code in pev_synth/api.py

def filter(self, **predicates: Any) -> Fleet:
    """Return a new ``Fleet`` containing EVs matching the predicates.

    Supported keys
    --------------
    * ``archetype``: equality (string).
    * ``powertrain``: equality (string, e.g. ``'BEV'``).
    * ``cluster_z``: equality (int).
    * ``evse_brand``: equality on the v2.1 ``EVSE_brand`` column
      (e.g. ``'Tesla'``, ``'ChargePoint'``).
    * ``evse_connector``: equality on the v2.1 ``EVSE_connector``
      column (e.g. ``'J1772'``, ``'NACS'``).
    * ``battery_kwh_gte`` / ``battery_kwh_lt`` / ``battery_kwh_gt`` /
      ``battery_kwh_lte``: numeric thresholds on ``B_kwh``.
    * ``max_charge_kw_gte`` / ``max_charge_kw_lt`` / ...

    Multiple predicates are AND-combined. Unknown keys raise
    ``KeyError`` with a list of supported keys.
    """
    df = self._fleet_df.copy()
    df["max_charge_kw"] = np.minimum(
        df["OBC_kw"], df["EVSE_home_kw"]
    ).astype(float)

    supported: set[str] = {
        "archetype",
        "powertrain",
        "cluster_z",
        # v2.1 — EVSE enrichment equality predicates.
        "evse_brand",
        "evse_connector",
        "battery_kwh_gte",
        "battery_kwh_gt",
        "battery_kwh_lte",
        "battery_kwh_lt",
        "max_charge_kw_gte",
        "max_charge_kw_gt",
        "max_charge_kw_lte",
        "max_charge_kw_lt",
    }
    for key in predicates:
        if key not in supported:
            raise KeyError(
                f"unsupported filter key {key!r}. "
                f"Supported: {sorted(supported)}"
            )

    mask = pd.Series(True, index=df.index)
    if "archetype" in predicates:
        mask &= df["archetype"].astype(str) == str(predicates["archetype"])
    if "powertrain" in predicates:
        mask &= df["powertrain"].astype(str) == str(
            predicates["powertrain"]
        )
    if "cluster_z" in predicates:
        mask &= df["cluster_z"].astype(int) == int(
            predicates["cluster_z"]
        )
    if "evse_brand" in predicates:
        mask &= df["EVSE_brand"].astype(str) == str(predicates["evse_brand"])
    if "evse_connector" in predicates:
        mask &= df["EVSE_connector"].astype(str) == str(
            predicates["evse_connector"]
        )

    for col, src in (
        ("battery_kwh", "B_kwh"),
        ("max_charge_kw", "max_charge_kw"),
    ):
        v = df[src].astype(float)
        if f"{col}_gte" in predicates:
            mask &= v >= float(predicates[f"{col}_gte"])
        if f"{col}_gt" in predicates:
            mask &= v > float(predicates[f"{col}_gt"])
        if f"{col}_lte" in predicates:
            mask &= v <= float(predicates[f"{col}_lte"])
        if f"{col}_lt" in predicates:
            mask &= v < float(predicates[f"{col}_lt"])

    kept = df.loc[mask, "ev_id"].astype(int).tolist()
    return Fleet(
        data_root=self._data_root,
        ev_ids=kept,
        profile_type=self._profile_type,
        region=self._region,
        meta=self._meta,
    )

generate_presence_absence ¶

generate_presence_absence(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> DataFrame

Bool wide matrix: index=timestamps, columns=ev_id.

See :meth:Profile.generate_presence_absence for the time-window and year-remap rules. tz=None (default) yields a UTC index; any IANA zone yields the same data on a local-wall-clock index.

Source code in pev_synth/api.py

def generate_presence_absence(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.DataFrame:
    """Bool wide matrix: index=timestamps, columns=ev_id.

    See :meth:`Profile.generate_presence_absence` for the time-window
    and year-remap rules. ``tz=None`` (default) yields a UTC index;
    any IANA zone yields the same data on a local-wall-clock index.
    """
    freq_n = _normalize_freq(freq)
    ts0, ts1 = _normalize_window(t_start, t_stop, self._region.tz)

    # Choose the right rasterisation.
    if freq_n == "15min":
        parquet = "plug_status_15min.parquet"
    else:
        parquet = "plug_status_hourly.parquet"

    df = self._read_parquet(
        parquet,
        columns=["ev_id", "ts", "plugged"],
        ev_ids=self._ev_ids,
    )
    # Time window (half-open).
    df = df.loc[(df["ts"] >= ts0) & (df["ts"] < ts1)]
    target_idx = self._target_index(ts0, ts1, freq_n)
    if df.empty:
        wide = pd.DataFrame(
            False,
            index=target_idx,
            columns=[int(e) for e in self._ev_ids],
        )
    else:
        wide = df.pivot(
            index="ts", columns="ev_id", values="plugged"
        ).astype(bool)
        wide = wide.reindex(
            index=target_idx, columns=self._ev_ids, fill_value=False
        )
    wide.index.name = "ts"
    wide.columns.name = "ev_id"
    return _apply_query_tz(wide, tz)

plug_status ¶

plug_status(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> DataFrame

Alias of :meth:generate_presence_absence.

Source code in pev_synth/api.py

def plug_status(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.DataFrame:
    """Alias of :meth:`generate_presence_absence`."""
    return self.generate_presence_absence(t_start, t_stop, freq, tz=tz)

charging_sessions ¶

charging_sessions(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    tz: str | None = None,
) -> DataFrame

Sessions whose [t_in, t_out) intersects the window.

Timestamp columns (t_in, t_out) are returned in tz (UTC by default).

Source code in pev_synth/api.py

def charging_sessions(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    tz: str | None = None,
) -> pd.DataFrame:
    """Sessions whose ``[t_in, t_out)`` intersects the window.

    Timestamp columns (``t_in``, ``t_out``) are returned in ``tz``
    (UTC by default).
    """
    return self._charging_sessions(self._ev_ids, t_start, t_stop, tz=tz)

aggregate_load ¶

aggregate_load(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "1h",
    tz: str | None = None,
) -> Series

Fleet-aggregate charging power (kW).

Charge-asap baseline: each session draws constant max_kw from t_in until enough wall-energy has flowed to deliver energy_kwh at efficiency η, then idles. tz=None returns the UTC index; any IANA zone returns the equivalent local-wall-clock index.

Source code in pev_synth/api.py

def aggregate_load(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "1h",
    tz: str | None = None,
) -> pd.Series:
    """Fleet-aggregate charging power (kW).

    Charge-asap baseline: each session draws constant ``max_kw`` from
    ``t_in`` until enough wall-energy has flowed to deliver
    ``energy_kwh`` at efficiency η, then idles. ``tz=None`` returns the
    UTC index; any IANA zone returns the equivalent local-wall-clock
    index.
    """
    freq_n = _normalize_freq(freq)
    ts0, ts1 = _normalize_window(t_start, t_stop, self._region.tz)
    idx = self._target_index(ts0, ts1, freq_n)
    dt_h = 0.25 if freq_n == "15min" else 1.0

    sess = self._charging_sessions(self._ev_ids, ts0, ts1)
    if sess.empty:
        s = pd.Series(
            0.0, index=idx, dtype=float, name="aggregate_load_kw"
        )
        return _apply_query_tz(s, tz)

    max_kw = sess["max_kw"].astype(float).values
    # Avoid divide-by-zero (some sessions may have max_kw==0 in
    # pathological data).
    safe_kw = np.where(max_kw > 0.0, max_kw, np.inf)
    # RFC-004: compute duration in integer ns via the exact
    # ``round(energy / kw * 3600)`` → seconds, then ×1e9. Float ×
    # 3.6e12 lost sub-µs precision and could flip a 15-min bucket
    # boundary on multi-hour sessions.
    energy_kwh = sess["energy_kwh"].astype(float).values
    dur_s = np.round(energy_kwh / safe_kw * 3600.0).astype("int64")
    dur_ns = dur_s * 1_000_000_000
    # RFC-005: tz-aware ``.astype("int64")`` emits FutureWarning in
    # pandas 2.x and raises in 3.x. Convert to UTC, drop the tz, then
    # cast to int64 ns since epoch.
    t_in_ns = (
        sess["t_in"]
        .dt.tz_convert("UTC")
        .dt.tz_localize(None)
        .astype("datetime64[ns]")
        .astype("int64")
        .values
    )
    t_end_ns = t_in_ns + dur_ns

    bucket_edges_ns = np.array([t.value for t in idx], dtype="int64")
    step_ns = int(dt_h * 3.6e12)
    n_buckets = len(idx)

    ts0_ns = int(bucket_edges_ns[0])
    ts1_ns = int(bucket_edges_ns[-1]) + step_ns
    a = np.maximum(t_in_ns, ts0_ns)
    b = np.minimum(t_end_ns, ts1_ns)
    valid = b > a
    a = a[valid]
    b = b[valid]
    p = max_kw[valid]
    # RFC-005-style vectorisation: scatter-add session contributions
    # into the bucket grid via ``np.add.at`` instead of an O(N_sess *
    # buckets_per_sess) Python loop.
    load_kw = _aggregate_kw_from_sessions(
        a, b, p, ts0_ns, step_ns, n_buckets
    )

    s = pd.Series(
        load_kw, index=idx, dtype=float, name="aggregate_load_kw"
    )
    return _apply_query_tz(s, tz)

save ¶

save(path: str | Path) -> Path

Write a self-contained bundle under path/{profile_type}_fleet/.

Contents

fleet.parquet — the static rows for this Fleet's EV subset.
plug_status_15min.parquet — subsetted plug-status matrix.
sessions.parquet — subsetted sessions.
mobility.parquet — subsetted mobility.
plug_in_events.parquet — subsetted events (when present).
meta.json — bundle metadata: profile_type, region (name + tz), the subset's ev_ids, and any inherited meta.

Source code in pev_synth/api.py

def save(self, path: str | Path) -> Path:
    """Write a self-contained bundle under ``path/{profile_type}_fleet/``.

    Contents
    --------
    * ``fleet.parquet`` — the static rows for this Fleet's EV subset.
    * ``plug_status_15min.parquet`` — subsetted plug-status matrix.
    * ``sessions.parquet`` — subsetted sessions.
    * ``mobility.parquet`` — subsetted mobility.
    * ``plug_in_events.parquet`` — subsetted events (when present).
    * ``meta.json`` — bundle metadata: ``profile_type``, ``region``
      (name + tz), the subset's ``ev_ids``, and any inherited meta.
    """
    target = Path(path) / f"{self._profile_type}_fleet"
    target.mkdir(parents=True, exist_ok=True)

    # 1. fleet.parquet: just the subset rows.
    self._fleet_df.to_parquet(target / "fleet.parquet", index=False)

    # 2. plug_status_15min.parquet, sessions.parquet, mobility.parquet,
    #    plug_in_events.parquet, plug_status_hourly.parquet -- subset
    #    by ev_id.
    for fname in (
        "plug_status_15min.parquet",
        "plug_status_hourly.parquet",
        "sessions.parquet",
        "mobility.parquet",
        "plug_in_events.parquet",
    ):
        src = self._data_root / fname
        if not src.exists():
            continue
        df = pd.read_parquet(
            src, filters=[("ev_id", "in", self._ev_ids)]
        )
        df.to_parquet(target / fname, index=False)

    # 3. meta.json
    from . import __version__ as _api_version

    meta_out = {
        "profile_type": self._profile_type,
        "region": self._region.name,
        "region_tz": self._region.tz,
        "n": len(self._ev_ids),
        "ev_ids": self._ev_ids,
        "source_data_root": str(self._data_root),
        "inherited_meta": self._meta,
        "api_version": _api_version,
        "storage_timezone": _STORAGE_TZ,
    }
    with (target / "meta.json").open("w", encoding="utf-8") as f:
        json.dump(meta_out, f, indent=2, default=str)

    return target

load `classmethod` ¶

load(path: str | Path) -> Fleet

Inverse of :meth:save. Accepts either the inner bundle directory (.../{type}_fleet) or its parent (the latter is convenience).

Source code in pev_synth/api.py

@classmethod
def load(cls, path: str | Path) -> Fleet:
    """Inverse of :meth:`save`. Accepts either the inner bundle directory
    (``.../{type}_fleet``) or its parent (the latter is convenience).
    """
    p = Path(path)
    # If user passed the parent, look for an inner *_fleet directory.
    if not (p / "fleet.parquet").exists():
        candidates = list(p.glob("*_fleet"))
        if len(candidates) == 1:
            p = candidates[0]
        else:
            raise FileNotFoundError(
                f"No fleet.parquet at {p} and ambiguous inner bundles: "
                f"{candidates}"
            )
    meta_file = p / "meta.json"
    if not meta_file.exists():
        raise FileNotFoundError(
            f"meta.json is required to load a Fleet bundle; "
            f"expected at {meta_file}"
        )
    with meta_file.open(encoding="utf-8") as f:
        meta = json.load(f)
    profile_type = meta.get("profile_type")
    if not isinstance(profile_type, str) or not profile_type:
        raise ValueError(
            f"meta.json at {meta_file} is missing required key "
            f"'profile_type' (must be a non-empty string)"
        )
    region_name = meta.get("region")
    if not isinstance(region_name, str) or not region_name:
        raise ValueError(
            f"meta.json at {meta_file} is missing required key "
            f"'region' (must be a non-empty string)"
        )
    region = Region.from_name(region_name)
    fleet_df = pd.read_parquet(p / "fleet.parquet")
    ev_ids = fleet_df["ev_id"].astype(int).tolist()
    return cls(
        data_root=p,
        ev_ids=ev_ids,
        profile_type=profile_type,
        region=region,
        meta=meta,
    )

Profile¶

A Profile is a single synthetic EV. Its static attributes (battery, charge power, archetype, EVSE details) come from one row of the fleet table; its time-window methods (plug_status, charging_sessions, soc_trajectory, trips) read the dynamic artifacts lazily.

Profile ¶

Profile(fleet: Fleet, ev_id: int)

One synthetic EV. All data is loaded lazily from the cache.

Public attributes / properties below cover the brief's API surface. The methods consult fleet.parquet (static), plug_status_15min.parquet (presence/absence), sessions.parquet (charging sessions), mobility.parquet (trips). Filters are pushed down to pyarrow so we never load the full 35M-row table.

All time-window methods accept a tz keyword: tz=None returns the UTC storage tz; pass region.tz (or any IANA zone) for a local wall-clock index instead.

Source code in pev_synth/api.py

def __init__(self, fleet: Fleet, ev_id: int) -> None:
    self._fleet = fleet
    self._ev_id = int(ev_id)
    self._row_cache: _ProfileRow | None = None

evse_brand `property` ¶

evse_brand: str

EVSE brand string (e.g. 'Tesla', 'ChargePoint', ...).

See docs/literature_review_evse_specs_us.md for the brand universe and prior sources.

evse_connector `property` ¶

evse_connector: str

EVSE connector type (e.g. 'J1772', 'NACS', 'Tesla_NACS_native', 'NEMA_14_50', ...).

summary ¶

summary() -> Series

Per-EV summary Series (one row of the fleet table).

Source code in pev_synth/api.py

def summary(self) -> pd.Series:
    """Per-EV summary Series (one row of the fleet table)."""
    r = self._row
    return pd.Series(
        {
            "ev_id": r.ev_id,
            "profile_type": self.profile_type,
            "region": self._fleet.region.name,
            "archetype": r.archetype,
            "powertrain": r.powertrain,
            "make": r.make,
            "model": r.model,
            "model_year": r.model_year,
            "battery_kwh": r.battery_kwh,
            "max_charge_kw": r.max_charge_kw,
            "obc_kw": r.obc_kw,
            "evse_home_kw": r.evse_home_kw,
            "panel_cap_kw": r.panel_cap_kw,
            # v2.1 EVSE enrichment.
            "evse_brand": r.evse_brand,
            "evse_connector": r.evse_connector,
            "eta_charge": r.eta_charge,
            "soc_min_kwh": r.soc_min_kwh,
            "has_heat_pump": r.has_heat_pump,
            "wh_per_mi_nominal": r.wh_per_mi_nominal,
            "garage_type": r.garage_type,
            "income_tier": r.income_tier,
            "cluster_z": r.cluster_z,
            "donor_household_id": r.donor_household_id,
            "donor_vehicle_id": r.donor_vehicle_id,
        },
        name=f"ev_{r.ev_id:0{self._fleet._ev_id_pad}d}",
    )

generate_presence_absence ¶

generate_presence_absence(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> Series

Bool plug-in series over [t_start, t_stop) at freq.

Returns a tz-aware pd.Series[bool] indexed by DatetimeIndex. True means the EV is plugged-in at that timestep.

Parameters:

Name	Type	Description	Default
`t_start`	`str \| Timestamp`	Window endpoints. Tz-naive inputs are localised to `self.region.tz`. Year is remapped to the cache year (2001).	required
`t_stop`	`str \| Timestamp`	Window endpoints. Tz-naive inputs are localised to `self.region.tz`. Year is remapped to the cache year (2001).	required
`freq`	`str`	`'15min'` (default) or `'1h'`.	`'15min'`
`tz`	`str \| None`	`None` (default) → UTC storage-tz index. Any IANA zone (e.g. `self.region.tz`) → that-zone index. The cell values are identical; only the index labels change.	`None`

Source code in pev_synth/api.py

def generate_presence_absence(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.Series:
    """Bool plug-in series over ``[t_start, t_stop)`` at ``freq``.

    Returns a tz-aware ``pd.Series[bool]`` indexed by ``DatetimeIndex``.
    ``True`` means the EV is plugged-in at that timestep.

    Parameters
    ----------
    t_start, t_stop:
        Window endpoints. Tz-naive inputs are localised to
        ``self.region.tz``. Year is remapped to the cache year (2001).
    freq:
        ``'15min'`` (default) or ``'1h'``.
    tz:
        ``None`` (default) → UTC storage-tz index.
        Any IANA zone (e.g. ``self.region.tz``) → that-zone index.
        The cell values are identical; only the index labels change.
    """
    return self._fleet._presence_absence_one(
        self._ev_id, t_start, t_stop, freq, tz=tz
    )

plug_status ¶

plug_status(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> Series

Alias of :meth:generate_presence_absence.

Source code in pev_synth/api.py

def plug_status(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.Series:
    """Alias of :meth:`generate_presence_absence`."""
    return self.generate_presence_absence(t_start, t_stop, freq, tz=tz)

charging_sessions ¶

charging_sessions(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    tz: str | None = None,
) -> DataFrame

Sessions whose [t_in, t_out) intersects the window.

Source code in pev_synth/api.py

def charging_sessions(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    tz: str | None = None,
) -> pd.DataFrame:
    """Sessions whose ``[t_in, t_out)`` intersects the window."""
    return self._fleet._charging_sessions(
        [self._ev_id], t_start, t_stop, tz=tz
    )

soc_trajectory ¶

soc_trajectory(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> Series

Continuous SoC over [t_start, t_stop) in kWh.

Inside a session, SoC ramps linearly from soc_in_kwh to soc_target_out_kwh over the session window (constant-power proxy consistent with the v1 ledger). Between sessions, SoC steps linearly from the previous soc_target_out down to the next soc_in (driving discharge).

Source code in pev_synth/api.py

def soc_trajectory(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    freq: str = "15min",
    tz: str | None = None,
) -> pd.Series:
    """Continuous SoC over ``[t_start, t_stop)`` in kWh.

    Inside a session, SoC ramps linearly from ``soc_in_kwh`` to
    ``soc_target_out_kwh`` over the session window (constant-power
    proxy consistent with the v1 ledger). Between sessions, SoC steps
    linearly from the previous ``soc_target_out`` down to the next
    ``soc_in`` (driving discharge).
    """
    return self._fleet._soc_trajectory_one(
        self._ev_id, t_start, t_stop, freq, tz=tz
    )

trips ¶

trips(
    t_start: str | Timestamp,
    t_stop: str | Timestamp,
    tz: str | None = None,
) -> DataFrame

Trips from mobility.parquet overlapping the window.

Timestamp columns (start_ts, end_ts) are returned in the requested tz (UTC by default).

Source code in pev_synth/api.py

def trips(
    self,
    t_start: str | pd.Timestamp,
    t_stop: str | pd.Timestamp,
    tz: str | None = None,
) -> pd.DataFrame:
    """Trips from ``mobility.parquet`` overlapping the window.

    Timestamp columns (``start_ts``, ``end_ts``) are returned in the
    requested ``tz`` (UTC by default).
    """
    return self._fleet._trips(
        [self._ev_id], t_start, t_stop, tz=tz
    )

Regions¶

ev-flow models eight US regions. Each Region carries the climate, market and timezone context that drives the pipeline; the region's tz is the default local zone applied to timezone-naive t_start / t_stop inputs on every time-window method.

Region `dataclass` ¶

Region(
    name: str,
    display_name: str,
    states: tuple[str, ...],
    cbsas: tuple[int, ...],
    cdivmsar_filter: tuple[int, ...],
    tz: str,
    iso_market: ISOMarket,
    balancing_authority: str | None = None,
    sales_mix_source: str = "",
    sales_mix_overlay: tuple[tuple[str, float], ...] = (),
    temperature_stations: tuple[str, ...] = (),
    income_tier_scheme: str = "acs_national",
    winter_f_T_multiplier: float = 1.0,
    winter_t_c: float | None = None,
    workplace_donor_borrow_states: tuple[str, ...] = (),
    heat_pump_share_override: float | None = None,
    nhts_vintage_mix: tuple[tuple[str, float], ...] = (
        ("2017", 1.0),
    ),
    acs_calibration_enabled: bool = False,
    acs_pums_vintage: str = "2020-2024",
    notes: str = "",
)

One regional slice of the v2.0 EV synthesis pipeline.

Frozen and hashable so a Region can be used as a cache key in downstream modules (donor matcher, travel-week builder, SoC ledger). All fields are populated explicitly per plan §2.2; None is only used for iso_market on the national reference region.

Notes

Citation pointers:

CBSA codes verified against OMB 2023 list1_2023.xlsx.
Winter f_T multipliers are a linear approximation calibrated against Yuksel & Michalek (2015) winter uplift, evaluated at the Dec-Mar mean ambient temperature from NOAA ISD 1991-2020 normals (formula: 1 + 0.018 * (20 - T_C)).
DFW pickup overlay per Cox Automotive Q4-2024 / Q1-2025 Texas EV registration data.

nhts_vintage_mix_map `property` ¶

nhts_vintage_mix_map: Mapping[str, float]

Read-only Mapping view of :attr:nhts_vintage_mix.

Convenience accessor mirroring :attr:sales_mix_overlay_map. The underlying storage is a canonical tuple-of-pairs so the frozen dataclass stays hashable; this property surfaces it as an immutable MappingProxyType for ergonomic consumer access.

sales_mix_overlay_map `property` ¶

sales_mix_overlay_map: Mapping[str, float]

Read-only Mapping view of :attr:sales_mix_overlay.

Backed by :class:types.MappingProxyType — in-place mutation attempts raise TypeError.

from_name `classmethod` ¶

from_name(name: str) -> Region

Look up a region by its short name.

Parameters:

Name	Type	Description	Default
`name`	`str`	Region short name, e.g. `"bay_area"`.	required

Returns:

Type	Description
`Region`	The matching frozen `Region` instance.

Raises:

Type	Description
`ValueError`	If `name` is not a registered region. The error message lists the valid names so callers can self-correct.

Source code in pev_synth/regions.py

@classmethod
def from_name(cls, name: str) -> Region:
    """Look up a region by its short name.

    Parameters
    ----------
    name:
        Region short name, e.g. ``"bay_area"``.

    Returns
    -------
    Region
        The matching frozen ``Region`` instance.

    Raises
    ------
    ValueError
        If ``name`` is not a registered region.  The error message lists
        the valid names so callers can self-correct.
    """
    try:
        return REGIONS[name]
    except KeyError as exc:
        valid = sorted(REGIONS.keys())
        raise ValueError(
            f"Unknown region {name!r}; valid names are {valid}"
        ) from exc

list_regions ¶

list_regions() -> list[str]

Return all registered region short-names, sorted alphabetically.

Source code in pev_synth/regions.py

def list_regions() -> list[str]:
    """Return all registered region short-names, sorted alphabetically."""
    return sorted(REGIONS.keys())

list_profile_types ¶

list_profile_types() -> list[str]

Return the supported profile types in v2.0.

fleet_depot was de-scoped in v2.0 and is intentionally absent; only residential and workplace are supported.

Source code in pev_synth/regions.py

def list_profile_types() -> list[str]:
    """Return the supported profile types in v2.0.

    ``fleet_depot`` was de-scoped in v2.0 and is intentionally absent;
    only ``residential`` and ``workplace`` are supported.
    """
    return list(PROFILE_TYPES)

The region registry¶

pev_synth.REGIONS is the canonical mapping of region short-name to Region instance. The eight registered regions are bay_area, boston, chicago, dallas_fort_worth, la_basin, new_york_metro, seattle and us_national. Look one up by name with Region.from_name or by indexing REGIONS directly:

import pev_synth as ps

ps.REGIONS['bay_area'].tz          # 'America/Los_Angeles'
ps.Region.from_name('seattle').display_name   # 'Seattle + Portland'

Plotting¶

Optional matplotlib helpers for visualising fleet output, in the pev_synth.plotting submodule. They are kept separate from the core package so plain import pev_synth has no matplotlib dependency — install the extra with pip install ev-flow[plotting], then from pev_synth import plotting.

plotting ¶

Optional matplotlib plotting helpers for pev_synth.

This module is not imported by import pev_synth and matplotlib is not a core dependency. Import the helpers explicitly::

>>> from pev_synth import plotting
>>> ax = plotting.plot_aggregate_load(fleet.plug_status(t0, t1))

and install the optional extra::

pip install ev-flow[plotting]

Each helper operates directly on the DataFrames returned by the public :class:pev_synth.Fleet / :class:pev_synth.Profile API, so the helpers are unit-testable without a real on-disk cache:

:func:plot_aggregate_load consumes the wide plug-status matrix returned by :meth:Fleet.plug_status / :meth:Fleet.generate_presence_absence (DatetimeIndex × ev_id columns of bool).
:func:plot_plugin_time_distribution and :func:plot_session_energy_distribution consume the long sessions table returned by :meth:Fleet.charging_sessions (columns include ev_id, t_in, t_out, energy_kwh, max_kw, ...).
:func:plot_soc_traces reconstructs per-EV state-of-charge step traces from the sessions table's t_in / t_out / soc_in_kwh / soc_target_out_kwh anchor columns.

Conventions

matplotlib (and pyplot) are imported lazily inside each function via :func:_import_pyplot, which raises a friendly :class:ImportError when the optional extra is missing.
Every helper accepts an optional pre-existing ax and returns the :class:matplotlib.axes.Axes it drew on (creating a fresh figure/axes when ax is None).
No helper ever calls plt.show() — rendering / saving is left to the caller so the helpers stay headless-friendly (e.g. under Agg).

plot_aggregate_load ¶

plot_aggregate_load(
    plug_status_df: DataFrame,
    ax: Axes | None = None,
    *,
    freq: str | None = None,
    label: str = "plugged-in EVs",
    color: str = "#0072B2",
) -> Axes

Plot the fleet-aggregate plugged-in count over time.

Parameters:

Name	Type	Description	Default
`plug_status_df`	`DataFrame`	Wide boolean plug-status matrix as returned by :meth:`pev_synth.Fleet.plug_status` / :meth:`pev_synth.Fleet.generate_presence_absence`: a :class:`~pandas.DataFrame` with a `DatetimeIndex` and one column per `ev_id` whose cells are `True` when that EV is plugged in.	required
`ax`	`Axes \| None`	Pre-existing Axes to draw on. A new figure/axes is created when `None`.	`None`
`freq`	`str \| None`	Optional pandas offset alias (e.g. `"1h"`). When given the per-step count is resampled to `freq` using the mean, smoothing a 15-min matrix onto a coarser grid. When `None` (default) the matrix is plotted at its native resolution.	`None`
`label`	`str`	Legend label for the drawn line.	`'plugged-in EVs'`
`color`	`str`	Line color (defaults to a colorblind-safe blue from the Okabe-Ito palette).	`'#0072B2'`

Returns:

Type	Description
`Axes`	The Axes the curve was drawn on.

Source code in pev_synth/plotting.py

def plot_aggregate_load(
    plug_status_df: pd.DataFrame,
    ax: Axes | None = None,
    *,
    freq: str | None = None,
    label: str = "plugged-in EVs",
    color: str = "#0072B2",
) -> Axes:
    """Plot the fleet-aggregate plugged-in count over time.

    Parameters
    ----------
    plug_status_df:
        Wide boolean plug-status matrix as returned by
        :meth:`pev_synth.Fleet.plug_status` /
        :meth:`pev_synth.Fleet.generate_presence_absence`: a
        :class:`~pandas.DataFrame` with a ``DatetimeIndex`` and one column per
        ``ev_id`` whose cells are ``True`` when that EV is plugged in.
    ax:
        Pre-existing Axes to draw on. A new figure/axes is created when
        ``None``.
    freq:
        Optional pandas offset alias (e.g. ``"1h"``). When given the per-step
        count is resampled to ``freq`` using the mean, smoothing a 15-min
        matrix onto a coarser grid. When ``None`` (default) the matrix is
        plotted at its native resolution.
    label:
        Legend label for the drawn line.
    color:
        Line color (defaults to a colorblind-safe blue from the Okabe-Ito
        palette).

    Returns
    -------
    matplotlib.axes.Axes
        The Axes the curve was drawn on.
    """
    ax = _resolve_ax(ax)
    # Sum across the per-EV columns -> plugged-in count per timestep.
    if plug_status_df.shape[1] == 0:
        count = pd.Series(dtype=float, index=plug_status_df.index)
    else:
        count = plug_status_df.astype(float).sum(axis=1)
    if freq is not None and not count.empty:
        count = count.resample(freq).mean()
    ax.plot(count.index, count.to_numpy(), color=color, label=label)
    ax.set_xlabel("time")
    ax.set_ylabel("plugged-in EVs (count)")
    ax.set_title("Fleet aggregate plug-in load")
    ax.margins(x=0)
    ax.legend(loc="best")
    return ax

plot_plugin_time_distribution ¶

plot_plugin_time_distribution(
    sessions_df: DataFrame,
    ax: Axes | None = None,
    *,
    bins: int = 24,
    color: str = "#009E73",
    t_in_col: str = "t_in",
) -> Axes

Plot a histogram of charging-session plug-in hour-of-day.

Parameters:

Name	Type	Description	Default
`sessions_df`	`DataFrame`	Long charging-session table as returned by :meth:`pev_synth.Fleet.charging_sessions`. The `t_in_col` column must be datetime-like (tz-aware is fine; the hour is taken from whatever wall-clock the column carries, so convert to a local tz upstream if you want local hour-of-day).	required
`ax`	`Axes \| None`	Pre-existing Axes to draw on. A new figure/axes is created when `None`.	`None`
`bins`	`int`	Number of histogram bins spanning `[0, 24)` hours (default 24, i.e. one bin per hour).	`24`
`color`	`str`	Bar color (defaults to a colorblind-safe green from the Okabe-Ito palette).	`'#009E73'`
`t_in_col`	`str`	Name of the plug-in timestamp column (default `"t_in"`).	`'t_in'`

Returns:

Type	Description
`Axes`	The Axes the histogram was drawn on.

Source code in pev_synth/plotting.py

def plot_plugin_time_distribution(
    sessions_df: pd.DataFrame,
    ax: Axes | None = None,
    *,
    bins: int = 24,
    color: str = "#009E73",
    t_in_col: str = "t_in",
) -> Axes:
    """Plot a histogram of charging-session plug-in hour-of-day.

    Parameters
    ----------
    sessions_df:
        Long charging-session table as returned by
        :meth:`pev_synth.Fleet.charging_sessions`. The ``t_in_col`` column must
        be datetime-like (tz-aware is fine; the hour is taken from whatever
        wall-clock the column carries, so convert to a local tz upstream if you
        want local hour-of-day).
    ax:
        Pre-existing Axes to draw on. A new figure/axes is created when
        ``None``.
    bins:
        Number of histogram bins spanning ``[0, 24)`` hours (default 24, i.e.
        one bin per hour).
    color:
        Bar color (defaults to a colorblind-safe green from the Okabe-Ito
        palette).
    t_in_col:
        Name of the plug-in timestamp column (default ``"t_in"``).

    Returns
    -------
    matplotlib.axes.Axes
        The Axes the histogram was drawn on.
    """
    ax = _resolve_ax(ax)
    ts = pd.to_datetime(sessions_df[t_in_col])
    hour = ts.dt.hour.to_numpy(dtype=float) + ts.dt.minute.to_numpy(dtype=float) / 60.0
    ax.hist(hour, bins=bins, range=(0.0, 24.0), color=color, edgecolor="black")
    ax.set_xlabel("plug-in hour of day")
    ax.set_ylabel("session count")
    ax.set_title("Plug-in start-time distribution")
    ax.set_xlim(0.0, 24.0)
    ax.set_xticks(range(0, 25, 6))
    return ax

plot_session_energy_distribution ¶

plot_session_energy_distribution(
    sessions_df: DataFrame,
    ax: Axes | None = None,
    *,
    bins: int = 30,
    color: str = "#D55E00",
    energy_col: str = "energy_kwh",
) -> Axes

Plot a histogram of per-session delivered energy.

Parameters:

Name	Type	Description	Default
`sessions_df`	`DataFrame`	Long charging-session table as returned by :meth:`pev_synth.Fleet.charging_sessions`. The `energy_col` column is the grid-side session energy in kWh.	required
`ax`	`Axes \| None`	Pre-existing Axes to draw on. A new figure/axes is created when `None`.	`None`
`bins`	`int`	Number of histogram bins (default 30).	`30`
`color`	`str`	Bar color (defaults to a colorblind-safe orange from the Okabe-Ito palette).	`'#D55E00'`
`energy_col`	`str`	Name of the session-energy column (default `"energy_kwh"`).	`'energy_kwh'`

Returns:

Type	Description
`Axes`	The Axes the histogram was drawn on.

Source code in pev_synth/plotting.py

def plot_session_energy_distribution(
    sessions_df: pd.DataFrame,
    ax: Axes | None = None,
    *,
    bins: int = 30,
    color: str = "#D55E00",
    energy_col: str = "energy_kwh",
) -> Axes:
    """Plot a histogram of per-session delivered energy.

    Parameters
    ----------
    sessions_df:
        Long charging-session table as returned by
        :meth:`pev_synth.Fleet.charging_sessions`. The ``energy_col`` column is
        the grid-side session energy in kWh.
    ax:
        Pre-existing Axes to draw on. A new figure/axes is created when
        ``None``.
    bins:
        Number of histogram bins (default 30).
    color:
        Bar color (defaults to a colorblind-safe orange from the Okabe-Ito
        palette).
    energy_col:
        Name of the session-energy column (default ``"energy_kwh"``).

    Returns
    -------
    matplotlib.axes.Axes
        The Axes the histogram was drawn on.
    """
    ax = _resolve_ax(ax)
    energy = pd.to_numeric(sessions_df[energy_col], errors="coerce").to_numpy(
        dtype=float
    )
    energy = energy[np.isfinite(energy)]
    ax.hist(energy, bins=bins, color=color, edgecolor="black")
    ax.set_xlabel("session energy (kWh)")
    ax.set_ylabel("session count")
    ax.set_title("Session-energy distribution")
    return ax

plot_soc_traces ¶

plot_soc_traces(
    sessions_df: DataFrame,
    ev_ids: list[int] | None = None,
    ax: Axes | None = None,
    *,
    max_traces: int = 5,
    ev_id_col: str = "ev_id",
    t_in_col: str = "t_in",
    t_out_col: str = "t_out",
    soc_in_col: str = "soc_in_kwh",
    soc_out_col: str = "soc_target_out_kwh",
) -> Axes

Plot per-EV state-of-charge traces reconstructed from sessions.

The :class:pev_synth.Profile API exposes a continuous SoC trajectory only as a :class:~pandas.Series (:meth:Profile.soc_trajectory), not as a DataFrame. To stay cache-free and DataFrame-driven, this helper instead reconstructs a step trace per EV directly from the sessions table's SoC anchor columns: within each session SoC ramps from soc_in_col to soc_out_col over [t_in, t_out). This mirrors the in-session ramp of :meth:Profile.soc_trajectory (the between-session driving discharge is not drawn, as it is not encoded in the sessions table alone).

Parameters:

Name	Type	Description	Default
`sessions_df`	`DataFrame`	Long charging-session table as returned by :meth:`pev_synth.Fleet.charging_sessions`.	required
`ev_ids`	`list[int] \| None`	Specific `ev_id` values to plot. When `None` (default) the first `max_traces` distinct EVs present in `sessions_df` are used.	`None`
`ax`	`Axes \| None`	Pre-existing Axes to draw on. A new figure/axes is created when `None`.	`None`
`max_traces`	`int`	Cap on the number of EV traces drawn (default 5) when `ev_ids` is not given.	`5`
`ev_id_col`	`str`	Column-name overrides for the sessions schema.	`'ev_id'`
`t_in_col`	`str`	Column-name overrides for the sessions schema.	`'ev_id'`
`t_out_col`	`str`	Column-name overrides for the sessions schema.	`'ev_id'`
`soc_in_col`	`str`	Column-name overrides for the sessions schema.	`'ev_id'`
`soc_out_col`	`str`	Column-name overrides for the sessions schema.	`'ev_id'`

Returns:

Type	Description
`Axes`	The Axes the traces were drawn on.

Source code in pev_synth/plotting.py

def plot_soc_traces(
    sessions_df: pd.DataFrame,
    ev_ids: list[int] | None = None,
    ax: Axes | None = None,
    *,
    max_traces: int = 5,
    ev_id_col: str = "ev_id",
    t_in_col: str = "t_in",
    t_out_col: str = "t_out",
    soc_in_col: str = "soc_in_kwh",
    soc_out_col: str = "soc_target_out_kwh",
) -> Axes:
    """Plot per-EV state-of-charge traces reconstructed from sessions.

    The :class:`pev_synth.Profile` API exposes a continuous SoC trajectory only
    as a :class:`~pandas.Series` (:meth:`Profile.soc_trajectory`), not as a
    DataFrame. To stay cache-free and DataFrame-driven, this helper instead
    reconstructs a step trace per EV directly from the sessions table's SoC
    anchor columns: within each session SoC ramps from ``soc_in_col`` to
    ``soc_out_col`` over ``[t_in, t_out)``. This mirrors the in-session ramp of
    :meth:`Profile.soc_trajectory` (the between-session driving discharge is not
    drawn, as it is not encoded in the sessions table alone).

    Parameters
    ----------
    sessions_df:
        Long charging-session table as returned by
        :meth:`pev_synth.Fleet.charging_sessions`.
    ev_ids:
        Specific ``ev_id`` values to plot. When ``None`` (default) the first
        ``max_traces`` distinct EVs present in ``sessions_df`` are used.
    ax:
        Pre-existing Axes to draw on. A new figure/axes is created when
        ``None``.
    max_traces:
        Cap on the number of EV traces drawn (default 5) when ``ev_ids`` is not
        given.
    ev_id_col, t_in_col, t_out_col, soc_in_col, soc_out_col:
        Column-name overrides for the sessions schema.

    Returns
    -------
    matplotlib.axes.Axes
        The Axes the traces were drawn on.
    """
    ax = _resolve_ax(ax)

    if ev_ids is None:
        distinct = list(dict.fromkeys(sessions_df[ev_id_col].tolist()))
        ev_ids = distinct[:max_traces]

    # Colorblind-safe Okabe-Ito cycle + distinct line styles so traces are
    # distinguishable without relying on color alone.
    palette = (
        "#0072B2",
        "#D55E00",
        "#009E73",
        "#CC79A7",
        "#E69F00",
        "#56B4E9",
        "#F0E442",
        "#000000",
    )
    line_styles = ("-", "--", "-.", ":")

    for i, ev in enumerate(ev_ids):
        sub = sessions_df.loc[sessions_df[ev_id_col] == ev].copy()
        if sub.empty:
            continue
        sub = sub.sort_values(t_in_col)
        t_in = pd.to_datetime(sub[t_in_col])
        t_out = pd.to_datetime(sub[t_out_col])
        soc_in = pd.to_numeric(sub[soc_in_col], errors="coerce")
        soc_out = pd.to_numeric(sub[soc_out_col], errors="coerce")
        # Interleave session start/end anchors into a single step trace.
        times = np.empty(2 * len(sub), dtype=object)
        socs = np.empty(2 * len(sub), dtype=float)
        times[0::2] = t_in.to_numpy()
        times[1::2] = t_out.to_numpy()
        socs[0::2] = soc_in.to_numpy(dtype=float)
        socs[1::2] = soc_out.to_numpy(dtype=float)
        ax.plot(
            times,
            socs,
            color=palette[i % len(palette)],
            linestyle=line_styles[i % len(line_styles)],
            marker="o",
            markersize=3,
            label=f"ev_{int(ev):04d}",
        )

    ax.set_xlabel("time")
    ax.set_ylabel("state of charge (kWh)")
    ax.set_title("Per-EV SoC traces")
    if ax.get_legend_handles_labels()[0]:
        ax.legend(loc="best", fontsize="small")
    return ax

Type aliases¶

ProfileType `module-attribute` ¶

ProfileType = Literal['residential', 'workplace']

API Reference¶

Package overview¶

Generating fleets¶

generate_profiles ¶

regenerate_fleet ¶

Fleet¶

Fleet ¶

meta property ¶

by_ev_id ¶

by_position ¶

cache_path staticmethod ¶

profile_ids ¶

ev_ids ¶

has_ev ¶

row_for ¶

charging_sessions_for ¶

presence_absence_one ¶

read_parquet ¶

summary ¶

filter ¶

generate_presence_absence ¶

plug_status ¶

charging_sessions ¶

aggregate_load ¶

save ¶

load classmethod ¶

Profile¶

Profile ¶

evse_brand property ¶

evse_connector property ¶

summary ¶

generate_presence_absence ¶

plug_status ¶

charging_sessions ¶

soc_trajectory ¶

trips ¶

Regions¶

Region dataclass ¶

nhts_vintage_mix_map property ¶

sales_mix_overlay_map property ¶

from_name classmethod ¶

list_regions ¶

list_profile_types ¶

The region registry¶

Plotting¶

plotting ¶

plot_aggregate_load ¶

plot_plugin_time_distribution ¶

plot_session_energy_distribution ¶

plot_soc_traces ¶

Type aliases¶

ProfileType module-attribute ¶

meta `property` ¶

cache_path `staticmethod` ¶

load `classmethod` ¶

evse_brand `property` ¶

evse_connector `property` ¶

Region `dataclass` ¶

nhts_vintage_mix_map `property` ¶

sales_mix_overlay_map `property` ¶

from_name `classmethod` ¶

ProfileType `module-attribute` ¶