gcages.aneris_helpers#

Helpers for working with aneris

A lot of this should be pushed upstream at some point.

Classes:

Name	Description
`AmbiguousHarmonisationMethod`	Error raised when harmonisation methods are ambiguous.
`MissingHarmonisationYear`	Error raised when the harmonisation year is missing.
`MissingHistoricalError`	Error raised when historical data is missing.

Functions:

Name	Description
`harmonise_all`	Harmonise all timeseries in `scenarios` to match `history`
`harmonise_scenario`	Harmonise a single scenario

AmbiguousHarmonisationMethod #

Bases: ValueError

Error raised when harmonisation methods are ambiguous.

Source code in src/gcages/aneris_helpers.py

class AmbiguousHarmonisationMethod(ValueError):
    """
    Error raised when harmonisation methods are ambiguous.
    """

MissingHarmonisationYear #

Bases: ValueError

Error raised when the harmonisation year is missing.

Source code in src/gcages/aneris_helpers.py

class MissingHarmonisationYear(ValueError):
    """
    Error raised when the harmonisation year is missing.
    """

MissingHistoricalError #

Bases: ValueError

Error raised when historical data is missing.

Source code in src/gcages/aneris_helpers.py

class MissingHistoricalError(ValueError):
    """
    Error raised when historical data is missing.
    """

harmonise_all #

harmonise_all(
    scenarios: DataFrame,
    history: DataFrame,
    year: int,
    overrides: Series[str] | None = None,
) -> DataFrame

Harmonise all timeseries in scenarios to match history

This is a re-write of aneris` version of the same. TODO: MR upstream.

Parameters:

Name	Type	Description	Default
`scenarios`	`DataFrame`	`pd.DataFrame` containing the timeseries to be harmonised	required
`history`	`DataFrame`	`pd.DataFrame` containing the historical timeseries to which `scenarios` should be harmonised.	required
`year`	`int`	The year in which `scenarios` should be harmonised to `history`	required
`overrides`	`Series[str] \| None`	If not provided, the default aneris decision tree is used. Otherwise, `overrides` must be a `pd.DataFrame` containing any specifications for overriding the default aneris methods. Each row specifies one override. The override method is specified in the "method" columns. The other columns specify which of the timeseries in `scenarios` should use this override by specifying metadata to match ( e.g. variable, region). If a cell has a null value (evaluated using `pd.isnull()`) then that scenario characteristic will not be used for filtering for that override. For example, if you have a row with "method" equal to "constant_ratio", region equal to "World" and variable is null then all timeseries in the "World" region will use the "constant_ratio" method. In contrast, if you have a row with "method" equal to "constant_ratio", region equal to "World" and variable is "Emissions\|CO2" then only timeseries with variable equal to "Emissions\|CO2" and region equal to "World" will use the "constant_ratio" method.	`None`

Returns:

Type	Description
`DataFrame`	The harmonised timeseries

Notes

This interface is nowhere near as sophisticated as aneris' other interfaces. It simply harmonises timeseries. It does not check sectoral sums or other possible errors which can arise when harmonising. If you need such features, do not use this interface.

Raises:

Type	Description
`MissingHistoricalError`	No historical data is provided for a given timeseries
`MissingHarmonisationYear`	A value for the harmonisation year is missing or is null in `history`
`AmbiguousHarmonisationMethod`	`overrides` do not uniquely specify the harmonisation method for a given timeseries.

Source code in src/gcages/aneris_helpers.py

def harmonise_all(
    scenarios: pd.DataFrame,
    history: pd.DataFrame,
    year: int,
    overrides: pd.Series[str] | None = None,
) -> pd.DataFrame:
    """
    Harmonise all timeseries in `scenarios` to match `history`

    This is a re-write of aneris` version of the same.
    TODO: MR upstream.

    Parameters
    ----------
    scenarios
        `pd.DataFrame` containing the timeseries to be harmonised

    history
        `pd.DataFrame` containing the historical timeseries to which
        `scenarios` should be harmonised.

    year
        The year in which `scenarios` should be harmonised to `history`

    overrides
        If not provided, the default aneris decision tree is used.

        Otherwise, `overrides` must be a `pd.DataFrame`
        containing any specifications for overriding the default aneris methods.
        Each row specifies one override.
        The override method is specified in the "method" columns.
        The other columns specify which of the timeseries in
        `scenarios` should use this override by specifying metadata to match
        ( e.g. variable, region).
        If a cell has a null value (evaluated using `pd.isnull()`)
        then that scenario characteristic will not be used for
        filtering for that override.
        For example, if you have a row with "method" equal to "constant_ratio",
        region equal to "World" and variable is null
        then all timeseries in the "World" region will use the "constant_ratio" method.
        In contrast, if you have a row with "method" equal to "constant_ratio",
        region equal to "World" and variable is "Emissions|CO2"
        then only timeseries with variable equal to "Emissions|CO2"
        and region equal to "World" will use the "constant_ratio" method.

    Returns
    -------
    :
        The harmonised timeseries

    Notes
    -----
    This interface is nowhere near as sophisticated as aneris' other interfaces.
    It simply harmonises timeseries.
    It does not check sectoral sums
    or other possible errors which can arise when harmonising.
    If you need such features, do not use this interface.

    Raises
    ------
    MissingHistoricalError
        No historical data is provided for a given timeseries

    MissingHarmonisationYear
        A value for the harmonisation year is missing or is null in `history`

    AmbiguousHarmonisationMethod
        `overrides` do not uniquely specify
        the harmonisation method for a given timeseries.
    """
    try:
        from aneris.harmonize import Harmonizer  # type: ignore # noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "harmonise_all", requirement="aneris"
        ) from exc

    try:
        from pandas_indexing.core import assignlevel, concat, semijoin  # noqa: PLC0415
        from pandas_indexing.selectors import isin  # noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "harmonise_all", requirement="pandas_indexing"
        ) from exc

    sidx = scenarios.index  # save in case we need to re-add extraneous indicies later

    dfs = []
    group_levels = ["model", "scenario"]
    harm_idx = ["variable", "region"]
    for (model, scenario), msdf in scenarios.groupby(group_levels):
        hist_msdf = history.loc[
            isin(region=msdf.pix.unique("region"))  # type: ignore
            & isin(variable=msdf.pix.unique("variable"))  # type: ignore
        ]
        _check_data(hist_msdf, msdf, year)

        hist_msdf = _convert_units_to_match(start=hist_msdf, match=msdf)

        # need to convert to aneris' internal datastructure
        level_order = ["model", "scenario", "region", "variable", "unit"]
        msdf_aneris = msdf.reorder_levels(level_order)
        # Drop out any years that are all nan before passing to aneris
        msdf_aneris = msdf_aneris.dropna(how="all", axis="columns")
        # Convert to format expected by aneris
        hist_msdf_aneris = hist_msdf.pix.assign(
            model="history", scenario="scen"
        ).reorder_levels(level_order)

        # Drop out the group levels
        msdf_aneris = msdf_aneris.reset_index(group_levels, drop=True)
        hist_msdf_aneris = hist_msdf_aneris.reset_index(group_levels, drop=True)

        harmoniser = Harmonizer(
            msdf_aneris,
            hist_msdf_aneris,
            # have to copy harm index as aneris modifies it for some reason
            harm_idx=harm_idx.copy(),
        )

        # knead overrides
        overrides_kneaded = _knead_overrides(overrides, msdf, harm_idx=harm_idx)  # type: ignore
        result: pd.DataFrame = harmoniser.harmonize(
            year=year, overrides=overrides_kneaded
        )

        # convert out of internal datastructure
        dfs.append(assignlevel(result, model=model, scenario=scenario))

    # realign indicies as needed
    result = concat(dfs)
    result = semijoin(result, sidx, how="right").reorder_levels(sidx.names)

    return result

harmonise_scenario #

harmonise_scenario(
    indf: DataFrame,
    history: DataFrame,
    year: int,
    overrides: Series[str] | None,
) -> DataFrame

Harmonise a single scenario

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Scenario to harmonise	required
`history`	`DataFrame`	History to harmonise to	required
`year`	`int`	Year to use for harmonisation	required
`overrides`	`Series[str] \| None`	Overrides to pass to aneris	required

Returns:

Type	Description
`DataFrame`	Harmonised scenario

Source code in src/gcages/aneris_helpers.py

def harmonise_scenario(
    indf: pd.DataFrame,
    history: pd.DataFrame,
    year: int,
    overrides: pd.Series[str] | None,
) -> pd.DataFrame:
    """
    Harmonise a single scenario

    Parameters
    ----------
    indf
        Scenario to harmonise

    history
        History to harmonise to

    year
        Year to use for harmonisation

    overrides
        Overrides to pass to aneris

    Returns
    -------
    :
        Harmonised scenario
    """
    assert_only_working_on_variable_unit_region_variations(indf)

    harmonised = harmonise_all(
        indf,
        history=history,
        year=year,
        overrides=overrides,
    )

    return harmonised