Skip to content

gcages.aneris_helpers#

Helpers for working with aneris

A lot of this should be pushed upstream at some point.

Classes:

Name Description
AmbiguousHarmonisationMethod

Error raised when harmonisation methods are ambiguous.

MissingHarmonisationYear

Error raised when the harmonisation year is missing.

MissingHistoricalError

Error raised when historical data is missing.

Functions:

Name Description
harmonise_all

Harmonise all timeseries in scenarios to match history

harmonise_scenario

Harmonise a single scenario

AmbiguousHarmonisationMethod #

Bases: ValueError

Error raised when harmonisation methods are ambiguous.

Source code in src/gcages/aneris_helpers.py
class AmbiguousHarmonisationMethod(ValueError):
    """
    Error raised when harmonisation methods are ambiguous.
    """

MissingHarmonisationYear #

Bases: ValueError

Error raised when the harmonisation year is missing.

Source code in src/gcages/aneris_helpers.py
class MissingHarmonisationYear(ValueError):
    """
    Error raised when the harmonisation year is missing.
    """

MissingHistoricalError #

Bases: ValueError

Error raised when historical data is missing.

Source code in src/gcages/aneris_helpers.py
class MissingHistoricalError(ValueError):
    """
    Error raised when historical data is missing.
    """

harmonise_all #

harmonise_all(
    scenarios: DataFrame,
    history: DataFrame,
    year: int,
    overrides: Series[str] | None = None,
) -> DataFrame

Harmonise all timeseries in scenarios to match history

This is a re-write of aneris` version of the same. TODO: MR upstream.

Parameters:

Name Type Description Default
scenarios DataFrame

pd.DataFrame containing the timeseries to be harmonised

required
history DataFrame

pd.DataFrame containing the historical timeseries to which scenarios should be harmonised.

required
year int

The year in which scenarios should be harmonised to history

required
overrides Series[str] | None

If not provided, the default aneris decision tree is used.

Otherwise, overrides must be a pd.DataFrame containing any specifications for overriding the default aneris methods. Each row specifies one override. The override method is specified in the "method" columns. The other columns specify which of the timeseries in scenarios should use this override by specifying metadata to match ( e.g. variable, region). If a cell has a null value (evaluated using pd.isnull()) then that scenario characteristic will not be used for filtering for that override. For example, if you have a row with "method" equal to "constant_ratio", region equal to "World" and variable is null then all timeseries in the "World" region will use the "constant_ratio" method. In contrast, if you have a row with "method" equal to "constant_ratio", region equal to "World" and variable is "Emissions|CO2" then only timeseries with variable equal to "Emissions|CO2" and region equal to "World" will use the "constant_ratio" method.

None

Returns:

Type Description
DataFrame

The harmonised timeseries

Notes

This interface is nowhere near as sophisticated as aneris' other interfaces. It simply harmonises timeseries. It does not check sectoral sums or other possible errors which can arise when harmonising. If you need such features, do not use this interface.

Raises:

Type Description
MissingHistoricalError

No historical data is provided for a given timeseries

MissingHarmonisationYear

A value for the harmonisation year is missing or is null in history

AmbiguousHarmonisationMethod

overrides do not uniquely specify the harmonisation method for a given timeseries.

Source code in src/gcages/aneris_helpers.py
def harmonise_all(
    scenarios: pd.DataFrame,
    history: pd.DataFrame,
    year: int,
    overrides: pd.Series[str] | None = None,
) -> pd.DataFrame:
    """
    Harmonise all timeseries in `scenarios` to match `history`

    This is a re-write of aneris` version of the same.
    TODO: MR upstream.

    Parameters
    ----------
    scenarios
        `pd.DataFrame` containing the timeseries to be harmonised

    history
        `pd.DataFrame` containing the historical timeseries to which
        `scenarios` should be harmonised.

    year
        The year in which `scenarios` should be harmonised to `history`

    overrides
        If not provided, the default aneris decision tree is used.

        Otherwise, `overrides` must be a `pd.DataFrame`
        containing any specifications for overriding the default aneris methods.
        Each row specifies one override.
        The override method is specified in the "method" columns.
        The other columns specify which of the timeseries in
        `scenarios` should use this override by specifying metadata to match
        ( e.g. variable, region).
        If a cell has a null value (evaluated using `pd.isnull()`)
        then that scenario characteristic will not be used for
        filtering for that override.
        For example, if you have a row with "method" equal to "constant_ratio",
        region equal to "World" and variable is null
        then all timeseries in the "World" region will use the "constant_ratio" method.
        In contrast, if you have a row with "method" equal to "constant_ratio",
        region equal to "World" and variable is "Emissions|CO2"
        then only timeseries with variable equal to "Emissions|CO2"
        and region equal to "World" will use the "constant_ratio" method.

    Returns
    -------
    :
        The harmonised timeseries

    Notes
    -----
    This interface is nowhere near as sophisticated as aneris' other interfaces.
    It simply harmonises timeseries.
    It does not check sectoral sums
    or other possible errors which can arise when harmonising.
    If you need such features, do not use this interface.

    Raises
    ------
    MissingHistoricalError
        No historical data is provided for a given timeseries

    MissingHarmonisationYear
        A value for the harmonisation year is missing or is null in `history`

    AmbiguousHarmonisationMethod
        `overrides` do not uniquely specify
        the harmonisation method for a given timeseries.
    """
    try:
        from aneris.harmonize import Harmonizer  # type: ignore
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "harmonise_all", requirement="aneris"
        ) from exc

    try:
        from pandas_indexing.core import assignlevel, concat, semijoin
        from pandas_indexing.selectors import isin
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "harmonise_all", requirement="pandas_indexing"
        ) from exc

    sidx = scenarios.index  # save in case we need to re-add extraneous indicies later

    dfs = []
    group_levels = ["model", "scenario"]
    harm_idx = ["variable", "region"]
    for (model, scenario), msdf in scenarios.groupby(group_levels):
        hist_msdf = history.loc[
            isin(region=msdf.pix.unique("region"))  # type: ignore
            & isin(variable=msdf.pix.unique("variable"))  # type: ignore
        ]
        _check_data(hist_msdf, msdf, year)

        hist_msdf = _convert_units_to_match(start=hist_msdf, match=msdf)

        # need to convert to aneris' internal datastructure
        level_order = ["model", "scenario", "region", "variable", "unit"]
        msdf_aneris = msdf.reorder_levels(level_order)
        # Drop out any years that are all nan before passing to aneris
        msdf_aneris = msdf_aneris.dropna(how="all", axis="columns")
        # Convert to format expected by aneris
        hist_msdf_aneris = hist_msdf.pix.assign(
            model="history", scenario="scen"
        ).reorder_levels(level_order)

        # Drop out the group levels
        msdf_aneris = msdf_aneris.reset_index(group_levels, drop=True)
        hist_msdf_aneris = hist_msdf_aneris.reset_index(group_levels, drop=True)

        harmoniser = Harmonizer(
            msdf_aneris,
            hist_msdf_aneris,
            # have to copy harm index as aneris modifies it for some reason
            harm_idx=harm_idx.copy(),
        )

        # knead overrides
        overrides_kneaded = _knead_overrides(overrides, msdf, harm_idx=harm_idx)  # type: ignore
        result: pd.DataFrame = harmoniser.harmonize(
            year=year, overrides=overrides_kneaded
        )

        # convert out of internal datastructure
        dfs.append(assignlevel(result, model=model, scenario=scenario))

    # realign indicies as needed
    result = concat(dfs)
    result = semijoin(result, sidx, how="right").reorder_levels(sidx.names)

    return result

harmonise_scenario #

harmonise_scenario(
    indf: DataFrame,
    history: DataFrame,
    year: int,
    overrides: Series[str] | None,
) -> DataFrame

Harmonise a single scenario

Parameters:

Name Type Description Default
indf DataFrame

Scenario to harmonise

required
history DataFrame

History to harmonise to

required
year int

Year to use for harmonisation

required
overrides Series[str] | None

Overrides to pass to aneris

required

Returns:

Type Description
DataFrame

Harmonised scenario

Source code in src/gcages/aneris_helpers.py
def harmonise_scenario(
    indf: pd.DataFrame,
    history: pd.DataFrame,
    year: int,
    overrides: pd.Series[str] | None,
) -> pd.DataFrame:
    """
    Harmonise a single scenario

    Parameters
    ----------
    indf
        Scenario to harmonise

    history
        History to harmonise to

    year
        Year to use for harmonisation

    overrides
        Overrides to pass to aneris

    Returns
    -------
    :
        Harmonised scenario
    """
    assert_only_working_on_variable_unit_region_variations(indf)

    harmonised = harmonise_all(
        indf,
        history=history,
        year=year,
        overrides=overrides,
    )

    return harmonised