gcages.cmip7_scenariomip.infilling#

Infilling configuration and related things for the CMIP7 ScenarioMIP workflow

Classes:

Name	Description
`CMIP7ScenarioMIPInfiller`	Infiller that follows the same logic as was used in CMIP7 ScenarioMIP

Functions:

Name	Description
`get_complete`	Get a complete set of timeseries
`get_direct_copy_infiller`	Get an infiller which just copies the timeseries from another scenario
`get_direct_scaling_infiller`	Get an infiller which just scales one set of emissions to create the next set
`get_pre_industrial_aware_direct_scaling_infiller`	Build pre-industrial-aware direct scaling infillers for follower/leader pairs.
`get_silicone_based_infiller`	Get an infiller based on silicone
`infill`	Infill an emissions scenario using the provided infillers.
`load_cmip7_scenariomip_ghg_inversions`	Load
`load_cmip7_scenariomip_infilling_db`	Load infilling database for CMIP7 ScenarioMIP harmonisation.

Attributes:

Name	Type	Description
`COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES`		Complete set of input emissions using gcages' naming
`complete_index_gcages_names`		Complete index using gcages' names

COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES `module-attribute` #

COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES = [
    "Emissions|CO2|Biosphere",
    "Emissions|CO2|Fossil",
    "Emissions|BC",
    "Emissions|CH4",
    "Emissions|CO",
    "Emissions|N2O",
    "Emissions|NH3",
    "Emissions|NMVOC",
    "Emissions|NOx",
    "Emissions|OC",
    "Emissions|SOx",
    "Emissions|C2F6",
    "Emissions|C6F14",
    "Emissions|CF4",
    "Emissions|SF6",
    "Emissions|HFC125",
    "Emissions|HFC134a",
    "Emissions|HFC143a",
    "Emissions|HFC227ea",
    "Emissions|HFC23",
    "Emissions|HFC245fa",
    "Emissions|HFC32",
    "Emissions|HFC4310mee",
    "Emissions|CCl4",
    "Emissions|CFC11",
    "Emissions|CFC113",
    "Emissions|CFC114",
    "Emissions|CFC115",
    "Emissions|CFC12",
    "Emissions|CH3CCl3",
    "Emissions|HCFC141b",
    "Emissions|HCFC142b",
    "Emissions|HCFC22",
    "Emissions|Halon1202",
    "Emissions|Halon1211",
    "Emissions|Halon1301",
    "Emissions|Halon2402",
    "Emissions|C3F8",
    "Emissions|C4F10",
    "Emissions|C5F12",
    "Emissions|C7F16",
    "Emissions|C8F18",
    "Emissions|cC4F8",
    "Emissions|SO2F2",
    "Emissions|HFC236fa",
    "Emissions|HFC152a",
    "Emissions|HFC365mfc",
    "Emissions|CH2Cl2",
    "Emissions|CHCl3",
    "Emissions|CH3Br",
    "Emissions|CH3Cl",
    "Emissions|NF3",
]

Complete set of input emissions using gcages' naming

complete_index_gcages_names `module-attribute` #

complete_index_gcages_names = from_product(
    [COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES, ["World"]],
    names=["variable", "region"],
)

Complete index using gcages' names

CMIP7ScenarioMIPInfiller #

Infiller that follows the same logic as was used in CMIP7 ScenarioMIP

If you want exactly the same behaviour as in CMIP7 ScenarioMIP, initialise using from_cmip7_scenariomip_config

Methods:

Name	Description
`__call__`	Create an a infilled df for CMIP7 ScenarioMIP's simple climate model run.
`from_cmip7_scenariomip_config`	Initialise from the config used in AR6

Attributes:

Name	Type	Description
`cmip7_ghg_inversions`	`DataFrame`	Green house gasses inversion data frame.
`harmonisation_year`	`int`	Year in which the data was harmonised
`historical_emissions`	`DataFrame`	Historical emissions used for harmonisation
`infilling_db`	`DataFrame`	Infilling leaders data base for each variable.
`pre_industrial_year`	`int`	Pre-Industrial year
`run_checks`	`bool`	If `True`, run checks on both input and output data
`ur`	`UnitRegistry \| None`	UnitRegistry

Source code in src/gcages/cmip7_scenariomip/infilling.py

@define
class CMIP7ScenarioMIPInfiller:
    """
    Infiller that follows the same logic as was used in CMIP7 ScenarioMIP

    If you want exactly the same behaviour as in CMIP7 ScenarioMIP,
    initialise using [`from_cmip7_scenariomip_config`][(c)]
    """

    infilling_db: pd.DataFrame
    """
    Infilling leaders data base for each variable.
    """

    cmip7_ghg_inversions: pd.DataFrame
    """
    Green house gasses inversion data frame.
    """

    historical_emissions: pd.DataFrame
    """
    Historical emissions used for harmonisation
    """
    harmonisation_year: int = 2023
    """
    Year in which the data was harmonised
    """
    pre_industrial_year: int = 1750
    """
    Pre-Industrial year
    """
    run_checks: bool = True
    """
    If `True`, run checks on both input and output data

    If you are sure about your workflow,
    you can disable the checks to speed things up
    (but we don't recommend this unless you really
    are confident about what you're doing).
    """

    ur: UnitRegistry | None = None
    """
    UnitRegistry
    """

    def __call__(self, in_emissions: pd.DataFrame) -> pd.DataFrame:
        """
        Create an a infilled df for CMIP7 ScenarioMIP's simple climate model run.

        Parameters
        ----------
        in_emissions
            Emissions to infill

        Returns
        -------
        :
            Infilled emissions DataFrame
        """
        if self.ur is None:
            try:
                import openscm_units  # noqa: PLC0415

                self.ur = openscm_units.unit_registry
            except ImportError as exc:
                raise MissingOptionalDependencyError(
                    "openscm_units",
                    requirement="openscm_units",
                ) from exc

        try:
            import silicone.database_crunchers  # type: ignore # silicone has no type hints# noqa: PLC0415
        except ImportError as exc:
            raise MissingOptionalDependencyError(
                "get_silicone_based_infiller", requirement="silicone"
            ) from exc

        if self.run_checks:
            assert_index_is_multiindex(in_emissions)
            assert_data_is_all_numeric(in_emissions)
            assert_has_index_levels(
                in_emissions, ["variable", "unit", "model", "scenario"]
            )
            # Check that the infilling database and
            # scenario data are harmonised the same
            history = self.historical_emissions.reset_index(
                level=[
                    lvl
                    for lvl in ["model", "scenario"]
                    if lvl in self.historical_emissions.index.names
                ],
                drop=True,
            )
            assert_harmonised(
                in_emissions,
                history=history,
                harmonisation_time=self.harmonisation_year,
            )

        infilling_wmo = self.infilling_db[
            self.infilling_db.index.get_level_values("model").str.contains("WMO")
        ]

        infilling_silicone = self.infilling_db[
            ~self.infilling_db.index.get_level_values("model").str.contains("WMO")
            & ~self.infilling_db.index.get_level_values("model").str.contains("Velders")
        ]

        # Infill

        # TODO: split this out somehow
        ### Very low marker should use F-gas emissions in line with Kigali
        # We get these from [Velders et al., 2022](https://zenodo.org/records/6520707)

        vl_model, vl_scenario = ("REMIND-MAgPIE 3.5-4.11", "SSP1 - Very Low Emissions")

        mask = in_emissions.index.get_level_values("model").str.contains(
            vl_model
        ) & in_emissions.index.get_level_values("scenario").str.contains(vl_scenario)

        vl_marker = in_emissions[mask]
        unique_var = infilling_silicone.index.get_level_values("variable").unique()
        if not vl_marker.empty:
            lead_vl_marker = "Emissions|CO2|Fossil"
            infillers_silicone_vl_marker = {}
            for variable in [v for v in unique_var if v != lead_vl_marker]:
                infillers_silicone_vl_marker[variable] = get_silicone_based_infiller(
                    infilling_db=infilling_silicone,
                    follower_variable=variable,
                    lead_variables=[lead_vl_marker],
                    silicone_db_cruncher=silicone.database_crunchers.RMSClosest,
                )

            infilled_vl_exception = infill(
                vl_marker,
                infillers_silicone_vl_marker,
            )

        else:
            infilled_vl_exception = None

        # TODO: fix this. The infiller should only return infilled emissions,
        # not complete emissions.
        complete_vl_exception = get_complete(in_emissions, infilled_vl_exception)

        # Silicone
        lead = "Emissions|CO2|Fossil"
        infillers_silicone = {}
        for variable in [v for v in unique_var if v != lead]:
            infillers_silicone[variable] = get_silicone_based_infiller(
                infilling_db=infilling_silicone,
                follower_variable=variable,
                lead_variables=[lead],
                silicone_db_cruncher=silicone.database_crunchers.RMSClosest,
            )

        infilled_silicone = infill(
            complete_vl_exception,
            infillers_silicone,
        )
        complete_silicone = get_complete(complete_vl_exception, infilled_silicone)

        # Infill

        infillers_wmo = {}
        unique_var = infilling_wmo.index.get_level_values("variable").unique()
        for wmo_var in unique_var:
            infillers_wmo[wmo_var] = get_direct_copy_infiller(
                variable=wmo_var,
                copy_from=infilling_wmo,
            )

        infilled_wmo = infill(complete_silicone, infillers_wmo)
        complete_wmo = get_complete(complete_silicone, infilled_wmo)

        # Scale timeseries
        #
        # Surprisingly, this is the most mucking around of all.
        # The hard part here is that the scaling needs to be aware
        # of the fact that the pre-industrial value is different for each tiemseries.
        # The naming mucking around also adds to the fun of course.

        scaling_leaders = {
            "Emissions|C3F8": "Emissions|C2F6",
            "Emissions|C4F10": "Emissions|C2F6",
            "Emissions|C5F12": "Emissions|C2F6",
            "Emissions|C7F16": "Emissions|C2F6",
            "Emissions|C8F18": "Emissions|C2F6",
            "Emissions|cC4F8": "Emissions|CF4",
            "Emissions|SO2F2": "Emissions|CF4",
            "Emissions|HFC236fa": "Emissions|HFC245fa",
            "Emissions|HFC152a": "Emissions|HFC4310mee",
            "Emissions|HFC365mfc": "Emissions|HFC134a",
            "Emissions|CH2Cl2": "Emissions|HFC134a",
            "Emissions|CHCl3": "Emissions|C2F6",
            "Emissions|NF3": "Emissions|SF6",
        }

        infillers_scaling = get_pre_industrial_aware_direct_scaling_infiller(
            historical_emissions=self.historical_emissions,
            cmip7_ghg_inversions_reporting_names=self.cmip7_ghg_inversions,
            scaling_leaders=scaling_leaders,
            harmonisation_year=self.harmonisation_year,
            pre_industrial_year=self.pre_industrial_year,
        )

        infilled_scaling = infill(complete_wmo, infillers_scaling)
        infilled = get_complete(complete_wmo, infilled_scaling)
        infilled.columns.name = "year"

        if self.run_checks:
            pd.testing.assert_index_equal(infilled.columns, in_emissions.columns)

            assert_harmonised(
                infilled,
                history=history,
                harmonisation_time=self.harmonisation_year,
                rounding=5,  # level of data storage in historical data often
            )
            ## Check completeness
            assert_all_groups_are_complete(infilled, complete_index_gcages_names)

        return infilled

    @classmethod
    def from_cmip7_scenariomip_config(
        cls,
        cmip7_scenariomip_infilling_leader_emissions_file: Path,
        cmip7_ghg_inversions_file: Path,
        cmip7_scenariomip_global_historical_emissions_file: Path,
        ur: UnitRegistry | None = None,
        run_checks: bool = True,
    ) -> CMIP7ScenarioMIPInfiller:
        """
        Initialise from the config used in AR6

        Parameters
        ----------
        cmip7_scenariomip_infilling_leader_emissions_file
            File containing the infilling leaders database

            This is for all emissions except GHGs.

        cmip7_ghg_inversions_file
            File containing the infilling database for GHGs inversions

        cmip7_scenariomip_global_historical_emissions_file
            File containing the historical emissions used for harmonisation

        run_checks
            Should checks of the input and output data be performed?

            If this is turned off, things are faster,
            but error messages are much less clear if things go wrong.

        Returns
        -------
        :
            Initialised CMIP7ScenarioMIPInfiller
        """
        # Hardcode as we are matching CMIP7 ScenarioMIP exactly.
        # Users can copy and modify themselves if they wish
        # (or we can introduce a lower layer if lots of users want it)
        PI_YEAR = 1750
        HARMONISATION_YEAR = 2023

        if ur is None:
            try:
                import openscm_units  # noqa: PLC0415

                ur = openscm_units.unit_registry
            except ImportError as exc:
                raise MissingOptionalDependencyError(
                    "openscm_units",
                    requirement="openscm_units",
                ) from exc

        # Still embargoed
        infilling_db = load_cmip7_scenariomip_infilling_db(
            filepath=cmip7_scenariomip_infilling_leader_emissions_file,
            check_hash=False,  # TODO: update when available
        )

        # CMIP7 GHG inversions
        cmip7_ghg_inversions = load_cmip7_scenariomip_ghg_inversions(
            filepath=cmip7_ghg_inversions_file,
        )
        # History
        historical_emissions = load_cmip7_scenariomip_historical_emissions(
            filepath=cmip7_scenariomip_global_historical_emissions_file,
            check_hash=True,
        )

        # Use gcages naming convention.
        infilling_db = update_index_levels_func(
            infilling_db,
            {
                "variable": lambda x: convert_variable_name(
                    x,
                    from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
                    to_convention=SupportedNamingConventions.GCAGES,
                )
            },
            copy=False,
        )
        cmip7_ghg_inversions = update_index_levels_func(
            cmip7_ghg_inversions,
            {
                "variable": lambda x: convert_variable_name(
                    x,
                    from_convention=SupportedNamingConventions.OPENSCM_RUNNER,
                    to_convention=SupportedNamingConventions.GCAGES,
                )
            },
            copy=False,
        )
        historical_emissions = update_index_levels_func(
            historical_emissions,
            {
                "variable": lambda x: convert_variable_name(
                    x,
                    from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
                    to_convention=SupportedNamingConventions.GCAGES,
                )
            },
            copy=False,
        )

        if run_checks:
            assert_harmonised(
                infilling_db,
                history=historical_emissions.reset_index(
                    level=[
                        lvl
                        for lvl in ["model", "scenario"]
                        if lvl in historical_emissions.index.names
                    ],
                    drop=True,
                ),
                harmonisation_time=HARMONISATION_YEAR,
                history_unit_level="unit",
                ur=ur,
            )

        return cls(
            infilling_db=infilling_db,
            historical_emissions=historical_emissions,
            cmip7_ghg_inversions=cmip7_ghg_inversions,
            harmonisation_year=HARMONISATION_YEAR,
            pre_industrial_year=PI_YEAR,
            run_checks=run_checks,
            ur=ur,
        )

cmip7_ghg_inversions `instance-attribute` #

cmip7_ghg_inversions: DataFrame

Green house gasses inversion data frame.

harmonisation_year `class-attribute` `instance-attribute` #

harmonisation_year: int = 2023

Year in which the data was harmonised

historical_emissions `instance-attribute` #

historical_emissions: DataFrame

Historical emissions used for harmonisation

infilling_db `instance-attribute` #

infilling_db: DataFrame

Infilling leaders data base for each variable.

pre_industrial_year `class-attribute` `instance-attribute` #

pre_industrial_year: int = 1750

Pre-Industrial year

run_checks `class-attribute` `instance-attribute` #

run_checks: bool = True

If True, run checks on both input and output data

If you are sure about your workflow, you can disable the checks to speed things up (but we don't recommend this unless you really are confident about what you're doing).

ur `class-attribute` `instance-attribute` #

ur: UnitRegistry | None = None

UnitRegistry

call #

__call__(in_emissions: DataFrame) -> DataFrame

Create an a infilled df for CMIP7 ScenarioMIP's simple climate model run.

Parameters:

Name	Type	Description	Default
`in_emissions`	`DataFrame`	Emissions to infill	required

Returns:

Type	Description
`DataFrame`	Infilled emissions DataFrame

Source code in src/gcages/cmip7_scenariomip/infilling.py

def __call__(self, in_emissions: pd.DataFrame) -> pd.DataFrame:
    """
    Create an a infilled df for CMIP7 ScenarioMIP's simple climate model run.

    Parameters
    ----------
    in_emissions
        Emissions to infill

    Returns
    -------
    :
        Infilled emissions DataFrame
    """
    if self.ur is None:
        try:
            import openscm_units  # noqa: PLC0415

            self.ur = openscm_units.unit_registry
        except ImportError as exc:
            raise MissingOptionalDependencyError(
                "openscm_units",
                requirement="openscm_units",
            ) from exc

    try:
        import silicone.database_crunchers  # type: ignore # silicone has no type hints# noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "get_silicone_based_infiller", requirement="silicone"
        ) from exc

    if self.run_checks:
        assert_index_is_multiindex(in_emissions)
        assert_data_is_all_numeric(in_emissions)
        assert_has_index_levels(
            in_emissions, ["variable", "unit", "model", "scenario"]
        )
        # Check that the infilling database and
        # scenario data are harmonised the same
        history = self.historical_emissions.reset_index(
            level=[
                lvl
                for lvl in ["model", "scenario"]
                if lvl in self.historical_emissions.index.names
            ],
            drop=True,
        )
        assert_harmonised(
            in_emissions,
            history=history,
            harmonisation_time=self.harmonisation_year,
        )

    infilling_wmo = self.infilling_db[
        self.infilling_db.index.get_level_values("model").str.contains("WMO")
    ]

    infilling_silicone = self.infilling_db[
        ~self.infilling_db.index.get_level_values("model").str.contains("WMO")
        & ~self.infilling_db.index.get_level_values("model").str.contains("Velders")
    ]

    # Infill

    # TODO: split this out somehow
    ### Very low marker should use F-gas emissions in line with Kigali
    # We get these from [Velders et al., 2022](https://zenodo.org/records/6520707)

    vl_model, vl_scenario = ("REMIND-MAgPIE 3.5-4.11", "SSP1 - Very Low Emissions")

    mask = in_emissions.index.get_level_values("model").str.contains(
        vl_model
    ) & in_emissions.index.get_level_values("scenario").str.contains(vl_scenario)

    vl_marker = in_emissions[mask]
    unique_var = infilling_silicone.index.get_level_values("variable").unique()
    if not vl_marker.empty:
        lead_vl_marker = "Emissions|CO2|Fossil"
        infillers_silicone_vl_marker = {}
        for variable in [v for v in unique_var if v != lead_vl_marker]:
            infillers_silicone_vl_marker[variable] = get_silicone_based_infiller(
                infilling_db=infilling_silicone,
                follower_variable=variable,
                lead_variables=[lead_vl_marker],
                silicone_db_cruncher=silicone.database_crunchers.RMSClosest,
            )

        infilled_vl_exception = infill(
            vl_marker,
            infillers_silicone_vl_marker,
        )

    else:
        infilled_vl_exception = None

    # TODO: fix this. The infiller should only return infilled emissions,
    # not complete emissions.
    complete_vl_exception = get_complete(in_emissions, infilled_vl_exception)

    # Silicone
    lead = "Emissions|CO2|Fossil"
    infillers_silicone = {}
    for variable in [v for v in unique_var if v != lead]:
        infillers_silicone[variable] = get_silicone_based_infiller(
            infilling_db=infilling_silicone,
            follower_variable=variable,
            lead_variables=[lead],
            silicone_db_cruncher=silicone.database_crunchers.RMSClosest,
        )

    infilled_silicone = infill(
        complete_vl_exception,
        infillers_silicone,
    )
    complete_silicone = get_complete(complete_vl_exception, infilled_silicone)

    # Infill

    infillers_wmo = {}
    unique_var = infilling_wmo.index.get_level_values("variable").unique()
    for wmo_var in unique_var:
        infillers_wmo[wmo_var] = get_direct_copy_infiller(
            variable=wmo_var,
            copy_from=infilling_wmo,
        )

    infilled_wmo = infill(complete_silicone, infillers_wmo)
    complete_wmo = get_complete(complete_silicone, infilled_wmo)

    # Scale timeseries
    #
    # Surprisingly, this is the most mucking around of all.
    # The hard part here is that the scaling needs to be aware
    # of the fact that the pre-industrial value is different for each tiemseries.
    # The naming mucking around also adds to the fun of course.

    scaling_leaders = {
        "Emissions|C3F8": "Emissions|C2F6",
        "Emissions|C4F10": "Emissions|C2F6",
        "Emissions|C5F12": "Emissions|C2F6",
        "Emissions|C7F16": "Emissions|C2F6",
        "Emissions|C8F18": "Emissions|C2F6",
        "Emissions|cC4F8": "Emissions|CF4",
        "Emissions|SO2F2": "Emissions|CF4",
        "Emissions|HFC236fa": "Emissions|HFC245fa",
        "Emissions|HFC152a": "Emissions|HFC4310mee",
        "Emissions|HFC365mfc": "Emissions|HFC134a",
        "Emissions|CH2Cl2": "Emissions|HFC134a",
        "Emissions|CHCl3": "Emissions|C2F6",
        "Emissions|NF3": "Emissions|SF6",
    }

    infillers_scaling = get_pre_industrial_aware_direct_scaling_infiller(
        historical_emissions=self.historical_emissions,
        cmip7_ghg_inversions_reporting_names=self.cmip7_ghg_inversions,
        scaling_leaders=scaling_leaders,
        harmonisation_year=self.harmonisation_year,
        pre_industrial_year=self.pre_industrial_year,
    )

    infilled_scaling = infill(complete_wmo, infillers_scaling)
    infilled = get_complete(complete_wmo, infilled_scaling)
    infilled.columns.name = "year"

    if self.run_checks:
        pd.testing.assert_index_equal(infilled.columns, in_emissions.columns)

        assert_harmonised(
            infilled,
            history=history,
            harmonisation_time=self.harmonisation_year,
            rounding=5,  # level of data storage in historical data often
        )
        ## Check completeness
        assert_all_groups_are_complete(infilled, complete_index_gcages_names)

    return infilled

from_cmip7_scenariomip_config `classmethod` #

from_cmip7_scenariomip_config(
    cmip7_scenariomip_infilling_leader_emissions_file: Path,
    cmip7_ghg_inversions_file: Path,
    cmip7_scenariomip_global_historical_emissions_file: Path,
    ur: UnitRegistry | None = None,
    run_checks: bool = True,
) -> CMIP7ScenarioMIPInfiller

Initialise from the config used in AR6

Parameters:

Name	Type	Description	Default
`cmip7_scenariomip_infilling_leader_emissions_file`	`Path`	File containing the infilling leaders database This is for all emissions except GHGs.	required
`cmip7_ghg_inversions_file`	`Path`	File containing the infilling database for GHGs inversions	required
`cmip7_scenariomip_global_historical_emissions_file`	`Path`	File containing the historical emissions used for harmonisation	required
`run_checks`	`bool`	Should checks of the input and output data be performed? If this is turned off, things are faster, but error messages are much less clear if things go wrong.	`True`

Returns:

Type	Description
`CMIP7ScenarioMIPInfiller`	Initialised CMIP7ScenarioMIPInfiller

Source code in src/gcages/cmip7_scenariomip/infilling.py

@classmethod
def from_cmip7_scenariomip_config(
    cls,
    cmip7_scenariomip_infilling_leader_emissions_file: Path,
    cmip7_ghg_inversions_file: Path,
    cmip7_scenariomip_global_historical_emissions_file: Path,
    ur: UnitRegistry | None = None,
    run_checks: bool = True,
) -> CMIP7ScenarioMIPInfiller:
    """
    Initialise from the config used in AR6

    Parameters
    ----------
    cmip7_scenariomip_infilling_leader_emissions_file
        File containing the infilling leaders database

        This is for all emissions except GHGs.

    cmip7_ghg_inversions_file
        File containing the infilling database for GHGs inversions

    cmip7_scenariomip_global_historical_emissions_file
        File containing the historical emissions used for harmonisation

    run_checks
        Should checks of the input and output data be performed?

        If this is turned off, things are faster,
        but error messages are much less clear if things go wrong.

    Returns
    -------
    :
        Initialised CMIP7ScenarioMIPInfiller
    """
    # Hardcode as we are matching CMIP7 ScenarioMIP exactly.
    # Users can copy and modify themselves if they wish
    # (or we can introduce a lower layer if lots of users want it)
    PI_YEAR = 1750
    HARMONISATION_YEAR = 2023

    if ur is None:
        try:
            import openscm_units  # noqa: PLC0415

            ur = openscm_units.unit_registry
        except ImportError as exc:
            raise MissingOptionalDependencyError(
                "openscm_units",
                requirement="openscm_units",
            ) from exc

    # Still embargoed
    infilling_db = load_cmip7_scenariomip_infilling_db(
        filepath=cmip7_scenariomip_infilling_leader_emissions_file,
        check_hash=False,  # TODO: update when available
    )

    # CMIP7 GHG inversions
    cmip7_ghg_inversions = load_cmip7_scenariomip_ghg_inversions(
        filepath=cmip7_ghg_inversions_file,
    )
    # History
    historical_emissions = load_cmip7_scenariomip_historical_emissions(
        filepath=cmip7_scenariomip_global_historical_emissions_file,
        check_hash=True,
    )

    # Use gcages naming convention.
    infilling_db = update_index_levels_func(
        infilling_db,
        {
            "variable": lambda x: convert_variable_name(
                x,
                from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
                to_convention=SupportedNamingConventions.GCAGES,
            )
        },
        copy=False,
    )
    cmip7_ghg_inversions = update_index_levels_func(
        cmip7_ghg_inversions,
        {
            "variable": lambda x: convert_variable_name(
                x,
                from_convention=SupportedNamingConventions.OPENSCM_RUNNER,
                to_convention=SupportedNamingConventions.GCAGES,
            )
        },
        copy=False,
    )
    historical_emissions = update_index_levels_func(
        historical_emissions,
        {
            "variable": lambda x: convert_variable_name(
                x,
                from_convention=SupportedNamingConventions.CMIP7_SCENARIOMIP,
                to_convention=SupportedNamingConventions.GCAGES,
            )
        },
        copy=False,
    )

    if run_checks:
        assert_harmonised(
            infilling_db,
            history=historical_emissions.reset_index(
                level=[
                    lvl
                    for lvl in ["model", "scenario"]
                    if lvl in historical_emissions.index.names
                ],
                drop=True,
            ),
            harmonisation_time=HARMONISATION_YEAR,
            history_unit_level="unit",
            ur=ur,
        )

    return cls(
        infilling_db=infilling_db,
        historical_emissions=historical_emissions,
        cmip7_ghg_inversions=cmip7_ghg_inversions,
        harmonisation_year=HARMONISATION_YEAR,
        pre_industrial_year=PI_YEAR,
        run_checks=run_checks,
        ur=ur,
    )

get_complete #

get_complete(
    indf: DataFrame, infilled: DataFrame | None
) -> DataFrame

Get a complete set of timeseries

This is just a convenience function to help deal with the fact that infill can return None.

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Input data	required
`infilled`	`DataFrame \| None`	Results of infilling using infill	required

Returns:

Type	Description
`DataFrame`	Complete data i.e. the combination of `indf` and `infilled`

Source code in src/gcages/cmip7_scenariomip/infilling.py

def get_complete(indf: pd.DataFrame, infilled: pd.DataFrame | None) -> pd.DataFrame:
    """
    Get a complete set of timeseries

    This is just a convenience function to help deal with the fact
    that [infill][gcages.cmip7_scenariomip.infilling.infill] can return `None`.

    Parameters
    ----------
    indf
        Input data

    infilled
        Results of infilling using [infill][gcages.cmip7_scenariomip.infilling.infill]

    Returns
    -------
    :
        Complete data i.e. the combination of `indf` and `infilled`
    """
    try:
        from pandas_indexing.core import concat  # noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "infill", requirement="pandas_indexing"
        ) from exc

    if infilled is not None:
        complete = concat([indf, infilled])

    else:
        complete = indf

    return complete

get_direct_copy_infiller #

get_direct_copy_infiller(
    variable: str, copy_from: DataFrame
) -> Callable[[DataFrame], DataFrame]

Get an infiller which just copies the timeseries from another scenario

Parameters:

Name	Type	Description	Default
`variable`	`str`	Variable to infill	required
`copy_from`	`DataFrame`	Scenario to copy from	required

Returns:

Type	Description
`Callable[[DataFrame], DataFrame]`	Infiller which can infill data for `variable`

Source code in src/gcages/cmip7_scenariomip/infilling.py

def get_direct_copy_infiller(
    variable: str, copy_from: pd.DataFrame
) -> Callable[[pd.DataFrame], pd.DataFrame]:
    """
    Get an infiller which just copies the timeseries from another scenario

    Parameters
    ----------
    variable
        Variable to infill

    copy_from
        Scenario to copy from

    Returns
    -------
    :
        Infiller which can infill data for `variable`
    """

    def infiller(inp: pd.DataFrame) -> pd.DataFrame:
        try:
            from pandas_indexing.selectors import isin as pix_isin  # noqa: PLC0415
        except ImportError as exc:
            raise MissingOptionalDependencyError(
                "get_direct_copy_infiller", requirement="pandas_indexing"
            ) from exc

        model_l = inp.index.get_level_values("model").unique()
        if len(model_l) != 1:
            raise AssertionError(model_l)
        model = model_l[0]

        scenario_l = inp.index.get_level_values("scenario").unique()
        if len(scenario_l) != 1:
            raise AssertionError(scenario_l)
        scenario = scenario_l[0]

        res = copy_from.loc[pix_isin(variable=variable)].pix.assign(
            model=model, scenario=scenario
        )

        return cast(pd.DataFrame, res)

    return infiller

get_direct_scaling_infiller #

get_direct_scaling_infiller(
    leader: str,
    follower: str,
    scaling_factor: float,
    l_0: float,
    f_0: float,
    f_unit: str,
    calculation_year: int,
    f_calculation_year: int,
) -> Callable[[DataFrame], DataFrame]

Get an infiller which just scales one set of emissions to create the next set

This is basically silicone's constant ratio infiller with smarter handling of pre-industrial levels.

Source code in src/gcages/cmip7_scenariomip/infilling.py

def get_direct_scaling_infiller(  # noqa: PLR0913
    leader: str,
    follower: str,
    scaling_factor: float,
    l_0: float,
    f_0: float,
    f_unit: str,
    calculation_year: int,
    f_calculation_year: int,
) -> Callable[[pd.DataFrame], pd.DataFrame]:
    """
    Get an infiller which just scales one set of emissions to create the next set

    This is basically silicone's constant ratio infiller
    with smarter handling of pre-industrial levels.
    """

    def infiller(inp: pd.DataFrame) -> pd.DataFrame:
        mask = inp.index.get_level_values("variable").str.contains(leader, regex=False)
        lead_df = inp[mask]

        follow_df = (scaling_factor * (lead_df - l_0) + f_0).pix.assign(
            variable=follower, unit=f_unit
        )
        if not np.isclose(follow_df[calculation_year], f_calculation_year).all():
            raise AssertionError

        return cast(pd.DataFrame, follow_df)

    return infiller

get_pre_industrial_aware_direct_scaling_infiller #

get_pre_industrial_aware_direct_scaling_infiller(
    *,
    historical_emissions: DataFrame,
    cmip7_ghg_inversions_reporting_names: DataFrame,
    scaling_leaders: dict[str, str],
    harmonisation_year: int = 2023,
    pre_industrial_year: int = 1750,
) -> dict[str, Any]

Build pre-industrial-aware direct scaling infillers for follower/leader pairs.

This constructs scaling factors that preserve both pre-industrial baselines and harmonisation-year values when scaling follower emissions based on leader emissions.

The scaling preserves two key properties::

f_future(l_harmonisation) = f_harmonisation
f_future(l_pre_industrial) = f_pre_industrial

with linear interpolation between these anchor points.

Parameters:

Name	Type	Description	Default
`historical_emissions`	`DataFrame`	Historical emissions data with MultiIndex including 'variable' level. Must contain harmonisation-year values for all leader/follower variables.	required
`cmip7_ghg_inversions_reporting_names`	`DataFrame`	CMIP7 GHG inversion data with pre-industrial (PI) year values. Must contain PI-year values for all leader/follower variables.	required
`scaling_leaders`	`dict[str, str]`	Mapping of follower variable names to leader variable names.	required
`harmonisation_year`	`int`	Primary harmonisation reference year	`2023`
`pre_industrial_year`	`int`	Pre-industrial reference year	`1750`

Returns:

Type	Description
`dict[str, Any]`	Mapping of follower variable into direct scaling infiller callable.

Raises:

Type	Description
`AssertionError`	If multiple units found for a variable, no valid harmonisation year data, or scaling factor computation yields NaN.

Source code in src/gcages/cmip7_scenariomip/infilling.py

def get_pre_industrial_aware_direct_scaling_infiller(
    *,
    historical_emissions: pd.DataFrame,
    cmip7_ghg_inversions_reporting_names: pd.DataFrame,
    scaling_leaders: dict[str, str],
    harmonisation_year: int = 2023,
    pre_industrial_year: int = 1750,
) -> dict[str, Any]:
    """
    Build pre-industrial-aware direct scaling infillers for follower/leader pairs.

    This constructs scaling factors that preserve both pre-industrial baselines and
    harmonisation-year values when scaling follower emissions based on leader emissions.

    The scaling preserves two key properties::

        f_future(l_harmonisation) = f_harmonisation
        f_future(l_pre_industrial) = f_pre_industrial

    with linear interpolation between these anchor points.

    Parameters
    ----------
    historical_emissions
        Historical emissions data with MultiIndex including 'variable' level.
        Must contain harmonisation-year values for all leader/follower variables.

    cmip7_ghg_inversions_reporting_names
        CMIP7 GHG inversion data with pre-industrial (PI) year values.
        Must contain PI-year values for all leader/follower variables.

    scaling_leaders
        Mapping of follower variable names to leader variable names.

    harmonisation_year
        Primary harmonisation reference year

    pre_industrial_year
        Pre-industrial reference year

    Returns
    -------
    :
        Mapping of follower variable into direct scaling infiller callable.

    Raises
    ------
    AssertionError
        If multiple units found for a variable, no valid harmonisation year data,
        or scaling factor computation yields NaN.
    """
    infillers_scaling = {}

    for follower, leader in scaling_leaders.items():
        lead_mask = historical_emissions.index.get_level_values(
            "variable"
        ).str.contains(leader, regex=False)
        lead_df = historical_emissions[lead_mask]
        follow_mask = historical_emissions.index.get_level_values(
            "variable"
        ).str.contains(follower, regex=False)
        follow_df = historical_emissions[follow_mask]

        lead_mask = cmip7_ghg_inversions_reporting_names.index.get_level_values(
            "variable"
        ).str.contains(leader, regex=False)
        lead_cmip7_inverse_df = cmip7_ghg_inversions_reporting_names[lead_mask]
        follow_mask = cmip7_ghg_inversions_reporting_names.index.get_level_values(
            "variable"
        ).str.contains(follower, regex=False)
        follow_cmip7_inverse_df = cmip7_ghg_inversions_reporting_names[follow_mask]

        f_unit = follow_df.index.get_level_values("unit").unique()
        if len(f_unit) != 1:
            msg = f"Multiple units for {follower=}: {f_unit}"
            raise AssertionError(msg)
        f_unit_str = str(f_unit[0]).replace("-", "")

        l_unit = lead_df.index.get_level_values("unit").unique()
        if len(l_unit) != 1:
            msg = f"Multiple units for {leader=}: {l_unit}"
            raise AssertionError(msg)
        l_unit = l_unit[0].replace("-", "")

        for harmonisation_yr_use in [harmonisation_year, 2021]:
            l_harmonisation_year = (
                lead_df[harmonisation_yr_use].to_numpy().squeeze().item()
            )

            f_harmonisation_year = (
                follow_df[harmonisation_yr_use].to_numpy().squeeze().item()
            )

            if not (pd.isnull(l_harmonisation_year) or pd.isnull(f_harmonisation_year)):
                break
        else:
            msg = f"No valid harmonisation year for {follower=}/{leader=}"
            raise AssertionError(msg)

        f_0 = follow_cmip7_inverse_df[pre_industrial_year].to_numpy().squeeze().item()
        l_0 = lead_cmip7_inverse_df[pre_industrial_year].to_numpy().squeeze().item()

        scaling_factor = (f_harmonisation_year - f_0) / (l_harmonisation_year - l_0)

        if np.isnan(scaling_factor):
            msg = (
                f"NaN scaling_factor for {follower=}/{leader=}: "
                f"{f_harmonisation_year=} {l_harmonisation_year=} "
                f"{f_0=} {l_0=}"
            )
            raise AssertionError(msg)

        infillers_scaling[follower] = get_direct_scaling_infiller(
            leader=leader,
            follower=follower,
            scaling_factor=scaling_factor,
            l_0=l_0,
            f_0=f_0,
            f_unit=f_unit_str,
            calculation_year=harmonisation_yr_use,
            f_calculation_year=f_harmonisation_year,
        )

    return infillers_scaling

get_silicone_based_infiller #

get_silicone_based_infiller(
    infilling_db: DataFrame,
    follower_variable: str,
    lead_variables: list[str],
    silicone_db_cruncher: _DatabaseCruncher,
    derive_relationship_kwargs: dict[str, Any]
    | None = None,
) -> Callable[[DataFrame], DataFrame]

Get an infiller based on silicone

Parameters:

Name	Type	Description	Default
`infilling_db`	`DataFrame`	Infilling database	required
`follower_variable`	`str`	The variable to infill	required
`lead_variables`	`list[str]`	The variables used to infill `follower_variable`	required
`silicone_db_cruncher`	`_DatabaseCruncher`	Silicone cruncher to use	required
`derive_relationship_kwargs`	`dict[str, Any] \| None`	Passed to `silicone_db_cruncher.derive_relationship`	`None`

Returns:

Type	Description
`Callable[[DataFrame], DataFrame]`	Function which can be used to infill `follower_variable` in scenarios

Source code in src/gcages/cmip7_scenariomip/infilling.py

def get_silicone_based_infiller(  # type: ignore # silicone has no type hints
    infilling_db: pd.DataFrame,
    follower_variable: str,
    lead_variables: list[str],
    silicone_db_cruncher: silicone.database_crunchers.base._DatabaseCruncher,
    derive_relationship_kwargs: dict[str, Any] | None = None,
) -> Callable[[pd.DataFrame], pd.DataFrame]:
    """
    Get an infiller based on silicone

    Parameters
    ----------
    infilling_db
        Infilling database

    follower_variable
        The variable to infill

    lead_variables
        The variables used to infill `follower_variable`

    silicone_db_cruncher
        Silicone cruncher to use

    derive_relationship_kwargs
        Passed to `silicone_db_cruncher.derive_relationship`

    Returns
    -------
    :
        Function which can be used to infill `follower_variable` in scenarios
    """
    try:
        import pyam  # type: ignore# noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "get_silicone_based_infiller", requirement="pyam"
        ) from exc

    if derive_relationship_kwargs is None:
        derive_relationship_kwargs = {}

    silicone_infiller = silicone_db_cruncher(
        pyam.IamDataFrame(infilling_db)
    ).derive_relationship(
        variable_follower=follower_variable,
        variable_leaders=lead_variables,
        **derive_relationship_kwargs,
    )

    def res(inp: pd.DataFrame) -> pd.DataFrame:
        res_h = silicone_infiller(pyam.IamDataFrame(inp)).timeseries()

        return cast(pd.DataFrame, res_h)

    return res

infill #

infill(
    indf: DataFrame,
    infillers: Mapping[
        str, Callable[[DataFrame], DataFrame]
    ],
) -> DataFrame | None

Infill an emissions scenario using the provided infillers.

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Emissions scenario to infill	required
`infillers`	`Mapping[str, Callable[[DataFrame], DataFrame]]`	Infillers to use Each key is the gas the infiller can infill. Each value is the function which does the infilling.	required

Returns:

Type	Description
`DataFrame \| None`	Infilled timeseries. If nothing was infilled, `None` is returned

Source code in src/gcages/cmip7_scenariomip/infilling.py

def infill(
    indf: pd.DataFrame, infillers: Mapping[str, Callable[[pd.DataFrame], pd.DataFrame]]
) -> pd.DataFrame | None:
    """
    Infill an emissions scenario using the provided infillers.

    Parameters
    ----------
    indf
        Emissions scenario to infill

    infillers
        Infillers to use

        Each key is the gas the infiller can infill.
        Each value is the function which does the infilling.

    Returns
    -------
    :
        Infilled timeseries.

        If nothing was infilled, `None` is returned
    """
    try:
        from pandas_indexing.core import concat  # noqa: PLC0415
    except ImportError as exc:
        raise MissingOptionalDependencyError(
            "infill", requirement="pandas_indexing"
        ) from exc

    infilled_l = []
    for variable in infillers:
        for (model, scenario), msdf in indf.groupby(["model", "scenario"]):
            if variable not in msdf.index.get_level_values("variable"):
                infilled_l.append(infillers[variable](msdf))

    if not infilled_l:
        return None

    return concat(infilled_l)

load_cmip7_scenariomip_ghg_inversions #

load_cmip7_scenariomip_ghg_inversions(
    filepath: Path,
) -> DataFrame

Load

Parameters:

Name	Type	Description	Default
`filepath`	`Path`	Path from which to load the file	required

Returns:

Type	Description
`DataFrame`	Green house gases inversion data frame

Source code in src/gcages/cmip7_scenariomip/infilling.py

def load_cmip7_scenariomip_ghg_inversions(
    filepath: Path,
) -> pd.DataFrame:
    """
    Load

    Parameters
    ----------
    filepath
        Path from which to load the file

    Returns
    -------
    :
        Green house gases inversion data frame
    """
    res = load_timeseries_csv(
        filepath,
        lower_column_names=True,
        index_columns=["model", "scenario", "region", "variable", "unit"],
        out_columns_type=int,
        out_columns_name="year",
    )

    return res

load_cmip7_scenariomip_infilling_db #

load_cmip7_scenariomip_infilling_db(
    filepath: Path, check_hash: bool = True
) -> DataFrame

Load infilling database for CMIP7 ScenarioMIP harmonisation.

Parameters:

Name	Type	Description	Default
`filepath`	`Path`	Path from which to load the file	required
`check_hash`	`bool`	Check file hash	`True`

Returns:

Type	Description
`DataFrame`	Infilled emissions

Raises:

Type	Description
`AssertionError`	`filepath` does not have the expected hash. We expect to be reading the file from https://zenodo.org/records/17844114/files/infiling-db_202512021030_202512071232_202511040855_202511040855.csv?download=1

Source code in src/gcages/cmip7_scenariomip/infilling.py

def load_cmip7_scenariomip_infilling_db(
    filepath: Path, check_hash: bool = True
) -> pd.DataFrame:
    """
    Load infilling database for CMIP7 ScenarioMIP harmonisation.

    Parameters
    ----------
    filepath
        Path from which to load the file

    check_hash
        Check file hash

    Returns
    -------
    :
        Infilled emissions

    Raises
    ------
    AssertionError
        `filepath` does not have the expected hash.

        We expect to be reading the file from
        https://zenodo.org/records/17844114/files/infiling-db_202512021030_202512071232_202511040855_202511040855.csv?download=1
    """
    if check_hash:
        fp_hash = get_file_hash(filepath, algorithm="md5")
        if platform.system() in "Windows":
            fp_hash_exp = "3a55491330c0160a0c0abc011766559a"
        else:
            fp_hash_exp = "3a55491330c0160a0c0abc011766559a"

        if fp_hash != fp_hash_exp:
            msg = (
                f"The md5 hash of {filepath} is {fp_hash}. "
                f"This does not match what we expect {fp_hash_exp=}."
            )
            raise AssertionError(msg)

    res = load_timeseries_csv(
        filepath,
        lower_column_names=True,
        index_columns=["model", "scenario", "region", "variable", "unit"],
        out_columns_type=int,
    )
    res.columns.name = "year"

    return res

gcages.cmip7_scenariomip.infilling#

COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES module-attribute #

complete_index_gcages_names module-attribute #

CMIP7ScenarioMIPInfiller #

cmip7_ghg_inversions instance-attribute #

harmonisation_year class-attribute instance-attribute #

historical_emissions instance-attribute #

infilling_db instance-attribute #

pre_industrial_year class-attribute instance-attribute #

run_checks class-attribute instance-attribute #

ur class-attribute instance-attribute #

__call__ #

from_cmip7_scenariomip_config classmethod #

get_complete #

get_direct_copy_infiller #

get_direct_scaling_infiller #

get_pre_industrial_aware_direct_scaling_infiller #

get_silicone_based_infiller #

infill #

load_cmip7_scenariomip_ghg_inversions #

load_cmip7_scenariomip_infilling_db #

COMPLETE_EMISSIONS_INPUT_VARIABLES_GCAGES `module-attribute` #

complete_index_gcages_names `module-attribute` #

cmip7_ghg_inversions `instance-attribute` #

harmonisation_year `class-attribute` `instance-attribute` #

historical_emissions `instance-attribute` #

infilling_db `instance-attribute` #

pre_industrial_year `class-attribute` `instance-attribute` #

run_checks `class-attribute` `instance-attribute` #

ur `class-attribute` `instance-attribute` #

call #

from_cmip7_scenariomip_config `classmethod` #