gcages.cmip7_scenariomip.pre_processing#

Pre-processing part of the workflow

This is extremely fiddly because of the way the data is reported, which is frankly, a mess because of how it blends data that is a regional-sum with data that has regional detail and how the variable name is a blend of different bits of information (species, sectoral information etc.) with no easy way to decode what is what using a machine (you have to hardcode lots of edge cases e.g. Emissions|CO2|Energy is "Emissions", then the species then the sector but Emissions|HFC|HFC245 is "Emissions" then the "HFC" string then the species, i.e. completely different information is provided after each "|").

This module implements the logic for this processing. The complexity comes in the re-aggregation (gcages.cmip7_scenariomip.pre_processing.reaggregation), which has to handle converting from whatever is reported (and a huge amount of different possibilities have to be supported) to the sectors used for gridding. From there, the workflow can be standardised (as is done in pre_processor.do_pre_processing).

Modules:

Name	Description
`pre_processor`	Definition of the pre-processor class
`reaggregation`	Reaggregation of timeseries from raw reporting to sectors needed for gridding

Classes:

Name	Description
`CMIP7ScenarioMIPPreProcessingResult`	Result of pre-processing with CMIP7ScenarioMIPPreProcessor
`CMIP7ScenarioMIPPreProcessor`	Pre-processor for CMIP7's ScenarioMIP
`ReaggregatorBasic`	Reaggregator that follows this module's logic
`ReaggregatorLike`	Interface that can be used for re-aggregation
`ToCompleteResult`	Result of calling `to_complete` on a reaggregator

CMIP7ScenarioMIPPreProcessingResult #

Result of pre-processing with CMIP7ScenarioMIPPreProcessor

This has more components than normal, because we need to support both the 'normal' global path and harmonising at the region-sector level.

Attributes:

Name	Type	Description
`assumed_zero_emissions`	`DataFrame \| None`	Emissions that were asssumed to be zero during the processing
`global_workflow_emissions`	`DataFrame`	Emissions that can be used with the 'normal' global workflow
`global_workflow_emissions_raw_names`	`DataFrame`	Emissions consistent with those that can be used with the 'normal' global workflow
`gridding_workflow_emissions`	`DataFrame`	Emissions that can be used with the gridding workflow

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

@define
class CMIP7ScenarioMIPPreProcessingResult:
    """
    Result of pre-processing with [CMIP7ScenarioMIPPreProcessor][(m).]

    This has more components than normal,
    because we need to support both the 'normal' global path
    and harmonising at the region-sector level.
    """

    assumed_zero_emissions: pd.DataFrame | None
    """
    Emissions that were asssumed to be zero during the processing
    """

    gridding_workflow_emissions: pd.DataFrame
    """
    Emissions that can be used with the gridding workflow
    """

    global_workflow_emissions: pd.DataFrame
    """
    Emissions that can be used with the 'normal' global workflow
    """

    global_workflow_emissions_raw_names: pd.DataFrame
    """
    Emissions consistent with those that can be used with the 'normal' global workflow

    The difference is that these are reported with CMIP7 ScenarioMIP naming,
    which isn't compatible with our SCM runners (for example),
    so is probably not what you want to use,
    but perhaps helpful for plotting and direct comparisons.
    """

assumed_zero_emissions `instance-attribute` #

assumed_zero_emissions: DataFrame | None

Emissions that were asssumed to be zero during the processing

global_workflow_emissions `instance-attribute` #

global_workflow_emissions: DataFrame

Emissions that can be used with the 'normal' global workflow

global_workflow_emissions_raw_names `instance-attribute` #

global_workflow_emissions_raw_names: DataFrame

Emissions consistent with those that can be used with the 'normal' global workflow

The difference is that these are reported with CMIP7 ScenarioMIP naming, which isn't compatible with our SCM runners (for example), so is probably not what you want to use, but perhaps helpful for plotting and direct comparisons.

gridding_workflow_emissions `instance-attribute` #

gridding_workflow_emissions: DataFrame

Emissions that can be used with the gridding workflow

CMIP7ScenarioMIPPreProcessor #

Pre-processor for CMIP7's ScenarioMIP

For more details of the logic, see gcages.cmip7_scenariomip.pre_processing.

Methods:

Name	Description
`__call__`	Pre-process

Attributes:

Name	Type	Description
`co2_biosphere_sectors`	`tuple[str, ...]`	Gridding sectors that are assumed to come from the biosphere CO2 reservoir
`co2_fossil_sectors`	`tuple[str, ...]`	Gridding sectors that are assumed to come from the fossil CO2 reservoir
`co2_name`	`str`	Name used for CO2 in variable names
`level_separator`	`str`	The separator between levels in variable names
`n_processes`	`int \| None`	Number of processes to use for parallel processing.
`progress`	`bool`	Should progress bars be shown?
`reaggregator`	`ReaggregatorLike \| None`	Re-aggregator to use when converting raw data to gridding sectors
`run_checks`	`bool`	If `True`, run checks on both input and output data
`table`	`str`	The value used for the top level of variable names
`world_gridding_sectors`	`tuple[str, ...]`	Sectors that are only used for gridding at the world (i.e. regional sum) level

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

@define
class CMIP7ScenarioMIPPreProcessor:
    """
    Pre-processor for CMIP7's ScenarioMIP

    For more details of the logic, see [gcages.cmip7_scenariomip.pre_processing][].
    """

    reaggregator: ReaggregatorLike | None = None
    """
    Re-aggregator to use when converting raw data to gridding sectors

    If not supplied, we guess the re-aggregator during processing
    """

    run_checks: bool = True
    """
    If `True`, run checks on both input and output data

    If you are sure about your workflow,
    you can disable the checks to speed things up
    (but we don't recommend this unless you really
    are confident about what you're doing).
    """

    world_gridding_sectors: tuple[str, ...] = ("Aircraft", "International Shipping")
    """
    Sectors that are only used for gridding at the world (i.e. regional sum) level
    """

    co2_fossil_sectors: tuple[str, ...] = CO2_FOSSIL_SECTORS_GRIDDING
    """
    Gridding sectors that are assumed to come from the fossil CO2 reservoir
    """

    co2_biosphere_sectors: tuple[str, ...] = CO2_BIOSPHERE_SECTORS_GRIDDING
    """
    Gridding sectors that are assumed to come from the biosphere CO2 reservoir
    """

    co2_name: str = "CO2"
    """
    Name used for CO2 in variable names
    """

    table: str = "Emissions"
    """
    The value used for the top level of variable names
    """

    level_separator: str = "|"
    """
    The separator between levels in variable names
    """

    progress: bool = True
    """
    Should progress bars be shown?
    """

    n_processes: int | None = multiprocessing.cpu_count()
    """
    Number of processes to use for parallel processing.

    Set to `None` to process in serial.
    """

    def __call__(
        self, in_emissions: pd.DataFrame
    ) -> CMIP7ScenarioMIPPreProcessingResult:
        """
        Pre-process

        Parameters
        ----------
        in_emissions
            Emissions to pre-process

        Returns
        -------
        :
            Pre-processed emissions
        """
        if self.run_checks:
            assert_index_is_multiindex(in_emissions)
            assert_data_is_all_numeric(in_emissions)

            if in_emissions.columns.name != "year":
                msg = "The input emissions' column name should be 'year'"
                raise AssertionError(msg)

        res_g = apply_op_parallel_progress(
            func_to_call=do_pre_processing,
            reaggregator=self.reaggregator,
            time_name="year",
            run_checks=self.run_checks,
            world_gridding_sectors=self.world_gridding_sectors,
            table=self.table,
            level_separator=self.level_separator,
            co2_fossil_sectors=self.co2_fossil_sectors,
            co2_biosphere_sectors=self.co2_biosphere_sectors,
            co2_name=self.co2_name,
            iterable_input=(
                gdf for _, gdf in in_emissions.groupby(["model", "scenario"])
            ),
            parallel_op_config=ParallelOpConfig.from_user_facing(
                progress=self.progress,
                max_workers=self.n_processes,
            ),
        )

        res_d = defaultdict(list)
        for res_ms in res_g:
            for k, v in asdict(res_ms).items():
                if v is not None:
                    res_d[k].append(v)

        result_initialiser = {k: pd.concat(v) for k, v in res_d.items()}
        if "assumed_zero_emissions" not in result_initialiser:
            result_initialiser["assumed_zero_emissions"] = None

        res = CMIP7ScenarioMIPPreProcessingResult(**result_initialiser)

        return res

co2_biosphere_sectors `class-attribute` `instance-attribute` #

co2_biosphere_sectors: tuple[str, ...] = (
    CO2_BIOSPHERE_SECTORS_GRIDDING
)

Gridding sectors that are assumed to come from the biosphere CO2 reservoir

co2_fossil_sectors `class-attribute` `instance-attribute` #

co2_fossil_sectors: tuple[str, ...] = (
    CO2_FOSSIL_SECTORS_GRIDDING
)

Gridding sectors that are assumed to come from the fossil CO2 reservoir

co2_name `class-attribute` `instance-attribute` #

co2_name: str = 'CO2'

Name used for CO2 in variable names

level_separator `class-attribute` `instance-attribute` #

level_separator: str = '|'

The separator between levels in variable names

n_processes `class-attribute` `instance-attribute` #

n_processes: int | None = cpu_count()

Number of processes to use for parallel processing.

Set to None to process in serial.

progress `class-attribute` `instance-attribute` #

progress: bool = True

Should progress bars be shown?

reaggregator `class-attribute` `instance-attribute` #

reaggregator: ReaggregatorLike | None = None

Re-aggregator to use when converting raw data to gridding sectors

If not supplied, we guess the re-aggregator during processing

run_checks `class-attribute` `instance-attribute` #

run_checks: bool = True

If True, run checks on both input and output data

If you are sure about your workflow, you can disable the checks to speed things up (but we don't recommend this unless you really are confident about what you're doing).

table `class-attribute` `instance-attribute` #

table: str = 'Emissions'

The value used for the top level of variable names

world_gridding_sectors `class-attribute` `instance-attribute` #

world_gridding_sectors: tuple[str, ...] = (
    "Aircraft",
    "International Shipping",
)

Sectors that are only used for gridding at the world (i.e. regional sum) level

call #

__call__(
    in_emissions: DataFrame,
) -> CMIP7ScenarioMIPPreProcessingResult

Pre-process

Parameters:

Name	Type	Description	Default
`in_emissions`	`DataFrame`	Emissions to pre-process	required

Returns:

Type	Description
`CMIP7ScenarioMIPPreProcessingResult`	Pre-processed emissions

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def __call__(
    self, in_emissions: pd.DataFrame
) -> CMIP7ScenarioMIPPreProcessingResult:
    """
    Pre-process

    Parameters
    ----------
    in_emissions
        Emissions to pre-process

    Returns
    -------
    :
        Pre-processed emissions
    """
    if self.run_checks:
        assert_index_is_multiindex(in_emissions)
        assert_data_is_all_numeric(in_emissions)

        if in_emissions.columns.name != "year":
            msg = "The input emissions' column name should be 'year'"
            raise AssertionError(msg)

    res_g = apply_op_parallel_progress(
        func_to_call=do_pre_processing,
        reaggregator=self.reaggregator,
        time_name="year",
        run_checks=self.run_checks,
        world_gridding_sectors=self.world_gridding_sectors,
        table=self.table,
        level_separator=self.level_separator,
        co2_fossil_sectors=self.co2_fossil_sectors,
        co2_biosphere_sectors=self.co2_biosphere_sectors,
        co2_name=self.co2_name,
        iterable_input=(
            gdf for _, gdf in in_emissions.groupby(["model", "scenario"])
        ),
        parallel_op_config=ParallelOpConfig.from_user_facing(
            progress=self.progress,
            max_workers=self.n_processes,
        ),
    )

    res_d = defaultdict(list)
    for res_ms in res_g:
        for k, v in asdict(res_ms).items():
            if v is not None:
                res_d[k].append(v)

    result_initialiser = {k: pd.concat(v) for k, v in res_d.items()}
    if "assumed_zero_emissions" not in result_initialiser:
        result_initialiser["assumed_zero_emissions"] = None

    res = CMIP7ScenarioMIPPreProcessingResult(**result_initialiser)

    return res

ReaggregatorBasic #

Reaggregator that follows this module's logic

Methods:

Name	Description
`assert_has_all_required_timeseries`	Assert that the data has all the required timeseries
`assert_is_internally_consistent`	Assert that the data is internally consistent
`default_tols_internal_consistency`	Get default tolerances for internal consistency checks
`get_internal_consistency_checking_index`	Get the index which selects only data relevant for checking internal consistency
`to_complete`	Convert the raw data to complete data
`to_gridding_sectors`	Re-aggregate data to the sectors used for gridding

Attributes:

Name	Type	Description
`internal_consistency_tolerances`	`Mapping[str, InternalConsistencyCheckingTolerance]`	Tolerances to apply when checking the internal consistency of the data
`model_regions`	`tuple[str, ...]`	Model regions to use while reaggregating
`region_level`	`str`	Region level in the data index
`unit_level`	`str`	Unit level in the data index
`variable_level`	`str`	Variable level in the data index
`world_region`	`str`	The value used when the data represents the sum over all regions

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

@define
class ReaggregatorBasic:
    """
    Reaggregator that follows this module's logic
    """

    model_regions: tuple[str, ...]
    """Model regions to use while reaggregating"""

    region_level: str = "region"
    """Region level in the data index"""

    unit_level: str = "unit"
    """Unit level in the data index"""

    variable_level: str = "variable"
    """Variable level in the data index"""

    world_region: str = "World"
    """
    The value used when the data represents the sum over all regions

    (Having a value for this is odd,
    there should really just be no region level when data is the sum,
    but this is the data format used so we have to follow this convention.)
    """

    internal_consistency_tolerances: Mapping[
        str, InternalConsistencyCheckingTolerance
    ] = field()
    """
    Tolerances to apply when checking the internal consistency of the data
    """

    @internal_consistency_tolerances.default
    def default_tols_internal_consistency(
        self,
    ) -> Mapping[str, InternalConsistencyCheckingTolerance]:
        """
        Get default tolerances for internal consistency checks
        """
        return get_default_internal_conistency_checking_tolerances()

    def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data has all the required timeseries

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        NotCompleteError
            `indf` is not complete
        """
        assert_has_all_required_timeseries(
            indf,
            model_regions=self.model_regions,
            world_region=self.world_region,
            region_level=self.region_level,
            variable_level=self.variable_level,
        )

    def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data is internally consistent

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        InternalConsistencyError
            The data is not internally consistent
        """
        assert_is_internally_consistent(
            indf,
            model_regions=self.model_regions,
            tolerances=self.internal_consistency_tolerances,
            world_region=self.world_region,
            region_level=self.region_level,
            unit_level=self.unit_level,
            variable_level=self.variable_level,
        )

    def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
        """
        Get the index which selects only data relevant for checking internal consistency

        Returns
        -------
        :
            Internal consistency checking index
        """
        return get_internal_consistency_checking_index(
            model_regions=self.model_regions,
            world_region=self.world_region,
            region_level=self.region_level,
            variable_level=self.variable_level,
        )

    def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
        """
        Convert the raw data to complete data

        Parameters
        ----------
        raw
            Raw data

        Returns
        -------
        :
            To complete result
        """
        return to_complete(
            indf=raw,
            model_regions=self.model_regions,
            unit_level=self.unit_level,
            variable_level=self.variable_level,
            region_level=self.region_level,
            world_region=self.world_region,
        )

    def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
        """
        Re-aggregate data to the sectors used for gridding

        Parameters
        ----------
        indf
            Data to re-aggregate

        Returns
        -------
        :
            Data re-aggregated to the gridding sectors
        """
        return to_gridding_sectors(
            indf=indf, region_level=self.region_level, world_region=self.world_region
        )

internal_consistency_tolerances `class-attribute` `instance-attribute` #

internal_consistency_tolerances: Mapping[
    str, InternalConsistencyCheckingTolerance
] = field()

Tolerances to apply when checking the internal consistency of the data

model_regions `instance-attribute` #

model_regions: tuple[str, ...]

Model regions to use while reaggregating

region_level `class-attribute` `instance-attribute` #

region_level: str = 'region'

Region level in the data index

unit_level `class-attribute` `instance-attribute` #

unit_level: str = 'unit'

Unit level in the data index

variable_level `class-attribute` `instance-attribute` #

variable_level: str = 'variable'

Variable level in the data index

world_region `class-attribute` `instance-attribute` #

world_region: str = 'World'

The value used when the data represents the sum over all regions

(Having a value for this is odd, there should really just be no region level when data is the sum, but this is the data format used so we have to follow this convention.)

assert_has_all_required_timeseries #

assert_has_all_required_timeseries(indf: DataFrame) -> None

Assert that the data has all the required timeseries

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to check	required

Raises:

Type	Description
`NotCompleteError`	`indf` is not complete

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data has all the required timeseries

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    NotCompleteError
        `indf` is not complete
    """
    assert_has_all_required_timeseries(
        indf,
        model_regions=self.model_regions,
        world_region=self.world_region,
        region_level=self.region_level,
        variable_level=self.variable_level,
    )

assert_is_internally_consistent #

assert_is_internally_consistent(indf: DataFrame) -> None

Assert that the data is internally consistent

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to check	required

Raises:

Type	Description
`InternalConsistencyError`	The data is not internally consistent

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data is internally consistent

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    InternalConsistencyError
        The data is not internally consistent
    """
    assert_is_internally_consistent(
        indf,
        model_regions=self.model_regions,
        tolerances=self.internal_consistency_tolerances,
        world_region=self.world_region,
        region_level=self.region_level,
        unit_level=self.unit_level,
        variable_level=self.variable_level,
    )

default_tols_internal_consistency #

default_tols_internal_consistency() -> Mapping[
    str, InternalConsistencyCheckingTolerance
]

Get default tolerances for internal consistency checks

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

@internal_consistency_tolerances.default
def default_tols_internal_consistency(
    self,
) -> Mapping[str, InternalConsistencyCheckingTolerance]:
    """
    Get default tolerances for internal consistency checks
    """
    return get_default_internal_conistency_checking_tolerances()

get_internal_consistency_checking_index #

get_internal_consistency_checking_index() -> MultiIndex

Get the index which selects only data relevant for checking internal consistency

Returns:

Type	Description
`MultiIndex`	Internal consistency checking index

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
    """
    Get the index which selects only data relevant for checking internal consistency

    Returns
    -------
    :
        Internal consistency checking index
    """
    return get_internal_consistency_checking_index(
        model_regions=self.model_regions,
        world_region=self.world_region,
        region_level=self.region_level,
        variable_level=self.variable_level,
    )

to_complete #

to_complete(raw: DataFrame) -> ToCompleteResult

Convert the raw data to complete data

Parameters:

Name	Type	Description	Default
`raw`	`DataFrame`	Raw data	required

Returns:

Type	Description
`ToCompleteResult`	To complete result

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
    """
    Convert the raw data to complete data

    Parameters
    ----------
    raw
        Raw data

    Returns
    -------
    :
        To complete result
    """
    return to_complete(
        indf=raw,
        model_regions=self.model_regions,
        unit_level=self.unit_level,
        variable_level=self.variable_level,
        region_level=self.region_level,
        world_region=self.world_region,
    )

to_gridding_sectors #

to_gridding_sectors(indf: DataFrame) -> DataFrame

Re-aggregate data to the sectors used for gridding

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to re-aggregate	required

Returns:

Type	Description
`DataFrame`	Data re-aggregated to the gridding sectors

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py

def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
    """
    Re-aggregate data to the sectors used for gridding

    Parameters
    ----------
    indf
        Data to re-aggregate

    Returns
    -------
    :
        Data re-aggregated to the gridding sectors
    """
    return to_gridding_sectors(
        indf=indf, region_level=self.region_level, world_region=self.world_region
    )

ReaggregatorLike #

Bases: Protocol

Interface that can be used for re-aggregation

Methods:

Name	Description
`assert_has_all_required_timeseries`	Assert that the data has all the required timeseries
`assert_is_internally_consistent`	Assert that the data is internally consistent
`get_internal_consistency_checking_index`	Get the index which selects only data relevant for checking internal consistency
`to_complete`	Convert the raw data to complete data
`to_gridding_sectors`	Re-aggregate data to the sectors used for gridding

Attributes:

Name	Type	Description
`model_regions`	`tuple[str, ...]`	Model regions to use while reaggregating
`region_level`	`str`	Region level in the data index
`unit_level`	`str`	Unit level in the data index
`variable_level`	`str`	Variable level in the data index
`world_region`	`str`	The value used when the data represents the sum over all regions

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

class ReaggregatorLike(Protocol):
    """
    Interface that can be used for re-aggregation
    """

    model_regions: tuple[str, ...]
    """Model regions to use while reaggregating"""

    region_level: str
    """Region level in the data index"""

    unit_level: str
    """Unit level in the data index"""

    variable_level: str
    """Variable level in the data index"""

    world_region: str
    """
    The value used when the data represents the sum over all regions

    (Having a value for this is odd,
    there should really just be no region level when data is the sum,
    but this is the data format used so we have to follow this convention.)
    """

    def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data has all the required timeseries

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        NotCompleteError
            `indf` is not complete
        """

    def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data is internally consistent

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        InternalConsistencyError
            The data is not internally consistent
        """

    def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
        """
        Get the index which selects only data relevant for checking internal consistency

        Returns
        -------
        :
            Internal consistency checking index
        """

    def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
        """
        Convert the raw data to complete data

        Parameters
        ----------
        raw
            Raw data

        Returns
        -------
        :
            To complete result
        """

    def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
        """
        Re-aggregate data to the sectors used for gridding

        Parameters
        ----------
        indf
            Data to re-aggregate

        Returns
        -------
        :
            Data re-aggregated to the gridding sectors
        """

model_regions `instance-attribute` #

model_regions: tuple[str, ...]

Model regions to use while reaggregating

region_level `instance-attribute` #

region_level: str

Region level in the data index

unit_level `instance-attribute` #

unit_level: str

Unit level in the data index

variable_level `instance-attribute` #

variable_level: str

Variable level in the data index

world_region `instance-attribute` #

world_region: str

The value used when the data represents the sum over all regions

(Having a value for this is odd, there should really just be no region level when data is the sum, but this is the data format used so we have to follow this convention.)

assert_has_all_required_timeseries #

assert_has_all_required_timeseries(indf: DataFrame) -> None

Assert that the data has all the required timeseries

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to check	required

Raises:

Type	Description
`NotCompleteError`	`indf` is not complete

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data has all the required timeseries

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    NotCompleteError
        `indf` is not complete
    """

assert_is_internally_consistent #

assert_is_internally_consistent(indf: DataFrame) -> None

Assert that the data is internally consistent

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to check	required

Raises:

Type	Description
`InternalConsistencyError`	The data is not internally consistent

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data is internally consistent

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    InternalConsistencyError
        The data is not internally consistent
    """

get_internal_consistency_checking_index #

get_internal_consistency_checking_index() -> MultiIndex

Get the index which selects only data relevant for checking internal consistency

Returns:

Type	Description
`MultiIndex`	Internal consistency checking index

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
    """
    Get the index which selects only data relevant for checking internal consistency

    Returns
    -------
    :
        Internal consistency checking index
    """

to_complete #

to_complete(raw: DataFrame) -> ToCompleteResult

Convert the raw data to complete data

Parameters:

Name	Type	Description	Default
`raw`	`DataFrame`	Raw data	required

Returns:

Type	Description
`ToCompleteResult`	To complete result

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
    """
    Convert the raw data to complete data

    Parameters
    ----------
    raw
        Raw data

    Returns
    -------
    :
        To complete result
    """

to_gridding_sectors #

to_gridding_sectors(indf: DataFrame) -> DataFrame

Re-aggregate data to the sectors used for gridding

Parameters:

Name	Type	Description	Default
`indf`	`DataFrame`	Data to re-aggregate	required

Returns:

Type	Description
`DataFrame`	Data re-aggregated to the gridding sectors

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py

def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
    """
    Re-aggregate data to the sectors used for gridding

    Parameters
    ----------
    indf
        Data to re-aggregate

    Returns
    -------
    :
        Data re-aggregated to the gridding sectors
    """

ToCompleteResult #

Result of calling to_complete on a reaggregator

Attributes:

Name	Type	Description
`assumed_zero`	`DataFrame \| None`	The timeseries that were assumed to be zero to make `self.complete`
`complete`	`DataFrame`	Complete pd.DataFrame

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/common.py

@define
class ToCompleteResult:
    """
    Result of calling `to_complete` on a reaggregator
    """

    complete: pd.DataFrame
    """Complete [pd.DataFrame][pandas.DataFrame]"""

    assumed_zero: pd.DataFrame | None
    """
    The timeseries that were assumed to be zero to make `self.complete`

    If `None`, no timeseries were assumed to be zero.
    """

assumed_zero `instance-attribute` #

assumed_zero: DataFrame | None

The timeseries that were assumed to be zero to make self.complete

If None, no timeseries were assumed to be zero.

complete `instance-attribute` #

complete: DataFrame

Complete pd.DataFrame

gcages.cmip7_scenariomip.pre_processing#

CMIP7ScenarioMIPPreProcessingResult #

assumed_zero_emissions instance-attribute #

global_workflow_emissions instance-attribute #

global_workflow_emissions_raw_names instance-attribute #

gridding_workflow_emissions instance-attribute #

CMIP7ScenarioMIPPreProcessor #

co2_biosphere_sectors class-attribute instance-attribute #

co2_fossil_sectors class-attribute instance-attribute #

co2_name class-attribute instance-attribute #

level_separator class-attribute instance-attribute #

n_processes class-attribute instance-attribute #

progress class-attribute instance-attribute #

reaggregator class-attribute instance-attribute #

run_checks class-attribute instance-attribute #

table class-attribute instance-attribute #

world_gridding_sectors class-attribute instance-attribute #

__call__ #

ReaggregatorBasic #

internal_consistency_tolerances class-attribute instance-attribute #

model_regions instance-attribute #

region_level class-attribute instance-attribute #

unit_level class-attribute instance-attribute #

variable_level class-attribute instance-attribute #

world_region class-attribute instance-attribute #

assert_has_all_required_timeseries #

assert_is_internally_consistent #

default_tols_internal_consistency #

get_internal_consistency_checking_index #

to_complete #

to_gridding_sectors #

ReaggregatorLike #

model_regions instance-attribute #

region_level instance-attribute #

unit_level instance-attribute #

variable_level instance-attribute #

world_region instance-attribute #

assert_has_all_required_timeseries #

assert_is_internally_consistent #

get_internal_consistency_checking_index #

to_complete #

to_gridding_sectors #

ToCompleteResult #

assumed_zero instance-attribute #

complete instance-attribute #

assumed_zero_emissions `instance-attribute` #

global_workflow_emissions `instance-attribute` #

global_workflow_emissions_raw_names `instance-attribute` #

gridding_workflow_emissions `instance-attribute` #

co2_biosphere_sectors `class-attribute` `instance-attribute` #

co2_fossil_sectors `class-attribute` `instance-attribute` #

co2_name `class-attribute` `instance-attribute` #

level_separator `class-attribute` `instance-attribute` #

n_processes `class-attribute` `instance-attribute` #

progress `class-attribute` `instance-attribute` #

reaggregator `class-attribute` `instance-attribute` #

run_checks `class-attribute` `instance-attribute` #

table `class-attribute` `instance-attribute` #

world_gridding_sectors `class-attribute` `instance-attribute` #

call #

internal_consistency_tolerances `class-attribute` `instance-attribute` #

model_regions `instance-attribute` #

region_level `class-attribute` `instance-attribute` #

unit_level `class-attribute` `instance-attribute` #

variable_level `class-attribute` `instance-attribute` #

world_region `class-attribute` `instance-attribute` #

model_regions `instance-attribute` #

region_level `instance-attribute` #

unit_level `instance-attribute` #

variable_level `instance-attribute` #

world_region `instance-attribute` #

assumed_zero `instance-attribute` #

complete `instance-attribute` #