Skip to content

gcages.cmip7_scenariomip.pre_processing#

Pre-processing part of the workflow

This is extremely fiddly because of the way the data is reported, which is frankly, a mess because of how it blends data that is a regional-sum with data that has regional detail and how the variable name is a blend of different bits of information (species, sectoral information etc.) with no easy way to decode what is what using a machine (you have to hardcode lots of edge cases e.g. Emissions|CO2|Energy is "Emissions", then the species then the sector but Emissions|HFC|HFC245 is "Emissions" then the "HFC" string then the species, i.e. completely different information is provided after each "|").

This module implements the logic for this processing. The complexity comes in the re-aggregation (gcages.cmip7_scenariomip.pre_processing.reaggregation), which has to handle converting from whatever is reported (and a huge amount of different possibilities have to be supported) to the sectors used for gridding. From there, the workflow can be standardised (as is done in pre_processor.do_pre_processing).

Modules:

Name Description
pre_processor

Definition of the pre-processor class

reaggregation

Reaggregation of timeseries from raw reporting to sectors needed for gridding

Classes:

Name Description
CMIP7ScenarioMIPPreProcessingResult

Result of pre-processing with CMIP7ScenarioMIPPreProcessor

CMIP7ScenarioMIPPreProcessor

Pre-processor for CMIP7's ScenarioMIP

ReaggregatorBasic

Reaggregator that follows this module's logic

ReaggregatorLike

Interface that can be used for re-aggregation

ToCompleteResult

Result of calling to_complete on a reaggregator

CMIP7ScenarioMIPPreProcessingResult #

Result of pre-processing with CMIP7ScenarioMIPPreProcessor

This has more components than normal, because we need to support both the 'normal' global path and harmonising at the region-sector level.

Attributes:

Name Type Description
assumed_zero_emissions DataFrame | None

Emissions that were asssumed to be zero during the processing

global_workflow_emissions DataFrame

Emissions that can be used with the 'normal' global workflow

global_workflow_emissions_raw_names DataFrame

Emissions consistent with those that can be used with the 'normal' global workflow

gridding_workflow_emissions DataFrame

Emissions that can be used with the gridding workflow

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
@define
class CMIP7ScenarioMIPPreProcessingResult:
    """
    Result of pre-processing with [CMIP7ScenarioMIPPreProcessor][(m).]

    This has more components than normal,
    because we need to support both the 'normal' global path
    and harmonising at the region-sector level.
    """

    assumed_zero_emissions: pd.DataFrame | None
    """
    Emissions that were asssumed to be zero during the processing
    """

    gridding_workflow_emissions: pd.DataFrame
    """
    Emissions that can be used with the gridding workflow
    """

    global_workflow_emissions: pd.DataFrame
    """
    Emissions that can be used with the 'normal' global workflow
    """

    global_workflow_emissions_raw_names: pd.DataFrame
    """
    Emissions consistent with those that can be used with the 'normal' global workflow

    The difference is that these are reported with CMIP7 ScenarioMIP naming,
    which isn't compatible with our SCM runners (for example),
    so is probably not what you want to use,
    but perhaps helpful for plotting and direct comparisons.
    """

assumed_zero_emissions instance-attribute #

assumed_zero_emissions: DataFrame | None

Emissions that were asssumed to be zero during the processing

global_workflow_emissions instance-attribute #

global_workflow_emissions: DataFrame

Emissions that can be used with the 'normal' global workflow

global_workflow_emissions_raw_names instance-attribute #

global_workflow_emissions_raw_names: DataFrame

Emissions consistent with those that can be used with the 'normal' global workflow

The difference is that these are reported with CMIP7 ScenarioMIP naming, which isn't compatible with our SCM runners (for example), so is probably not what you want to use, but perhaps helpful for plotting and direct comparisons.

gridding_workflow_emissions instance-attribute #

gridding_workflow_emissions: DataFrame

Emissions that can be used with the gridding workflow

CMIP7ScenarioMIPPreProcessor #

Pre-processor for CMIP7's ScenarioMIP

For more details of the logic, see gcages.cmip7_scenariomip.pre_processing.

Methods:

Name Description
__call__

Pre-process

Attributes:

Name Type Description
co2_biosphere_sectors tuple[str, ...]

Gridding sectors that are assumed to come from the biosphere CO2 reservoir

co2_fossil_sectors tuple[str, ...]

Gridding sectors that are assumed to come from the fossil CO2 reservoir

co2_name str

Name used for CO2 in variable names

level_separator str

The separator between levels in variable names

n_processes int | None

Number of processes to use for parallel processing.

progress bool

Should progress bars be shown?

reaggregator ReaggregatorLike | None

Re-aggregator to use when converting raw data to gridding sectors

run_checks bool

If True, run checks on both input and output data

table str

The value used for the top level of variable names

world_gridding_sectors tuple[str, ...]

Sectors that are only used for gridding at the world (i.e. regional sum) level

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
@define
class CMIP7ScenarioMIPPreProcessor:
    """
    Pre-processor for CMIP7's ScenarioMIP

    For more details of the logic, see [gcages.cmip7_scenariomip.pre_processing][].
    """

    reaggregator: ReaggregatorLike | None = None
    """
    Re-aggregator to use when converting raw data to gridding sectors

    If not supplied, we guess the re-aggregator during processing
    """

    run_checks: bool = True
    """
    If `True`, run checks on both input and output data

    If you are sure about your workflow,
    you can disable the checks to speed things up
    (but we don't recommend this unless you really
    are confident about what you're doing).
    """

    world_gridding_sectors: tuple[str, ...] = ("Aircraft", "International Shipping")
    """
    Sectors that are only used for gridding at the world (i.e. regional sum) level
    """

    co2_fossil_sectors: tuple[str, ...] = CO2_FOSSIL_SECTORS_GRIDDING
    """
    Gridding sectors that are assumed to come from the fossil CO2 reservoir
    """

    co2_biosphere_sectors: tuple[str, ...] = CO2_BIOSPHERE_SECTORS_GRIDDING
    """
    Gridding sectors that are assumed to come from the biosphere CO2 reservoir
    """

    co2_name: str = "CO2"
    """
    Name used for CO2 in variable names
    """

    table: str = "Emissions"
    """
    The value used for the top level of variable names
    """

    level_separator: str = "|"
    """
    The separator between levels in variable names
    """

    progress: bool = True
    """
    Should progress bars be shown?
    """

    n_processes: int | None = multiprocessing.cpu_count()
    """
    Number of processes to use for parallel processing.

    Set to `None` to process in serial.
    """

    def __call__(
        self, in_emissions: pd.DataFrame
    ) -> CMIP7ScenarioMIPPreProcessingResult:
        """
        Pre-process

        Parameters
        ----------
        in_emissions
            Emissions to pre-process

        Returns
        -------
        :
            Pre-processed emissions
        """
        if self.run_checks:
            assert_index_is_multiindex(in_emissions)
            assert_data_is_all_numeric(in_emissions)

            if in_emissions.columns.name != "year":
                msg = "The input emissions' column name should be 'year'"
                raise AssertionError(msg)

        res_g = apply_op_parallel_progress(
            func_to_call=do_pre_processing,
            reaggregator=self.reaggregator,
            time_name="year",
            run_checks=self.run_checks,
            world_gridding_sectors=self.world_gridding_sectors,
            table=self.table,
            level_separator=self.level_separator,
            co2_fossil_sectors=self.co2_fossil_sectors,
            co2_biosphere_sectors=self.co2_biosphere_sectors,
            co2_name=self.co2_name,
            iterable_input=(
                gdf for _, gdf in in_emissions.groupby(["model", "scenario"])
            ),
            parallel_op_config=ParallelOpConfig.from_user_facing(
                progress=self.progress,
                max_workers=self.n_processes,
            ),
        )

        res_d = defaultdict(list)
        for res_ms in res_g:
            for k, v in asdict(res_ms).items():
                if v is not None:
                    res_d[k].append(v)

        result_initialiser = {k: pd.concat(v) for k, v in res_d.items()}
        if "assumed_zero_emissions" not in result_initialiser:
            result_initialiser["assumed_zero_emissions"] = None

        res = CMIP7ScenarioMIPPreProcessingResult(**result_initialiser)

        return res

co2_biosphere_sectors class-attribute instance-attribute #

co2_biosphere_sectors: tuple[str, ...] = (
    CO2_BIOSPHERE_SECTORS_GRIDDING
)

Gridding sectors that are assumed to come from the biosphere CO2 reservoir

co2_fossil_sectors class-attribute instance-attribute #

co2_fossil_sectors: tuple[str, ...] = (
    CO2_FOSSIL_SECTORS_GRIDDING
)

Gridding sectors that are assumed to come from the fossil CO2 reservoir

co2_name class-attribute instance-attribute #

co2_name: str = 'CO2'

Name used for CO2 in variable names

level_separator class-attribute instance-attribute #

level_separator: str = '|'

The separator between levels in variable names

n_processes class-attribute instance-attribute #

n_processes: int | None = cpu_count()

Number of processes to use for parallel processing.

Set to None to process in serial.

progress class-attribute instance-attribute #

progress: bool = True

Should progress bars be shown?

reaggregator class-attribute instance-attribute #

reaggregator: ReaggregatorLike | None = None

Re-aggregator to use when converting raw data to gridding sectors

If not supplied, we guess the re-aggregator during processing

run_checks class-attribute instance-attribute #

run_checks: bool = True

If True, run checks on both input and output data

If you are sure about your workflow, you can disable the checks to speed things up (but we don't recommend this unless you really are confident about what you're doing).

table class-attribute instance-attribute #

table: str = 'Emissions'

The value used for the top level of variable names

world_gridding_sectors class-attribute instance-attribute #

world_gridding_sectors: tuple[str, ...] = (
    "Aircraft",
    "International Shipping",
)

Sectors that are only used for gridding at the world (i.e. regional sum) level

__call__ #

__call__(
    in_emissions: DataFrame,
) -> CMIP7ScenarioMIPPreProcessingResult

Pre-process

Parameters:

Name Type Description Default
in_emissions DataFrame

Emissions to pre-process

required

Returns:

Type Description
CMIP7ScenarioMIPPreProcessingResult

Pre-processed emissions

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def __call__(
    self, in_emissions: pd.DataFrame
) -> CMIP7ScenarioMIPPreProcessingResult:
    """
    Pre-process

    Parameters
    ----------
    in_emissions
        Emissions to pre-process

    Returns
    -------
    :
        Pre-processed emissions
    """
    if self.run_checks:
        assert_index_is_multiindex(in_emissions)
        assert_data_is_all_numeric(in_emissions)

        if in_emissions.columns.name != "year":
            msg = "The input emissions' column name should be 'year'"
            raise AssertionError(msg)

    res_g = apply_op_parallel_progress(
        func_to_call=do_pre_processing,
        reaggregator=self.reaggregator,
        time_name="year",
        run_checks=self.run_checks,
        world_gridding_sectors=self.world_gridding_sectors,
        table=self.table,
        level_separator=self.level_separator,
        co2_fossil_sectors=self.co2_fossil_sectors,
        co2_biosphere_sectors=self.co2_biosphere_sectors,
        co2_name=self.co2_name,
        iterable_input=(
            gdf for _, gdf in in_emissions.groupby(["model", "scenario"])
        ),
        parallel_op_config=ParallelOpConfig.from_user_facing(
            progress=self.progress,
            max_workers=self.n_processes,
        ),
    )

    res_d = defaultdict(list)
    for res_ms in res_g:
        for k, v in asdict(res_ms).items():
            if v is not None:
                res_d[k].append(v)

    result_initialiser = {k: pd.concat(v) for k, v in res_d.items()}
    if "assumed_zero_emissions" not in result_initialiser:
        result_initialiser["assumed_zero_emissions"] = None

    res = CMIP7ScenarioMIPPreProcessingResult(**result_initialiser)

    return res

ReaggregatorBasic #

Reaggregator that follows this module's logic

Methods:

Name Description
assert_has_all_required_timeseries

Assert that the data has all the required timeseries

assert_is_internally_consistent

Assert that the data is internally consistent

default_tols_internal_consistency

Get default tolerances for internal consistency checks

get_internal_consistency_checking_index

Get the index which selects only data relevant for checking internal consistency

to_complete

Convert the raw data to complete data

to_gridding_sectors

Re-aggregate data to the sectors used for gridding

Attributes:

Name Type Description
internal_consistency_tolerances Mapping[str, Mapping[str, float]] | Mapping[str, Mapping[str, PINT_SCALAR]]

Tolerances to apply when checking the internal consistency of the data

model_regions tuple[str, ...]

Model regions to use while reaggregating

region_level str

Region level in the data index

unit_level str

Unit level in the data index

variable_level str

Variable level in the data index

world_region str

The value used when the data represents the sum over all regions

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
@define
class ReaggregatorBasic:
    """
    Reaggregator that follows this module's logic
    """

    model_regions: tuple[str, ...]
    """Model regions to use while reaggregating"""

    region_level: str = "region"
    """Region level in the data index"""

    unit_level: str = "unit"
    """Unit level in the data index"""

    variable_level: str = "variable"
    """Variable level in the data index"""

    world_region: str = "World"
    """
    The value used when the data represents the sum over all regions

    (Having a value for this is odd,
    there should really just be no region level when data is the sum,
    but this is the data format used so we have to follow this convention.)
    """

    internal_consistency_tolerances: (
        Mapping[str, Mapping[str, float]] | Mapping[str, Mapping[str, PINT_SCALAR]]
    ) = field()
    """
    Tolerances to apply when checking the internal consistency of the data
    """

    @internal_consistency_tolerances.default
    def default_tols_internal_consistency(
        self,
    ) -> Mapping[str, Mapping[str, float]] | Mapping[str, Mapping[str, PINT_SCALAR]]:
        """
        Get default tolerances for internal consistency checks
        """
        return get_default_internal_conistency_checking_tolerances()

    def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data has all the required timeseries

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        NotCompleteError
            `indf` is not complete
        """
        assert_has_all_required_timeseries(
            indf,
            model_regions=self.model_regions,
            world_region=self.world_region,
            region_level=self.region_level,
            variable_level=self.variable_level,
        )

    def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data is internally consistent

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        InternalConsistencyError
            The data is not internally consistent
        """
        assert_is_internally_consistent(
            indf,
            model_regions=self.model_regions,
            tolerances=self.internal_consistency_tolerances,
            world_region=self.world_region,
            region_level=self.region_level,
            unit_level=self.unit_level,
            variable_level=self.variable_level,
        )

    def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
        """
        Get the index which selects only data relevant for checking internal consistency

        Returns
        -------
        :
            Internal consistency checking index
        """
        return get_internal_consistency_checking_index(
            model_regions=self.model_regions,
            world_region=self.world_region,
            region_level=self.region_level,
            variable_level=self.variable_level,
        )

    def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
        """
        Convert the raw data to complete data

        Parameters
        ----------
        raw
            Raw data

        Returns
        -------
        :
            To complete result
        """
        return to_complete(
            indf=raw,
            model_regions=self.model_regions,
            unit_level=self.unit_level,
            variable_level=self.variable_level,
            region_level=self.region_level,
            world_region=self.world_region,
        )

    def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
        """
        Re-aggregate data to the sectors used for gridding

        Parameters
        ----------
        indf
            Data to re-aggregate

        Returns
        -------
        :
            Data re-aggregated to the gridding sectors
        """
        return to_gridding_sectors(
            indf=indf, region_level=self.region_level, world_region=self.world_region
        )

internal_consistency_tolerances class-attribute instance-attribute #

internal_consistency_tolerances: (
    Mapping[str, Mapping[str, float]]
    | Mapping[str, Mapping[str, PINT_SCALAR]]
) = field()

Tolerances to apply when checking the internal consistency of the data

model_regions instance-attribute #

model_regions: tuple[str, ...]

Model regions to use while reaggregating

region_level class-attribute instance-attribute #

region_level: str = 'region'

Region level in the data index

unit_level class-attribute instance-attribute #

unit_level: str = 'unit'

Unit level in the data index

variable_level class-attribute instance-attribute #

variable_level: str = 'variable'

Variable level in the data index

world_region class-attribute instance-attribute #

world_region: str = 'World'

The value used when the data represents the sum over all regions

(Having a value for this is odd, there should really just be no region level when data is the sum, but this is the data format used so we have to follow this convention.)

assert_has_all_required_timeseries #

assert_has_all_required_timeseries(indf: DataFrame) -> None

Assert that the data has all the required timeseries

Parameters:

Name Type Description Default
indf DataFrame

Data to check

required

Raises:

Type Description
NotCompleteError

indf is not complete

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data has all the required timeseries

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    NotCompleteError
        `indf` is not complete
    """
    assert_has_all_required_timeseries(
        indf,
        model_regions=self.model_regions,
        world_region=self.world_region,
        region_level=self.region_level,
        variable_level=self.variable_level,
    )

assert_is_internally_consistent #

assert_is_internally_consistent(indf: DataFrame) -> None

Assert that the data is internally consistent

Parameters:

Name Type Description Default
indf DataFrame

Data to check

required

Raises:

Type Description
InternalConsistencyError

The data is not internally consistent

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data is internally consistent

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    InternalConsistencyError
        The data is not internally consistent
    """
    assert_is_internally_consistent(
        indf,
        model_regions=self.model_regions,
        tolerances=self.internal_consistency_tolerances,
        world_region=self.world_region,
        region_level=self.region_level,
        unit_level=self.unit_level,
        variable_level=self.variable_level,
    )

default_tols_internal_consistency #

default_tols_internal_consistency() -> (
    Mapping[str, Mapping[str, float]]
    | Mapping[str, Mapping[str, PINT_SCALAR]]
)

Get default tolerances for internal consistency checks

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
@internal_consistency_tolerances.default
def default_tols_internal_consistency(
    self,
) -> Mapping[str, Mapping[str, float]] | Mapping[str, Mapping[str, PINT_SCALAR]]:
    """
    Get default tolerances for internal consistency checks
    """
    return get_default_internal_conistency_checking_tolerances()

get_internal_consistency_checking_index #

get_internal_consistency_checking_index() -> MultiIndex

Get the index which selects only data relevant for checking internal consistency

Returns:

Type Description
MultiIndex

Internal consistency checking index

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
    """
    Get the index which selects only data relevant for checking internal consistency

    Returns
    -------
    :
        Internal consistency checking index
    """
    return get_internal_consistency_checking_index(
        model_regions=self.model_regions,
        world_region=self.world_region,
        region_level=self.region_level,
        variable_level=self.variable_level,
    )

to_complete #

to_complete(raw: DataFrame) -> ToCompleteResult

Convert the raw data to complete data

Parameters:

Name Type Description Default
raw DataFrame

Raw data

required

Returns:

Type Description
ToCompleteResult

To complete result

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
    """
    Convert the raw data to complete data

    Parameters
    ----------
    raw
        Raw data

    Returns
    -------
    :
        To complete result
    """
    return to_complete(
        indf=raw,
        model_regions=self.model_regions,
        unit_level=self.unit_level,
        variable_level=self.variable_level,
        region_level=self.region_level,
        world_region=self.world_region,
    )

to_gridding_sectors #

to_gridding_sectors(indf: DataFrame) -> DataFrame

Re-aggregate data to the sectors used for gridding

Parameters:

Name Type Description Default
indf DataFrame

Data to re-aggregate

required

Returns:

Type Description
DataFrame

Data re-aggregated to the gridding sectors

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
    """
    Re-aggregate data to the sectors used for gridding

    Parameters
    ----------
    indf
        Data to re-aggregate

    Returns
    -------
    :
        Data re-aggregated to the gridding sectors
    """
    return to_gridding_sectors(
        indf=indf, region_level=self.region_level, world_region=self.world_region
    )

ReaggregatorLike #

Bases: Protocol

Interface that can be used for re-aggregation

Methods:

Name Description
assert_has_all_required_timeseries

Assert that the data has all the required timeseries

assert_is_internally_consistent

Assert that the data is internally consistent

get_internal_consistency_checking_index

Get the index which selects only data relevant for checking internal consistency

to_complete

Convert the raw data to complete data

to_gridding_sectors

Re-aggregate data to the sectors used for gridding

Attributes:

Name Type Description
model_regions tuple[str, ...]

Model regions to use while reaggregating

region_level str

Region level in the data index

unit_level str

Unit level in the data index

variable_level str

Variable level in the data index

world_region str

The value used when the data represents the sum over all regions

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
class ReaggregatorLike(Protocol):
    """
    Interface that can be used for re-aggregation
    """

    model_regions: tuple[str, ...]
    """Model regions to use while reaggregating"""

    region_level: str
    """Region level in the data index"""

    unit_level: str
    """Unit level in the data index"""

    variable_level: str
    """Variable level in the data index"""

    world_region: str
    """
    The value used when the data represents the sum over all regions

    (Having a value for this is odd,
    there should really just be no region level when data is the sum,
    but this is the data format used so we have to follow this convention.)
    """

    def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data has all the required timeseries

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        NotCompleteError
            `indf` is not complete
        """

    def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
        """
        Assert that the data is internally consistent

        Parameters
        ----------
        indf
            Data to check

        Raises
        ------
        InternalConsistencyError
            The data is not internally consistent
        """

    def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
        """
        Get the index which selects only data relevant for checking internal consistency

        Returns
        -------
        :
            Internal consistency checking index
        """

    def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
        """
        Convert the raw data to complete data

        Parameters
        ----------
        raw
            Raw data

        Returns
        -------
        :
            To complete result
        """

    def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
        """
        Re-aggregate data to the sectors used for gridding

        Parameters
        ----------
        indf
            Data to re-aggregate

        Returns
        -------
        :
            Data re-aggregated to the gridding sectors
        """

model_regions instance-attribute #

model_regions: tuple[str, ...]

Model regions to use while reaggregating

region_level instance-attribute #

region_level: str

Region level in the data index

unit_level instance-attribute #

unit_level: str

Unit level in the data index

variable_level instance-attribute #

variable_level: str

Variable level in the data index

world_region instance-attribute #

world_region: str

The value used when the data represents the sum over all regions

(Having a value for this is odd, there should really just be no region level when data is the sum, but this is the data format used so we have to follow this convention.)

assert_has_all_required_timeseries #

assert_has_all_required_timeseries(indf: DataFrame) -> None

Assert that the data has all the required timeseries

Parameters:

Name Type Description Default
indf DataFrame

Data to check

required

Raises:

Type Description
NotCompleteError

indf is not complete

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def assert_has_all_required_timeseries(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data has all the required timeseries

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    NotCompleteError
        `indf` is not complete
    """

assert_is_internally_consistent #

assert_is_internally_consistent(indf: DataFrame) -> None

Assert that the data is internally consistent

Parameters:

Name Type Description Default
indf DataFrame

Data to check

required

Raises:

Type Description
InternalConsistencyError

The data is not internally consistent

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def assert_is_internally_consistent(self, indf: pd.DataFrame) -> None:
    """
    Assert that the data is internally consistent

    Parameters
    ----------
    indf
        Data to check

    Raises
    ------
    InternalConsistencyError
        The data is not internally consistent
    """

get_internal_consistency_checking_index #

get_internal_consistency_checking_index() -> MultiIndex

Get the index which selects only data relevant for checking internal consistency

Returns:

Type Description
MultiIndex

Internal consistency checking index

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def get_internal_consistency_checking_index(self) -> pd.MultiIndex:
    """
    Get the index which selects only data relevant for checking internal consistency

    Returns
    -------
    :
        Internal consistency checking index
    """

to_complete #

to_complete(raw: DataFrame) -> ToCompleteResult

Convert the raw data to complete data

Parameters:

Name Type Description Default
raw DataFrame

Raw data

required

Returns:

Type Description
ToCompleteResult

To complete result

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def to_complete(self, raw: pd.DataFrame) -> ToCompleteResult:
    """
    Convert the raw data to complete data

    Parameters
    ----------
    raw
        Raw data

    Returns
    -------
    :
        To complete result
    """

to_gridding_sectors #

to_gridding_sectors(indf: DataFrame) -> DataFrame

Re-aggregate data to the sectors used for gridding

Parameters:

Name Type Description Default
indf DataFrame

Data to re-aggregate

required

Returns:

Type Description
DataFrame

Data re-aggregated to the gridding sectors

Source code in src/gcages/cmip7_scenariomip/pre_processing/pre_processor.py
def to_gridding_sectors(self, indf: pd.DataFrame) -> pd.DataFrame:
    """
    Re-aggregate data to the sectors used for gridding

    Parameters
    ----------
    indf
        Data to re-aggregate

    Returns
    -------
    :
        Data re-aggregated to the gridding sectors
    """

ToCompleteResult #

Result of calling to_complete on a reaggregator

Attributes:

Name Type Description
assumed_zero DataFrame | None

The timeseries that were assumed to be zero to make self.complete

complete DataFrame

Complete pd.DataFrame

Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/common.py
@define
class ToCompleteResult:
    """
    Result of calling `to_complete` on a reaggregator
    """

    complete: pd.DataFrame
    """Complete [pd.DataFrame][pandas.DataFrame]"""

    assumed_zero: pd.DataFrame | None
    """
    The timeseries that were assumed to be zero to make `self.complete`

    If `None`, no timeseries were assumed to be zero.
    """

assumed_zero instance-attribute #

assumed_zero: DataFrame | None

The timeseries that were assumed to be zero to make self.complete

If None, no timeseries were assumed to be zero.

complete instance-attribute #

complete: DataFrame

Complete pd.DataFrame