gcages.cmip7_scenariomip.pre_processing.reaggregation#
Reaggregation of timeseries from raw reporting to sectors needed for gridding
The idea here is that we receive raw data following some variable specification Based on this, we reaggregate to the variables needed for gridding (see gcages.cmip7_scenariomip.gridding_emissions). In order to do the reaggregation sensibly, two things must be true:
- all the timeseries we require must be there
- the data must be internally consistent
- including consideration of any optional timeseries
Reaggregation is a data problem i.e. the hard part is making sure that the data we receive matches our data model. As a result, the code is highly coupled with the data we expect (writing general solutions is hard). This is why we have written the code that supports each data model in a standalone module, rather than trying to write a general solution (which was extremely difficult when we tried to do it that way from the start, we think because it creates couplings which are incredibly difficult to reason through).
Modules:
| Name | Description |
|---|---|
basic |
Basic reaggregation |
common |
Common components used across different re-aggregation strategies |
Classes:
| Name | Description |
|---|---|
ReaggregatorBasic |
Reaggregator that follows this module's logic |
ToCompleteResult |
Result of calling |
ReaggregatorBasic #
Reaggregator that follows this module's logic
Methods:
| Name | Description |
|---|---|
assert_has_all_required_timeseries |
Assert that the data has all the required timeseries |
assert_is_internally_consistent |
Assert that the data is internally consistent |
default_tols_internal_consistency |
Get default tolerances for internal consistency checks |
get_internal_consistency_checking_index |
Get the index which selects only data relevant for checking internal consistency |
to_complete |
Convert the raw data to complete data |
to_gridding_sectors |
Re-aggregate data to the sectors used for gridding |
Attributes:
| Name | Type | Description |
|---|---|---|
internal_consistency_tolerances |
Mapping[str, Mapping[str, float]] | Mapping[str, Mapping[str, PINT_SCALAR]]
|
Tolerances to apply when checking the internal consistency of the data |
model_regions |
tuple[str, ...]
|
Model regions to use while reaggregating |
region_level |
str
|
Region level in the data index |
unit_level |
str
|
Unit level in the data index |
variable_level |
str
|
Variable level in the data index |
world_region |
str
|
The value used when the data represents the sum over all regions |
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 | |
internal_consistency_tolerances
class-attribute
instance-attribute
#
internal_consistency_tolerances: (
Mapping[str, Mapping[str, float]]
| Mapping[str, Mapping[str, PINT_SCALAR]]
) = field()
Tolerances to apply when checking the internal consistency of the data
model_regions
instance-attribute
#
Model regions to use while reaggregating
region_level
class-attribute
instance-attribute
#
region_level: str = 'region'
Region level in the data index
unit_level
class-attribute
instance-attribute
#
unit_level: str = 'unit'
Unit level in the data index
variable_level
class-attribute
instance-attribute
#
variable_level: str = 'variable'
Variable level in the data index
world_region
class-attribute
instance-attribute
#
world_region: str = 'World'
The value used when the data represents the sum over all regions
(Having a value for this is odd, there should really just be no region level when data is the sum, but this is the data format used so we have to follow this convention.)
assert_has_all_required_timeseries #
assert_has_all_required_timeseries(indf: DataFrame) -> None
Assert that the data has all the required timeseries
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indf
|
DataFrame
|
Data to check |
required |
Raises:
| Type | Description |
|---|---|
NotCompleteError
|
|
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
assert_is_internally_consistent #
assert_is_internally_consistent(indf: DataFrame) -> None
Assert that the data is internally consistent
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indf
|
DataFrame
|
Data to check |
required |
Raises:
| Type | Description |
|---|---|
InternalConsistencyError
|
The data is not internally consistent |
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
default_tols_internal_consistency #
default_tols_internal_consistency() -> (
Mapping[str, Mapping[str, float]]
| Mapping[str, Mapping[str, PINT_SCALAR]]
)
Get default tolerances for internal consistency checks
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
get_internal_consistency_checking_index #
get_internal_consistency_checking_index() -> MultiIndex
Get the index which selects only data relevant for checking internal consistency
Returns:
| Type | Description |
|---|---|
MultiIndex
|
Internal consistency checking index |
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
to_complete #
to_complete(raw: DataFrame) -> ToCompleteResult
Convert the raw data to complete data
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw
|
DataFrame
|
Raw data |
required |
Returns:
| Type | Description |
|---|---|
ToCompleteResult
|
To complete result |
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
to_gridding_sectors #
Re-aggregate data to the sectors used for gridding
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indf
|
DataFrame
|
Data to re-aggregate |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Data re-aggregated to the gridding sectors |
Source code in src/gcages/cmip7_scenariomip/pre_processing/reaggregation/basic.py
ToCompleteResult #
Result of calling to_complete on a reaggregator
Attributes:
| Name | Type | Description |
|---|---|---|
assumed_zero |
DataFrame | None
|
The timeseries that were assumed to be zero to make |
complete |
DataFrame
|
Complete pd.DataFrame |