Skip to content

Draft and test transform methods

Draft transform.py

Success

Write methods to transform the input dataframe into the output dataframe ofthe ETL process.

The input data is type hinted using the schema_external dataframe model. NOTE: The return data will be type hinted in a later step after defining schema.py

For this example, the data will be converted from °C to °F. The metadata from the schema can be used to determine which values should be converted.

transform.py
diff --git a/able_weather/datasets/weather/open_meteo/runner/transform.py b/able_weather/datasets/weather/open_meteo/runner/transform.py
index 0a771df..86f2048 100644
--- a/able_weather/datasets/weather/open_meteo/runner/transform.py
+++ b/able_weather/datasets/weather/open_meteo/runner/transform.py
@@ -1,14 +1,44 @@
 """
-This file transforms the input extracted by `extract_external` and/or
-the `extract` module of other ETL processes developed using the
-`able-workflow-copier` template. The output of this transformation is
-loaded to disk by the `load` module of this process.
-
-The contents of the `runner` module require extra
-dependencies (specified as `project.optional-dependencies.runner` in
-`pyproject.toml`) and are not imported required to be imported by other
-ETL processes that only need to extract and validate the datasets.
-This allows for a clear separation of concerns and keeps the dependencies
-light for other ETL processes that do not require the full functionality
-of this process.
+Convert the Open Meteo dataset from °C to °F
 """
+
+import pandas as pa
+from pandera.typing.pandas import DataFrame
+
+from able_weather.datasets.weather.open_meteo.runner import (
+    schema_external,
+)
+
+
+def celsius_to_fahrenheit(celsius: float) -> float:
+    return (celsius * 9 / 5) + 32
+
+
+def transform(data: DataFrame[schema_external.OpenMeteoSchema]) -> pa.DataFrame:
+    """
+    Convert temperature from Celsius to Fahrenheit in the Open Meteo dataset.
+    """
+
+    # Get column metadata from the schema to find temperature columns
+    col_metadata = (
+        (schema_external.OpenMeteoSchema.get_metadata() or {}).get(
+            "OpenMeteoSchema", {}
+        )
+        or {}
+    ).get("columns", {}) or {}
+
+    col_units = {
+        col: (col_metadata.get(col, {}) or {}).get("units")
+        for col in col_metadata.keys()
+    }
+
+    # Convert temperature columns from Celsius to Fahrenheit
+    for col in data.columns:
+        if col in col_units and col_units[col] == "°C":
+            data[col] = data[col].apply(celsius_to_fahrenheit)
+            data.rename(
+                columns={col: col.replace("_deg_c", "_deg_f")},
+                inplace=True,
+            )
+
+    return data

While writing the transformation, it became apparent that the column names should contain the units as to not create confusion. As such, extract_external.py, schema_external.py, test_extract_external.py, and test_schema_external.py were all updated so that the column names contain units.

Check that the code passes lint and typechecks

tox run-parallel --quiet -f py312 lint -f py312 typecheck

Ensure that the previously passing unit tests still pass

tox run -e py312-package-unit-runner -- --remote-data=any

Write and run test_transform.py

Success

Write unit tests to ensure the transformation works as intended. This can use test data from the data/tests/ directory, or hard-code simple test data.

Write unit tests to confirm that columns with °C metadata units are converted to °F, their column names change, and other columns are untouched. Use a simple hard-coded dataframe for test data.

test_transform.py
diff --git a/tests/able_weather/datasets/weather/open_meteo/runner/test_transform.py b/tests/able_weather/datasets/weather/open_meteo/runner/test_transform.py
index e69de29..0fc611c 100644
--- a/tests/able_weather/datasets/weather/open_meteo/runner/test_transform.py
+++ b/tests/able_weather/datasets/weather/open_meteo/runner/test_transform.py
@@ -0,0 +1,45 @@
+import pandas as pd
+
+from able_weather.datasets.weather.open_meteo.runner import transform
+
+
+def make_sample_df() -> pd.DataFrame:
+    """Create a tiny dataframe with temperatures in Celsius."""
+    return pd.DataFrame(
+        {
+            "date": pd.date_range("2023-01-01", periods=2, freq="h", tz="UTC"),
+            "temperature_deg_c_2m": [0.0, 100.0],
+            "apparent_temperature_deg_c": [0.0, 10.0],
+            "dew_point_temperature_deg_c_2m": [0.0, 5.0],
+            "relative_humidity_2m": [100.0, 50.0],
+        }
+    )
+
+
+def test_celsius_to_fahrenheit() -> None:
+    """Verify basic Celsius→Fahrenheit conversion."""
+    assert transform.celsius_to_fahrenheit(0.0) == 32.0
+    assert transform.celsius_to_fahrenheit(100.0) == 212.0
+
+
+def test_transform_temperature_conversion() -> None:
+    """Temperature columns should be converted and renamed."""
+    df = make_sample_df()
+    result = transform.transform(df.copy())
+
+    # temperature_deg_c_2m should be converted and renamed
+    assert "temperature_deg_f_2m" in result.columns
+    assert "temperature_deg_c_2m" not in result.columns
+    assert result.loc[0, "temperature_deg_f_2m"] == 32.0
+    assert result.loc[1, "temperature_deg_f_2m"] == 212.0
+
+    # Other Celsius columns should be converted but keep their names
+    assert result.loc[0, "apparent_temperature_deg_f"] == 32.0
+    assert result.loc[1, "apparent_temperature_deg_f"] == 50.0
+    assert result.loc[0, "dew_point_temperature_deg_f_2m"] == 32.0
+    assert result.loc[1, "dew_point_temperature_deg_f_2m"] == 41.0
+
+    # Relative humidity should remain unchanged
+    assert "relative_humidity_2m" in result.columns
+    assert result.loc[0, "relative_humidity_2m"] == 100.0
+    assert result.loc[1, "relative_humidity_2m"] == 50.0

Then test and debug if needed the functionality. This unit test does not require the remote data, so --remote-data=any can be ommitted.

tox run -e py312-package-unit-runner

Commit and CI

Commit the changes, push to github, and ensure all the continuous integration tests pass. NOTE: The CI tests will skip any tests marked with remote-data.