Reproducing GRAIN Workflow
Reproduction of GRAIN Workflow
This page outlines the process to recreate GRAIN data for the country of Egypt. The same process may be used to generate GRAIN canals for any country by changing the related input files
Provided in the GRAIN Git repository is a python jupyter notebook titled 'GRAIN_sample_workflow.ipynb', which has been set up to be run cell by cell to generate the GRAIN data. It can be found in src->sample_workflow.
I.Python Environment set up
All python modules required to run the workflow notebook is listed in the grain_environment.yml file. The .yml file can be used with either mamba or conda to create the environment using the following bash command:
Provided in the GRAIN Git repository is a python jupyter notebook titled 'GRAIN_sample_workflow.ipynb', which has been set up to be run cell by cell to generate the GRAIN data. It can be found in src->sample_workflow.
I.Python Environment set up
All python modules required to run the workflow notebook is listed in the grain_environment.yml file. The .yml file can be used with either mamba or conda to create the environment using the following bash command:
This command requires a working installation of conda or mamba (Python environment managers) on your machine.
II.Prerequisite Data
The sample workflow requires some pre-requisite data such as OpenStreetMap(OSM) data for Egypt, the trained grain ML model, and other supporting datasets to run. They have been hosted on Zenodo and needs to be downloaded and unzipped into the sample_workflow folder.
Zenodo Link: https://zenodo.org/records/17608198
The following Bash command can also be used for the same:
cd src/sample_workflow
curl -L -o GRAIN_sample_data_EGYPT.zip "https://zenodo.org/records/17608198/files/GRAIN_sample_data_EGYPT.zip?download=1"
unzip GRAIN_sample_data_EGYPT.zip
Once the file has been downloaded and unzipped, ensure that there is now a folder named GRAIN_sample_data_EGYPT inside the sample_workflow folder.
III.Running GRAIN sample workflow python
- Open the GRAIN_sample_workflow.ipynb jupyter notebook and set the kernel to the GRAIN python environment that contains the required modules.
- Run cell [3] ##imports to import all the required modules. Ensure that the python scripts feature_engineering.py, and GRAIN_helper_functions.py is present in the sample_workflow folder.
- Cell [4] may be run to ignore any kind of warnings that might be outputted. This is optional.
- Cell [5] contains specification of the file and folder paths to all the required pre-requsite data. Ensure that the data has been downloaded as provided in step II, and that the GRAIN_sample_data_EGYPT folder is present in sample_workflow.
## File paths for input data
egypt_osm_waterways_fp = './GRAIN_sample_data_EGYPT/egypt_waterway.parquet' #(1)!
ml_model_fp = './GRAIN_sample_data_EGYPT/ML_model_random_forest.pkl'
sword_data_folder = './GRAIN_sample_data_EGYPT/SWORD_v16_shp/'
sword_fileName_format = '{}_sword_reaches_hb{}_v16.shp'
hydrobasins_l2_folder = './GRAIN_sample_data_EGYPT/HydroBasins_world_L2'
hydrobasin_l6_file = './GRAIN_sample_data_EGYPT/hydrobasins_allBasins_l6_geoParquet_EPSG4326.parquet'
hydrobasins_l2_fileName_format = 'hybas_{}_lev02_v1c.shp'
world_countries_filePath = './GRAIN_sample_data_EGYPT/world-administrative-boundaries.geojson'
sword_continent_map = './GRAIN_sample_data_EGYPT/sword_continents.json'
koppen_class_map = './GRAIN_sample_data_EGYPT/koppen_class_label.json'
koppen_geiger_fp = './GRAIN_sample_data_EGYPT/koppen_geiger_0p00833333.cog'
dem_cog_fp = './GRAIN_sample_data_EGYPT/dem_data/World_e-Atlas-UCSD_SRTM30-plus_v8.cog'
esa_cci_cog_path = "./GRAIN_sample_data_EGYPT/ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.7.cog"
##Output folder path
save_path = './GRAIN_sample_data_EGYPT/sample_outputs/' #(2)!
- Check the 'Processing OSM PBF' files section at the end of this page to understand how to obtain the OSM parquet files.
- You can change the save_path to any other folder of choice.
- Run cells [6] and [7] that declares functions to perform topology based promotion of canal segments and to assign canal use case.
- Run cell [8]. It contains the core function that runs the GRAIN ml model for a given country. Additional code may be added after line 279 of cell [8] to export the GRAIN data in further formats. For instance, if you want to export as a GeoJSON file, then add the following line.
- Run cell [9] which will call the run_grain_ml_model function. If you are recreating GRAIN data for a country other than egypt, change the country name in Line 3 of this cell.
This should now have created the GRAIN canal files in .parquet, .shp, and other user provided formats within the folder specified in save_path.
IV.Downloading and Processing OSM PBF data
OSM data for any country not present in GRAIN or the updated data for exisiting countries can be downloaded from Geofabrik.de. Ensure that country level files are downloaded. These can be accessed by clicking on the continent names as shown below.

The OSM data that is provided withing the pre-requisite data folder contains only the waterways and is in .parquet format. Note that conversion to .parquet is not strictly necessary for smaller countries. However, it is highly recommended for larger OSM files due to their file size. This can be achieved as follows.
- Filtering for waterways: Using the Osmium Toolkit for processing OSM data. Osmium-tool is a command-line utility for filtering, converting, and extracting information from large
.osm.pbffiles efficiently.
Once installed on your machine, the OSM data can be filtered for waterways using the following terminal command. Example shown for Egypt.
This file can now be converted to
.parquet format in several ways. Note that osmium-tool itself cannot export directly to Parquet, as it only supports OSM-native formats (e.g., .osm, .osc, .pbf) and simple CSV/JSON exports. Therefore, one of the following tools must be used to convert the filtered .osm.pbf file into a GeoParquet file suitable for the GRAIN workflow:
-
Using QGIS
The filtered fileegypt_waterway.osm.pbfcan be loaded into QGIS directly. QGIS automatically parses OSM geometry and attributes. Once loaded:- Open the Layers panel
- Right-click the waterways layer → Export → Save Features As…
- Select GeoParquet as the output format
- Save as
egypt_waterway.parquet
-
Using Python and
pyrosm/ GeoPandas
Python provides a fully programmatic way to convert OSM PBF files into GeoParquet. For example:from pyrosm import OSM osm = OSM("egypt_waterway.osm.pbf") waterways = osm.get_data("waterways") # Extract waterway features # Save to GeoParquet waterways.to_parquet("egypt_waterway.parquet") -
Using
osmium-export(optional)
Osmium providesosmium exportto generate.geojsonor NDJSON. Although it cannot write Parquet directly, you can convert the GeoJSON output to Parquet via Python:
Then in Python:osmium export egypt_waterway.osm.pbf -o egypt_waterway.geojson
This two-step method is useful ifimport geopandas as gpd gdf = gpd.read_file("egypt_waterway.geojson") gdf.to_parquet("egypt_waterway.parquet")pyrosmis not available.