# RDM Pipeline The Robust Decision Making (RDM) pipeline is the core of OSeMOSYS-RDM, enabling systematic exploration of uncertainty in energy system models. ## What is RDM? Robust Decision Making is a decision support methodology that: - Explores a wide range of plausible futures - Identifies strategies that perform well across many scenarios - Helps decision-makers understand vulnerabilities - Supports adaptive policy design ## Pipeline Stages ### 1. Base Future Generation The base future (Future 0) establishes the reference scenario. ```bash python scripts/run_base_future.py ``` **What happens:** 1. Reads scenario from `src/workflow/0_Scenarios/` 2. Extracts model structure to `B1_Model_Structure.xlsx` 3. Solves the optimization problem 4. Exports results to CSV format ### 2. Uncertainty Sampling Latin Hypercube Sampling (LHS) generates parameter combinations. **Why LHS?** - Ensures uniform coverage of the uncertainty space - More efficient than random sampling - Stratified sampling across all dimensions **Configuration:** ``` # In Interface_RDM.xlsx, Setup sheet: Number_of_Runs: 100 # Number of futures to generate ``` ### 3. Future Generation and Solving Each future is created by modifying baseline parameters. ```python # Pseudocode of the process for future_id in range(1, N+1): # Generate parameter modifications from LHS sample modifications = apply_uncertainties(baseline, lhs_sample[future_id]) # Create scenario file create_scenario_file(modifications, future_id) # Preprocess data file (add sets, compute CRF/PvAnnuity) preprocess_data(scenario_file) # Solve optimization solve(scenario_file, solver) # Export results export_results(future_id) ``` #### Automatic Data Preprocessing Before each future is solved, the data file is automatically pre-processed by `preprocess_data.py`. This step: 1. Parses bracket-format parameters (`OutputActivityRatio`, `InputActivityRatio`, `EmissionActivityRatio`, etc.) 2. Calculates `CapitalRecoveryFactor` and `PvAnnuity` for each technology 3. Adds preprocessed sets (`MODExTECHNOLOGYperFUELout`, `MODExTECHNOLOGYperFUELin`, `MODEperTECHNOLOGY`, etc.) This is required for OSeMOSYS model formulation v5.4 and reduces matrix generation time. #### EV UDC Sign Correction When using User-Defined Constraints (UDC) to model EV penetration caps, LHS perturbations can flip coefficient signs, causing infeasibility. The workflow includes automatic post-perturbation sign correction: - Scans `UDCMultiplierTotalCapacity`, `UDCMultiplierNewCapacity`, and `UDCMultiplierActivity` - Ensures conventional technology coefficients (e.g., diesel/gasoline) remain **negative** - Ensures electric technology coefficients remain **positive** - Clamps flipped values to a small epsilon (0.001) This is configured via three fields in `Interface_RDM.xlsx` → `Setup` sheet: | Field | Description | Example | |---|---|---| | `EV_Conventional_Patterns` | Semicolon-separated substrings for conventional technologies | `DSL;DSH;GSL` | | `EV_Electric_Pattern` | Substring for electric technologies | `ELC` | | `EV_UDCs` | Semicolon-separated EV penetration UDC names | `2TRAHTREVCAP;2TRALTREVCAP` | If any of these fields is empty, the correction is skipped. Correction logs are saved to `Experimental_Platform/Logs/UDC_Corrections/`. ### 4. Result Aggregation Results are consolidated into unified datasets across two stages: - **`rdm_experiment` stage:** generates `OSEMOSYS_{Region}_Energy_Input.csv` immediately after all futures are solved - **`postprocess` stage:** generates `OSEMOSYS_{Region}_Energy_Output.csv` ``` src/Results/ ├── OSEMOSYS_{Region}_Energy_Input.csv # Generated in rdm_experiment ├── OSEMOSYS_{Region}_Energy_Output.csv # Generated in postprocess └── *.parquet # Efficient intermediate storage ``` ## Configuring Uncertainties ### Uncertainty Table Structure Define uncertainties in `Interface_RDM.xlsx` → `Uncertainty_Table` sheet: ```{list-table} :header-rows: 1 * - Column - Description - Example * - X_Num - Unique ID - 1, 2, 3... * - X_Category - Grouping category - "Fuel Costs" * - Min_Value - Lower bound - 0.8 * - Max_Value - Upper bound - 1.2 * - X_Mathematical_Type - Variation method - "Time_Series" ``` ### Mathematical Types #### Time_Series Interpolates from current value to a modified final value: ``` Current trajectory: 2025: 100 → 2050: 200 With multiplier 1.2: 2025: 100 → 2050: 240 ``` #### Constant Maintains constant value from the uncertainty start year: ``` Original: 2025: 100 → 2030: 100 → 2050: 100 With initial year 2030: 2025: 100 → 2030: 100 → 2050: 100 ``` #### Linear Linear interpolation to final value: ``` Original: 2025: 100 → 2050: 200 Modified: 2025: 100 → 2050: 240 (linear path) ``` #### Logistic S-curve (sigmoid) trajectory for technology adoption: ``` Slow adoption at start, accelerating in middle, saturating at end ``` #### Timeslices_Curve Switches between predefined demand curves: ``` Selects a different curve from shape_of_demand.xlsx based on uncertainty sampling ``` ```{important} For `Timeslices_Curve` type: - Curves must be predefined in the file `shape_of_demand.xlsx` - The parameter `Explored_Parameter_of_X` must be set to `Change_Curve` - This allows exploring different demand profile shapes across futures ``` ### Example: Fuel Cost Uncertainty ```yaml X_Num: 1 X_Category: "Fuel Costs" X_Plain_English_Description: "Natural gas price uncertainty" X_Mathematical_Type: "Time_Series" Explored_Parameter_of_X: "Final_Value" Min_Value: 0.7 Max_Value: 1.5 Involved_Scenarios: "Scenario1" Involved_First_Sets_in_Osemosys: "NATGAS" Exact_Parameters_Involved_in_Osemosys: "VariableCost" Initial_Year_of_Uncertainty: 2025 ``` ### Example: Technology Capacity Uncertainty ```yaml X_Num: 2 X_Category: "Technology Limits" X_Plain_English_Description: "Solar PV maximum capacity" X_Mathematical_Type: "Time_Series" Explored_Parameter_of_X: "Final_Value" Min_Value: 0.5 Max_Value: 2.0 Involved_Scenarios: "Scenario1" Involved_First_Sets_in_Osemosys: "PWRSOL001 ; PWRSOL002" Exact_Parameters_Involved_in_Osemosys: "TotalAnnualMaxCapacity" Initial_Year_of_Uncertainty: 2025 ``` ## Running the RDM Pipeline ### Quick Start ```bash python run.py rdm ``` ### Monitoring Progress The pipeline provides progress updates: ``` ====================================================================== 🔬 RDM Pipeline (Robust Decision Making) ====================================================================== Stages: base_future → rdm_experiment → postprocess ====================================================================== 🔄 Executing RDM Pipeline... ---------------------------------------------------------------------- Step 1 finished Step 2 finished ... # This is future: 1 and scenario Scenario1 # This is future: 2 and scenario Scenario1 ... ---------------------------------------------------------------------- ✅ RDM Pipeline completed in 15m 32s! ``` ## Parallel Execution RDM experiments are parallelized for efficiency. Each future launches its own system process (solver), so the number of futures you can run simultaneously depends on your CPU threads, RAM, and the solver you are using. ### Configuration ``` # In Interface_RDM.xlsx, Setup sheet: Parallel_Use: 10 # Futures processed simultaneously Threads_CPLEX_Gurobi: 4 # Threads per solve (CPLEX/Gurobi only) Time_CBC: 3600 # Max solve time in seconds (CBC only) ``` ### Calculating `Parallel_Use` Based on Your Machine The key resource is the number of **logical CPU threads** available. You can check this with: - **Windows:** Task Manager → Performance → CPU → "Logical processors" - **Linux/macOS:** `nproc` or `lscpu` You must always **reserve threads for the operating system and background processes** (typically 2–4 threads). The formula depends on whether your solver uses single or multiple threads: #### Single-thread solvers (CBC, GLPK) These solvers use exactly **1 thread per future**: ``` Parallel_Use = Total_CPU_Threads - Reserved_Threads ``` **Example:** A machine with 16 threads, reserving 4: ``` Parallel_Use = 16 - 4 = 12 ``` #### Multi-thread solvers (CPLEX, Gurobi) These solvers use **multiple threads per future** (configured via `Threads_CPLEX_Gurobi`): ``` Parallel_Use = (Total_CPU_Threads - Reserved_Threads) / Threads_CPLEX_Gurobi ``` **Example:** A machine with 16 threads, reserving 4, with `Threads_CPLEX_Gurobi = 4`: ``` Parallel_Use = (16 - 4) / 4 = 3 ``` #### Summary Table | Machine Threads | Reserved | Solver | Threads per Solve | Max `Parallel_Use` | |:-:|:-:|--------|:-:|:-:| | 8 | 2 | CBC/GLPK | 1 | 6 | | 8 | 2 | CPLEX/Gurobi | 2 | 3 | | 16 | 4 | CBC/GLPK | 1 | 12 | | 16 | 4 | CPLEX/Gurobi | 4 | 3 | | 32 | 4 | CBC/GLPK | 1 | 28 | | 32 | 4 | CPLEX/Gurobi | 4 | 7 | | 64 | 4 | CBC/GLPK | 1 | 60 | | 64 | 4 | CPLEX/Gurobi | 8 | 7 | ```{warning} **Do not exceed your machine's capacity.** If `Parallel_Use × Threads_per_Solve` exceeds the available CPU threads, the operating system will over-subscribe the CPU. This causes heavy context switching, degrades solver performance significantly (each individual solve takes much longer), and can make the machine unresponsive — potentially requiring a forced restart. Always leave threads free for the OS and other processes. ``` ### Memory Considerations Besides CPU threads, each parallel future also consumes RAM. A rough guide: | Parallel_Use | RAM Needed | Speed | |:-:|------------|-------| | 1 | ~4 GB | Slowest | | 5 | ~8 GB | Moderate | | 10 | ~16 GB | Fast | | 20 | ~32 GB | Fastest | ```{note} RAM usage depends heavily on model size (number of technologies, time slices, years). For large models, monitor memory usage during the first batch and adjust `Parallel_Use` accordingly. ``` ## Output Structure ### Per-Future Outputs Each future generates: ``` Experimental_Platform/Futures/Scenario1/Scenario1_1/ ├── Scenario1_1.txt # Modified scenario file ├── Scenario1_1_Input.parquet # Input parameters ├── Scenario1_1_Output.parquet # Solution outputs └── Scenario1_1_Output.sol # Raw solver output ``` ### Aggregated Outputs After pipeline completion: ``` src/Results/ ├── OSEMOSYS_{Region}_Energy_Input.csv (from rdm_experiment stage) │ └── Columns: Strategy, Future.ID, YEAR, Parameter, Value, ... ├── OSEMOSYS_{Region}_Energy_Output.csv (from postprocess stage) │ └── Columns: Strategy, Future.ID, YEAR, TECHNOLOGY, Value, ... └── *.parquet (efficient intermediate storage) ``` ## Solution Status Report After all futures are solved, the pipeline automatically generates a **solution status report** at `Results/solution_status.txt`. This file summarizes whether each future reached an optimal solution, was infeasible, or ended with another status. ### Report Format ``` Solution Status Summary ======================================== Total futures: 101 Optimal: 98 Infeasible: 3 ---------------------------------------- Scenario1_0: optimal Scenario1_1: optimal Scenario1_2: infeasible ... ``` The report covers both the base future (`Scenario*_0`) and all RDM futures. Supported solver output formats are CPLEX (XML), Gurobi, CBC, and GLPK. ### What Happens Internally 1. After all futures finish solving, `check_sol_status.py` scans the `.sol` files in each future's directory. 2. It extracts the solution status from solver-specific output formats. 3. It writes the summary to `Results/solution_status.txt`. 4. It deletes the `.sol` and `.lp` files to free disk space (these files can be very large for big models). ```{note} If a future is infeasible, its output Parquet files will not contain valid results. Check the solution status report to identify and exclude infeasible futures from downstream analysis. ``` ## Best Practices ### 1. Start Small ``` # Begin with 10-20 futures to test configuration Number_of_Runs: 20 ``` ### 2. Validate Base Future Before running many futures: - Check base future results manually - Verify model solves correctly - Confirm outputs make sense ### 3. Use Appropriate Ranges ``` # Too narrow: misses important outcomes Min_Value: 0.99, Max_Value: 1.01 # ❌ # Too wide: includes implausible futures Min_Value: 0.1, Max_Value: 10.0 # ❌ # Reasonable range Min_Value: 0.7, Max_Value: 1.3 # ✅ ``` ### 4. Group Related Uncertainties Use `X_Category` to organize: - Fuel Costs - Technology Costs - Demand Growth - Policy Constraints ### 5. Document Your Choices Keep notes on: - Why specific ranges were chosen - Data sources for uncertainty bounds - Assumptions made