Action & Observation Models
The environment uses Pydantic models for type-safe, validated data exchange between agent and environment. Both MethanolAPCAction and MethanolAPCObservation inherit from OpenEnv's base Action and Observation classes.
from methanol_apc_env.models import MethanolAPCAction, MethanolAPCObservation
MethanolAPCAction
The agent sends an action every step. It contains 13 continuous control variables organized into 5 subsystems:
Feed Controls (required — no defaults)
| Field |
Type |
Range |
Unit |
What It Controls |
feed_rate_h2 |
float |
0–10 |
mol/s |
Hydrogen feed to synthesis reactor. Higher = more reactant available. |
feed_rate_co |
float |
0–5 |
mol/s |
Carbon monoxide feed. Ideal H₂/CO ratio is 2.0 for methanol synthesis. |
cooling_water_flow |
float |
0–100 |
L/min |
Shell-side cooling water rate. The primary tool for temperature control. |
compressor_power |
float |
0–100 |
kW |
Syngas compressor. Higher power → higher reactor pressure → faster reaction. |
These 4 fields are required — the agent must always specify them.
Synthesis Loop Controls (optional — have safe defaults)
| Field |
Type |
Range |
Default |
Unit |
What It Controls |
purge_valve_position |
float |
0–100 |
2.0 |
% |
Removes inert gases (N₂, CH₄, Ar) from recycle loop. Too high = wasted reactant. Too low = inert buildup reduces conversion. |
recycle_ratio |
float |
0–8 |
3.5 |
— |
Ratio of recycled gas to fresh feed. Industrial plants use 3–5. Higher = better per-pass utilization but more compressor load. |
feed_preheat_temp |
float |
0–300 |
200.0 |
°C |
Preheats feed gas before reactor entry. Must be high enough for catalyst activation (~200°C) but not so high it causes runaway. |
| Field |
Type |
Range |
Default |
Unit |
What It Controls |
reformer_fuel_gas |
float |
0–20 |
5.0 |
mol/s |
Fuel to the SMR burner tubes. Controls how much syngas is produced. More fuel = hotter reformer = more H₂ + CO. |
reformer_steam_flow |
float |
0–50 |
15.0 |
mol/s |
Steam injection. Target steam-to-carbon ratio is ~3.0. Too low → carbon deposition (coking). Too high → wasted energy. |
Distillation Controls (optional)
| Field |
Type |
Range |
Default |
Unit |
What It Controls |
distillation_reflux |
float |
0–10 |
3.0 |
— |
Column reflux ratio. Higher = purer methanol product but more reboiler energy needed. Industrial: 2–5. |
reboiler_duty |
float |
0–200 |
50.0 |
kW |
Heat input to distillation column bottom. Drives methanol-water separation. |
Safety Controls (optional)
| Field |
Type |
Range |
Default |
Unit |
What It Controls |
flare_valve |
float |
0–100 |
0.0 |
% |
Emergency pressure relief. Opens to flare stack. Should be 0% in normal operation — any opening wastes product and emits CO₂. |
Safe Default Behavior
The action model includes a Pydantic model_validator that replaces 0 with safe operating defaults for optional fields. This prevents agents that only know about the 4 core variables from accidentally setting the reformer/distillation to zero:
# Agent only sends core 4 variables — optional fields get safe defaults
action = MethanolAPCAction(
feed_rate_h2=5.0,
feed_rate_co=2.5,
cooling_water_flow=40.0,
compressor_power=65.0,
# purge_valve_position → 2.0 (not 0)
# recycle_ratio → 3.5 (not 0)
# reformer_fuel_gas → 5.0 (not 0)
# etc.
)
Validation
All fields are bounded. Pydantic raises ValidationError if any value is outside its range:
# This raises ValidationError: feed_rate_h2 must be <= 10
action = MethanolAPCAction(feed_rate_h2=15.0, feed_rate_co=2.5,
cooling_water_flow=40.0, compressor_power=65.0)
MethanolAPCObservation
Returned by env.step() and env.reset(). Contains 30+ fields organized into categories:
Core Reactor Telemetry
| Field |
Type |
Unit |
Description |
Typical Range |
temperature |
float |
°C |
Reactor bulk temperature. Most important variable. |
220–270 normal, >300 = shutdown |
pressure |
float |
bar |
Reactor pressure. Controlled by compressor. |
50–100 |
feed_rate_h2 |
float |
mol/s |
Current H₂ feed (may differ from setpoint due to rate limits) |
0–10 |
feed_rate_co |
float |
mol/s |
Current CO feed |
0–5 |
h2_co_ratio |
float |
— |
H₂/CO molar ratio. Ideal = 2.0 for methanol stoichiometry |
1.5–3.0 |
cooling_water_flow |
float |
L/min |
Current cooling water rate |
0–100 |
cooling_water_temp |
float |
°C |
Cooling water inlet temperature. Varies with ambient conditions + disturbances. |
20–45 |
catalyst_health |
float |
0–1 |
Catalyst relative activity. 1.0 = fresh, 0.0 = destroyed. Degrades irreversibly above 280°C. |
0.3–1.0 |
Production & Economics
| Field |
Type |
Unit |
Description |
methanol_produced |
float |
kg |
Cumulative methanol output this episode |
reaction_rate |
float |
mol/s |
Current methanol formation rate. Zero if no feed or shutdown. |
profit_this_step |
float |
$ |
Revenue minus costs for this timestep |
cumulative_profit |
float |
$ |
Total profit this episode |
stoichiometric_number |
float |
— |
SN = (H₂−CO₂)/(CO+CO₂). Target 2.0–2.05. Measures feed quality. |
carbon_efficiency |
float |
0–1 |
Fraction of carbon in feed converted to methanol product |
selectivity |
float |
0–1 |
Methanol selectivity (rest is DME + methyl formate byproducts) |
Synthesis Loop
| Field |
Type |
Unit |
Description |
purge_rate |
float |
mol/s |
Gas being purged from recycle loop |
inert_fraction |
float |
0–1 |
Inert gas buildup in recycle. High = reduced partial pressures = lower conversion |
recycle_ratio |
float |
— |
Current actual recycle ratio |
| Field |
Type |
Unit |
Description |
reformer_outlet_temp |
float |
°C |
SMR tube outlet temperature. 700–900°C range. |
steam_to_carbon |
float |
— |
Current S/C molar ratio. Target ~3.0. |
syngas_flow |
float |
mol/s |
Total syngas produced by reformer |
Distillation
| Field |
Type |
Unit |
Description |
product_purity |
float |
0–1 |
Methanol mass fraction in product. Grade AA = 0.9985. |
distillation_duty |
float |
kW |
Total energy consumed by distillation column |
Safety & Utility
| Field |
Type |
Unit |
Description |
flare_flow |
float |
mol/s |
Gas currently being flared. Should be ~0 in normal operation. |
total_co2_emissions |
float |
kg |
Cumulative CO₂ emitted (from combustion + flaring + reformer) |
safety_warning |
str or null |
— |
Natural-language safety alert. See Fault Detection for levels. |
temperature_trend |
float |
°C/step |
Rate of temperature change. Positive = heating up. |
Episode State
| Field |
Type |
Description |
step_number |
int |
Current step in the episode (0-indexed) |
max_steps |
int |
Total steps for this task (50–500 depending on task) |
task_name |
str |
Name of the active task (e.g., "optimization", "startup") |
done |
bool |
True if episode is over (inherited from OpenEnv Observation) |
reward |
float |
Dense per-step reward in [0.01, 0.99] (inherited from OpenEnv Observation) |
rubric_reward |
float or null |
RFC 004 rubric score. Dense during episode, trajectory score at terminal step. |
Using Observations
obs = await env.step(action)
# Core monitoring
print(f"T={obs.temperature:.1f}°C P={obs.pressure:.1f} bar")
print(f"Catalyst: {obs.catalyst_health:.1%}")
print(f"Rate: {obs.reaction_rate:.4f} mol/s")
# Economics
print(f"Step profit: ${obs.profit_this_step:.2f}")
print(f"Total profit: ${obs.cumulative_profit:.2f}")
# Safety
if obs.safety_warning:
print(f"⚠ {obs.safety_warning}")
# Is the episode over?
if obs.done:
print(f"Episode ended at step {obs.step_number}")