Action & Observation Models

The environment uses Pydantic models for type-safe, validated data exchange between agent and environment. Both MethanolAPCAction and MethanolAPCObservation inherit from OpenEnv's base Action and Observation classes.

from methanol_apc_env.models import MethanolAPCAction, MethanolAPCObservation

MethanolAPCAction

The agent sends an action every step. It contains 13 continuous control variables organized into 5 subsystems:

Feed Controls (required — no defaults)

Field	Type	Range	Unit	What It Controls
`feed_rate_h2`	`float`	0–10	mol/s	Hydrogen feed to synthesis reactor. Higher = more reactant available.
`feed_rate_co`	`float`	0–5	mol/s	Carbon monoxide feed. Ideal H₂/CO ratio is 2.0 for methanol synthesis.
`cooling_water_flow`	`float`	0–100	L/min	Shell-side cooling water rate. The primary tool for temperature control.
`compressor_power`	`float`	0–100	kW	Syngas compressor. Higher power → higher reactor pressure → faster reaction.

These 4 fields are required — the agent must always specify them.

Synthesis Loop Controls (optional — have safe defaults)

Field	Type	Range	Default	Unit	What It Controls
`purge_valve_position`	`float`	0–100	2.0	%	Removes inert gases (N₂, CH₄, Ar) from recycle loop. Too high = wasted reactant. Too low = inert buildup reduces conversion.
`recycle_ratio`	`float`	0–8	3.5	—	Ratio of recycled gas to fresh feed. Industrial plants use 3–5. Higher = better per-pass utilization but more compressor load.
`feed_preheat_temp`	`float`	0–300	200.0	°C	Preheats feed gas before reactor entry. Must be high enough for catalyst activation (~200°C) but not so high it causes runaway.

Reformer Controls (optional)

Field	Type	Range	Default	Unit	What It Controls
`reformer_fuel_gas`	`float`	0–20	5.0	mol/s	Fuel to the SMR burner tubes. Controls how much syngas is produced. More fuel = hotter reformer = more H₂ + CO.
`reformer_steam_flow`	`float`	0–50	15.0	mol/s	Steam injection. Target steam-to-carbon ratio is ~3.0. Too low → carbon deposition (coking). Too high → wasted energy.

Distillation Controls (optional)

Field	Type	Range	Default	Unit	What It Controls
`distillation_reflux`	`float`	0–10	3.0	—	Column reflux ratio. Higher = purer methanol product but more reboiler energy needed. Industrial: 2–5.
`reboiler_duty`	`float`	0–200	50.0	kW	Heat input to distillation column bottom. Drives methanol-water separation.

Safety Controls (optional)

Field	Type	Range	Default	Unit	What It Controls
`flare_valve`	`float`	0–100	0.0	%	Emergency pressure relief. Opens to flare stack. Should be 0% in normal operation — any opening wastes product and emits CO₂.

Safe Default Behavior

The action model includes a Pydantic model_validator that replaces 0 with safe operating defaults for optional fields. This prevents agents that only know about the 4 core variables from accidentally setting the reformer/distillation to zero:

# Agent only sends core 4 variables — optional fields get safe defaults
action = MethanolAPCAction(
    feed_rate_h2=5.0,
    feed_rate_co=2.5,
    cooling_water_flow=40.0,
    compressor_power=65.0,
    # purge_valve_position → 2.0 (not 0)
    # recycle_ratio → 3.5 (not 0)
    # reformer_fuel_gas → 5.0 (not 0)
    # etc.
)

Validation

All fields are bounded. Pydantic raises ValidationError if any value is outside its range:

# This raises ValidationError: feed_rate_h2 must be <= 10
action = MethanolAPCAction(feed_rate_h2=15.0, feed_rate_co=2.5,
                            cooling_water_flow=40.0, compressor_power=65.0)

MethanolAPCObservation

Returned by env.step() and env.reset(). Contains 30+ fields organized into categories:

Core Reactor Telemetry

Field	Type	Unit	Description	Typical Range
`temperature`	`float`	°C	Reactor bulk temperature. Most important variable.	220–270 normal, >300 = shutdown
`pressure`	`float`	bar	Reactor pressure. Controlled by compressor.	50–100
`feed_rate_h2`	`float`	mol/s	Current H₂ feed (may differ from setpoint due to rate limits)	0–10
`feed_rate_co`	`float`	mol/s	Current CO feed	0–5
`h2_co_ratio`	`float`	—	H₂/CO molar ratio. Ideal = 2.0 for methanol stoichiometry	1.5–3.0
`cooling_water_flow`	`float`	L/min	Current cooling water rate	0–100
`cooling_water_temp`	`float`	°C	Cooling water inlet temperature. Varies with ambient conditions + disturbances.	20–45
`catalyst_health`	`float`	0–1	Catalyst relative activity. 1.0 = fresh, 0.0 = destroyed. Degrades irreversibly above 280°C.	0.3–1.0

Production & Economics

Field	Type	Unit	Description
`methanol_produced`	`float`	kg	Cumulative methanol output this episode
`reaction_rate`	`float`	mol/s	Current methanol formation rate. Zero if no feed or shutdown.
`profit_this_step`	`float`	$	Revenue minus costs for this timestep
`cumulative_profit`	`float`	$	Total profit this episode
`stoichiometric_number`	`float`	—	SN = (H₂−CO₂)/(CO+CO₂). Target 2.0–2.05. Measures feed quality.
`carbon_efficiency`	`float`	0–1	Fraction of carbon in feed converted to methanol product
`selectivity`	`float`	0–1	Methanol selectivity (rest is DME + methyl formate byproducts)

Synthesis Loop

Field	Type	Unit	Description
`purge_rate`	`float`	mol/s	Gas being purged from recycle loop
`inert_fraction`	`float`	0–1	Inert gas buildup in recycle. High = reduced partial pressures = lower conversion
`recycle_ratio`	`float`	—	Current actual recycle ratio

Reformer

Field	Type	Unit	Description
`reformer_outlet_temp`	`float`	°C	SMR tube outlet temperature. 700–900°C range.
`steam_to_carbon`	`float`	—	Current S/C molar ratio. Target ~3.0.
`syngas_flow`	`float`	mol/s	Total syngas produced by reformer

Distillation

Field	Type	Unit	Description
`product_purity`	`float`	0–1	Methanol mass fraction in product. Grade AA = 0.9985.
`distillation_duty`	`float`	kW	Total energy consumed by distillation column

Safety & Utility

Field	Type	Unit	Description
`flare_flow`	`float`	mol/s	Gas currently being flared. Should be ~0 in normal operation.
`total_co2_emissions`	`float`	kg	Cumulative CO₂ emitted (from combustion + flaring + reformer)
`safety_warning`	`str` or `null`	—	Natural-language safety alert. See Fault Detection for levels.
`temperature_trend`	`float`	°C/step	Rate of temperature change. Positive = heating up.

Episode State

Field	Type	Description
`step_number`	`int`	Current step in the episode (0-indexed)
`max_steps`	`int`	Total steps for this task (50–500 depending on task)
`task_name`	`str`	Name of the active task (e.g., `"optimization"`, `"startup"`)
`done`	`bool`	`True` if episode is over (inherited from OpenEnv `Observation`)
`reward`	`float`	Dense per-step reward in [0.01, 0.99] (inherited from OpenEnv `Observation`)
`rubric_reward`	`float` or `null`	RFC 004 rubric score. Dense during episode, trajectory score at terminal step.

Using Observations

obs = await env.step(action)

# Core monitoring
print(f"T={obs.temperature:.1f}°C  P={obs.pressure:.1f} bar")
print(f"Catalyst: {obs.catalyst_health:.1%}")
print(f"Rate: {obs.reaction_rate:.4f} mol/s")

# Economics
print(f"Step profit: ${obs.profit_this_step:.2f}")
print(f"Total profit: ${obs.cumulative_profit:.2f}")

# Safety
if obs.safety_warning:
    print(f"⚠ {obs.safety_warning}")

# Is the episode over?
if obs.done:
    print(f"Episode ended at step {obs.step_number}")