Environment API

The MethanolAPCEnvironment class is the core of the simulation. It implements OpenEnv's Environment interface, managing the reactor simulation, task selection, grading, MCP tools, domain randomization, and multi-agent views.

from methanol_apc_env.server.methanol_environment import MethanolAPCEnvironment
from methanol_apc_env.models import MethanolAPCAction

Lifecycle

Every episode follows this sequence:

reset(task_name, seed)  →  step(action)  →  step(action)  →  ...  →  done=True  →  get_final_score()

`reset(task_name: str, seed: int = None) → MethanolAPCObservation`

Starts a new episode. Resets the reactor to the initial conditions defined by the chosen task.

Parameters:

Parameter	Type	Required	Description
`task_name`	`str`	Yes	One of 12 task names (see Tasks below)
`seed`	`int`	No	Random seed for domain randomization. Same seed = same initial conditions.

What happens on reset:

Task configuration loaded (initial temperature, pressure, catalyst health, disturbances)
ReactorState initialized with task-specific values
Domain randomization applied (if seed given):
- Catalyst health: ±3% Gaussian noise
- Temperature: ±2°C
- Cooling water temp: ±1.5°C
- Pressure: ±1.5 bar
- H₂ feed: ±0.15 mol/s
- CO feed: ±0.08 mol/s
Plant stages initialized (desulfurization, reformer, distillation)
Trajectory cleared
Initial observation returned

env = MethanolAPCEnvironment()
obs = env.reset(task_name="optimization", seed=42)
# obs.temperature ≈ 250°C, obs.catalyst_health ≈ 1.0

`step(action: MethanolAPCAction) → MethanolAPCObservation`

Executes one control step. This is where the physics simulation runs.

What happens per step:

Rate limiting: Action values are clamped and rate-limited (can't change feed rate by more than ~20% per step)
Plant stages: Desulfurization → Reformer → feed composition updated
Recycle loop: Fresh feed mixed with recycled gas, inerts accumulated, purge applied
Partial pressures: Species partial pressures calculated at reactor conditions
Fugacity corrections: SRK cubic EOS applies fugacity coefficients to H₂, CO, CO₂, CH₃OH, H₂O
Kinetic model: Selected model (LHHW/Graaf/VBF/Seyfert/Nestler) calculates reaction rates for 3 simultaneous reactions
RK4 integration: 4th-order Runge-Kutta with 4 sub-steps integrates the energy balance ODE
Multi-bed quench: Temperature profile across 4 catalyst beds with cold-shot injection
Catalyst deactivation: 3-zone model updates catalyst health (irreversible above 280°C)
Condensation: 96% crude methanol recovery
Byproducts: DME + methyl formate formation based on selectivity model
Economics: Revenue (methanol × spot price) minus costs (feed + energy + cooling)
Process noise: ±1°C temperature, ±5% rate, ±0.3 bar pressure
Disturbances: Task-specific events (e.g., cooling failure at step 25)
Safety check: Emergency shutdown if T ≥ 300°C
Reward: 6-component dense reward computed and clamped to (0.01, 0.99)
Observation: All sensor readings packaged into MethanolAPCObservation

action = MethanolAPCAction(
    feed_rate_h2=5.0, feed_rate_co=2.5,
    cooling_water_flow=40.0, compressor_power=65.0
)
obs = env.step(action)

`get_final_score() → float`

Returns the trajectory-based score for the completed episode. Must be called after done=True.

Score computation:

The task-specific grader evaluates the full trajectory (list of all ReactorState objects from the episode)
Raw score (0.0–1.0) is passed through a centered sigmoid: $\text{score} = 0.01 + 0.98 \cdot \sigma(10 \cdot (x - 0.5))$
Result is strictly in (0.01, 0.99) — never exactly 0 or 1

# After episode ends
score = env.get_final_score()
# 0.01 = terrible, 0.50 = mediocre, 0.99 = near-perfect

`get_metrics() → dict`

Returns 3 KPI metrics for the episode:

Metric	Type	Description	Formula
`economic_regret`	`float`	Profit left on the table vs theoretical max	$\text{max_possible} - \text{actual_profit}$
`constraint_violations`	`int`	Number of steps where T > 280°C or T < 180°C	Count of unsafe timesteps
`adaptability_score`	`float`	Temperature stability (0–1, higher = more stable)	$\frac{1}{1 + \text{variance}/100}$

metrics = env.get_metrics()
print(f"Regret: ${metrics['economic_regret']:.2f}")
print(f"Violations: {metrics['constraint_violations']}")
print(f"Adaptability: {metrics['adaptability_score']:.3f}")

`get_shift_context() → dict`

Returns game-theoretic context for day/night pricing scenarios:

ctx = env.get_shift_context()
# {
#     "shift": "day",
#     "gas_price": 0.002,
#     "electricity_price": 0.08,
#     "nash_equilibrium_note": "Optimal: day 90% capacity, night 70%",
#     "recommended_strategy": "Push production during cheap electricity"
# }

Tasks

12 scenarios with increasing difficulty:

Task Name	Steps	Difficulty	Initial T	Initial Catalyst	Key Challenge
`startup`	50	Easy	150°C	1.0	Ramp from cold to 250°C
`optimization`	100	Medium	250°C	1.0	Maximize profit at steady state
`cost_minimization`	100	Medium	250°C	1.0	Minimize OPEX while maintaining output
`maximum_yield`	200	Medium	250°C	1.0	Push methanol output to maximum
`disturbance_rejection`	100	Medium	250°C	1.0	Cooling water temp jumps to 45°C at step 25
`emergency_recovery`	80	Hard	290°C	1.0	Start near shutdown, cool without crashing
`feed_composition_upset`	100	Hard	250°C	1.0	H₂/CO ratio shifts unexpectedly
`pressure_loss`	100	Hard	250°C	1.0	Compressor degrades mid-episode
`aged_catalyst`	100	Hard	250°C	0.4	Operate profitably with 40% catalyst
`day_night_cycle`	150	Hard	250°C	1.0	Cooling water oscillates 25→35→25→35°C
`long_horizon_production`	500	Hard	250°C	1.0	Extended run with catalyst aging
`multi_disturbance`	150	Expert	250°C	1.0	Cascading failures: cooling at 25, worse at 50

MCP Tools

The environment exposes 4 Model Context Protocol tools via env.mcp_server. These allow LLM agents to query external context before making control decisions.

`get_energy_pricing() → str`

Returns current natural gas and electricity spot prices. Prices vary with regional configuration and time-of-day.

"Natural Gas: $3.42/MMBtu | Electricity: $0.11/kWh"

Agent use: Throttle production during price spikes. If gas is expensive, reduce feed rates; if electricity is cheap, push compressor harder.

`get_catalyst_status(temperature: float, hours_online: float) → str`

Predicts catalyst health degradation based on current temperature and runtime.

"Catalyst health: 0.92 | Predicted life: 4200 hours at 252°C | Risk: LOW"

Agent use: If health is dropping fast, reduce temperature to extend catalyst life. If health is already low, accept lower conversion and avoid high temperatures.

`get_maintenance_schedule() → str`

Returns equipment maintenance status and upcoming windows.

"Compressor: OK (next service in 720h) | Heat exchanger: FOULED (efficiency -8%) | Catalyst: 2100h to replacement"

Agent use: If maintenance is imminent, pre-emptively reduce load. If heat exchanger is fouled, compensate with more cooling water.

`calculate_carbon_footprint(methanol_kg: float, fuel_mol: float) → str`

Calculates CO₂ emissions intensity for current production.

"Total CO2: 142.3 kg | Intensity: 1.42 kg CO2/kg MeOH | EU ETS cost: $8.54"

Agent use: If emissions approach regulatory limits, reduce reformer fuel or increase methanol yield per unit of CO₂ generated.

Reward Function

The dense per-step reward has 6 components:

Component	Range	Weight	What It Measures
Profit	-0.2 to +0.4	High	Step revenue minus costs
Safety	-0.3 to +0.2	High	Distance from 300°C shutdown limit
Stability	0 to +0.1	Low	Low temperature variance between steps
Catalyst	0 to +0.1	Low	Catalyst health preservation
Progress	varies	Medium	Task-specific: approaching target, maintaining production, etc.
Shutdown	-1.0	Critical	Emergency shutdown penalty (dominates all other terms)

The raw reward is mapped through a sigmoid and affine transform to strictly (0.01, 0.99).

Domain Randomization

On each reset(), Gaussian noise is added to initial conditions. This trains agents that are robust to plant-to-plant variation:

Parameter	Noise	Purpose
Catalyst health	±3%	Different catalyst ages
Temperature	±2°C	Sensor calibration error
Cooling water temp	±1.5°C	Ambient temperature variation
Pressure	±1.5 bar	Compressor wear
H₂ feed	±0.15 mol/s	Flow meter accuracy
CO feed	±0.08 mol/s	Flow meter accuracy

Per-step noise is also applied:

Parameter	Noise	Purpose
Temperature	±1°C	Sensor noise
Reaction rate	±5%	Catalyst surface heterogeneity
Pressure	±0.3 bar	Compressor pulsation
Cooling water temp	Brownian drift	Weather/cooling tower variation