Multi-Agent Architecture
The environment supports decomposing plant control into specialized sub-agents that mirror how real methanol plants are organized. Each agent manages a subsystem with its own local observations and action space.
from methanol_apc_env.agents import (
ReformerAgent, SynthesisAgent,
PurificationAgent, SupervisoryAgent
)
Why Multi-Agent?
A real methanol plant has different operators for different sections:
| Real Plant Role | Agent Class | Scope |
|---|---|---|
| Reformer operator | ReformerAgent |
Natural gas → syngas conversion |
| Board operator (reactor) | SynthesisAgent |
Synthesis loop: reactor + recycle + purge |
| Distillation operator | PurificationAgent |
Crude MeOH → Grade AA product |
| Shift supervisor | SupervisoryAgent |
Plant-wide coordination, conflict resolution |
Using multiple agents is optional. A single agent controlling all 13 variables works fine. Multi-agent is for:
- Training specialized policies that are easier to learn
- Studying coordination between plant sections
- Modeling real organizational structure for sim-to-real transfer
Agent Classes
ReformerAgent
Controls the Steam Methane Reformer (SMR) — the upstream section that converts natural gas and steam into syngas.
Controls: reformer_fuel_gas, reformer_steam_flow
Observes:
| Field | What It Means |
|---|---|
reformer_outlet_temp |
Tube outlet temperature (target: 800–900°C) |
steam_to_carbon |
S/C ratio (target: ~3.0, below 2.5 → coking risk) |
syngas_flow |
Total syngas produced (should match reactor demand) |
temperature |
Reactor temperature (shared — helps anticipate demand) |
catalyst_health |
If catalyst is degraded, reduce syngas aggressively |
Rule-based strategy: Maintains S/C ≈ 3.0 and adjusts fuel gas to match downstream reactor temperature. If reactor is too hot, reduces syngas production. If catalyst health < 0.5, reduces output to extend life.
reformer = ReformerAgent()
obs = env.reset(task_name="optimization")
r_action = reformer.rule_based_action(obs)
# r_action = {"reformer_fuel_gas": 5.2, "reformer_steam_flow": 15.6}
SynthesisAgent
Controls the synthesis reactor, recycle loop, and purge system — the core of the plant.
Controls: feed_rate_h2, feed_rate_co, cooling_water_flow, compressor_power, purge_valve_position, recycle_ratio, feed_preheat_temp
Observes:
| Field | What It Means |
|---|---|
temperature |
THE critical variable. Must stay 220–270°C, never exceed 300°C. |
pressure |
Controls reaction rate. Higher P → faster reaction but more compressor cost. |
h2_co_ratio |
Feed quality. Ideal = 2.0. |
reaction_rate |
Current methanol formation. Should be maximized within safety limits. |
catalyst_health |
Degradation feedback. High T → fast degradation. |
inert_fraction |
Recycle loop quality. High inerts → reduce purge or increase purge. |
cooling_water_temp |
Disturbance variable. If this rises (cooling failure), agent must compensate. |
Rule-based strategy: PI controller on temperature → cooling water flow, with feed rate adjusted to maintain H₂/CO ≈ 2.0. Purge valve opens proportionally to inert fraction.
synthesis = SynthesisAgent()
s_action = synthesis.rule_based_action(obs)
# s_action = {"feed_rate_h2": 5.0, "feed_rate_co": 2.5, "cooling_water_flow": 42.0, ...}
PurificationAgent
Controls the distillation column that separates crude methanol (80% MeOH + 20% water) into Grade AA product (99.85%).
Controls: distillation_reflux, reboiler_duty
Observes:
| Field | What It Means |
|---|---|
product_purity |
Current methanol mass fraction (target: 0.9985) |
distillation_duty |
Energy consumption. Minimize while maintaining purity. |
methanol_produced |
Throughput signal. More crude = need more reboiler. |
temperature |
If reactor is producing more, distillation load increases. |
Rule-based strategy: Maintains reflux ratio at 3.0 and adjusts reboiler duty proportionally to crude methanol flow rate.
purification = PurificationAgent()
p_action = purification.rule_based_action(obs)
# p_action = {"distillation_reflux": 3.0, "reboiler_duty": 52.0}
SupervisoryAgent
The coordinator. Does not have its own controls — instead, it takes the actions from the 3 sub-agents and merges them into a single MethanolAPCAction, resolving any conflicts.
Key method: merge_actions(reformer_action, synthesis_action, purification_action) → MethanolAPCAction
Conflict resolution rules:
- If synthesis agent wants high feed but reformer is reducing syngas → reformer wins (can't feed what isn't produced)
- If temperature is critical (>285°C) → override all agents with emergency cooling
- Safety always takes priority over economics
# Full multi-agent episode
env = MethanolAPCEnvironment()
obs = env.reset(task_name="optimization")
reformer = ReformerAgent()
synthesis = SynthesisAgent()
purification = PurificationAgent()
for step in range(100):
r = reformer.rule_based_action(obs)
s = synthesis.rule_based_action(obs)
p = purification.rule_based_action(obs)
# Supervisory merges and resolves conflicts
action = SupervisoryAgent.merge_actions(r, s, p)
obs = env.step(action)
if obs.done:
break
score = env.get_final_score()
Agent Observation Views
Each agent gets a filtered view of the full observation, containing only the fields relevant to its subsystem. This prevents information leakage and models realistic plant communication:
reformer = ReformerAgent()
view = reformer.observe(obs)
# view.local_state = {"reformer_outlet_temp": 850.0, "steam_to_carbon": 3.0, ...}
# view.shared_state = {"temperature": 252.0, "catalyst_health": 0.95}
# view.controls = ["reformer_fuel_gas", "reformer_steam_flow"]
The AgentObservation dataclass contains:
| Attribute | Description |
|---|---|
local_state |
Variables specific to this agent's subsystem |
shared_state |
Plant-wide variables visible to all agents |
controls |
List of action field names this agent can set |
Training Multi-Agent Systems
For RL training, each agent can be a separate policy network:
# Pseudocode for multi-agent RL
reformer_policy = PPO("reformer", obs_dim=5, act_dim=2)
synthesis_policy = PPO("synthesis", obs_dim=10, act_dim=7)
purification_policy = PPO("purification", obs_dim=4, act_dim=2)
for episode in range(1000):
obs = env.reset(task_name="optimization")
for step in range(100):
r_view = ReformerAgent().observe(obs)
s_view = SynthesisAgent().observe(obs)
p_view = PurificationAgent().observe(obs)
r_act = reformer_policy.act(r_view.local_state)
s_act = synthesis_policy.act(s_view.local_state)
p_act = purification_policy.act(p_view.local_state)
action = SupervisoryAgent.merge_actions(r_act, s_act, p_act)
obs = env.step(action)