Warehouse Logistics RL Conveyors • Packing • Docks • AGV

Problem. A small warehouse receives totes at inbound docks, routes some across conveyor belts, packs orders at stations, and sends shipments to outbound docks. An AGV (autonomous cart) moves on a grid to pick/drop totes. Your goal is to learn a policy that maximizes throughput and on‑time deliveries while minimizing congestion and travel time.

Grid size ×

State. \(s=(x,y,c,\phi,t)\) — AGV at cell \((x,y)\), carrying \(c\in\{0,1\}\), phase \(\phi\in\{\)toPick,toPack,toDock,idle\(\}\), time‑to‑deadline \(t\).
Actions. \(a\in\{\uparrow,\downarrow,\leftarrow,\rightarrow,\text{wait}\}\). Belts move carts automatically in their direction.
Reward. \(-0.04\)/step, \(+1\) pickup, \(+3\) pack complete, \(+10\) on‑time delivery (or \(+5-0.1\cdot\)lateness if late), \(-1\) into wall/rock.
Update. \(Q(s,a)\leftarrow Q + \alpha\big[r+\gamma\max_{a'} Q(s',a') - Q(s,a)\big]\).
Auxiliary Rewards. When enabled, provides distance-based guidance, exploration bonuses, and urgency rewards to help learning.
Algorithm. Enhanced Q-learning with experience replay, Q-value normalization, adaptive ε-greedy exploration, and reduced congestion sensitivity.
Algorithm. Enhanced Q-learning with experience replay, Q-value normalization, adaptive ε-greedy exploration, and reduced congestion sensitivity.

Stats & Plots

Demonstration Scenarios

Click a scenario button to load different warehouse configurations

Delivered

—

On-time %

—

Efficiency

Order Flow Distribution

■ Express ■ Standard ■ Bulk

Overall Efficiency

Area chart shows avg reward per step over episodes. Heatmap indicates congested areas.

Avg reward / step

—

Congestion events

Q-value range

—

Adaptive ε

0.15

Q-value range

—

Adaptive ε

0.15

🎯 Reward Shaping: ON - Agent receives auxiliary guidance rewards

Discount γ 0.95 Learning rate α 0.15 Epsilon ε 0.15 Spawn rate (orders/min) Deadline (steps) Seed Show policy Show tile labels Reward Shaping Shaping intensity 0.5 Congestion sensitivity 12 Episode max steps Episode delivery limit