Warehouse Logistics RL Conveyors • Packing • Docks • AGV

Problem. A small warehouse receives totes at inbound docks, routes some across conveyor belts, packs orders at stations, and sends shipments to outbound docks. An AGV (autonomous cart) moves on a grid to pick/drop totes. Your goal is to learn a policy that maximizes throughput and on‑time deliveries while minimizing congestion and travel time.

State. \(s=(x,y,c,\phi,t)\) — AGV at cell \((x,y)\), carrying \(c\in\{0,1\}\), phase \(\phi\in\{\)toPick,toPack,toDock,idle\(\}\), time‑to‑deadline \(t\).
Actions. \(a\in\{\uparrow,\downarrow,\leftarrow,\rightarrow,\text{wait}\}\). Belts move carts automatically in their direction.
Reward. \(-0.04\)/step, \(+1\) pickup, \(+3\) pack complete, \(+10\) on‑time delivery (or \(+5-0.1\cdot\)lateness if late), \(-1\) into wall/rock.
Update. \(Q(s,a)\leftarrow Q + \alpha\big[r+\gamma\max_{a'} Q(s',a') - Q(s,a)\big]\).
Auxiliary Rewards. When enabled, provides distance-based guidance, exploration bonuses, and urgency rewards to help learning.
Algorithm. Enhanced Q-learning with experience replay, Q-value normalization, adaptive ε-greedy exploration, and reduced congestion sensitivity.
Algorithm. Enhanced Q-learning with experience replay, Q-value normalization, adaptive ε-greedy exploration, and reduced congestion sensitivity.

Stats & Plots

Demonstration Scenarios

Click a scenario button to load different warehouse configurations
0
Delivered
On-time %
Efficiency

Order Flow Distribution

Express Standard Bulk

Overall Efficiency

Area chart shows avg reward per step over episodes. Heatmap indicates congested areas.

Avg reward / step
Congestion events
0
Q-value range
Adaptive ε
0.15
Q-value range
Adaptive ε
0.15
🎯 Reward Shaping: ON - Agent receives auxiliary guidance rewards