![]() |
| 2001 | ||
|---|---|---|
| 2 | Nigel Tao, Jonathan Baxter, Lex Weaver: A Multi-Agent Policy-Gradient Approach to Network Routing. ICML 2001: 553-560 | |
| 1 | EE | Lex Weaver, Nigel Tao: The Optimal Reward Baseline for Gradient-Based Reinforcement Learning. UAI 2001: 538-545 |
| 1 | Jonathan Baxter | [2] |
| 2 | Lex Weaver | [1] [2] |