2001 | ||
---|---|---|
2 | Nigel Tao, Jonathan Baxter, Lex Weaver: A Multi-Agent Policy-Gradient Approach to Network Routing. ICML 2001: 553-560 | |
1 | EE | Lex Weaver, Nigel Tao: The Optimal Reward Baseline for Gradient-Based Reinforcement Learning. UAI 2001: 538-545 |
1 | Jonathan Baxter | [2] |
2 | Lex Weaver | [1] [2] |