多智能体强化学习代码汇总(pytorch)
Algorithms
We provide three types of MARL algorithms as our baselines including:
Independent Learning
- IQL
- DDPG
- PG
- A2C
- TRPO
- PPO
Centralized Critic
- COMA
- MADDPG
- MAAC
- MAPPO
- MATRPO
- HATRPO
- HAPPO
Value Decomposition
- VDN
- QMIX
- FACMAC
- VDAC
- VDPPO
Here is a chart describing the characteristics of each algorithm:
| Algorithm | Support Task Mode | Need Central Information | Discrete Action | Continuous Action | Learning Categorize | Type |
|---|---|---|---|---|---|---|
| IQL* | cooperative collaborative competitive mixed | ✔️ | Independent Learning | Off Policy | ||
| PG | cooperative collaborative competitive mixed | ✔️ | ✔️ | Independent Learning | On Policy | |
| A2C | cooperative collaborative competitive mixed | ✔️ | ✔️ | Independent Learning | On Policy | |
| DDPG | cooperative collaborative competitive mixed | ✔️ | Independent Learning | Off Policy | ||
| TRPO | cooperative collaborative competitive mixed | ✔️ | ✔️ | Independent Learning | On Policy | |
| PPO | cooperative collaborative competitive mixed | ✔️ | ✔️ | Independent Learning | On Policy | |
| COMA | cooperative collaborative competitive mixed | ✔️ | ✔️ | Centralized Critic | On Policy | |
| MADDPG | cooperative collaborative competitive mixed | ✔️ | ✔️ | Centralized Critic | Off Policy | |
| MAA2C* | cooperative collaborative competitive mixed | ✔️ | ✔️ | ✔️ | Centralized Critic | On Policy |
| MATRPO* | cooperative collaborative competitive mixed | ✔️ | ✔️ | ✔️ | Centralized Critic | On Policy |
| MAPPO | cooperative collaborative competitive mixed | ✔️ | ✔️ | ✔️ | Centralized Critic | On Policy |
| HATRPO | Cooperative | ✔️ | ✔️ | ✔️ | Centralized Critic | On Policy |
| HAPPO | Cooperative | ✔️ | ✔️ | ✔️ | Centralized Critic | On Policy |
| VDN | Cooperative | ✔️ | Value Decomposition | Off Policy | ||
| QMIX | Cooperative | ✔️ | ✔️ | Value Decomposition | Off Policy | |
| FACMAC | Cooperative | ✔️ | ✔️ | Value Decomposition | Off Policy | |
| VDAC | Cooperative | ✔️ | ✔️ | ✔️ | Value Decomposition | On Policy |
| VDPPO* | Cooperative | ✔️ | ✔️ | ✔️ | Value Decomposition | On Policy |
IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.
Awesome Repos
Here we provide a table for the comparison of MARLlib and existing work.
| Library | Github Stars | Task Mode | Supported Env | Algorithm | Parameter Sharing | Asynchronous Interact | Framework |
|---|---|---|---|---|---|---|---|
| PyMARL | cooperative | 1 | Independent Learning(1) + Centralized Critic(1) + Value Decomposition(3) | full-sharing | * | ||
| PyMARL2 | cooperative | 1 | Independent Learning(1) + Centralized Critic(1) + Value Decomposition(9) | full-sharing | PyMARL | ||
| MARL-Algorithms | cooperative | 1 | CTDE(6) + Communication(1) + Graph(1) + Multi-task(1) | full-sharing | * | ||
| EPyMARL | cooperative | 4 | Independent Learning(3) + Value Decomposition(4) + Centralized Critic(2) | full-sharing + non-sharing | PyMARL | ||
| MAlib | self-play | 2 + PettingZoo + OpenSpiel | Population-based | full-sharing + group-sharing + non-sharing | ✔️ | * | |
| MAPPO Benchmark | cooperative | 4 | MAPPO(1) | full-sharing + non-sharing | ✔️ | pytorch-a2c-ppo-acktr-gail | |
| MARLlib | cooperative collaborative competitive mixed | 10 + PettingZoo | Independent Learning(6) + Centralized Critic(7) + Value Decomposition(5) | full-sharing + group-sharing + non-sharing | ✔️ | Ray/Rllib |
Some comments
- starry-sky6688
这套代码简单易上手,适合初学者入门。包含IQL、QMIX、VDN、COMA、QTRAN、MAVEN、CommNet、DyMA-CL、G2ANet和MADDPG。
- pymarl
牛津大学whiteson组的代码库,非常模块化的代码,是那种很好用但也很难上手的类型。包括QMIX、COMA、VDN、IQL、QTRAN。
- pymarl2(351星)
pymarl的改进版本, 增加了一些code-level tricks。
- epymarl(139星)
epymarl的扩展版本,在pymarl的基础上增加了IA2C、IPPO、MADDPG、MAA2C、MAPPO。
- shariqiqbal2810(307星)
MAAC作者写的代码,挺简洁的。包括MADDPG、MAAC。
- marlbenchmark(509星/102星)
清华大学的代码库,包含MAPPO、QMIX、VDN、MADDPG和MATD3,其中VDN和MATD3没有经过完整测试。
- marllib(70星)
一个涵盖了大多主流MARL算法的代码库,基于ray的rllib,属于那种模块化做得特别好,但上手需要花些时间的代码,包含independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C)。
Environments
Most of the popular environments in MARL research are supported by MARLlib:
| Env Name | Learning Mode | Observability | Action Space | Observations |
|---|---|---|---|---|
| LBF | cooperative + collaborative | Both | Discrete | Discrete |
| RWARE | cooperative | Partial | Discrete | Discrete |
| MPE | cooperative + collaborative + mixed | Both | Both | Continuous |
| SMAC | cooperative | Partial | Discrete | Continuous |
| MetaDrive | collaborative | Partial | Continuous | Continuous |
| MAgent | collaborative + mixed | Partial | Discrete | Discrete |
| Pommerman | collaborative + competitive + mixed | Both | Discrete | Discrete |
| MAMuJoCo | cooperative | Partial | Continuous | Continuous |
| GRF | collaborative + mixed | Full | Discrete | Continuous |
| Hanabi | cooperative | Partial | Discrete | Discrete |
Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.