多智能体强化学习代码库汇总

1. Algorithms

We provide three types of MARL algorithms as our baselines including:

1.1 Independent Learning

IQL
DDPG
PG
A2C
TRPO
PPO

1.2 Centralized Critic

COMA
MADDPG
MAAC
MAPPO
MATRPO
HATRPO
HAPPO

1.3 Value Decomposition

VDN
QMIX
FACMAC
VDAC
VDPPO

Here is a chart describing the characteristics of each algorithm:

Algorithm	Support Task Mode	Need Central Information	Discrete Action	Continuous Action	Learning Categorize	Type
IQL*	cooperative collaborative competitive mixed		✔️		Independent Learning	Off Policy
PG	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
A2C	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
DDPG	cooperative collaborative competitive mixed			✔️	Independent Learning	Off Policy
TRPO	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
PPO	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
COMA	cooperative collaborative competitive mixed	✔️	✔️		Centralized Critic	On Policy
MADDPG	cooperative collaborative competitive mixed	✔️		✔️	Centralized Critic	Off Policy
MAA2C*	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
MATRPO*	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
MAPPO	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
HATRPO	Cooperative	✔️	✔️	✔️	Centralized Critic	On Policy
HAPPO	Cooperative	✔️	✔️	✔️	Centralized Critic	On Policy
VDN	Cooperative		✔️		Value Decomposition	Off Policy
QMIX	Cooperative	✔️	✔️		Value Decomposition	Off Policy
FACMAC	Cooperative	✔️		✔️	Value Decomposition	Off Policy
VDAC	Cooperative	✔️	✔️	✔️	Value Decomposition	On Policy
VDPPO*	Cooperative	✔️	✔️	✔️	Value Decomposition	On Policy

IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.

2. Awesome Repos

Here we provide a table for the comparison of MARLlib and existing work.

Library	Task Mode	Supported Env	Algorithm	Parameter Sharing	Asynchronous Interact	Framework
PyMARL	cooperative	1	Independent Learning(1) + Centralized Critic(1) + Value Decomposition(3)	full-sharing		*
PyMARL2	cooperative	1	Independent Learning(1) + Centralized Critic(1) + Value Decomposition(9)	full-sharing		PyMARL
MARL-Algorithms	cooperative	1	CTDE(6) + Communication(1) + Graph(1) + Multi-task(1)	full-sharing		*
EPyMARL	cooperative	4	Independent Learning(3) + Value Decomposition(4) + Centralized Critic(2)	full-sharing + non-sharing		PyMARL
MAlib	self-play	2 + PettingZoo + OpenSpiel	Population-based	full-sharing + group-sharing + non-sharing	✔️	*
MAPPO Benchmark	cooperative	4	MAPPO(1)	full-sharing + non-sharing	✔️	pytorch-a2c-ppo-acktr-gail
MARLlib	cooperative collaborative competitive mixed	10 + PettingZoo	Independent Learning(6) + Centralized Critic(7) + Value Decomposition(5)	full-sharing + group-sharing + non-sharing	✔️	Ray/Rllib

3. Some comments

starry-sky6688

这套代码简单易上手，适合初学者入门。包含 IQL、QMIX、VDN、COMA、QTRAN、MAVEN、CommNet、DyMA-CL、G2ANet 和 MADDPG。

pymarl

牛津大学 whiteson 组的代码库，非常模块化的代码，是那种很好用但也很难上手的类型。包括 QMIX、COMA、VDN、IQL、QTRAN。

https://github.com/oxwhirl/pymarl

pymarl2 （351 星）

pymarl 的改进版本, 增加了一些 code-level tricks。

epymarl （139 星）

epymarl 的扩展版本，在 pymarl 的基础上增加了 IA2C、IPPO、MADDPG、MAA2C、MAPPO。

https://github.com/uoe-agents/epymarl

shariqiqbal2810 （307 星）

MAAC 作者写的代码，挺简洁的。包括 MADDPG、MAAC。

marlbenchmark （509 星/102 星）

清华大学的代码库，包含 MAPPO、QMIX、VDN、MADDPG 和 MATD3，其中 VDN 和 MATD3 没有经过完整测试。

marllib （70 星）

一个涵盖了大多主流 MARL 算法的代码库，基于 ray 的 rllib，属于那种模块化做得特别好，但上手需要花些时间的代码，包含 independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C)。

https://github.com/Replicable-MARL/MARLlib

4. Environments

Most of the popular environments in MARL research are supported by MARLlib:

Env Name	Learning Mode	Observability	Action Space	Observations
LBF	cooperative + collaborative	Both	Discrete	Discrete
RWARE	cooperative	Partial	Discrete	Discrete
MPE	cooperative + collaborative + mixed	Both	Both	Continuous
SMAC	cooperative	Partial	Discrete	Continuous
MetaDrive	collaborative	Partial	Continuous	Continuous
MAgent	collaborative + mixed	Partial	Discrete	Discrete
Pommerman	collaborative + competitive + mixed	Both	Discrete	Discrete
MAMuJoCo	cooperative	Partial	Continuous	Continuous
GRF	collaborative + mixed	Full	Discrete	Continuous
Hanabi	cooperative	Partial	Discrete	Discrete

Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.

多智能体强化学习代码库汇总 ​

1. Algorithms ​

1.1 Independent Learning ​

1.2 Centralized Critic ​

1.3 Value Decomposition ​

2. Awesome Repos ​

3. Some comments ​

4. Environments ​