多智能体强化学习代码汇总（pytorch）

Algorithms

We provide three types of MARL algorithms as our baselines including:

Independent Learning

IQL
DDPG
PG
A2C
TRPO
PPO

Centralized Critic

COMA
MADDPG
MAAC
MAPPO
MATRPO
HATRPO
HAPPO

Value Decomposition

VDN
QMIX
FACMAC
VDAC
VDPPO

Here is a chart describing the characteristics of each algorithm:

Algorithm	Support Task Mode	Need Central Information	Discrete Action	Continuous Action	Learning Categorize	Type
IQL*	cooperative collaborative competitive mixed		✔️		Independent Learning	Off Policy
PG	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
A2C	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
DDPG	cooperative collaborative competitive mixed			✔️	Independent Learning	Off Policy
TRPO	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
PPO	cooperative collaborative competitive mixed		✔️	✔️	Independent Learning	On Policy
COMA	cooperative collaborative competitive mixed	✔️	✔️		Centralized Critic	On Policy
MADDPG	cooperative collaborative competitive mixed	✔️		✔️	Centralized Critic	Off Policy
MAA2C*	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
MATRPO*	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
MAPPO	cooperative collaborative competitive mixed	✔️	✔️	✔️	Centralized Critic	On Policy
HATRPO	Cooperative	✔️	✔️	✔️	Centralized Critic	On Policy
HAPPO	Cooperative	✔️	✔️	✔️	Centralized Critic	On Policy
VDN	Cooperative		✔️		Value Decomposition	Off Policy
QMIX	Cooperative	✔️	✔️		Value Decomposition	Off Policy
FACMAC	Cooperative	✔️		✔️	Value Decomposition	Off Policy
VDAC	Cooperative	✔️	✔️	✔️	Value Decomposition	On Policy
VDPPO*	Cooperative	✔️	✔️	✔️	Value Decomposition	On Policy

IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.

Awesome Repos

Here we provide a table for the comparison of MARLlib and existing work.

Library	Task Mode	Supported Env	Algorithm	Parameter Sharing	Asynchronous Interact	Framework
PyMARL	cooperative	1	Independent Learning(1) + Centralized Critic(1) + Value Decomposition(3)	full-sharing		*
PyMARL2	cooperative	1	Independent Learning(1) + Centralized Critic(1) + Value Decomposition(9)	full-sharing		PyMARL
MARL-Algorithms	cooperative	1	CTDE(6) + Communication(1) + Graph(1) + Multi-task(1)	full-sharing		*
EPyMARL	cooperative	4	Independent Learning(3) + Value Decomposition(4) + Centralized Critic(2)	full-sharing + non-sharing		PyMARL
MAlib	self-play	2 + PettingZoo + OpenSpiel	Population-based	full-sharing + group-sharing + non-sharing	✔️	*
MAPPO Benchmark	cooperative	4	MAPPO(1)	full-sharing + non-sharing	✔️	pytorch-a2c-ppo-acktr-gail
MARLlib	cooperative collaborative competitive mixed	10 + PettingZoo	Independent Learning(6) + Centralized Critic(7) + Value Decomposition(5)	full-sharing + group-sharing + non-sharing	✔️	Ray/Rllib

Some comments

starry-sky6688

这套代码简单易上手，适合初学者入门。包含IQL、QMIX、VDN、COMA、QTRAN、MAVEN、CommNet、DyMA-CL、G2ANet和MADDPG。

pymarl

牛津大学whiteson组的代码库，非常模块化的代码，是那种很好用但也很难上手的类型。包括QMIX、COMA、VDN、IQL、QTRAN。

https://github.com/oxwhirl/pymarl

pymarl2（351星）

pymarl的改进版本, 增加了一些code-level tricks。

epymarl（139星）

epymarl的扩展版本，在pymarl的基础上增加了IA2C、IPPO、MADDPG、MAA2C、MAPPO。

https://github.com/uoe-agents/epymarl

shariqiqbal2810（307星）

MAAC作者写的代码，挺简洁的。包括MADDPG、MAAC。

marlbenchmark（509星/102星）

清华大学的代码库，包含MAPPO、QMIX、VDN、MADDPG和MATD3，其中VDN和MATD3没有经过完整测试。

marllib（70星）

一个涵盖了大多主流MARL算法的代码库，基于ray的rllib，属于那种模块化做得特别好，但上手需要花些时间的代码，包含independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C)。

https://github.com/Replicable-MARL/MARLlib

Environments

Most of the popular environments in MARL research are supported by MARLlib:

Env Name	Learning Mode	Observability	Action Space	Observations
LBF	cooperative + collaborative	Both	Discrete	Discrete
RWARE	cooperative	Partial	Discrete	Discrete
MPE	cooperative + collaborative + mixed	Both	Both	Continuous
SMAC	cooperative	Partial	Discrete	Continuous
MetaDrive	collaborative	Partial	Continuous	Continuous
MAgent	collaborative + mixed	Partial	Discrete	Discrete
Pommerman	collaborative + competitive + mixed	Both	Discrete	Discrete
MAMuJoCo	cooperative	Partial	Continuous	Continuous
GRF	collaborative + mixed	Full	Discrete	Continuous
Hanabi	cooperative	Partial	Discrete	Discrete

Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.

多智能体强化学习代码汇总（pytorch） ​

Algorithms ​

Independent Learning ​

Centralized Critic ​

Value Decomposition ​

Awesome Repos ​

Some comments ​

Environments ​