Skip to content

多智能体强化学习代码汇总(pytorch)

Algorithms

We provide three types of MARL algorithms as our baselines including:

Independent Learning

  • IQL
  • DDPG
  • PG
  • A2C
  • TRPO
  • PPO

Centralized Critic

  • COMA
  • MADDPG
  • MAAC
  • MAPPO
  • MATRPO
  • HATRPO
  • HAPPO

Value Decomposition

  • VDN
  • QMIX
  • FACMAC
  • VDAC
  • VDPPO

Here is a chart describing the characteristics of each algorithm:

AlgorithmSupport Task ModeNeed Central InformationDiscrete ActionContinuous ActionLearning CategorizeType
IQL*cooperative collaborative competitive mixed✔️Independent LearningOff Policy
PGcooperative collaborative competitive mixed✔️✔️Independent LearningOn Policy
A2Ccooperative collaborative competitive mixed✔️✔️Independent LearningOn Policy
DDPGcooperative collaborative competitive mixed✔️Independent LearningOff Policy
TRPOcooperative collaborative competitive mixed✔️✔️Independent LearningOn Policy
PPOcooperative collaborative competitive mixed✔️✔️Independent LearningOn Policy
COMAcooperative collaborative competitive mixed✔️✔️Centralized CriticOn Policy
MADDPGcooperative collaborative competitive mixed✔️✔️Centralized CriticOff Policy
MAA2C*cooperative collaborative competitive mixed✔️✔️✔️Centralized CriticOn Policy
MATRPO*cooperative collaborative competitive mixed✔️✔️✔️Centralized CriticOn Policy
MAPPOcooperative collaborative competitive mixed✔️✔️✔️Centralized CriticOn Policy
HATRPOCooperative✔️✔️✔️Centralized CriticOn Policy
HAPPOCooperative✔️✔️✔️Centralized CriticOn Policy
VDNCooperative✔️Value DecompositionOff Policy
QMIXCooperative✔️✔️Value DecompositionOff Policy
FACMACCooperative✔️✔️Value DecompositionOff Policy
VDACCooperative✔️✔️✔️Value DecompositionOn Policy
VDPPO*Cooperative✔️✔️✔️Value DecompositionOn Policy

IQL is the multi-agent version of Q learning. MAA2C and MATRPO are the centralized version of A2C and TRPO. VDPPO is the value decomposition version of PPO.

Awesome Repos

Here we provide a table for the comparison of MARLlib and existing work.

LibraryGithub StarsTask ModeSupported EnvAlgorithmParameter SharingAsynchronous InteractFramework
PyMARLGitHub starscooperative1Independent Learning(1) + Centralized Critic(1) + Value Decomposition(3)full-sharing*
PyMARL2GitHub starscooperative1Independent Learning(1) + Centralized Critic(1) + Value Decomposition(9)full-sharingPyMARL
MARL-AlgorithmsGitHub starscooperative1CTDE(6) + Communication(1) + Graph(1) + Multi-task(1)full-sharing*
EPyMARLGitHub starscooperative4Independent Learning(3) + Value Decomposition(4) + Centralized Critic(2)full-sharing + non-sharingPyMARL
MAlibGitHub starsself-play2 + PettingZoo + OpenSpielPopulation-basedfull-sharing + group-sharing + non-sharing✔️*
MAPPO BenchmarkGitHub starscooperative4MAPPO(1)full-sharing + non-sharing✔️pytorch-a2c-ppo-acktr-gail
MARLlibcooperative collaborative competitive mixed10 + PettingZooIndependent Learning(6) + Centralized Critic(7) + Value Decomposition(5)full-sharing + group-sharing + non-sharing✔️Ray/Rllib

Some comments

  1. starry-sky6688

这套代码简单易上手,适合初学者入门。包含IQL、QMIX、VDN、COMA、QTRAN、MAVEN、CommNet、DyMA-CL、G2ANet和MADDPG。

  1. pymarl

牛津大学whiteson组的代码库,非常模块化的代码,是那种很好用但也很难上手的类型。包括QMIX、COMA、VDN、IQL、QTRAN。

  1. pymarl2(351星)

pymarl的改进版本, 增加了一些code-level tricks。

  1. epymarl(139星)

epymarl的扩展版本,在pymarl的基础上增加了IA2C、IPPO、MADDPG、MAA2C、MAPPO。

  1. shariqiqbal2810(307星)

MAAC作者写的代码,挺简洁的。包括MADDPG、MAAC。

  1. marlbenchmark(509星/102星)

清华大学的代码库,包含MAPPO、QMIX、VDN、MADDPG和MATD3,其中VDN和MATD3没有经过完整测试。

  1. marllib(70星)

一个涵盖了大多主流MARL算法的代码库,基于ray的rllib,属于那种模块化做得特别好,但上手需要花些时间的代码,包含independence learning (IQL, A2C, DDPG, TRPO, PPO), centralized critic learning (COMA, MADDPG, MAPPO, HATRPO), and value decomposition (QMIX, VDN, FACMAC, VDA2C)。

Environments

Most of the popular environments in MARL research are supported by MARLlib:

Env NameLearning ModeObservabilityAction SpaceObservations
LBFcooperative + collaborativeBothDiscreteDiscrete
RWAREcooperativePartialDiscreteDiscrete
MPEcooperative + collaborative + mixedBothBothContinuous
SMACcooperativePartialDiscreteContinuous
MetaDrivecollaborativePartialContinuousContinuous
MAgentcollaborative + mixedPartialDiscreteDiscrete
Pommermancollaborative + competitive + mixedBothDiscreteDiscrete
MAMuJoCocooperativePartialContinuousContinuous
GRFcollaborative + mixedFullDiscreteContinuous
HanabicooperativePartialDiscreteDiscrete

Each environment has a readme file, standing as the instruction for this task, talking about env settings, installation, and some important notes.

Maintained by Robin