Cases for Applying Multi-Agent Reinforcement Learning

At Silo.AI we have a weekly research club where we look into interesting techniques and methods within the fast-moving field of AI. Last week, I gave a presentation on Multi-agent Reinforcement Learning (MARL), which I compiled into this blog post. The importance of learning in multi-agent environments is widely acknowledged in artificial intelligence. Let’s take a look at MARL and its applications.

Multiple reinforcement learning agents

MARL aims to build multiple reinforcement learning agents in a multi-agent environment. The actions of all the agents are affecting the next state of the system. The agents can have cooperative, competitive, or mixed behaviour in the system.MARL is not a new field or concept, but due to its complex nature, it didn’t get much of an attention until around 20 years ago. However, these days, an increasing amount of researchers are interested in MARL, since it has many promising applications in real world.

The main algorithms in the MARL area

There are three major categories of algorithms in the MARL area: policy based-method, value-based method, and the mix of the two. One of such mixes is the actor-critic method, which is drawing more and more attention in academia.

Applying MARL

The applications for MARL are being researched and tested in academia. At the moment there are not so many use cases that exist in practice. MARL tries to tackle complex problems in complex systems, and therefore thorough tests need to be done before real deployment. However, in some fields, such as online distributed resource allocation and cellular network optimisation, it might be applied in near future as the required level of safety can be more easily achieved. Let’s take a look at some potential applications of MARL.

  1. Online Distributed Resource AllocationApplying multi-agent learning on to come up with effective resource allocation in a network of computing.Zhang, Chongjie, Victor R. Lesser, and Prashant J. Shenoy. "A Multi-Agent Learning Approach to Online Distributed Resource Allocation." IJCAI. Vol. 9. 2009.
  2. Cellular Network OptimisationApplying MARL in LTE networks, guide base stations to maximise mobile service quality.Pandey, Binda. "Adaptive Learning For Mobile Network Management." 2016.
  3. Smart Grid OptimisationApplying MARL to control power flow in an electrical power grid with optimum efficiency.Riedmiller, Martin, Andrew Moore, and Jeff Schneider. "Reinforcement learning for cooperating and communicating reactive agents in electrical power grids." Workshop on Balancing Reactivity and Social Deliberation in Multi-Agent Systems. Springer, Berlin, Heidelberg. 2000.
  4. Smart Cross LightApplying MARL to control traffic lights to minimise wait time for each car in a city, making them more adaptable based estimates of expected wait time.Wiering, M. A. "Multi-agent reinforcement learning for traffic light control." ICML, 2000.

Challenges with MARL

There are many challenges with MARL that are waiting to be tackled. First, all the agents in the system are determining the system state together causing the so-called curse of dimensionality to become a bottleneck for MARL. The agents’ action spaces are interacting with each other and complexity of the system grows exponentially. Second, the non-stationary nature of the multi-agent system makes the problem harder to approach. As all the agents are interacting with the system, the best policy for an agent can change according to other agents’ policies. Another one is the exploration and exploitation trade-off. In multi-agent settings, the exploration space is much larger as the change of other agents will introduce new states that need to be explored. Then how to make sure the system runs stably (exploitation) while evolving at a reasonable rate (exploration) becomes a concern, especially for real life application.


Overall, MARL has great potential because it is much closer to our multi-agent real world. Each individual makes its own decision and optimises its action to finish a task. The MARL agents have the ability to learn from human, cooperate with human, and facilitate human to achieve goals. In Silo.AI's vision, AI is a human-in-the-loop solution, where humans will be augmented by AI and AI can learn from human to better augment human in an iterative means. Reinforcement Learning structure naturally embeds this human-in-the-loop concept. This could be the future direction.

Sources and interesting material

General MARL research summaries -- this git repository has a good collection of MARL papersGit repositories -- A MARL environment that’s compatible with gym -- A MARL library -- Google Dopamine Reinforcement Learning for single agent

Zhen Li
Former AI scientist
Silo AI
Share on Social
Subscribe to our newsletter

Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.

Zhen Li

Former AI scientist

Silo AI

What to read next

Ready to level up your AI capabilities?

Succeeding in AI requires a commitment to long-term product development. Let’s start today.