Reinforcement learning in signaling game

Statistics and Modeling for Complex Data

We consider a signaling game originally introduced by Skyrms, which models how two interacting players learn to signal each other and thus create a common language. The first rigorous analysis was done by Argiento, Pemantle, Skyrms and Volkov (2009) with 2 states, 2 signals and 2 acts. We study the case of $M_1$ states, $M_2$ signals and $M_1$ acts for general $M_1$, $M_2$ $\in\ensuremath{\mathbb{N}}$. We prove that the expected payoff increases in average and thus converges a.s., and that a limit bipartite graph emerges, such that no signal-state correspondence is associated to both a synonym and an informational bottleneck. Finally, we show that any graph correspondence with the above property is a limit configuration with positive probability.