Policy Similarity Measure for Two-Player Zero-Sum Games

Policy space response oracles (PSRO) is an important algorithmic framework for approximating Nash equilibria in two-player zero-sum games. Enhancing policy diversity has been shown to improve the performance of PSRO in this approximation process significantly. However, existing diversity metrics are...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongsong Tang, Liuyu Xiang, Zhaofeng He
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/5/2815
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850031416603049984
author Hongsong Tang
Liuyu Xiang
Zhaofeng He
author_facet Hongsong Tang
Liuyu Xiang
Zhaofeng He
author_sort Hongsong Tang
collection DOAJ
description Policy space response oracles (PSRO) is an important algorithmic framework for approximating Nash equilibria in two-player zero-sum games. Enhancing policy diversity has been shown to improve the performance of PSRO in this approximation process significantly. However, existing diversity metrics are often prone to redundancy, which can hinder optimal strategy convergence. In this paper, we introduce the policy similarity measure (PSM), a novel approach that combines Gaussian and cosine similarity measures to assess policy similarity. We further incorporate the PSM into the PSRO framework as a regularization term, effectively fostering a more diverse policy population. We demonstrate the effectiveness of our method in two distinct game environments: a non-transitive mixture model and Leduc poker. The experimental results show that the PSM-augmented PSRO outperforms baseline methods in reducing exploitability by approximately 7% and exhibits greater policy diversity in visual analysis. Ablation studies further validate the benefits of combining Gaussian and cosine similarities in cultivating more diverse policy sets. This work provides a valuable method for measuring and improving the policy diversity in two-player zero-sum games.
format Article
id doaj-art-489f08c9d5eb4f95bd04bf8c4806acce
institution DOAJ
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-489f08c9d5eb4f95bd04bf8c4806acce2025-08-20T02:58:58ZengMDPI AGApplied Sciences2076-34172025-03-01155281510.3390/app15052815Policy Similarity Measure for Two-Player Zero-Sum GamesHongsong Tang0Liuyu Xiang1Zhaofeng He2School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaPolicy space response oracles (PSRO) is an important algorithmic framework for approximating Nash equilibria in two-player zero-sum games. Enhancing policy diversity has been shown to improve the performance of PSRO in this approximation process significantly. However, existing diversity metrics are often prone to redundancy, which can hinder optimal strategy convergence. In this paper, we introduce the policy similarity measure (PSM), a novel approach that combines Gaussian and cosine similarity measures to assess policy similarity. We further incorporate the PSM into the PSRO framework as a regularization term, effectively fostering a more diverse policy population. We demonstrate the effectiveness of our method in two distinct game environments: a non-transitive mixture model and Leduc poker. The experimental results show that the PSM-augmented PSRO outperforms baseline methods in reducing exploitability by approximately 7% and exhibits greater policy diversity in visual analysis. Ablation studies further validate the benefits of combining Gaussian and cosine similarities in cultivating more diverse policy sets. This work provides a valuable method for measuring and improving the policy diversity in two-player zero-sum games.https://www.mdpi.com/2076-3417/15/5/2815game theoryreinforcement learningmulti-agent systemspolicy diversity
spellingShingle Hongsong Tang
Liuyu Xiang
Zhaofeng He
Policy Similarity Measure for Two-Player Zero-Sum Games
Applied Sciences
game theory
reinforcement learning
multi-agent systems
policy diversity
title Policy Similarity Measure for Two-Player Zero-Sum Games
title_full Policy Similarity Measure for Two-Player Zero-Sum Games
title_fullStr Policy Similarity Measure for Two-Player Zero-Sum Games
title_full_unstemmed Policy Similarity Measure for Two-Player Zero-Sum Games
title_short Policy Similarity Measure for Two-Player Zero-Sum Games
title_sort policy similarity measure for two player zero sum games
topic game theory
reinforcement learning
multi-agent systems
policy diversity
url https://www.mdpi.com/2076-3417/15/5/2815
work_keys_str_mv AT hongsongtang policysimilaritymeasurefortwoplayerzerosumgames
AT liuyuxiang policysimilaritymeasurefortwoplayerzerosumgames
AT zhaofenghe policysimilaritymeasurefortwoplayerzerosumgames