Reinforcement Learning for Guiding the E Theorem Prover

Automated Theorem Proving (ATP) systems search for a proof in a rapidly growing space of possibilities. Heuristics have a profound impact on search, and ATP systems make heavy use of heuristics. This work uses reinforcement learn- ing to learn a metaheuristic that decides which heuristic to use at e...

Full description

Saved in:
Bibliographic Details
Main Authors: Jack McKeown, Geoff Sutcliffe
Format: Article
Language:English
Published: LibraryPress@UF 2023-05-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:
Online Access:https://journals.flvc.org/FLAIRS/article/view/133334
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automated Theorem Proving (ATP) systems search for a proof in a rapidly growing space of possibilities. Heuristics have a profound impact on search, and ATP systems make heavy use of heuristics. This work uses reinforcement learn- ing to learn a metaheuristic that decides which heuristic to use at each step of a proof search in the E ATP system. Proximal policy optimization is used to dynamically select a heuristic from a fixed set, based on the current state of E. The approach is evaluated on its ability to reduce the number of inference steps used in successful proof searches, as an indicator of in- telligent search.
ISSN:2334-0754
2334-0762