Discovery of Exact Equations for Integer Sequences

Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and...

Full description

Saved in:
Bibliographic Details
Main Authors: Boštjan Gec, Sašo Džeroski, Ljupčo Todorovski
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/23/3745
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and, therefore, noisy, moving the focus of equation discovery algorithms towards discovering approximate equations. These loosely match the noisy observed data, rendering them inappropriate for applications in mathematics. In this article, we introduce <i>Diofantos</i>, an algorithm for discovering equations in the ring of integers that exactly match the training data. <i>Diofantos</i> is based on a reformulation of the equation discovery task into the task of solving linear Diophantine equations. We empirically evaluate the performance of <i>Diofantos</i> on reconstructing known equations for more than 27,000 sequences from the online encyclopedia of integer sequences, OEIS. <i>Diofantos</i> successfully reconstructs more than 90% of these equations and clearly outperforms SINDy, a state-of-the-art method for discovering approximate equations, that achieves a reconstruction rate of less than 70%.
ISSN:2227-7390