Discovery of Exact Equations for Integer Sequences

Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and...

Full description

Saved in:
Bibliographic Details
Main Authors: Boštjan Gec, Sašo Džeroski, Ljupčo Todorovski
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/23/3745
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850106602700406784
author Boštjan Gec
Sašo Džeroski
Ljupčo Todorovski
author_facet Boštjan Gec
Sašo Džeroski
Ljupčo Todorovski
author_sort Boštjan Gec
collection DOAJ
description Equation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and, therefore, noisy, moving the focus of equation discovery algorithms towards discovering approximate equations. These loosely match the noisy observed data, rendering them inappropriate for applications in mathematics. In this article, we introduce <i>Diofantos</i>, an algorithm for discovering equations in the ring of integers that exactly match the training data. <i>Diofantos</i> is based on a reformulation of the equation discovery task into the task of solving linear Diophantine equations. We empirically evaluate the performance of <i>Diofantos</i> on reconstructing known equations for more than 27,000 sequences from the online encyclopedia of integer sequences, OEIS. <i>Diofantos</i> successfully reconstructs more than 90% of these equations and clearly outperforms SINDy, a state-of-the-art method for discovering approximate equations, that achieves a reconstruction rate of less than 70%.
format Article
id doaj-art-d4af84af4e0a4402875fb255b3133c9c
institution OA Journals
issn 2227-7390
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-d4af84af4e0a4402875fb255b3133c9c2025-08-20T02:38:47ZengMDPI AGMathematics2227-73902024-11-011223374510.3390/math12233745Discovery of Exact Equations for Integer SequencesBoštjan Gec0Sašo Džeroski1Ljupčo Todorovski2Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, SloveniaDepartment of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, SloveniaDepartment of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, SloveniaEquation discovery, also known as symbolic regression, is the field of machine learning that studies algorithms for discovering quantitative laws, expressed as closed-form equations or formulas, in collections of observed data. The latter is expected to come from measurements of physical systems and, therefore, noisy, moving the focus of equation discovery algorithms towards discovering approximate equations. These loosely match the noisy observed data, rendering them inappropriate for applications in mathematics. In this article, we introduce <i>Diofantos</i>, an algorithm for discovering equations in the ring of integers that exactly match the training data. <i>Diofantos</i> is based on a reformulation of the equation discovery task into the task of solving linear Diophantine equations. We empirically evaluate the performance of <i>Diofantos</i> on reconstructing known equations for more than 27,000 sequences from the online encyclopedia of integer sequences, OEIS. <i>Diofantos</i> successfully reconstructs more than 90% of these equations and clearly outperforms SINDy, a state-of-the-art method for discovering approximate equations, that achieves a reconstruction rate of less than 70%.https://www.mdpi.com/2227-7390/12/23/3745machine learningequation discoverysymbolic regressionDiophantine equationsonline encyclopedia of integer sequences (OEIS)
spellingShingle Boštjan Gec
Sašo Džeroski
Ljupčo Todorovski
Discovery of Exact Equations for Integer Sequences
Mathematics
machine learning
equation discovery
symbolic regression
Diophantine equations
online encyclopedia of integer sequences (OEIS)
title Discovery of Exact Equations for Integer Sequences
title_full Discovery of Exact Equations for Integer Sequences
title_fullStr Discovery of Exact Equations for Integer Sequences
title_full_unstemmed Discovery of Exact Equations for Integer Sequences
title_short Discovery of Exact Equations for Integer Sequences
title_sort discovery of exact equations for integer sequences
topic machine learning
equation discovery
symbolic regression
Diophantine equations
online encyclopedia of integer sequences (OEIS)
url https://www.mdpi.com/2227-7390/12/23/3745
work_keys_str_mv AT bostjangec discoveryofexactequationsforintegersequences
AT sasodzeroski discoveryofexactequationsforintegersequences
AT ljupcotodorovski discoveryofexactequationsforintegersequences