Machines that halt resolve the undecidability of artificial intelligence alignment
Abstract The inner alignment problem, which asserts whether an arbitrary artificial intelligence (AI) model satisfices a non-trivial alignment function of its outputs given its inputs, is undecidable. This is rigorously proved by Rice’s theorem, which is also equivalent to a reduction to Turing’s Ha...
Saved in:
| Main Authors: | Gabriel A. Melo, Marcos R. O. A. Máximo, Nei Y. Soma, Paulo A. L. Castro |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-99060-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Explaining the undecidability of first-order logic
by: Timm Lampert, et al.
Published: (2024-12-01) -
EVALUASI RUTE DAN HALTE BUS DI KOTA BANDUNG
by: Astri Mutia Ekasari
Published: (2021-10-01) -
The effectiveness of the Dutch juvenile diversion program Halt: study protocol for a randomized controlled trial
by: Benthe J. van Delft, et al.
Published: (2025-07-01) -
Particles ‘halt’ and ‘eben’ in German: Functions, Combinatorial Compatibility, and Syntactic Properties
by: A. V. Averina, et al.
Published: (2025-03-01) -
Modelling Qualia with Physical Computers
by: Zoltán Sóstai
Published: (2024-12-01)