A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture

We introduce a specialized self-checking hardware journal being used as a centerpiece in our design strategy to build a processor tolerant to transient faults. Fault tolerance here relies on the use of error detection techniques in the processor core together with journalization and rollback executi...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohsin Amin, Abbas Ramazani, Fabrice Monteiro, Camille Diou, Abbas Dandache
Format: Article
Language:English
Published: Wiley 2011-01-01
Series:International Journal of Reconfigurable Computing
Online Access:http://dx.doi.org/10.1155/2011/962062
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832553965147914240
author Mohsin Amin
Abbas Ramazani
Fabrice Monteiro
Camille Diou
Abbas Dandache
author_facet Mohsin Amin
Abbas Ramazani
Fabrice Monteiro
Camille Diou
Abbas Dandache
author_sort Mohsin Amin
collection DOAJ
description We introduce a specialized self-checking hardware journal being used as a centerpiece in our design strategy to build a processor tolerant to transient faults. Fault tolerance here relies on the use of error detection techniques in the processor core together with journalization and rollback execution to recover from erroneous situations. Effective rollback recovery is possible thanks to using a hardware journal and chosing a stack computing architecture for the processor core instead of the usual RISC or CISC. The main objective of the journalization and the hardware self-checking journal is to prevent data not yet validated to be sent to the main memory, and allow to fast rollback execution on faulty situations. The main memory, supposed to be fault secure in our model, only contains valid (uncorrupted) data obtained from fault-free computations. Error control coding techniques are used both in the processor core to detect errors and in the HW journal to protect the temporarily stored data from possible changes induced by transient faults. Implementation results on an FPGA of the Altera Stratix-II family show clearly the relevance of the approach, both in terms of performance/area tradeoff and fault tolerance effectiveness, even for high error rates.
format Article
id doaj-art-b1fa2a0886b342c3882c4115b6b9da4c
institution Kabale University
issn 1687-7195
1687-7209
language English
publishDate 2011-01-01
publisher Wiley
record_format Article
series International Journal of Reconfigurable Computing
spelling doaj-art-b1fa2a0886b342c3882c4115b6b9da4c2025-02-03T05:52:42ZengWileyInternational Journal of Reconfigurable Computing1687-71951687-72092011-01-01201110.1155/2011/962062962062A Self-Checking Hardware Journal for a Fault-Tolerant Processor ArchitectureMohsin Amin0Abbas Ramazani1Fabrice Monteiro2Camille Diou3Abbas Dandache4LICM Laboratory, University Paul Verlaine, Metz, 7 rue Marconi, 57070 Metz, FranceElectrical Engineering Department, Engineering Faculty Lorestan, University Khorramabad, IranLICM Laboratory, University Paul Verlaine, Metz, 7 rue Marconi, 57070 Metz, FranceLICM Laboratory, University Paul Verlaine, Metz, 7 rue Marconi, 57070 Metz, FranceLICM Laboratory, University Paul Verlaine, Metz, 7 rue Marconi, 57070 Metz, FranceWe introduce a specialized self-checking hardware journal being used as a centerpiece in our design strategy to build a processor tolerant to transient faults. Fault tolerance here relies on the use of error detection techniques in the processor core together with journalization and rollback execution to recover from erroneous situations. Effective rollback recovery is possible thanks to using a hardware journal and chosing a stack computing architecture for the processor core instead of the usual RISC or CISC. The main objective of the journalization and the hardware self-checking journal is to prevent data not yet validated to be sent to the main memory, and allow to fast rollback execution on faulty situations. The main memory, supposed to be fault secure in our model, only contains valid (uncorrupted) data obtained from fault-free computations. Error control coding techniques are used both in the processor core to detect errors and in the HW journal to protect the temporarily stored data from possible changes induced by transient faults. Implementation results on an FPGA of the Altera Stratix-II family show clearly the relevance of the approach, both in terms of performance/area tradeoff and fault tolerance effectiveness, even for high error rates.http://dx.doi.org/10.1155/2011/962062
spellingShingle Mohsin Amin
Abbas Ramazani
Fabrice Monteiro
Camille Diou
Abbas Dandache
A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
International Journal of Reconfigurable Computing
title A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
title_full A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
title_fullStr A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
title_full_unstemmed A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
title_short A Self-Checking Hardware Journal for a Fault-Tolerant Processor Architecture
title_sort self checking hardware journal for a fault tolerant processor architecture
url http://dx.doi.org/10.1155/2011/962062
work_keys_str_mv AT mohsinamin aselfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT abbasramazani aselfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT fabricemonteiro aselfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT camillediou aselfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT abbasdandache aselfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT mohsinamin selfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT abbasramazani selfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT fabricemonteiro selfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT camillediou selfcheckinghardwarejournalforafaulttolerantprocessorarchitecture
AT abbasdandache selfcheckinghardwarejournalforafaulttolerantprocessorarchitecture