Enhanced self-configurability and yield in multicore grids
ABSTRACT As we move deeper in the nanotechnology era, computer architecture is solicited to manipulate tremendous numbers of devices per chip with high defect densities. These trends provide new computing opportunities but efficiently exploiting them will require a shift towards novel, highly parallel architectures. Fault tolerant mechanisms will have to be integrated to the design to deal with the low yield of future nanofabrication processes. In this paper we consider multi processor grid (MPG) architectures that assure scalability beyond hundreds of cores per chip. We study self-diagnosis and self-configuration methods at the architectural level and propose an enhanced self-configuration methodology that enables usage of a maximum percentage of available fault-free cores in MPGs with high defect densities. We show that our approach achieves usability of all fault-free cores for the case of fault-free routers whereas previous work was efficient for defect densities of up to 20-25% of defective cores. We also address the case of faulty routers, achieving usability of almost all fault-free nodes (fault-free cores having a fault-free router) for very high defect densities both in the cores and in the routers.
- SourceAvailable from: Mounir Benabdenbi[show abstract] [hide abstract]
ABSTRACT: In this paper, we present a software approach for localization of faulty components in a 2D-mesh Network-on-Chip, targeting fault tolerance in a shared memory MP2SoC architecture. We use a pre-existing and distributed hardware infrastructure supporting self-test and de-activation of the faulty components (routers and communication channels), that are transformed into “black hole”. We detail the software method used to localize these “black holes”, and centralize the information in a single point, where a modified global routing function can be defined. This embedded software makes an extensive use of a distributed fault-tolerant configuration firmware assisted by a Distributed Cooperative Configuration Infrastructure (DCCI), that is also presented. Finally, “black hole” detection and localization coverage is evaluated.VLSI Test Symposium (VTS), 2011 IEEE 29th; 06/2011