Jian-Jia Chen's Lab
Institution: Karlsruhe Institute of Technology
Featured research (10)
To avoid race conditions and ensure data integrity, resource synchronization protocols have been widely studied in real-time systems for decades, providing systematical policies to guarantee a bound on priority inversion-induced blocking time and the avoidance of deadlocks. However, the corresponding realization is often based on assumed abstractions and necessary adaptions in a real-time operating system, by which the theoretically proven properties of such a protocol may not be delivered, leading to potential mismatches. To prevent such mismatches, in this work, we propose to contract the obligations of involved primitives and operations, and apply the deductive verification on a corresponding implementation. To this end, we present a modularized verification framework and demonstrate its applicability by verifying the official implementation of the Immediate Ceiling Priority Protocol (ICPP) and the Multiprocessor Resource Sharing Protocol (MrsP) in RTEMS, resulting in the discovery of long-stayed mismatches for both synchronization protocols. To resolve them, we provide a possible remedy for the ICPP and an additional precondition regarding nested locking for the MrsP.
Specialized hardware accelerators beyond von-Neumann, that offer processing capability in where the data resides without moving it, become inevitable in data-centric computing. Emerging non-volatile memories, like Ferroelectric Field-Effect Transistor (FeFET), are able to build compact Logic-in-Memory (LiM). In this work, we investigate the probability of error (Perror) in FeFET-based XNOR LiM, demonstrating the new trade-off between the speed and reliability. Using our reliability model, we present how Binarized Neural Networks (BNNs) can be proactively trained in the presence of XNOR-induced errors towards obtaining robust BNNs at the design time. Furthermore, leveraging the trade-off between Perror and speed, we present a run-time adaptation technique, that selectively trades-off Perror and XNOR speed for every BNN layer. Our results demonstrate that when a small loss (e.g., 1%) in inference accuracy could be accepted, our design-time and run-time techniques provide error-resilient BNNs that exhibit 75% and 50% (FashionMNIST) and 38% and 24% (CIFAR10) XNOR speedups, respectively.
For timing-sensitive edge applications, the demand for efficient lightweight machine learning solutions has increased recently. Tree ensembles are among the state-of-the-art in many machine learning applications. While single decision trees are comparably small, an ensemble of trees can have a significant memory footprint leading to cache locality issues, which are crucial to performance in terms of execution time. In this work, we analyze memory-locality issues of the two most common realizations of decision trees, i.e. native and if-else trees. We highlight, that both realizations demand a more careful memory layout to improve caching behavior and maximize performance. We adopt a probabilistic model of decision tree inference to find the best memory layout for each tree at the application layer. Further, we present an efficient heuristic to take architecture-dependent information into account thereby optimizing the given ensemble for a target computer architecture. Our code-generation framework, which is freely available on an open-source repository, produces optimized code sessions while preserving the structure and accuracy of the trees. With several real-world data sets, we evaluate the elapsed time of various tree realizations on server hardware as well as embedded systems for Intel and ARM processors. Our optimized memory layout achieves a reduction in execution time up to 75 % execution for server-class systems, and up to 70 % for embedded systems, respectively.