Conference Paper

Architecture Performance Prediction Using Evolutionary Artificial Neural Networks.

DOI: 10.1007/978-3-540-78761-7_18 Conference: Applications of Evolutionary Computing, EvoWorkshops 2008: EvoCOMNET, EvoFIN, EvoHOT, EvoIASP, EvoMUSART, EvoNUM, EvoSTOC, and EvoTransLog, Naples, Italy, March 26-28, 2008. Proceedings
Source: DBLP

ABSTRACT The design of computer architectures requires the setting of multiple parameters on which the final performance depends. The
number of possible combinations make an extremely huge search space. A way of setting such parameters is simulating all the
architecture configurations using benchmarks. However, simulation is a slow solution since evaluating a single point of the
search space can take hours. In this work we propose using artificial neural networks to predict the configurations performance
instead of simulating all them. A prior model proposed by Ypek et al. [1] uses multilayer perceptron (MLP) and statistical
analysis of the search space to minimize the number of training samples needed. In this paper we use evolutionary MLP and
a random sampling of the space, which reduces the need to compute the performance of parameter settings in advance. Results
show a high accuracy of the estimations and a simplification in the method to select the configurations we have to simulate
to optimize the MLP.


Available from: Jj Merelo, Jun 16, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents OSCAR, an optimization methodology exploiting spatial correlation of multicore design spaces. This paper builds upon the observation that power consumption and performance metrics of spatially close design configurations (or points) are statistically correlated. We propose to exploit the correlation by using a response surface model (RSM), i.e., a closed-form expression suitable for predicting the quality of nonsimulated design points. This model is useful during the design space exploration (DSE) phase to quickly converge to the Pareto set of the multiobjective problem without executing lengthy simulations. To this end, we introduce a multiobjective optimization heuristic which iteratively updates and queries the RSM to identify the design points with the highest expected improvement. The RSM allows to consolidate the Pareto set by reducing the number of simulations required, thus speeding up the exploration process. We compare the proposed heuristic with state-of-the-art approaches [conventional, RSM-based, and structured design of experiments (DoEs)]. Experimental results show that OSCAR is a faster heuristic with respect to state-of-the-art techniques such as response-surface Pareto iterative refinement ReSPIR and nondominated-sorting genetic algorithm NSGA-II. In fact, OSCAR used a lower number of simulations to produce a similar solution, i.e., an average of 150 simulations instead of 320 simulations (NSGA-II) and 178 simulations (ReSPIR). When the number of design points is fixed to an average of 300, OSCAR achieves less than 0.6% in terms of average distance with respect to the reference solution while NSGA-II achieves 3.4%. Reported results also show that OSCAR can significantly improve structured DoE approaches by slightly increasing the number of experiments.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2012; 31(5):740-753. DOI:10.1109/TCAD.2011.2177457 · 1.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Architecture exploration for embedded systems is becoming an indispensable tool for System-on-Chip designers. This process requires the evaluation of many architectures that are generated during the exploration process. The evaluation process has a significant impact on the quality of the results and could consume a substantial amount of CPU time. Accordingly, the evaluation process should provide enough accuracy to guide the optimization process to promising points in the design space in reasonable time. In this paper an efficient approach for performance evaluation of embedded systems is proposed. Several cycle-accurate simulations are performed for commercial embedded processors used in our study. The simulation results are used to build Artificial Neural Network (ANN) models with accuracy up to 90% compared to cycle-accurate simulations with a very significant time saving.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous hardware accelerators. With respect to device specific languages, OpenCL enables application portability but does not guarantee performance portability, eventually requiring additional tuning of the implementation to a specific platform or to unpredictable dynamic workloads. In this paper, we present a methodology to analyze the customization space of an OpenCL application in order to improve performance portability and to support dynamic adaptation. We formulate our case study by implementing an OpenCL image stereo-matching application (which computes the relative depth of objects from a pair of stereo images) customized to the STMicroelectronics Platform 2012 many-core computing fabric. In particular, we use design space exploration techniques to generate a set of operating points that represent specific configurations of the parameters allowing different trade-offs between performance and accuracy of the algorithm itself. These points give detailed knowledge about the interaction between the application parameters, the underlying architecture and the performance of the system; they could also be used by a run-time manager software layer to meet dynamic Quality-of-Service (QoS) constraints. To analyze the customization space, we use cycle-accurate simulations for the target architecture. Since the profiling phase of each configuration takes a long simulation time, we designed our methodology to reduce the overall number of simulations by exploiting some important features of the application parameters; our analysis also enables the identification of the parameters that could be explored on a high-level simulation model to reduce the simulation time. The resulting methodology is one order of magnitude more efficient than an exhaustive exploration and, given its randomized nature, it increases the probability to avoid sub-optimal trade-offs.
    Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis; 10/2012