CogniServe: Heterogeneous Server Architecture for Large-Scale Recognition

IEEE Micro (Impact Factor: 1.52). 07/2011; DOI: 10.1109/MM.2011.37
Source: DBLP


As smart mobile devices become pervasive, vendors are offering rich features supported by cloud-based servers to enhance the user experience. Such servers implement large-scale computing environments, where target data is compared to a massive preloaded database. CogniServe is a highly efficient recognition server for large-scale recognition that employs a heterogeneous architecture to provide low-power, high-throughput cores, along with application-specific accelerators.

Download full-text


Available from: Seung Eun Lee, Jul 22, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: An important class of features centered on recognition (i.e. the ability to reconize images, speech, gestures, etc) is rapidly becoming available on embedded systems. This requires high performance computing to support recognition with low-latency, low power and high throughput. In this paper, we investigate the image and speech recognition algorithms and find opportunity to share the computation resources, reducing the cost without losing any system performance.
    Advances in Computer Science, Environment, Ecoinformatics, and Education - International Conference, CSEE 2011, Wuhan, China, August 21-22, 2011. Proceedings, Part II; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High performance SoCs and CMPs integrate multiple cores and hardware accelerators such as network interface devices and speech recognition engines. Cores make use of SRAM organized as a cache. Accelerators make use of SRAM as special-purpose storage such as FIFOs, scratchpad memory, or other forms of private buffers. Dedicated private buffers provide benefits such as deterministic access, but are highly area inefficient due to the lower average utilization of the total available storage. We propose Buffer-integrated-Caching (BiC), which integrates private buffers and traditional caches into a single shared SRAM block. Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. We show that with a small extra area added to the baseline cache, BiC removes the need for large, dedicated SRAMs, with minimal performance impact.
    Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31 - June 04, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Video analytics introduce new levels of intelligence to automated scene understanding. Neuromorphic algorithms, such as HMAX, are proposed as robust and accurate algorithms that mimic the processing in the visual cortex of the brain. HMAX, for instance, is a versatile algorithm that can be repurposed to target several visual recognition applications. This paper presents the design and evaluation of hardware accelerators for extracting visual features for universal recognition. The recognition applications include object recognition, face identification, facial expression recognition, and action recognition. These accelerators were validated on a multi-FPGA platform and significant performance enhancement and power efficiencies were demonstrated when compared to CMP and GPU platforms. Results demonstrate as much as 7.6X speedup and 12.8X more power-efficient performance when compared to those platforms.
    01/2012; DOI:10.1145/2228360.2228465
Show more