J.M. Pierre LangloisPolytechnique Montréal · Department of Computer and Software Engineering
J.M. Pierre Langlois
PhD
About
123
Publications
21,772
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,387
Citations
Introduction
Additional affiliations
July 2005 - July 2015
Publications
Publications (123)
Programmable network data planes have extended the capabilities of packet processing in network devices by allowing custom processing pipelines and agnostic packet processing. While a variety of applications can be implemented on current programmable data planes, there are significant constraints due to hardware limitations. One way to meet these c...
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in...
The P4 language has drastically changed the networking field as it allows to quickly describe and implement new networking applications. Although a large variety of applications can be described with the P4 language, current programmable switch architectures impose significant constraints on P4 programs. To address this shortcoming, FPGAs have been...
The P4 language has drastically changed the networking field as it
allows to quickly describe and implement new networking appli-
cations. Although a large variety of applications can be described
with the P4 language, current programmable switch architectures
impose significant constraints on P4 programs. To address this
shortcoming, FPGAs have be...
Oscillations in the granule cell layer (GCL) of the cerebellar cortex have been related to behavior and could facilitate communication with the cerebral cortex. These local field potential (LFP) oscillations, strong at 4-12 Hz in the rodent cerebellar cortex during awake immobility, should also be an indicator of an underlying influence on the patt...
Convolutional Neural Networks (CNNs) have proven to be extremely accurate for image recognition, even outperforming human recognition capability. When deployed on battery-powered mobile devices, efficient computer architectures are required to enable fast and energy-efficient computation of costly convolution operations. Despite recent advances in...
The emergence of P4, a domain specific language , coupled to PISA, a domain specific architecture, is revolutionizing the networking field. P4 allows to describe how packets are processed by a programmable data plane, spanning ASICs and CPUs, implementing PISA. Because the processing flexibility can be limited on ASICs, while the CPUs performance f...
The emergence of P4, a domain specific language, coupled to PISA, a domain specific architecture, is revolutionizing the networking field. P4 allows to describe how packets are processed by a programmable data plane, spanning ASICs and CPUs, implementing PISA. Because the processing flexibility can be limited on ASICs, while the CPUs performance fo...
The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memo...
Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in long processing time and low battery life. An important factor in designing CNN hardware accelerators is to e...
Convolutional Neural Networks (CNNs) and Deep Neural Networks (DNNs) have gained significant popularity in several classification and regression applications. The massive computation and memory requirements of DNN and CNN architectures pose particular challenges for their FPGA implementation. Moreover, programming FPGAs requires hardware-specific k...
High-Level Synthesis (HLS) brings FPGAs to audiences previously unfamiliar to hardware design. However, achieving the highest Quality-of-Results (QoR) with HLS is still unattainable for most programmers. This requires detailed knowledge of FPGA architecture and hardware design in order to produce FPGA-friendly codes. Moreover, these codes are norma...
Blood vessel segmentation from high-resolution fundus images is a necessary step in several retinal pathologies detection. Automatic blood vessel segmentation is a computing-intensive task, which raises the need for acceleration with hardware architectures. In this paper, we propose two architectures for blood vessel segmentation using a matched fi...
Cloud Radio Access Network is foreseen as one of the key features of the future 5G mobile communication standard. In this context, all the baseband processing is intended to be performed on CPUs in order to keep a high level of flexibility. The challenge is then to propose efficient software implementation of baseband processing algorithms that gua...
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an...
Bloom filters (BFs) are widely utilised to speed up string matching in crucial network applications such as real-time intrusion detection and spam filters. This study introduces a new approach to improve the efficiency of BFs for string matching functions. The approach splits each target string into two substrings and considers the second substring...
In this work, we present a simple yet effective image deblurring method to produce ringing-free deblurred images. Our work is inspired by the observation that large-scale deblurring ringing artifacts are measurable through a multiresolution pyramid of low-pass filtering of the blurred-deblurred image pair. We propose to model such a quantification...
Due to the emergence of new network applications, current IP lookup engines must support high-bandwidth, low lookup latency and the ongoing growth of IPv6 networks. However, existing solutions are not designed to address jointly those three requirements. This paper introduces SHIP, an IPv6 lookup algorithm that exploits prefix characteristics to bu...
Packet parsing is a key step in SDN-aware devices. Packet parsers in SDN networks need to be both reconfigurable and fast, to support the evolving network protocols and the increasing multi-gigabit data rates. The combination of packet processing languages with FPGAs seems to be the perfect match for these requirements. In this work, we develop an...
Deep Packet Inspection systems such as Snort and Bro express complex rules with regular expressions. In Snort, the search of a regular expression is performed with a Non-deterministic Finite Automaton (NFA). Traversing an NFA sequentially with a CPU is not deterministic in time, and it can be very time consuming. The sequential traversal of an NFA...
In this paper, a novel BCD multiplier approach is proposed. The main highlight of the proposed architecture is the generation of the partial products and parallel binary operations based on 2-digit columns. 1 × 1-digit multipliers used for the partial product generation are implemented directly by 4-bit binary multipliers without any code conversio...
Implementing an accurate and fast activation function with low cost is a crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high accuracy approximation approach for the hyperbolic tangent activation function of artificial neurons in DNNs. It is based on the Discrete Cosine Transform Interpolation Filter (DCTI...
Snort and Bro are Deep Packet Inspection systems which express complex rules with regular expressions. Before performing a regular expression search, these applications apply a filter to select which regular expressions must be searched. One way to search a regular expression is through a Nondeterministic Finite Automaton (NFA). Traversing an NFA i...
This paper presents a memory efficient architecture that implements the Multi-Scale Line Detector (MSLD) algorithm for real-time retinal blood vessel detection in fundus images on a Zynq FPGA. This implementation benefits from the FPGA parallelism to drastically reduce the memory requirements of the MSLD from two images to a few values. The archite...
P4 is an emergent packet-processing language where the user can describe how the packets are to be processed in a switching element. This paper presents a way to implement advanced operations that are not directly supported in P4. In this work, two different ways to add extensions to P4 are presented: i) using new native primitives and ii) using ex...
Implementing an accurate and fast activation function with low cost is a crucial aspect to the implementation of Deep Neural Networks (DNNs) on FPGAs. We propose a high-accuracy approximation approach for the hyperbolic tangent activation function of artificial neurons in DNNs. It is based on the Discrete Cosine Transform Interpolation Filter (DCTI...
Spatial Averaging Filters (SAF) are extensively used in image processing for image smoothing and denoising. Their latest implementations have already achieved constant time computational complexity regardless of kernel size. However, all the existing O(1) algorithms require additional memory for temporary data storage. In order to minimize memory u...
The development of an automatic telemedicine system for computer-aided screening and grading of diabetic retinopathy depends on reliable detection of retinal lesions in fundus images. In this paper, a novel method for automatic detection of both microaneurysms and hemorrhages in color fundus images is described and validated. The main contribution...
The increasing complexity of cyber-attacks necessitates the design of more efficient hardware architectures for real-time Intrusion Detection Systems (IDSs). String matching is the main performance-demanding component of an IDS. An effective technique to design high-performance string matching engines is to partition the target set of strings into...
In this paper, we analyze the performance and cost trade-off from selecting two representations of nodes when implementing the Aho-Corasick algorithm. This algorithm can be used for pattern matching in network-based intrusion detection systems such as Snort. Our analysis uses the Snort 2.9.7 rules set, which contains almost 26k patterns. Our method...
Biomedical Signal and Image Processing presents a variety of challenges in terms of the dimensionality of the data being processed, nature of the processing involved, the expected data throughput and the peculiar requirements for embedded applications. The processing involved often entails the detection of patterns of interest such as lesions in ti...
Application-specific customisation of micro-processor architectures has been widely accepted as an effective way to improve the efficiency of processor-based designs. In this work, the authors propose a new processor customisation method based on fixed-point word-length optimisation. Accuracy-aware word-length optimisation (WLO) of fixed-point circ...
This paper presents a reliable non-blind method to measure intrinsic lens blur. We first introduce an accurate camera-scene alignment framework that avoids erroneous homography estimation and camera tone curve estimation. This alignment is used to generate a sharp correspondence of a target pattern captured by the camera. Second, we introduce a Poi...
Rodent monitoring in biomedical laboratories is a time consuming and tedious task. Several automatic solutions that rely on different types of sensors have been proposed. Computer vision provides a significantly more universal and less intrusive solution. In this article we propose a new method to detect and classify three behaviors in rodents: exp...
In this paper, we present a method to measure the body temperature of an animal using a thermographic camera in hyperthermia experiments, where the heat contrast between the animal and its background is low. This work was done in the context of the study of artificially induced atypical febrile seizures. In order to measure the temperature of a mov...
In this paper, we propose a parallel systematic resampling (PSR) algorithm for particle filters, which is a new form of systematic resampling (SR). The PSR algorithm makes iterations independent, thus allowing the resampling algorithm to perform loop iterations in parallel. A fixed-point version of the PSR algorithm is also proposed, with a modific...
The Sparse Matrix-Vector multiplication (SpMV) is an algorithm used in many fields. Since the introduction of CUDA and general purpose programming on GPUs, several efforts to optimize it have been reported. SpMV optimization is complex due to irregular memory accesses depending on the nonzero element distribution of the matrix. In this paper, we pr...
This paper proposes a computationally efficient importance sampling algorithm applicable to computer vision tracking. The algorithm is based on the CONDENSATION algorithm, but it avoids expensive operations that are costly in real-time embedded systems. It also includes a method that reduces the number of particles during execution and a new resamp...
We present a new method to detect and remove ringing artifacts produced by the deconvolution process in image deblurring techniques. The method takes into account non-invertible frequency components of the blur kernel used in the deconvolution. Efficient Gabor wavelets are produced for each non-invertible frequency and applied on the deblurred imag...
This paper presents a novel approach for automatic detection of microaneurysms and haemorrhages in fundus images. First, it begins with a preprocessing stage for shade correction, contrast enhancement and denoising. Second, all regional minima with sufficient contrast are extracted and considered as candidates. Third, in an image flooding scheme, a...
Retinal image quality assessment is an important step in automated eye disease diagnosis. Diagnosis accuracy is highly dependent on the quality of retinal images, because poor image quality might prevent the observation of significant eye features and disease manifestations. A robust algorithm is therefore required in order to evaluate the quality...
Bit-width allocation has a crucial impact on hardware efficiency and accuracy of fixed-point arithmetic circuits. This paper introduces a new accuracy-guaranteed word-length optimization approach for feed-forward fixed-point designs. This method uses affine arithmetic, which is a well-known analytical technique, for both range and precision analyse...
This paper introduces a new approach for finite-precision error modeling based on affine arithmetic. The paper demonstrates that there is a common hazard in affine arithmetic-based error modeling methods described in the literature. The hazard is linked to early substitution of the signal terms that emerge in operations such as multiplication and d...
Particle filters (PFs) are computationally intensive, which prevents them from being widely used in some real-time applications with high throughput requirements. A parallel implementation is a feasible approach to enable using PFs in these applications. However, effective resampling algorithms such as the Systematic Resampling (SR) algorithm are s...
In this paper, we propose a uniform quantization likelihood evaluation (UQLE) algorithm for particle filters (PFs). This algorithm simplifies the exact likelihood evaluation (ELE) algorithm, the most computationally demanding function in PFs, by using a uniform quantization scheme to generate approximated weights. Simulation results indicate that P...
Processor customisation is an effective technique to enhance performance across an application domain. In this study, the authors present a new customised soft processor development environment called polytechnique customised soft processor (PolyCuSP), which bridges the gap between architecture description languages (ADLs) and extensible soft proce...
Proposed is a parallel array histogram architecture (PAHA) suitable for embedded implementations. The PAHA uses a register array instead of a memory block to store the histogram bins. In each step, M inputs can be processed in parallel to update the histogram bins without any additional latency. Also described is a second version of the PAHA with a...
Image pyramids are multi-scale representations of images, and their calculation is computationally intensive. They can be a main bottleneck in image processing and computer vision tasks such as edge detection and feature extraction. Thus, high speed computation of image pyramids is necessary. Moreover, when these algorithms are intended for embedde...
In this study, asymmetric non-pipelined large size unsigned and signed multipliers are implemented using symmetric and asymmetric embedded multipliers, look-up tables and dedicated adders in field programmable gate arrays (FPGAs). Decompositions of the operands are performed for the efficient use of the embedded blocks. Partial products are organis...
This study proposes an enhanced version of the five-field motion compensated deinterlacing algorithm. The proposed method applies bi-directional motion estimation using two previous and two subsequent fields. It uses an array of flags to determine if the missing pixels of two previous frames are calculated from original pixels or pre-filtered data....
This paper presents a systematic approach to the design of application-specific instruction-set processors for high speed computation of local neighborhood functions and intra-field deinterlacing. The intended application is real-time processing of high definition video. The approach aims at an efficient utilization of the available memory bandwidt...
Computer vision is a non-invasive method for monitoring laboratory animals. In this article, we propose a robust tracking method that is capable of extracting a rodent from a frame under uncontrolled normal laboratory conditions. The method consists of two steps. First, a sliding window combines three features to coarsely track the animal. Then, it...
The continual measurement of the body temperature of a moving subject in a non-invasive way is a challenging task. However,
doing so enables the observation of important phenomena with not much inconvenience to the subject, and can be a powerful
tool for understanding physiological reactions to diseases and medications. In this paper, we present a...
This paper introduces a framework to develop and characterize digital circuits using Carbon Nanotube Field Effect Transistors (CNFET). We define a 4-step process that involves design capture, pre-processing, circuit simulation and results extraction and interpretation. The initial work leading to this framework involves the selection of appropriate...
Tone-mapping (TM) aims to adapt high dynamic range images to conventional display devices. TM algorithms are usually implemented on general purpose processors and graphics processing units. Such platforms may not meet performance, area, power and flexibility constraints imposed by the embedded system domain. This paper presents the design and imple...
Particle filters have been widely used for video tracking due to their robustness. However, most particle filter algorithm implementations are computationally expensive which makes them ill-suited for real-time embedded systems. There have been some attempts to provide hardware implementations for the particle filter, but none of them tried to simp...
This study proposes a new hybrid video deinterlacing algorithm method featuring a novel approach to qualify the reliability of motion vectors. The algorithm switches between motion-compensated and enhanced edge-based line averaging (ELA) methods based on motion vector reliability. When the motion vectors are calculated, reverse motion estimation (R...
Determining the motion pattern of laboratory animals is very important in order to monitor their reaction to various stimuli. In this paper, we propose a robust method to track animals, and consequently determine their motion pattern. The method is designed to work under uncontrolled normal laboratory conditions. It consists of two steps. The first...
This paper presents a method to track an animal in low-contrast thermographic images in order to obtain its body temperature. This work was done in the context of the study of atypical febrile seizures. To solve this tracking problem, we propose a method based on morphological operations on the area to track using regions resulting from consecutive...
This paper presents a fine-grained configurable processor model used to generate image processing Application Specific Instruction Set Processors (ASIPs). A methodology to develop a minimal instruction set ASIP with the processor model is also proposed. The methodology is based on using specialized instructions in conjunction with Instruction Set A...
Image contrast enhancement methods play a key role in many image processing and vision applications. For surveillance applications, real-time contrast improvement over the whole image is required when videos are taken in poor lighting conditions. It is also necessary to highlight details in shadowed regions without introducing artifacts. In this pa...
Unmanned aerial vehicles (UAV) are subject to unforeseen events in harsh environment. Embedded autonomous real-time path re-planning is a possible solution to this issue. Evolutionary algorithms have shown to be an excellent means to optimise the generation of UAV paths but their slow iterative process prevent them to be used for real-time computat...
This paper presents two optimized design approaches of two’s complement large size squarers using embedded multipliers in
FPGAs. The realization of one of the approaches is based on Baugh–Wooley’s algorithm and the other one is a new sign-extension
technique. To achieve efficient implementation, a set of optimized schemes for the realization of mul...
This paper presents an embedded implementation approach of land vehicle navigation involving a Multi Sensor System (MSS) consisting of a single-axis gyroscope and an odometer integrated with GPS receiver. With the assumption that the vehicle stays mostly in the horizontal plane, the vehicle speed obtained from the odometer measurements is decompose...
We present techniques used to create a high performance application-specific instruction-set processor (ASIP) implementation of the Pattern-Based Directional Interpolation (PBDI) intra-field deinterlacing algorithm. The proposed techniques focus primarily on an efficient utilization of the available memory bandwidth. They include the use of Very Lo...
This paper presents a design methodology for the implementation of GPS/INS navigation system on Field Programmable Gate Arrays (FPGA). The method proposed in this research is examined using data from three-axis accelerometers and gyroscopes integrated with GPS for a road test experiment in a land vehicle. The designs are described in software which...
In this study, we demonstrate that gamma oscillations (30-50 Hz) recorded in the local field potentials (LFP) of the hippocampus are a marker of temporal lobe seizure propagation and that the level of LFP synchrony in the amygdalo-hippocampal network, during these oscillations, is related to the severity of seizures. Sprague-Dawley rats were given...
In epilepsy research, using a wide range of sensors can help to automatically detect the occurrence of seizures and to understand their underlying mechanisms. One such sensor is a thermographic camera that can measure the surface temperature of the body. This sensor may have an important role in investigating seizures as studies have shown that the...