Miguel Morales-Sandoval

PhD Computer Science
Posdoctoral fellow
Center for Research and Advanc... · Laboratorio de Tecnologías de Información (Tamaulipas)

Publications

  • M. Morales-Sandoval, C. Feregrino-Uribe, P. Kitsos, R. Cumplido
    [Show abstract] [Hide abstract]
    ABSTRACT: Montgomery Multiplication is a common and important algorithm for improving the efficiency of public key cryptographic algorithms, like RSA and Elliptic Curve Cryptography (ECC). A natural choice for implementing this time consuming multiplication defined on finite fields, mainly over GF(2m)GF(2m), is the use of Field Programmable Gate Arrays (FPGAs) for being reconfigurable, flexible and physically secure devices. FPGAs allow the implementation of this kind of algorithms in a broad range of applications with different area–performance requirements. In this paper, we explore alternative architectures for constructing GF(2m)GF(2m) digit-serial Montgomery multipliers on FPGAs based on Linear Feedback Shift Registers (LFSRs) and study their area–performance trade-offs. Different Montgomery multipliers were implemented using several digits and finite fields to compare their performance metrics such as area, memory, latency, clocking frequency and throughput to show suitable configurations for ECC implementations using NIST recommended parameters. The results achieved show a notable improvement against FPGA Montgomery multiplier previously reported, achieving the highest throughput and the best efficiency.
    Computers & Electrical Engineering 02/2013; 39(2):542–549. · 0.93 Impact Factor
  • Advances in Electrical and Computer Engineering 01/2013; 13(2):3-10. · 0.55 Impact Factor
  • Miguel Morales-Sandoval, Arturo Diaz-Perez
    [Show abstract] [Hide abstract]
    ABSTRACT: This work describes a compact FPGA hardware architecture for computing modular multiplications over GF(p) using the Montgomery method, suitable for public key cryptography for embedded or mobile systems. The multiplier is parameterizable, allowing to evaluate the hardware design for different prime fields using different radix of the form β = 2k. The design uses only three k x k multipliers and three 2k-bit adders. The hardware organization of the multiplier maximizes the use of the multipliers processing iteratively the multiplicand, multiplier and modulus. The parametric design allows to study area-performance trade offs, in order to meet system requirements such as available resources, throughput, and efficiency. The proposed multiplier achieves a 1024-bit modular multiplication in 15.63 μs using k = 32. Compared to the most compact FPGA implementation previously reported, our proposed design uses 79% less FPGA resources with better efficiency expressed as Mbps/Slice.
    Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI; 01/2013
  • Miguel Morales-Sandoval, Arturo Diaz-Perez
    [Show abstract] [Hide abstract]
    ABSTRACT: This work describes novel hardware architectures for GF(2k) multipliers using a digit-digit approach. Contrary to the bit-serial and digit-serial approaches previously addressed in the literature, we consider the partition of the multiplier, multiplicand and modulus in several digits and execute a field multiplication in an iterative way, like in a software implementation but exploiting the parallelism in the operations. We focused on parametric designs that allow to study area-performance trade offs when the multipliers are implemented in FPGAs. This study would guide a designer to select the most appropriate configuration based on the digits sizes in order to meet system requirements such as available resources, throughput, and efficiency. Although the proposed multiplier can be implemented for any finite field of order k, we provide implementation results for GF(2163) and GF(2233), two recommended finite fields for elliptic curve cryptography. For specific digit sizes, our proposed digit-digit multiplier uses considerably less area than a bit-serial multiplier with a penalization in the timing. Compared to a digit-serial implementation, area resources can be saved with still an improvement in the timing respect to a bit-serial implementation.
    Proceedings of the 23rd International Conference on Field Programmable Logic and Applications; 01/2013
  • Proceedings of the 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip; 01/2013
  • Miguel Morales-Sandoval, Arturo Diaz-Perez
    [Show abstract] [Hide abstract]
    ABSTRACT: This work describes FPGA hardware architectures of GF(2m) multipliers being more compact than a bit-serial multiplier and outperforming software counterparts. The proposed multiplier is more compact than a hardware implementation of the bit-serial approach, considered the most compact one. Also, the designs proposed still outperform software counterparts. For field multiplication, the multiplicand and modulus are parsed in digits of size d while the multiplier is parsed in digits of size D. Thus, the area complexity of the multiplier is mainly determined by the digits {D, d}, not by the order of the finite field m, which affects only the latency and thus the throughput. This approach allows to implement GF(2m) multipliers for any finite field with practically the same amount of FPGA resources. Several multiplier versions were implemented using different combinations for {D, d} in order to find the most compact designs while achieving better performance and efficiency than a bit-serial multiplier. From a hardware implementation in the xc3s1500 FPGA for the finite field GF(2233), the most efficient multiplier is obtained using the digits {D=6, d= 12}, requiring 70% less area resources than a bit-serial multiplier.
    Proceedings of the 16th Euromicro Conference on Digital System Design; 01/2013
  • 23rd International Conference on Electronics, Communications and Computers, CONIELECOMP 2013; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Digital fingerprinting is a technique that consists of inserting the ID of an authorized user in the digital content that he requests. This technique has been mainly used to trace back pirate copies of multimedia content such as images, audio, and video. This study proposes the use of state-of-the-art digital fingerprinting techniques in the context of restricted distribution of digital documents. In particular, the system proposed by Kuribayashi for multimedia content is investigated. Extensive simulations show the robustness of the proposed system against average collusion attack. Perceptual transparency of the fingerprinted documents is also studied. Moreover, by using an efficient Fast Fourier Transform core and standard computer machines it is shown that the proposed system is suitable for real-world scenarios.
    PLoS ONE 01/2013; 8(12):e81976. · 3.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This work reports an efficient and compact FPGA processor for the SHA-256 algorithm. The novel processor architecture is based on a custom datapath that exploits the reusing of modules, having as main component a 4-input Arithmetic-Logic Unit not previously reported. This ALU is designed as a result of studying the type of operations in the SHA algorithm, their execution sequence and the associated dataflow. The processor hardware architecture was modeled in VHDL and implemented in FPGAs. The results obtained from the implementation in a Virtex5 device demonstrate that the proposed design uses fewer resources achieving higher performance and efficiency, outperforming previous approaches in the literature focused on compact designs, saving around 60% FPGA slices with an increased throughput (Mbps) and efficiency (Mbps/Slice). The proposed SHA processor is well suited for applications like Wi-Fi, TMP (Trusted Mobile Platform), and MTM (Mobile Trusted Module), where the data transfer speed is around 50 Mbps.
    Computers & Electrical Engineering 01/2013; · 0.93 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hash function algorithms are widely used to provide security services of integrity and authentication, being SHA-2 the latest set of hash algorithms standardized by the US Federal Government. The main computation block in SHA-2 algorithms is governed by a loop with high data dependence for which several implementation strategies are explored in this work as well as designs efficiently mapped to hardware architectures. Four new different hardware architectures are proposed to improve the performance of SHA-256 algorithms, reducing the critical path by reordering some operations required at each iteration of the algorithm and computing some values in advance, as possible as data dependence allows. The proposed designs were implemented and validated in the FPGA Virtex-2 XC2VP-7. The achieved results show a significant improvement on the performance of the SHA-256 algorithm compared to similar previously proposed approaches, obtaining a throughput of 909Mbps and an improved efficiency of 0.713Mbps/slice.
    Microprocessors and Microsystems 01/2012; · 0.55 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to design efficient hardware implementations of cryptographic algorithms for a particular application, it is often required to explore several architectures in order to select the one that offers the appropriate trade-off between throughput and hardware resources. A natural choice for performing a design space exploration are the Field Programmable Gate Arrays (FPGAs) for being reconfigurable, flexible and physically secure devices. In this paper we explore several architectures for implementing the SHA-512 algorithm based on the loop unrolling technique and analyze their area-performance trade-offs. The analysis consists on unrolling at different levels the main loop which is the most costly part in the SHA-512 algorithm. The resulting hardware architectures are implemented and analyzed in order to identify the critical path and make decisions on the architectural design. The obtained results provide a practical guide to understand the effect of introducing different levels (1, 2, 4, 5, 8) of unrolling in terms of throughput and hardware resources. The hardware architecture 4x that partially unrolls four iterations of the main loop of the SHA-512 algorithm reports the best performance compared against related works, while the 1x architecture exhibits the best efficiency.
    ISVLSI; 01/2012
  • K Vega-Castillo, A Cortina-Reyes, M Morales-Sandoval
    Revista de Ingeniería Eléctrica, Electrónica y Computación. 01/2012; 10(1):22-29.
  • M Morales-Sandoval, M A Nuño-Maganda
    Revista Tecnointelecto. 01/2012; 9(1):1-14.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Since mobile devices were conceived and commercialized, their market has grown exponentially, so as its problems related to secure data residing in them. Elliptic curve cryptography (ECC) is an approach to public key cryptography (PCK) based on the algebraic structure of elliptic curves over finite fields. It represents the most suitable choice for implementing cryptography in mobile devices since it uses smaller key sizes compared with others traditional public key cryptosystems without decreasing the security level. In this work we present the design of software modules for ECC over .Net Compact Framework (.Net CF) 3.5 well suited for mobile and embedded devices with Windows CE as operating system. The main cores are modules for finite field arithmetic and elliptic curve cryptographic schemes defined over the prime field Zp. These modules are not available neither in the programming language nor the .Net CF. We evaluated the performance of our implementations using the Personal Digital Assistant devices IPAQ 116 and Handheld HP 216, using elliptic curves over the prime field Zp with p=192, 224 and 256, which are key sizes currently recommended by NIST. Our results show that ECC could be implemented in .Net CF with a performance that should be tolerated by most of the users with a high degree of security.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Digital image segmentation is one of the most important stages in the implementation of an Automatic Fingerprint Identification System. This work describes a strategy for image segmentation of latent fingerprints using a proper combination of operators achieving better results than other approaches reported in the literature. Latent fingerprint images are low quality images that make more difficult the segmentation process. The proposed segmentation strategy is based on the gradient magnitude of the image and the detection of regions. This strategy was implemented in Matlab and Java, and was tested using fingerprint images of the Fingerprint Verification Competition databases, which are commonly used for these purposes. The results achieved show a significant improvement compared with representative algorithms of literature, such as those based on the variance of image.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Spiking Neural Networks (SNNs) have become an important research theme due to new discoveries and advances in neurophysiology, which states that information among neurons is interchanged via pulses or spikes. FPGAs are widely used for implementing high performance digital hardware systems, due to its flexibility and because they are suitable for the implementation of systems with high degree of parallelism. FPGAs have become an important tool because fine grain digital elements useful for efficient hardware implementation of SNNs are provided, making FPGA device suitable for implementing SNNs. SNNs are less hardware greedy, and the nature of the pulsed processing is well suited to the digital processing blocks of the FPGA devices. Several computer vision applications have been implemented using SNNs. One of the most critical tasks in computer vision is image clustering. In this paper, a hardware architecture for implementing image clustering using SNNs is reported. Results and performance statistics are provided.
    VLSI (ISVLSI), 2012 IEEE Computer Society Annual Symposium on; 01/2012
  • E. Garcia Amaro, M. A. Nuno-Maganda, M. Morales-Sandoval
    [Show abstract] [Hide abstract]
    ABSTRACT: Biometric identification (BI) is one of the most explored topics in recent years. One of the most important techniques for BI is face recognition. Face recognition systems (FRSs) are an important field in computer vision, because it represents a non-invasive BI technique. In this paper, a FRS is proposed. In the first step, a face detection algorithm is used for extracting faces from video frames (training videos) and generating a face database. In a second step, filtering and preprocessing are applied to face images obtained in the previous step. In a third step, a collection of machine learning algorithms are trained using as input data the faces obtained in the previous step. Finally, the classifiers are used for classify faces obtained from video frames (test videos). The obtained results shows the suitability of this approach for analyzing large collections of videos where previous face labels are not available.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cryptographic algorithms are used to enable security services that are the core of modern communication systems. In particular, Hash functions algorithms are widely used to provide services of data integrity and authentication. These algorithms are based on performing a number of complex operations on the input data, thus it is important to count with novel designs that can be efficiently mapped to hardware architectures. Hash functions perform internal operations in an iterative fashion, which open the possibility of exploring several implementation strategies. In the paper, two different schemes to improve the performance of the hardware implementation of the SHA-2 family of algorithms are proposed. The main focus of the proposed schemes is to reduce the critical path by reordering the operations required at each iteration of the algorithm. Implementation results on an FPGA device show an improvement on the performance on the SHA-256 algorithm when compared against similar previously proposed approaches.
    Digital System Design (DSD), 2011 14th Euromicro Conference on; 10/2011
  • M.A. Nuno-Maganda, M. Morales-Sandoval, C. Torres-Huitzil
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, a high performance hardware coprocessor for CNNs and its interaction with the OpenCV library is reported. Edge detection algorithms reduce the amount of image data to be processed, because only essential information is preserved. There are several approaches for edge detection, one of them is based on Cellular Neural Networks (CNNs). The parallel nature of CNNs makes them suitable to be implemented on a reconfigurable device, such as Field Programmable Gate Arrays (FPGAs). An FPGA implementation of CNNs achieves high performance and flexibility due to fine-grain parallelism of the FPGA-based implementations. CNNs can perform both linear and nonlinear image processing tasks, such as filtering, threshold, various mathematical morphology operations, edge detection, corner detection, etc., but in this paper only the edge detection problem is addressed. Hardware resources and performance comparison are reported.
    Image and Graphics (ICIG), 2011 Sixth International Conference on; 09/2011
  • Source
    M. Morales-Sandoval, C. Feregrino-Uribe, P. Kitsos
    [Show abstract] [Hide abstract]
    ABSTRACT: This work presents novel multipliers for Montgomery multiplication defined on binary fields GF(2m). Different to state of the art Montgomery multipliers, this work uses a linear feedback shift register (LFSR) as the main building block. The authors studied different architectures for bit-serial and digit-serial Montgomery multipliers using the LFSR and the Montgomery factors xm and xm-1. The proposed multipliers are for different classes of irreducible polynomials: general, all one polynomials, pentanomials and trinomials. The results show that the use of LFSRs simplifies the design of the multipliers architecture reducing area resources and retaining high performance compared to related works.
    IET Computers & Digital Techniques 04/2011; · 0.28 Impact Factor

5 Following View all

12 Followers View all