Table 1 - uploaded by Atheer Akram Abdulrazzaq
Content may be subject to copyright.
Source publication
Exact String matching algorithms has been very significant in many applications in the last two decades. This is due to the advancement in technology that produces large volumes of data. The main factors in string matching algorithms are the number of attempts, the number of character comparison and the running time. These factors are influenced by...
Context in source publication
Context 1
... have been done depending on the performance of each approach group and its algorithms, depending on the character comparison, attempts, type of data, running time, pattern length, and size of data (Abdulrozaq, 2009). The limitations of the algorithms were obtained from previous algorithms, as shown in Table 1. A random example was used to determine the best number of character comparison and attempt; we got the result that mentioned in the table by hand tracing, as shown below: ...
Similar publications
Bending-active tensile hybrid structures refer to the coupling of bending-and tension-form-active components within a unique and stiffer constructions. The variety of possible shapes applicable to the design of these structures demands new ideas around real-time physics-based simulations for the development of more flexible and interactive design s...
Citations
... It evaluates the accuracy of predicted simplified sentences by comparing them to reference and source sentences and explicitly measures the effectiveness of added, deleted, and retained words by the system [39]. An exact string match (ESM) comparison method is used to compare naming entities between the ground truth and system-generated summaries, as illustrated in Fig. 2. ESM is a simple technique that determines whether two string or string array values match exactly [55]. ...
This research addresses the limitations of pretrained models (PTMs) in generating accurate and comprehensive abstractive summaries for scientific articles, with a specific focus on the challenges posed by medical research. The proposed solution named medical text simplification and summarization (MedTSS) introduces a dedicated module designed to enrich source text for PTMs. MedTSS addresses issues related to token limits, reinforces multiple concepts, and mitigates entity hallucination problems without necessitating additional training. Furthermore, the module conducts linguistic analysis to simplify generated summaries, particularly tailored for the complex nature of medical research articles. The results demonstrate a significant enhancement, with MedTSS improving the Rouge-1 score from 16.46 to 35.17 without requiring additional training. By emphasizing knowledge-driven components, this framework offers a distinct perspective, challenging the common narrative of ’more data’ or ’more parameters.’ This alternative approach, especially applicable in health-related domains, signifies a broader contribution to the field of NLP. MedTSS serves as an innovative model that not only addresses the intricacies of medical research summarization but also presents a paradigm shift with implications for diverse domains beyond its initial scope.
... In hashing-based strategies, characters are represented by hash values rather than being compared individually, significantly reducing computational overhead through the comparison of integer values instead of characters [64]. For instance, the Karp-Rabin algorithm [24] employs this method to address string matching challenges, conducting comparisons from left to right. ...
... The first one relies on the principles of parallel computing, reducing the number of operations within the algorithm to match the number of bits in a computer word [65]. This algorithm demonstrates speed and efficiency, particularly when the length of the provided pattern p is shorter than the word length [64]. The second one combines the advantages of different algorithms and is performs better than individual algorithms [66]. ...
Pattern matching algorithms have been studied on numerous occasions, mainly focusing on performance because of the large amount of data used in a matching process. However, a strong focus on performance can entail particular issues like the lack of flexibility to match patterns. As a consequence, programming developers need to tweak matching algorithms in contortive ways or create new specialized ones altogether if their specific needs are not supported. Inspired by the self-replication behavior of cells in biology, we explore and evaluate the design and implementation of an algorithm to flexibly match patterns, named Matcher Cells. Through the composition of simple rules applied to cells, developers can adjust the matching semantics of this algorithm to different needs. We describe this algorithm using a pure functional language as a recipe for any Turing-complete programming language and then offer an object-oriented architecture for languages like Java. To show the flexibility of our proposal, we use a concrete implementation in TypeScript to describe two applications, from different domains, that use pattern matching in a stream of tokens. Additionally, we carry out performance and developer experience empirical evaluations with undergraduate students using Matcher Cells. Finally, we discuss the pros and cons of using a biological-based algorithm, exploiting the compositions of rules, to match patterns.
... Such algorithms are based on the hashing concept in order to produce hashing values and compare patterns rather than performing a direct character comparison. The main benefit from such approach is the considerable improvement of calculation time [13], yet, as with most hashing algorithms, they suffer from the hashing collision problem. Typical examples of such algorithms is the Karp-Rabin which is based on modular arithmetic to perform hashing [14] and the Lecroq algorithm, which first splits the sequence to subsequences and then the pattern matching is performed on each sequence [15]. ...
Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
... This family of algorithms is based on calculating hashing values for the character to match rather than performing exact matching to the character as the abovementioned algorithms. This technique can significantly improve calculation time since it uses integer values for comparison instead of characters [24]. Yet, as is common with hashing, this method suffers from the hashing collision problem when two dissimilar strings are mapped on the same hashing integer [15]. ...
Exact string matching has been a fundamental problem in computer science for decades because of many practical applications. Some are related to common procedures, such as searching in files and text editors, or, more recently, to more advanced problems such as pattern detection in Artificial Intelligence and Bioinformatics. Tens of algorithms and methodologies have been developed for pattern matching and several programming languages, packages, applications and online systems exist that can perform exact string matching in biological sequences. These techniques, however, are limited to searching for specific and predefined strings in a sequence. In this paper a novel methodology (called Ex2SM) is presented, which is a pipeline of execution of advanced data structures and algorithms, explicitly designed for text mining, that can detect every possible repeated string in multivariate biological sequences. In contrast to known algorithms in literature, the methodology presented here is string agnostic, i.e., it does not require an input string to search for it, rather it can detect every string that exists at least twice, regardless of its attributes such as length, frequency, alphabet, overlapping etc. The complexity of the problem solved and the potential of the proposed methodology is demonstrated with the experimental analysis performed on the entire human genome. More specifically, all repeated strings with a length of up to 50 characters have been detected, an achievement which is practically impossible using other algorithms due to the exponential number of possible permutations of such long strings.
... S TRING matching is a universal technique for solving problems of different fields, such as text mining, natural language processing, image processing, speech processing, computer vision, and pattern recognition [1]. Natural language processing is an integral part of multimedia information retrieval. ...
String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches(such as exact string matching, approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This article presents a survey on single-pattern exact string matching algorithms.The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms.
... All the above- g.gligar@su.edu.sa mentioned algorithms that come under Boyer-Moore family [16] enhanced the speed of algorithm either by enhancing a number of shifts thereby reducing the number of character comparisons, changing the direction of scanning and other related criteria. Other approaches that solved the same problem include Hashing proposed by Karp and Rabin [17]. ...
The most popular algorithms used in searching texts involves Exact matching algorithms. These algorithms are used in the areas of natural language processing, networking, and bioinformatics. Boyer Moore algorithm is one of the most widely in searching texts. However, the performance of the Boyer-Moore algorithm and existing exact matching algorithms degrade for Arabic texts, especially for very short patterns. The reason for poor performance is involvement of diacritics in Arabic texts like Digital Quran and involvement of shifting phases in exact matching algorithms. To overcome the time complexity of Arabic Diacritical texts, we present a simple enhanced version of Brute force approach. The proposed algorithm divides the given pattern p into two equal halves and looks for second half only during the search process. Initial experimental results on natural language texts for very short patterns showed that the proposed algorithm showed significant improvement in terms of time complexity over existing approaches and is highly competitive.
... All these applications involve a large amount of data because of the advancement in technology; moreover, all these applications involve different types of alphabets. Therefore, researchers continue to reiterate the need for significant string matching algorithms that can address different types of alphabets and large amounts of data [7]. ...
... However, the Skip Search algorithm consumes much time when a short DNA pattern and protein database are employed [7]. By contrast, the Tuned Boyer-Moore algorithm consumes much time when a long pattern of DNA alphabet is employed [7]. ...
... However, the Skip Search algorithm consumes much time when a short DNA pattern and protein database are employed [7]. By contrast, the Tuned Boyer-Moore algorithm consumes much time when a long pattern of DNA alphabet is employed [7]. This algorithm has two disadvantages. ...
The string matching problem is considered as one of the most interesting research areas in the computer science field because it can be applied in many essential different applications such as intrusion detection, search analysis, editors, internet search engines, information retrieval and computational biology. During the matching process two main factors are used to evaluate the performance of the string matching algorithm which are the total number of character comparisons and the total number of attempts. This study aims to produce an efficient hybrid exact string matching algorithm called Sinan Sameer Tuned Boyer Moore-Quick Skip Search (SSTBMQS) algorithm by blending the best features that were extracted from the two selected original algorithms which are Tuned Boyer-Moore and Quick-Skip Search. The SSTBMQS hybrid algorithm was tested on different benchmark datasets with different size and different pattern lengths. The sequential version of the proposed hybrid algorithm produces better results when compared with its original algorithms (TBM and Quick Skip Search) and when compared with Maximum-Shift hybrid algorithm which is considered as one of the most recent hybrid algorithm. The proposed hybrid algorithm has less number of attempts and less number of character comparisons.