Book

# The Art of Computer Programming

Authors:
... Clearly, any sorting algorithm requires at least log(n!) = n log n−n log e+Θ(log n) queries 1 , since there are n! permutations for a set of n elements and each query reveals at most one bit of information. It is well known that using, for example, the merge sort algorithm, n log n + n + O(log n) queries are sufficient to sort n elements [1]. Therefore, to sort a set of n elements, it is both necessary and sufficient to submit n log n(1 + o(1)) queries. ...
... noise. We therefore are interested in characterizing the exact constant R n log n m , 1 Throughout the paper, log(·) = log 2 (·). Also, we use the standard order notation: f (n) = O(g(n)) if limn→∞ |f (n)| g(n) < ∞; f (n) = Ω(g(n)) if limn→∞ f (n) g(n) > 0; f (n) = Θ(g(n)) if f (n) = O(g(n)) and f (n) = Ω(g(n)); and f (n) = o(g(n)) if limn→∞ f (n) g(n) = 0. ...
... Hence, a natural question to ask is whether any nonvanishing rate is possible for noisy sorting. In this paper, we propose a noisy sorting algorithm that incorporates insertion sort [1] with the Burnashev-Zigangirov algorithm for coding over channels with feedback [8]. We show that the proposed algorithm has a strictly positive rate and a vanishing error probability asymptotically (Section III). ...
Preprint
Full-text available
Sorting is the task of ordering $n$ elements using pairwise comparisons. It is well known that $m=\Theta(n\log n)$ comparisons are both necessary and sufficient when the outcomes of the comparisons are observed with no noise. In this paper, we study the sorting problem in the presence of noise. Unlike the common approach in the literature which aims to minimize the number of pairwise comparisons $m$ to achieve a given desired error probability, our goal is to characterize the maximal ratio $\frac{n\log n}{m}$ such that the ordering of the elements can be estimated with a vanishing error probability asymptotically. The maximal ratio is referred to as the noisy sorting capacity. In this work, we derive upper and lower bounds on the noisy sorting capacity. The algorithm that attains the lower bound is based on the well-known Burnashev--Zigangirov algorithm for coding over channels with feedback. By comparing with existing algorithms in the literature under the proposed framework, we show that our algorithm can achieve a strictly larger ratio asymptotically.
... c(α) ffiffiffiffiffiffiffiffiffiffiffiffi ffi m þ n m Â n r (9) is true, then the null hypothesis can be rejected. 33 Here, m and n refer to the sizes of the first and second samples. The null hypothesis, H 0 , is that the two distributions are from the same sample. ...
... This corresponds to a c(α) value of 1.358 according to the KS test statistical significance calculation. 33 For each of the tests, the size of the two samples compared were either 499 or 500 measurements making the expression on the right of Inequality (9), equal to 0.085. ...
Article
The dynamic stress–strain response of a material can be described by a number of different models of varying fidelity. However, an individual model’s ability to replicate the dynamic stress–strain response of a material can be hindered by experimental variability. Reification, an approach to fusing models and experimental data with inherent scatter, is presented. It is then used to determine the optimum parameters of the Johnson–Cook (JC) and Zerilli–Armstrong (ZA) models using a fusion of Split-Hopkinson Pressure Bar (SHPB) data and the JC and ZA models fit to the SHPB data using a traditional approach. The output of the fused model is a dataset that represents a “best-guess” sampling of the possible stress–strain response of a high strength steel. In the present work, the dynamic response of a newly developed steel, AF9628, is evaluated. Under the reification framework, the experimental variability and limitations of the mathematical model expressions are addressed by the optimized sampling of data and combined fitting process. The JC and ZA models are then re-fit to partitions of the fused dataset, which bound the responses of the traditionally fit JC and ZA models. The behavior of the re-fit models and the traditionally fit models are compared via a simulated Taylor anvil test.
... For more information about the Jacobsthal sequence, Pell sequence, sequence, Tetranacci sequences and some generalizations of these sequences and applications of these sequences, we refer to [2][3][4][5], [7][8][9][10][11][12]. The Jacobsthal-Lucas sequence { } is defined by ...
... Lemma (3.1). [7] Let ≥ 0 be an integer. Then we have ...
Preprint
In this paper, a particular number sequence, namely the Jacobsthal-Lucas-Leonardo sequence is introduced. Then we define its binomial transform. We investigate the Binet formulas, generating functions and other important identities of the Jacobsthal-Lucas-Leonardo sequence and its binomial transform. We find some summation formulas of the Jacobsthal-Lucas-Leonardo sequence. Also, we derive expressions for several binomial sums involving the Jacobsthal-Lucas-Leonardo sequence. In order to highlight the results, we give some examples related to these identities and summation formulas. Also, a PYTHON code to generate the first terms of the Jacobsthal-Lucas-Leonardo sequence is represented in this paper.
... Line 3 takes time and space O(dk). Line 4 takes time O(dk log(k)) and space O(dk); since each row of U is already decreasing, we can use k-way merging [21] instead of naive sorting. All remaining lines require O(dk) time and space. ...
... Line 3 takes time and space O(dk). Line 4 takes time O(dk log(k)) and space O(dk); since each row of U is already decreasing, we can use k-way merging [21] instead of naive sorting. ...
Preprint
Full-text available
We present a differentially private algorithm for releasing the sequence of $k$ elements with the highest counts from a data domain of $d$ elements. The algorithm is a "joint" instance of the exponential mechanism, and its output space consists of all $O(d^k)$ length-$k$ sequences. Our main contribution is a method to sample this exponential mechanism in time $O(dk\log(k) + d\log(d))$ and space $O(dk)$. Experiments show that this approach outperforms existing pure differential privacy methods and improves upon even approximate differential privacy methods for moderate $k$.
... In fact, Knuth shows many implementation ideas for solving real puzzles in [75] as the applications of dancing links including Sudoku, Polyominoes, Polycubes, and many others. The second one is called binary decision diagrams (BDD) [73]. Roughly, a BDD is represented by an acyclic directed graph, which is obtained from a directed tree by merging common subtrees. ...
Article
Since the 1930s, mathematicians and computer scientists have been interested in computation. While mathematicians investigate recursion theory, computer scientists investigate computational complexity based on Turing machine model to understand what a computation is. Beside them, there is another approach of research on computation, which is the investigation of puzzles and games. Once we regard the rules used in puzzles and games as the set of basic operations of computation, we can perform some computation by solving puzzles and playing games. In fact, research on puzzles and games from the viewpoint of theoretical computer science has continued without any break in the history of theoretical computer science. Sometimes the research on computational complexity classes has proceeded by understanding the tons of puzzles. The wide collection of complete problems for a specific computational complexity class shares a common property, which gives us a deep understanding of the class. In this survey paper, we give a brief history of research on computational complexities of puzzles and games with related results and trends in theoretical computer science.
... This demands the input seed to be random in each run of the PRNG. The standard approach is to use sequences of uniform distribution from which other distributions can be generated using various transformations [10,94]. Usually, PRNGs are based on the principles of number theory. ...
Preprint
Random numbers are central to cryptography and various other tasks. The intrinsic probabilistic nature of quantum mechanics has allowed us to construct a large number of quantum random number generators (QRNGs) that are distinct from the traditional true number generators. This article provides a review of the existing QRNGs with a focus on various possible features of QRNGs (e.g., self-testing, device independence, semi-device independence) that are not achievable on the classical world. It also discusses the origin, applicability, and other facets of randomness. Specifically, the origin of randomness is explored from the perspective of a set of hierarchical axioms for quantum mechanics, implying that succeeding axioms can be regarded as a superstructure constructed on top of a structure built by the preceding axioms. The axioms considered are: (Q1) incompatibility and uncertainty; (Q2) contextuality; (Q3) entanglement; (Q4) nonlocality and (Q5) indistinguishability of identical particles. Relevant toy generalized probability theories (GPTs) are introduced, and it is shown that the origin of random numbers in different types of QRNGs known today are associated with different layers of nonclassical theories and all of them do not require all the features of quantum mechanics. Further, classification of the available QRNGs has been done and the technological challenges associated with each class is critically analyzed. Commercially available QRNGs are also compared.
... As graphical models became popular, message passing provided an exciting new approach to solving graph colouring and (the closely related) constraint satisfaction problems [33,54]. For constraint satisfaction, the survey propagation message passing technique seems to be particularly effective [55,56,57,58]. These techniques are primarily based on the factor graph PGM topology. ...
Preprint
Full-text available
Probabilistic graphical models (PGMs) are tools for solving complex probabilistic relationships. However, suboptimal PGM structures are primarily used in practice. This dissertation presents three contributions to the PGM literature. The first is a comparison between factor graphs and cluster graphs on graph colouring problems such as Sudokus - indicating a significant advantage for preferring cluster graphs. The second is an application of cluster graphs to a practical problem in cartography: land cover classification boosting. The third is a PGMs formulation for constraint satisfaction problems and an algorithm called purge-and-merge to solve such problems too complex for traditional PGMs.
... The hash maps are a solution to the first issue (see [23,22] for instance). They allow to avoid the blowing up of memory in the case where Y is huge. ...
Article
This paper studies the algorithms for the minimisation of weighted automata. It starts with the definition of morphisms — which generalises and unifies the notion of bisimulation to the whole class of weighted automata — and the unicity of a minimal quotient for every automaton, obtained by partition refinement. From a general scheme for the refinement of partitions, two strategies are considered for the computation of the minimal quotient: the Domain Split and the Predecesor Class Split algorithms. They correspond respectivly to the classical Moore and Hopcroft algorithms for the computation of the minimal quotient of deterministic Boolean automata. We show that these two strategies yield algorithms with the same quadratic complexity and we study the cases when the second one can be improved in order to achieve a complexity similar to the one of Hopcroft algorithm.
... 1.3.3, page 176 of The Art of Computer Programming, volume 1 (Knuth, 1997). ...
Article
Full-text available
Deductive program verification greatly improves software quality, but proving formal specifications is difficult, and this activity can only be partially automated. It is therefore relevant to supplement deductive verification tools, such as Why3, with the ability to test the properties to be verified. We present a methodological study and a prototype for the random and enumerative testing of properties written either in the Why3 input language WhyML or in the OCaml programming language used by Why3 to run programs written in WhyML. An originality is that we propose enumerative testing based on data generators themselves written in WhyML and formally verified with Why3. Another specificity is that the development effort is reduced by exploiting Why3’s extraction mechanism to OCaml and an existing random testing tool for OCaml. These design choices are applied in a prototypal implementation of a tool, called AutoCheck. The prototype and the paper are designed with simplicity and usability in mind, in order to make them accessible to the widest audience. Starting from the most elementary cases, a tutorial illustrates the implemented features with many examples presented in increasing complexity order.
... Theory of Multisets is an important generalization of classical set theory which has emerged by violating a basic property of classical sets that an element can belong to a set only once. The term multiset (mset for short) as (Knuth, 1981)noted, was first suggested by (De Bruijin, 1983) in a private communication. Owing to its aptness, it has replaced a variety of terms viz. ...
Thesis
Full-text available
In this dissertation, some fundamentals of multisets, soft multisets, multigroups, soft multigroups and some of their basic properties were presented. The dissertation also developed the concepts of normal submultigroups and soft normal multigroups. It then established that intersection, union, sum, of any two normal submultigroups is also normal submultigroup of a given multigroup. The inverse of any normal submultigroup of a multigroup is a normal submultigroup and for any normal submultigroup, the root (support) set is a normal subgroup of the underlying group. It also showed that under the isomorphism function between any two groups, the image of a normal submultigroup under the isomorphism is a normal submultigroup and the inverse image of a normal submultigroup under the isomorphism is a normal submultigroup. Finally, the dissertation defined operations on soft normal multigroups such as intersection, union, addition, AND, OR operations and discovered that such operations were closed under soft normal multigroups.
... Grammars for unary words are closely related to addition chains[27], and the smallest (not necessarily SLP) grammar for (a) k is non-trivial for general k that is not a power of two. Also, in such a case, RePair does not provide the smallest grammar for (a) k[19]. ...
Preprint
Grammar-based compression is a loss-less data compression scheme that represents a given string $w$ by a context-free grammar that generates only $w$. While computing the smallest grammar which generates a given string $w$ is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words $F_k$ can be completely characterized by RePair, where $F_k$ denotes the $k$-th Fibonacci word. Namely, all grammars for $F_k$ generated by any implementation of RePair are the smallest grammars for $F_k$, and no other grammars can be the smallest for $F_k$. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.
... Pour cette étude qui porte sur les mathématiques au lycée, nous partons de l'hypothèse (Laval, 2018) que la mise en oeuvre d'algorithmes dans des domaines spécifiques et leur implémentation dans des environnements numériques afin de les tester, vont permettre à l'élève d'acquérir une pensée algorithmique (Knuth, 1968(Knuth, , 2011, celle de l'informatique, en complément à une pensée mathématique. ...
... • Mother RNG, available in Marsaglia's website (MOT, Marsaglia, 1994); • Multiple with carry RNG (MWC, Marsaglia, 1994); • Combo RNG (COM, Marsaglia, 1994); • Lehmer RNG (LEH, Payne et al., 1969); • Fractional Brownian motion (fBm) and fractional Gaussian noise (fGn); refer to Bardet et al. (2003); • Coloured noise with power spectrum f Àk with k ≥ 0 (Larrondo, 2012); • Linear congruential generator (LCG, Knuth, 1997). ...
Article
This article serves two purposes. Firstly, it surveys the Bandt and Pompe methodology for the statistical community, stressing topics that are open for research. Secondly, it contributes towards a better understanding of the statistical properties of that approach for time series analysis. The Bandt and Pompe methodology consists of computing information theory descriptors from the histogram of ordinal patterns. Such descriptors lie in a 2D manifold: the entropy–complexity plane. This article provides the first proposal of a test in the entropy–complexity plane for the white noise hypothesis. Our test is based on true white noise sequences obtained from physical devices. The proposed methodology provides consistent results: It assesses sequences of true random samples as random (adequate test size), rejects correlated and contaminated sequences (sound test power) and captures the randomness of generators previously analysed in the literature.
... In combinatorics, lattice paths are widely studied. They have many applications in various domains such as computer science, biology and physics [21], and they have very tight links with other combinatorial objects such as directed animals, pattern avoiding permutations, bargraphs, RNA structures and so on [4,11,21]. A classical problem in combinatorics is the enumeration of these paths with respect to their length and other statistics [1,2,3,7,14,15,17,18,19]. ...
Preprint
Full-text available
We introduce and study the new combinatorial class of Dyck paths with air pockets. We exhibit a bijection with the peakless Motzkin paths which transports several pattern statistics and give bivariate generating functions for the distribution of patterns as peaks, returns and pyramids. Then, we deduce the popularities of these patterns and point out a link between the popularity of pyramids and a special kind of closed smooth self-overlapping curves, a subset of Fibonacci meanders. A similar study is conducted for the subclass of non-decreasing Dyck paths with air pockets.
... Let G be the λ-defect of H D and observe that |E(G )| = λ m 2 −d−e using the definition of n k+1 . Now gcd(F ) = 1, |E(G )| = b|E(F )| by (14), and |N λ G (x)| m − 1 − k = m − O(1) for each x ∈ V . Thus by Lemma 3.4 there is a decomposition of G into b copies of F and hence a decomposition of G into b copies of K h , b copies of K k and b copies of K k+1 . ...
Preprint
For positive integers $s$, $t$, $m$ and $n$, the Zarankiewicz number $Z_{s,t}(m,n)$ is defined to be the maximum number of edges in a bipartite graph with parts of sizes $m$ and $n$ that has no complete biparitite subgraph containing $s$ vertices in the part of size $m$ and $t$ vertices in the part of size $n$. A simple argument shows that, for each $t \geq 2$, $Z_{2,t}(m,n)=(t-1)\binom{m}{2}+n$ when $n \geq (t-1)\binom{m}{2}$. Here, for large $m$, we determine the exact value of $Z_{2,t}(m,n)$ in almost all of the remaining cases where $n=\Theta(tm^2)$. We establish a new family of upper bounds on $Z_{2,t}(m,n)$ which complement a family already obtained by Roman. We then prove that the floor of the best of these bounds is almost always achieved. We also show that there are cases in which this floor cannot be achieved and others in which determining whether it is achieved is likely a very hard problem. Our results are proved by viewing the problem through the lens of linear hypergraphs and our constructions make use of existing results on edge decompositions of dense graphs.
... On parle quand même de loi B 1 (10), mais on ne sait alors pas s'il s'agit de fréquence d'occurrence ou de probabilités. Un exemple de ces cas est celui des constantes physiques (Knuth (1973) ou Burke et Kincanon (1991)) dont les P CS colleraient assez bien à la loi de B 1 (10) selon ces auteurs. Dans ces articles, la taille des jeux de données n'est pas suffisante pour déterminer si les écarts entre les fréquences (ou probabilités) d'occurrence et la loi B 1 (10) sont réelles ou non. ...
... Hierarchies are represented as directed acyclic graphs where, in general, nodes can have any (finite) number of parents. Since the approach presented in this work assumes that every node has at most one parent, a topological traversal algorithm for directed graphs (see, e.g., Knuth 1997) is used to transform H into a tree, when required. In this way, the resulting model can take as input any hierarchy. ...
Article
Full-text available
Node classification is the task of inferring or predicting missing node attributes from information available for other nodes in a network. This paper presents a general prediction model to hierarchical multi-label classification, where the attributes to be inferred can be specified as a strict poset. It is based on a top-down classification approach that addresses hierarchical multi-label classification with supervised learning by building a local classifier per class. The proposed model is showcased with a case study on the prediction of gene functions for Oryza sativa Japonica , a variety of rice. It is compared to the Hierarchical Binomial-Neighborhood, a probabilistic model, by evaluating both approaches in terms of prediction performance and computational cost. The results in this work support the working hypothesis that the proposed model can achieve good levels of prediction efficiency, while scaling up in relation to the state of the art.
... The Fibonacci sequence (F n ) n≥0 is defined by F 0 = 0, F 1 = 1, and F n = F n−1 + F n−2 for n ≥ 2; it is sequence A000045 in the On-Line Encyclopedia of Integer Sequences (OEIS) [11]. Fibonacci numbers have been extensively studied [5,6]. Numerous fascinating properties are known. ...
Article
Full-text available
The arithmetic mean of the first n Fibonacci numbers is not an integer for all n. However, for some values of n, it is. In this paper we consider the sequence of integers n for which the average of the first n Fibonacci numbers is an integer. We prove some interesting properties and present two related conjectures.
... Even uniformly-at-random sampling procedures have sketching-like algorithms. Rather than load all items in just to randomly sample, one can stream them in and occasionally choose a random update in an online fashion as done by the famous reservoir sampling algorithm [Knu14]. Sketching is also useful for a variety of problems in numerical linear algebra [Lib13;Woo+14] including sketches for multiplying together very large matrices. ...
Preprint
In this manuscript, we offer a gentle review of submodularity and supermodularity and their properties. We offer a plethora of submodular definitions; a full description of a number of example submodular functions and their generalizations; example discrete constraints; a discussion of basic algorithms for maximization, minimization, and other operations; a brief overview of continuous submodular extensions; and some historical applications. We then turn to how submodularity is useful in machine learning and artificial intelligence. This includes summarization, and we offer a complete account of the differences between and commonalities amongst sketching, coresets, extractive and abstractive summarization in NLP, data distillation and condensation, and data subset selection and feature selection. We discuss a variety of ways to produce a submodular function useful for machine learning, including heuristic hand-crafting, learning or approximately learning a submodular function or aspects thereof, and some advantages of the use of a submodular function as a coreset producer. We discuss submodular combinatorial information functions, and how submodularity is useful for clustering, data partitioning, parallel machine learning, active and semi-supervised learning, probabilistic modeling, and structured norms and loss functions.
... Another such measure is total displacement, defined by Knuth [8] as td(w) = n i=1 |w(i) − i| and first studied by Diaconis and Graham [6] under the name Spearman's disarray. Diaconis and Graham showed that ℓ(w) + ℓ T (w) ≤ td(w) for all permutations w and asked for a characterization of those permutations for which equality holds. ...
Preprint
Diaconis and Graham studied a measure of distance from the identity in the symmetric group called total displacement and showed that it is bounded below by the sum of length and reflection length. They asked for a characterization of the permutations where this bound is an equality; we call these the shallow permutations. Cornwell and McNew recently interpreted the cycle diagram of a permutation as a knot diagram and studied the set of permutations for which the corresponding link is an unlink. We show the shallow permutations are precisely the unlinked permutations. As Cornwell and McNew give a generating function counting unlinked permutations, this gives a generating function counting shallow permutations.
... -Donald E. Knuth [145] A C K N O W L E D G E M E N T S Ces trois années sont passées à une vitesse folle, avec des moments forts qui resteront gravés en moi, notamment grâce à ce manuscrit. C'est l'occasion pour moi de remercier tous ceux qui ont, d'une manière ou d'une autre, contribué à ce résultat. ...
Thesis
Full-text available
Malgré l’émergence rapide des systèmes automatisés et le développement de l’informatique affective, très peu d’études considèrent la charge mentale de travail dans la conception de scénarios de formation en RV. Cette thèse a pour objectif de contribuer au développement des systèmes adaptatifs en RV, basés sur la charge mentale de travail des utilisateurs. Nous proposons 3 axes de recherches : induction, reconnaissance, et exploitation de la charge mentale de travail en RV, ainsi qu'une définition de la « Réalité Virtuelle Affective et Cognitive ». Dans un premier temps, nous étudierons l’impact du port de casque de RV sur l’effort mental des utilisateurs. De plus, l’influence potentielle de la marche et de l’effet d’accommodation en RV seront analysés. Puis, nous proposerons une approche méthodologique pour introduire l’évaluation de la charge mentale de travail dans la conception de scénarios de formation en RV. Cette méthodologie permettra notamment de moduler le niveau de charge mentale de travail des utilisateurs au cours du temps. Des études utilisateurs seront menées dans un simulateur de vol en RV afin d'évaluer cette approche. Finalement, nous proposerons une solution tout-en-un afin d'estimer la charge mentale de travail des utilisateurs en temps-réel en utilisant des capteurs intégrés aux casques de RV. Cette configuration sera comparée aux systèmes plus répandus dans le commerce vis-à-vis des performances de prédiction. Les influence du types de mesures, des capteurs, et des méthodes de normalisation des signaux seront également analysées.
... For an interval vertex v ∈ T , the subtree rooted at v is the tree formed by all vertices u with u ≤ v, and this subtree then has v as its root. Using subtrees, we can represent trees using the nested lists notation from Section 2.3.2 of [13], where each set of parenthesis represents a subtree. Unless otherwise stated, we will index leaves with [n] = {1, 2, . . . ...
Preprint
Tanglegrams are formed by taking two rooted binary trees $T$ and $S$ with the same number of leaves and uniquely matching each leaf in $T$ with a leaf in $S$. They are usually represented using layouts, which embed the trees and the matching of the leaves into the plane as in Figure 1. Given the numerous ways to construct a layout, one problem of interest is the Tanglegram Layout Problem, which is to efficiently find a layout that minimizes the number of crossings. This parallels a similar problem involving drawings of graphs, where a common approach is to insert edges into a planar subgraph. In this paper, we will explore inserting edges into a planar tanglegram. Previous results on planar tanglegrams include a Kuratowski Theorem, enumeration, and an algorithm for drawing a planar layout. We start by building on these results and characterizing all planar layouts of a planar tanglegram. We then apply this characterization to construct a quadratic-time algorithm that inserts a single edge optimally. Finally, we generalize some results to multiple edge insertion.
Chapter
The fundamental notion of an algorithm is presented here, the focus being on its traditional, “symbol-based” conception. A carefully selected set of formal models of an algorithm and universal computer is then presented in a non-traditional and novel manner. These and other formal models are the theoretical foundation for the discipline of computer science, which was developed by mathematical logicians during the 1930s, before the advent of the electronic, digital computer in the mid-1940s. During the early days of the ensuing computer revolution, numerical computation was paramount, and its practical foundation was the finite-precision, floating-point model. This model was developed by numerical analysts, who played a leading role in the creation of the computer science discipline, and it is described in detail. The basic concept of a symbol-based algorithm led to the much broader conception of algorithmic systems for computation, for example, neural, quantum, and natural, as is briefly itemized in the concluding section. The metaphorical phrase, “under the rubric of algorithm,” refers to the overarching umbrella of modern computer science.
Chapter
We study the following combinatorial problem. Given a set of n y-monotone curves, which we call , a determines the order of the wires on a number of horizontal such that any two consecutive layers differ only in swaps of neighboring wires. Given a multiset L of (that is, unordered pairs of wires) and an initial order of the wires, a tangle L if each pair of wires changes its order exactly as many times as specified by L.Deciding whether a given multiset of swaps admits a realizing tangle is known to be NP-hard [Yamanaka et al., CCCG 2018]. We prove that this problem remains NP-hard if every pair of wires swaps only a constant number of times. On the positive side, we improve the runtime of a previous exponential-time algorithm. We also show that the problem is in NP and fixed-parameter tractable with respect to the number of wires.KeywordsTangleNP-hardExponential-time algorithmFPT
Chapter
Algorithms for generating graphs that belong to a particular class are useful for providing test cases and counter-examples to refute conjectures about these graphs. This is true, in particular, for weakly chordal graphs. A graph G is weakly chordal if neither G nor its complement contains a chordless cycle of size greater than four. In an earlier paper, we proposed a separator-based scheme for generating weakly chordal graphs. In this paper, we propose a scheme to solve this open problem: generate a weakly chordal graph from a randomly generated input graph, G, adding as few edges as possible [2], unless the graph is already weakly chordal.
Article
The dichotomy created by the advent of computers and brought up by the title of this column was a major quandary in the early days of the computer revolution, causing major controversy in both the academic and commercial communities involved in the development of modern computer architectures. Even though the controversy was eventually decided (in favor of binary representation; all commercially available computers use a binary internal architecture), echoes of that controversy still affect computer usage today by creating errors when data is transferred between computers, especially in the chemometric world. A close examination of the consequences reveals a previously unexpected error source.
Conference Paper
The algorithms for generating all subsets of a given set, like many other generating algorithms, are of two main types: for generating in lexicographic order or in Gray code order. Many of them use binary representations of integers (i.e., binary vectors) as characteristic vectors of the subsets. Here we consider the set U_n of n elements that are ordered according to a given total order relation. We propose the ordering of characteristic vectors and their serial numbers (i.e., their corresponding integers) such that the characteristic vectors define a lexicographic ordering of subsets of U_n. To get this, we define and study the properties of three sequences: (1) p_n -- of all subsets in lexicographic order, (2) c_n -- of characteristic vectors corresponding to p_n, and (3) s_n -- of integers representing the vectors of c_n. We then propose a simple, straightforward, and fast algorithm that, for a given n, $1\leq n\leq 64$, generates the sequence s_n. This algorithm only performs integer additions. Its time and space complexity is of the type $\Theta(2^n)$ -- exponential with respect to the size of the input n but linear with respect to the size of the output 2^n. The algorithm was used in the creation of sequence A356120 in the OEIS (http://oeis.org). Finally, the general case where n is a positive natural number and the generation of the subsets themselves, i.e., the sequence p_n, are discussed.
Article
Public sector organizations at all levels of government increasingly rely on Big Data Algorithmic Systems (BDAS) to support decision-making along the entire policy cycle. But while our knowledge on the use of big data continues to grow for government agencies implementing and delivering public services, empirical research on applications for anticipatory policy design is still in its infancy. Based on the concept of policy analytical capacity (PAC), this case study examines the application of BDAS for early crisis detection within the German Federal Government—that is, the German Federal Foreign Office (FFO) and the Federal Ministry of Defence (FMoD). It uses the nested model of PAC to reflect on systemic, organizational, and individual capacity-building from a neoinstitutional perspective and allow for the consideration of embedded institutional contexts. Results from semi-structured interviews indicate that governments seeking to exploit BDAS in policymaking depend on their institutional environment (e.g., through research and data governance infrastructure). However, specific capacity-building strategies may differ according to the departments' institutional framework, with the FMoD relying heavily on subordinate agencies and the FFO creating network-like structures with external researchers. Government capacity-building at the individual and organizational level is similarly affected by long-established institutional structures, roles, and practices within the organization and beyond, making it important to analyze these three levels simultaneously instead of separately.
Article
Digital circuit design technologies based on Quantum-Dot Cellular Automata (QCA) have many advantages over CMOS, such as higher intrinsic switching speed up to Terahertz, lower power consumption, smaller circuit footprint, and higher throughput due to compatibility of the inherent signal propagation scheme with pipelining. Hence, QCA is a perfect candidate to provide a circuit design framework for applications such as Artificial Intelligence (AI) accelerators, where real-time energy-efficient performance needs to be delivered at low cost. A novel QCA design approach based on optimal mix of Majority and NAND-NOR-INVERTER (NNI) gates with USE (Universal, Scalable, Efficient) clocking scheme, has been investigated in this work for latency and energy consumption improvements to fundamental building blocks in AI-accelerators, including multipliers, adders, accumulators and SRAMs. The common $4\times 4$ Vedic multiplier has been redesigned using the proposed approach, and simulated to yield 62.8% reduction in cell count, 82.2% reduction in area, and 71.2% reduction in latency. 83% reduction in cell count, 94.5% reduction in area, and 94.6% reduction in latency was simulated for the proposed 8-bit PIPO register. The proposed SRAM cell design is estimated to have similar improvement figures to those achieved by the sub-blocks, such as the D-Latch, which has been simulated to exhibit 44.4% reduction in cell count, 50% reduction in both area and latency, and 73% reduction in energy dissipation. The contributions from this work can be directly applied to low cost, high throughput, energy efficient AI-accelerators that can potentially deliver orders of magnitude better energy-delay characteristics than their CMOS counterparts, and significantly better energy-delay characteristics than state-of-the-art QCA implementations.
Conference Paper
Conference Paper
Full-text available
Some well-known correspondences between sets of linearly independent rows and columns of matrices over fields carry over to matrices over non-commutative rings without nontrivial zero divisors.
Article
Assuming a widely believed hypothesis concerning the least prime in an arithmetic progression, we show that polynomials of degree less than $$n$$ over a finite field $$\mathbb {F}_q$$ with $$q$$ elements can be multiplied in time $$O (n \log q \log (n \log q))$$ , uniformly in $$q$$ . Under the same hypothesis, we show how to multiply two $$n$$ -bit integers in time $$O (n \log n)$$ ; this algorithm is somewhat simpler than the unconditional algorithm from the companion paper [ 22 ]. Our results hold in the Turing machine model with a finite number of tapes.
Article
Full-text available
Among the National Institute for Standards and Technology (NIST) postquantum cryptography (PQC) standardization Round 3 finalists (announced in 2020 and anticipated to conclude in 2022–2024), SABER and Falcon are efficient key encapsulation mechanism (KEM) and compact signature scheme, respectively. SABER is a simple and flexible cryptographic scheme, highly suitable for thwarting potential attacks in the postquantum era. Implementing SABER can be performed solely in hardware (HW) or on HW/software coprocessors. On the other hand, the compact key size, efficient design, and strong reliability proof in the quantum random oracle model (QROM) make Falcon a highly suitable signature algorithm for PQC. Although Falcon is crucial as a PQC signature scheme, the utilization of the Gaussian sampler makes it vulnerable to malicious attacks, e.g., fault attacks. This is the first work to present error detection schemes embedded efficiently in SABER as well as Falcon’s sampler architectures, which can detect both transient and permanent faults. Moreover, we implement HW design for the ModFalcon signature algorithm as well as the Gaussian sampler. These schemes are implemented on a formerly Xilinx field-programmable gate array (FPGA) family, for both SABER and Falcon variants, where we assess the error coverage and the performance. The proposed schemes incur low overhead (the area, delay, and power overheads being 22.59%, 19.77%, and 10.67%, respectively, in the worst case) while providing a high fault detection rate (99.9975% in the worst case scenario), making them suitable for high efficiency and compact HW implementations of constrained applications.
Article
Matchings between objects from two datasets, domains, or ontologies have to be computed in various application scenarios. One often used meta-approach — which we call bipartite data matching — is to leverage domain knowledge for defining costs between the objects that should be matched, and to then use the classical Hungarian algorithm to compute a minimum cost bipartite matching. In this paper, we introduce and study the problem of enumerating K dissimilar minimum cost bipartite matchings. We formalize this problem, prove that it is NP-hard, and present heuristics based on greedy dynamic programming. The presented enumeration techniques are not only interesting in themselves, but also mitigate an often overlooked shortcoming of bipartite data matching, namely, that it is sensitive w. r. t.the storage order of the input data. Extensive experiments show that our enumeration heuristics clearly outperform existing algorithms in terms of dissimilarity of the obtained matchings, that they are effective at rendering bipartite data matching approaches more robust w. r. t.random storage order, and that they significantly improve the upper bounds of state-of-the art algorithms for graph edit distance computation that are based on bipartite data matching.
Chapter
This chapter examines how platforms have long incentivised, and been structured by, “engagement”—that is, the maximisation of user attention and interaction metrics. It is argued that social news is guided by an institutional logic of engagement, and achieved early success precisely because the outlets which produce this genre optimised their content for social media metrics. But rather than simply being “clickbait”, social news outlets have instead often pursued engagement in multiple creative ways—and have even, at times, produced journalism with engaging qualities in the “civic” sense.KeywordsEngagementMetricsAlgorithmsPlatformsClickbaitCivic journalism
Chapter
The growth of the computing capacities makes it possible to obtain more and more precise simulation results. These results are often calculated in binary64 with the idea that round-off errors are not significant. However, exascale is pushing back the known limits and the problems of accumulating round-off errors could come back and require increasing further the precision. But working with extended precision, regardless of the method used, has a significant cost in memory, computation time and energy and would not allow to use the full performance of HPC computers. It is therefore important to measure the robustness of the binary64 by anticipating the future computing resources in order to ensure its durability in numerical simulations. For this purpose, numerical experiments have been performed and are presented in this article. Those were performed with weak floats which were specifically designed to conduct an empirical study of round-off errors in hydrodynamic simulations and to build an error model that extracts the part due to round-off error in the results. This model confirms that errors remain dominated by the scheme errors in our numerical experiments.KeywordsFloating-pointRound-off errorHydrodynamicsHPCExascale
Preprint
Combinatorial Exploration is a new domain-agnostic algorithmic framework to automatically and rigorously study the structure of combinatorial objects and derive their counting sequences and generating functions. We describe how it works and provide an open-source Python implementation. As a prerequisite, we build up a new theoretical foundation for combinatorial decomposition strategies and combinatorial specifications. We then apply Combinatorial Exploration to the domain of permutation patterns, to great effect. We rederive hundreds of results in the literature in a uniform manner and prove many new ones. These results can be found in a new public database, the Permutation Pattern Avoidance Library (PermPAL) at https://permpal.com. Finally, we give three additional proofs-of-concept, showing examples of how Combinatorial Exploration can prove results in the domains of alternating sign matrices, polyominoes, and set partitions.
Article
Full-text available
Huang and Wong (Acta Inform 21(1):113–123, 1984) proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler (Optimal search trees using two-way key comparisons, PhD thesis, 1994) proposed modifying Huang and Wong’s algorithm to obtain an algorithm for a different problem: computing optimal two-way comparison search trees. We show that the dynamic program underlying Spuler’s algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution.
Article
Full-text available
We give polynomial time algorithms for the seminal results of Kahn, who showed that the Goldberg–Seymour and list‐coloring conjectures for (list‐)edge coloring multigraphs hold asymptotically. Kahn's arguments are based on the probabilistic method and are non‐constructive. Our key insight is that we can combine sophisticated techniques due to Achlioptas, Iliopoulos, and Kolmogorov for the analysis of local search algorithms with correlation decay properties of the probability spaces on matchings used by Kahn in order to construct efficient edge‐coloring algorithms.
Thesis
Full-text available
The purpose of this project is to improve the current bipartite matching with one-sided preferences model used for the student-project allocation of the final year mathematics students in the University of Glasgow. Main improvements are: the non-bipartite extensions of the model to take into consideration both project capacities and lecturers' workloads, and allowing more flexibility to introduce additional constraints and features of optimal solutions. The new SPA model will be built on the minimum cost network flow model, whose solution relies upon graph algorithms. First, it will be proved that the minimum cost flow algorithm can replicate the generous algorithm for the bipartite matching with one-sided preferences problem. Second, it will be showed how the model can be extended to include lecturers and their capacities. Third, advantages of the extensions allowing alternative supervisors and lower bounds for the number of students supervised by each lecturer will be discussed. Finally, to illustrate the flexibility of the model, some controversial suggestions for differentiating among students will be mentioned. Having discussed all these improvements, it will be concluded that a network flow model allows greater flexibility and hence might be more useful in practice. However, the choice and the complexity of the specific graph algorithms is beyond the scope of this project and needs to be analyzed both theoretically and empirically.
Article
The performance of traditional video-based traffic surveillance systems is susceptible to illumination variation and perspective distortion. This has been a significant motivation in recent years for research into Light Detection and Ranging (LiDAR)-based traffic surveillance systems, as LiDAR is insensitive to both factors. The first step in LiDAR data processing involves effective extraction of moving foreground objects from a referenced background. However, existing methods only detect a static background based on LiDAR point density or relative distance. In this research, we develop a novel dense background representation model (DBRM) for stationary roadside LiDAR sensors to detect both static and dynamic backgrounds, for freeway traffic surveillance purposes. Background objects tend to be stationary in space and time. DBRM utilizes this property to detect two types of background: both static and dynamic. While the static background is represented by fixed structures, the dynamic background-which may be characterized by quasi-static objects such as tree foliage-is modeled by mixtures of Gaussian probability distributions. Experiments were carried out in two different scenarios to compare the proposed model with two other state-of-the-art models. The results demonstrate the effectiveness, robustness, and detail-preserving advantages of the proposed model.
Article
Full-text available
We introduce CESRBDDs, a form of binary decision diagrams (BDDs) that can exploit reduction opportunities beyond those allowed by reduced ordered BDDs (elimination of redundant nodes), zero-suppressed BDDs (elimination of “high-zero” nodes), and recent proposals merging the two (chained or tagged BDDs). CESRBDDs also incorporate complemented edges, thus never store both the encoding of a function and of its complement. We prove that CESRBDDs are canonical and show how their storage requirements and computational efficiency compare very favorably against previous alternatives, both theoretically and experimentally, using an extensive set of benchmarks. Another advantage of CESRBDDs over chained or tagged BDDs is that their nodes only require one byte to store reduction and complement information.
Preprint
Full-text available
We prove a formula for the evaluation of expectations containing a scalar function of a Gaussian random vector multiplied by a product of the random vector components, each one raised at a non-negative integer power. Some powers could be of zeroth-order, and, for expectations containing only one vector component to the first power, the formula reduces to Stein's lemma for the multivariate normal distribution. On the other hand, by setting the said function inside expectation equal to one, we easily derive Isserlis theorem and its generalizations, regarding higher order moments of a Gaussian random vector. We provide two proofs of the formula, with the first being a rigorous proof via mathematical induction. The second is a formal, constructive derivation based on treating the expectation not as an integral, but as the consecutive actions of pseudodifferential operators defined via the moment-generating function of the Gaussian random vector.
ResearchGate has not been able to resolve any references for this publication.