## About

71

Publications

22,967

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

678

Citations

Citations since 2017

Introduction

I am an assistant professor at Jagiellonian University (Poland). I have education in computer science (PhD), theoretical physics (PhD) and mathematics (MSc). I am also very interested in biology related topics. My most known contribution is ANS coding ( https://en.wikipedia.org/wiki/Asymmetric_numeral_systems ). My webpage: http://th.if.uj.edu.pl/~dudaj/

**Skills and Expertise**

## Publications

Publications (71)

Standard one-way quantum computers (1WQC) combine time symmetric unitary evolution, with asymmetric treatment of boundaries: state preparation allows to enforce chosen initial state, however, for the final value measurement chooses a random value instead. As e.g. stimulated emission-absorption are CPT analogs, and one can be used for state preparat...

Electroencephalography (EEG) signals are resultants of extremely complex brain activity. Some details of this hidden dynamics might be accessible through e.g. joint distributions $\rho_{\Delta t}$ of signals of pairs of electrodes shifted by various time delays (lag $\Delta t$). A standard approach is monitoring a single evaluation of such joint di...

Source coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming, for instance) renews interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from having a beauti...

The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like GARCH assume arbitrary type of dependence. To prevent such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\...

Various in silico approaches to predict activity and properties of chemical compounds constitute nowadays the basis of computer-aided drug design. While there is a general focus on the predictions of values, mathematically more appropriate is the prognosis of probability distributions, which offers additional possibilities, such as the evaluation o...

Large variability between cell lines brings a difficult optimization problem of drug selection for cancer therapy. Standard approaches use prediction of value for this purpose, corresponding e.g. to expected value of their distribution. This article shows superiority of working on, predicting the entire probability distributions-proposing basic too...

Compression also known as entropy coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming for instance) renews an interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS...

While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approac...

The bulk of Internet interactions is highly redundant and also security sensitive. To reduce communication bandwidth and provide a desired level of security, a data stream is first compressed to squeeze out redundant bits and then encrypted using authenticated encryption. This generic solution is very flexible and works well for any pair of (compre...

While there is a general focus on prediction of values, real data often only allows to predict conditional probability distributions, with capabilities bounded by conditional entropy $H(Y|X)$. If additionally estimating uncertainty, we can treat a predicted value as the center of Gaussian of Laplace distribution - idealization which can be far from...

SVD (singular value decomposition) is one of the basic tools of machine learning, allowing to optimize basis for a given matrix. However, sometimes we have a set of matrices $\{A_k\}_k$ instead, and would like to optimize a single common basis for them: find orthogonal matrices $U$, $V$, such that $\{U^T A_k V\}$ set of matrices is somehow simpler....

Rapid growth of genetic databases means huge savings from improvements in their data compression, what requires better inexpensive statistical models. This article proposes automatized optimizations of Markov-like models, especially context binning and model clustering. The former allows to merge similar contexts to reduce model size, e.g. allowing...

Unwanted data encryption, such as ransomware attacks, continues to be a significant cybersecurity threat. Ransomware is a preferred weapon of cybercriminals who target small to large organizations' computer systems and data centres. It is malicious software that infects a victim's computer system and encrypts all its valuable data files. The victim...

Unwanted data encryption, such as ransomware attacks, continues to be a significant cybersecurity threat. Ransomware is a preferred weapon of cybercriminals who target small to large organizations' computer systems and data centres. It is malicious software that infects a victim's computer system and encrypts all its valuable data files. The victim...

While semiconductor electronics is at heart of modern world, and now uses 5nm or smaller processes of single atoms, it seems there are missing models of actual electron currents in these scales-which could help with more conscious design of future electronics. This article proposes such practical methodology allowing to model approximated electron...

Variable γ-ray emission from blazars, one of the most powerful classes of astronomical sources featuring relativistic jets, is a widely discussed topic. In this work, we present the results of a variability study of a sample of 20 blazars using γ-ray (0.1–300 GeV) observations from Fermi/LAT telescope. Using maximum likelihood estimation (MLE) meth...

Uniaxial nematic liquid crystal of ellipsoid-like molecules can be represented using director field n(x) of unitary vectors. It has topological charge quantization: integrating field curvature over a closed surface S, we get 3D winding number of S → S 2 , which has to be integer-getting Gauss law with built-in missing charge quantization. In recent...

Compression is widely used in Internet applications to save communication time, bandwidth and storage. Recently invented by Jarek Duda asymmetric numeral system (ANS) offers an improved efficiency and a close to optimal compression. The ANS algorithm has been deployed by major IT companies such as Facebook, Google and Apple. Compression by itself d...

Financial contagion refers to the spread of market turmoils, for example from one country or index to another country or another index. It is standardly assessed by modelling the evolution of the correlation matrix, for example of returns, usually after removing univariate dynamics with the GARCH model. However, significant events like crises visib...

Many data compressors regularly encode probability distributions for entropy coding - requiring minimal description length type of optimizations. Canonical prefix/Huffman coding usually just writes lengths of bit sequences, this way approximating probabilities with powers-of-2. Operating on more accurate probabilities usually allows for better comp...

While we would like to predict exact values, the information available, being incomplete, is rarely sufficient - usually allowing only conditional probability distributions to be predicted. This article discusses hierarchical correlation reconstruction (HCR) methodology for such a prediction using the example of bid-ask spreads (usually unavailable...

While it is a common knowledge that AC coefficients of Fourier-related transforms, like DCT-II of JPEG image compression, are from Laplace distribution, there was tested more general EPD (exponential power distribution) $\rho\sim \exp(-|x|^{\kappa})$ family, leading to maximum likelihood estimated (MLE) $\kappa\approx 0.5$ instead of Laplace distri...

In this work, we study the probability density functions that best describe the γ-ray flux distribution of decade-long light curves of a sample of blazars. For averaged behavior over this period, there was maximum likelihood estimated log-stable distribution; for most sources leading to standard log-normal distribution (α = 2); however, other sourc...

In this work, we study the probability density functions that best describe the $\gamma$-ray flux distribution of decade-long light curves of a sample of blazars. For averaged behavior over this period, there was maximum likelihood estimated log-stable distribution; for most sources leading to standard log-normal distribution ($\alpha=2$); however,...

Image compression with upsampling encodes information to succeedingly increase image resolution, for example by encoding differences in FUIF and JPEG XL. It is useful for progressive decoding, also often can improve compression ratio. However, the currently used solutions rather do not exploit context dependence for encoding of such upscaling infor...

While standard estimation assumes that all dat-apoints are from probability distribution of the same fixed parameters θ, we will focus on maximum likelihood (ML) adaptive estimation for nonstationary time series: separately estimating parameters θT for each time T based on the earlier values (xt)t<T using (exponential) moving ML es-timator θT = arg...

While one-dimensional Markov processes are well understood, going to higher dimensions there are only a few analytically solved Ising-like models, in practice requiring to use costly, uncontrollable and inaccurate Monte-Carlo methods. There is discussed analytical approach for approximated problem exploiting Hammersley-Clifford theorem, which allow...

While we would like to predict exact values, available incomplete information is rarely sufficient-usually allowing only to predict conditional probability distributions. This article discusses hierarchical correlation reconstruction (HCR) methodology for such prediction on example of usually unavailable bid-ask spreads, predicted from more accessi...

While we would like to predict exact values, available incomplete information is rarely sufficient - usually allowing only to predict conditional probability distributions. This article discusses hierarchical correlation reconstruction (HCR) methodology for such prediction on example of usually unavailable bid-ask spreads, predicted from more acces...

In stochastic gradient descent, especially for neural network training, there are currently dominating first order methods: not modeling local distance to minimum. This information required for optimal step size is provided by second order methods, however, they have many difficulties, starting with full Hessian having square of dimension number of...

Data compression often subtracts predictor and encodes the difference (residue) assuming Laplace distribution, for example for images, videos, audio, or numerical data. Its performance is strongly dependent on proper choice of width (scale parameter) of this parametric distribution, can be improved if optimizing it based on local situation like con...

Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of gener-ative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allow...

Enforcing distributions of latent variables in neural networks is an active subject. It is vital in all kinds of generative models, where we want to be able to interpolate between points in the latent space, or sample from it. Modern generative AutoEncoders (AE) like WAE, SWAE, CWAE add a regularizer to the standard (deterministic) AE, which allows...

Deep neural networks are usually trained with stochastic gradient descent (SGD), which optimizes $\theta\in\mathbb{R}^D$ parameters to minimize objective function using very rough approximations of gradient, averaging to the real gradient. To improve its convergence, there is used some state representing the current situation, like momentum being l...

In situations like tax declarations or analyzes of household budgets we would like to evaluate credibility of exogenous variable (declared income) based on some available (endogenous) variables - we want to build a model and train it on provided data sample to predict (conditional) probability distribution of exogenous variable based on values of e...

Generative AutoEncoders require a chosen prob-
ability distribution for latent variables, usually multivariate
Gaussian. The original Variational AutoEncoder (VAE) only
tested KL divergence for separate points - directly not ensuring
their uniform coverage of the probability density. It was later
improved by adding some pairwise repulsion in method...

US Yield curve has recently collapsed to its most flattened level since subprime crisis and is close to the inversion. This fact has gathered attention of investors around the world and revived the discussion of proper modeling and forecasting yield curve, since changes in interest rate structure are believed to represent investors expectations abo...

US Yield curve has recently collapsed to its most flattened level since subprime crisis and is close to the inversion. This fact has gathered attention of investors around the world and revived the discussion of proper modeling and forecasting yield curve, since changes in interest rate structure are believed to represent investors expectations abo...

While we are usually focused on predicting future values of time series, it is often valuable to additionally predict their entire probability distributions, for example to evaluate risk or Monte Carlo simulations. On example of time series of $\approx$ 30000 Dow Jones Industrial Averages, there will be shown application of hierarchical correlation...

Machine learning often needs to estimate density from a multidimensional data sample, where we would also like to model correlations between coordinates. Additionally, we often have missing data case: that data points have only partial information - can miss information about some coordinates. This paper adapts rapid parametric density estimation t...

One of basic difficulties of machine learning is handling unknown rotations of objects, for example in image recognition. A related problem is evaluation of similarity of shapes, for example of two chemical molecules, for which direct approach requires costly pairwise rotation alignment and comparison. Rotation invariants are useful tools for such...

Pyramid Vector Quantizer (PVQ) is a promising technique especially for multimedia data compression, already used in Opus audio codec and considered for AV1 video codec. It quantizes vectors from Euclidean unit sphere by first projecting them to $L^1$ norm unit sphere, then quantizing and encoding there. This paper shows that the used standard radia...

While the P vs NP problem is mainly being attacked form the point of view of discrete mathematics, this paper propses two reformulations into the field of abstract algebra and of continuous global optimization - which advanced tools might bring new perspectives and approaches to attack this problem. The first one is equivalence of satisfying the 3-...

Parametric density estimation, for example as Gaussian distribution, is the base of the field of statistics. Machine learning requires inexpensive estimation of much more complex densities, and the basic approach is relatively costly maximum likelihood estimation (MLE). There will be discussed inexpensive density estimators, for example literally f...

Data compression combined with effective encryption is a common requirement of data storage and transmission. Low cost of these operations is often a high priority in order to increase transmission speed and reduce power usage. This requirement is crucial for battery-powered devices with limited resources, such as autonomous remote sensors or impla...

Tree rotations (left and right) are basic local deformations allowing to transform between two unlabeled binary trees of the same size. Hence, there is a natural problem of practically finding such transformation path with low number of rotations, the optimal minimal number is called the rotation distance. Such distance could be used for instance t...

One of the main goals of 5G wireless telecommunication technology is improving energy efficiency, especially of remote sensors which should be able for example to transmit on average 1bit/s for 10 years from a single AAA battery. There will be discussed using modulation with nonuniform probability distribution of symbols for improving energy effici...

The method of types is one of the most popular techniques in information theory and combinatorics. However, thus far the method has been mostly applied to 1-D Markov processes, and it has not been thoroughly studied for general Markov fields. Markov fields over a finite alphabet of size m ≥ 2 can be viewed as models for multidimensional systems wit...

One of the basic tasks in bioinformatics is localizing a short subsequence $S$, read while sequencing, in a long reference sequence $R$, like the human geneome. A natural rapid approach would be finding a hash value for $S$ and compare it with a prepared database of hash values for each of length $|S|$ subsequences of $R$. The problem with such app...

In this paper, we propose a novel method for generating visually appealing two-dimensional (2D) barcodes that resemble meaningful images to human observers. The technology of 2D barcodes, currently dominated by quick response codes, is widely adopted in many applications, including product tracking, document management, and general marketing. Such...

Nanopore sequencers are emerging as promising new platforms for
high-throughput sequencing. As with other technologies, sequencer errors pose a
major challenge for their effective use. In this paper, we present a novel
information theoretic analysis of the impact of insertion-deletion (indel)
errors in nanopore sequencers. In particular, we conside...

Physics experiments produce enormous amount of raw data, counted in petabytes
per day. Hence, there is large effort to reduce this amount, mainly by using
some filters. The situation can be improved by additionally applying some data
compression techniques: removing redundancy and optimally encoding the actual
information. Preferably, both filterin...

There is a common need to search of molecular databases for compounds
resembling some shape, what suggests having similar biological activity while
searching for new drugs. The large size of the databases requires fast methods
for such initial screening, for example based on feature vectors constructed to
fulfill the requirement that similar molecu...

Fountain Codes like LT or Raptor codes, also known as rateless erasure codes,
allow to encode a message as some number of packets, such that any large enough
subset of these packets is sufficient to fully reconstruct the message. Beside
the packet loss scenario, the transmitted packets are usually damaged. Hence,
an additional error correction sche...

Many two-dimensional (2D) barcodes, such as quick response (QR) codes, lack user-friendly appearance. Our goal in this paper is to generate 2D barcodes that 'look' like recognizable images or logos. Standard steganographic methods hide a message (payload) in an image usually by modifying bits in a specific way using predetermined pixels of the imag...

The method of types is one of the most popular technique in information theory and combinatorics. However, it was never thoroughly studied for Markov fields. Markov fields can be viewed as models for systems involving a large number of variables with local dependencies and interactions. These local dependencies can be captured by a shape of interac...

Imagine a basic situation: we have a source of symbols of known probability
distribution and we would like to design an entropy coder transforming it into
a bit sequence, which would be simple and very close to the capacity (Shannon
entropy). Prefix codes are the basic method, defining "symbol -> bit sequence"
set of rules, usually found using Huff...

The modern data compression is mainly based on two approaches to entropy
coding: Huffman (HC) and arithmetic/range coding (AC). The former is
much faster, but approximates probabilities with powers of 2, usually
leading to relatively low compression rates. The latter uses nearly
exact probabilities - easily approaching theoretical compression rate...

Barcodes like QR Codes have made that encoded messages have entered our
everyday life, what suggests to attach them a second layer of information:
directly available to human receiver for informational or marketing purposes.
We will discuss a general problem of using codes with chosen statistical
constrains, for example reproducing given grayscale...

There is a common problem of operating on hash values of elements of some
database. In this paper there will be analyzed informational content of such
general task and how to practically approach such found lower boundaries.
Minimal prefix tree which distinguish elements turns out to require
asymptotically only about 2.77544 bits per element, while...

The rapidly improving performance of modern hardware renders convolutional
codes obsolete, and allows for the practical implementation of more
sophisticated correction codes such as low density parity check (LDPC) and
turbo codes (TC). Both are decoded by iterative algorithms, which require a
disproportional computational effort for low channel noi...

Surprisingly the looking natural random walk leading to Brownian motion
occurs to be often biased in a very subtle way: usually refers to only
approximate fulfillment of thermodynamical principles like maximizing
uncertainty. Recently, a new philosophy of stochastic modeling was introduced,
which by being mathematically similar to euclidean path in...

We review various features of the statistics of random paths on graphs. The relationship between path statistics and Quantum Mechanics (QM) leads to two canonical ways of defining random walk on a graph, which have different statistics and hence different entropies. Generic random walk (GRW) is in correspondence with the field-theoretical formalism...

In this paper I will try to convince that quantum mechanics does not have to lead to indeterminism, but is just a natural consequence of four-dimensional nature of our world - that for example particles shouldn't be imagined as 'moving points' in space, but as their trajectories in the spacetime like in optimizing action formulation of Lagrangian m...

We define a new class of random walk processes which maximize entropy. This maximal entropy random walk is equivalent to generic random walk if it takes place on a regular lattice, but it is not if the underlying lattice is irregular. In particular, we consider a lattice with weak dilution. We show that the stationary probability of finding a parti...

In this paper will be presented new approach to entropy coding: family of generalizations of standard numeral systems which are optimal for encoding sequence of equiprobable symbols, into asymmetric numeral systems - optimal for freely chosen probability distributions of symbols. It has some similarities to Range Coding but instead of encoding symb...

Presented approach in polynomial time calculates large number of invariants for each vertex, which won't change with graph isomorphism and should fully determine the graph. For example numbers of closed paths of length k for given starting vertex, what can be though as the diagonal terms of k-th power of the adjacency matrix. For k=2 we would get d...

In this paper will be introduced large, probably complete family of complex base systems, which are 'proper' - for each point of the space there is a representation which is unique for all but some zero measure set. The condition defining this family is the periodicity - we get periodic covering of the plane by fractals in hexagonal-type structure,...

In this paper it is shown how to almost optimally encode information in valuations of discrete lattice with some translational invariant constrains. The method is based on finding statistical description of such valuations and changing it into statistical algorithm: which allow to construct deterministically valuation with given statistics. Optimal...

In this paper we will introduce the methodology of analysis of the convex hull of the attractors of iterated functional systems (IFS) - compact fixed sets of self-similarity mapping. The method is based on a function which for a direction, gives width in that direction. We can write the self similarity equation in terms of this function, solve and...

## Questions

Questions (5)

Stimulated emission-abosrption are CPT analogs, one allows for state preparation for photonic quantum computers - couldn't we use the second for CPT analogue of state preparation?

If so, e.g. by placing such photonic chip inside flux of ring laser, we could both push photons to it, and simultaneously pull through it for better control - in theory being able attack NP problems.

Article:

Stern-Gerlach experiment is often seen as idealization of measurement. Using strong magnetic field, it makes magnetic dipoles (of e.g. atoms) align in parallel or anti-parallel way. Additionally, gradient of magnetic field bends trajectories depending on this choice.

Magnetic dipoles in magnetic field undergo e.g. Larmor precession ( https://en.wikipedia.org/wiki/Larmor_precession ) due to

*τ*=*μ*×*B*torque, unless*μ*×*B*=0 what means parallel or anti-parallel alignment.Precession means magnetic dipole becomes kind of antenna, should radiate this additional kinetic energy. Thanks to duality between electric and magnetic field ( https://en.wikipedia.org/wiki/Duality_(electricity_and_magnetism) ), we can use the attached formula for precessing electric dipole, e.g. from http://www.phys.boun.edu.tr/%7Esevgena/p202/docs/Electric%20dipole%20radiation.pdf .

Using which I get power like 10^−3

*W*, suggesting radiation of atomic scale energies (∼10^−18*J*) in e.g. femtoseconds (to*μ*×*B*=0 parallel or anti-parallel).So can we see spin alignment in Stern-Gerlach as a result of EM radiation of precessing magnetic dipole?

Beside photons, can we interpret other spin measurement experiments this way?

I am working on 2nd order optimizer with Hessian estimator from online MLE linear regression of gradients, mostly updating 4 exponential moving averages: of (theta, g, theta*g, theta2). Attached simple 2D Beale function example, after 30 steps it gets ~50x smaller values than momentum.

I wanted to propose a discussion about various 2nd order approaches using only gradients - I am aware of: conjugated gradients, quasi-Newton especially L-BFGS, Gauss-Newton.

Any others? Which one seems the more practical to expand for NN training?

How to transform them to high dimension? I thought about building 2nd order model on updated e.g. 10 dimensional locally interesting space e.g. from online PCA of gradients, and in the remaining directions still use e.g. momentum.

How to optimally use such estimated Hessian - especially handle very low and negative eigenvalues? (abs, div&cut above)

Slides with gathered various approach (any interesting missing?): https://www.dropbox.com/s/54v8cwqyp7uvddk/SGD.pdf

Derivation of this OGR Hessian estimator: https://arxiv.org/pdf/1901.11457