Conference PaperPDF Available

Automated Quality Assessment for Compressed Vibrotactile Signals Using Multi-Method Assessment Fusion

Authors:

Abstract and Figures

Design and optimization of vibrotactile codecs require precise measurements of the compressed signals' perceptual quality. In this paper, we present two computational approaches for estimating vibrotactile signal quality. First, we propose a novel full-reference vibrotactile quality metric called Spectral Perceptual Quality Index (SPQI), which computes a similarity score based on a computed perceptually weighted error measure. Second, we use the concept of Multi-Method Assessment Fusion (MAF) to predict the subjective quality. MAF uses a Support Vector Machine regressor to fuse multiple elementary metrics into a final quality score, which preserves the strengths of the individual metrics. We evaluate both proposed quality assessment methods on an extended subjective dataset, which we introduce as part of this work. For two of three tested vibrotactile codecs, the MSE between subjective ratings and the SPQI is reduced by 64% and 92%, respectively compared to the state of the art. With our MAF approach, we obtain the only currently available metric that accurately predicts real human user experiments for all three tested codecs. The MAF estimations reduce the average MSE to the subjective ratings over all three tested codecs by 59% compared to the best performing elementary metric.
Content may be subject to copyright.
Automated Quality Assessment for Compressed Vibrotactile Signals
Using Multi-Method Assessment Fusion
Andreas Noll1,3, Markus Hofbauer1, Evelyn Muschter2,3, Shu-Chen Li2,3, and Eckehard Steinbach1,3,
c
2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers
or lists, or reuse of any copyrighted component of this work in other works.
Abstract Design and optimization of vibrotactile codecs
require precise measurements of the compressed signals’ per-
ceptual quality. In this paper, we present two computational
approaches for estimating vibrotactile signal quality. First, we
propose a novel full-reference vibrotactile quality metric called
Spectral Perceptual Quality Index (SPQI), which computes a
similarity score based on a computed perceptually weighted
error measure. Second, we use the concept of Multi-Method
Assessment Fusion (MAF) to predict the subjective quality.
MAF uses a Support Vector Machine regressor to fuse multiple
elementary metrics into a final quality score, which preserves
the strengths of the individual metrics. We evaluate both
proposed quality assessment methods on an extended subjective
dataset, which we introduce as part of this work. For two of
three tested vibrotactile codecs, the MSE between subjective
ratings and the SPQI is reduced by 64% and 92%, respectively
compared to the state of the art. With our MAF approach,
we obtain the only currently available metric that accurately
predicts real human user experiments for all three tested codecs.
The MAF estimations reduce the average MSE to the subjective
ratings over all three tested codecs by 59% compared to the
best performing elementary metric.
I. INTRODUCTION
Quality assessment methods are widely used in audio and
video compression algorithm development. In the context
of haptic codec development, quality assessment methods
are equally important to guarantee high fidelity signals [1].
Concerning the tactile domain, the ongoing standardization
efforts for Tactile Internet and haptic codecs [2] have come
to fruition in vibrotactile codec proposals such as [3]–[5],
which allow for compressing vibrotactile signals in a similar
way as acoustic and visual signals. A major goal of these
efforts is to deliver tactile experiences over communication
networks with the best possible perceptual quality. This
requires accurately and effectively measuring the perceptual
quality. Thus, considering the human perceptual limitations
in the process is essential [6], [7].
*Funded by the German Research Foundation (DFG, Deutsche
Forschungsgemeinschaft) as part of Germany’s Excellence Strategy – EXC
2050/1 – Project ID 390696704 – Cluster of Excellence “Centre for
Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universit¨
at
Dresden.
1Department of Electrical and Computer Engineering and
Munich Institute of Robotics and Machine Intelligence (MIRMI),
Technical University of Munich, 80333 Munich, Germany, e-mail:
{andreas.noll,markus.hofbauer,eckehard.steinbach}@tum.de.
2Chair of Lifespan Developmental Neuroscience, Technische Univer-
sit¨
at Dresden, 01062 Dresden, Germany, e-mail: {evelyn.muschter, shu-
chen.li}@tu-dresden.de
3Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische
Universit¨
at Dresden, 01062 Dresden, Germany.
A. Noll and M. Hofbauer contributed equally to this work.
So far, perceptual quality in the tactile domain has mainly
been measured with human experimental procedures [8].
However, these experiments are time-consuming and require
recruiting many participants. Ultimately, the goal is to avoid
human assessment studies by using computable perceptual
quality metrics. Similar to other domains, the use of different
compression techniques results in different types of coding
artifacts. This in turn leads to different performances of
elementary quality metrics with their own strengths and
weaknesses. The current perceptual metrics are unable to
reflect the human experimental results consistently.
In this paper, we address this problem by proposing
two new methods. First, we propose a novel full-reference
vibrotactile quality metric called Spectral Perceptual Quality
Index (SPQI). SPQI uses the absolute threshold of vibro-
tactile perception to compute a perceptual error measure.
This error measure is then mapped to a score strictly be-
tween 0 and 1 to reflect a similarity rating closely related
to human user studies [8]. Second, we use Multi-Method
Assessment Fusion (MAF) to combine multiple quality met-
rics. Our MAF approach, called Vibrotactile Multi-Method
Assessment Fusion (VibroMAF), is inspired by Video Multi-
Method Assessment Fusion (VMAF) [9] and assigns weights
to each elementary metric using a Support Vector Machine
(SVM) to fuse the weighted quality measures into a final
score. This allows for better preserving the strengths of the
individual metrics. Additionally, we introduce a new dataset
with subjective quality ratings for training and evaluating the
proposed metrics.
In summary, we have the following main contributions:
We introduce a new dataset with subjective quality
ratings for the vibrotactile codec from [3].
We propose a novel perceptual vibrotactile quality met-
ric called Spectral Perceptual Quality Index (SPQI).
We adopt Multi-Method Assessment Fusion (MAF) to
predict the subjective quality by a weighted combination
of multiple individual quality metrics.
II. REL ATED WORK
In the tactile domain, the three vibrotactile codecs de-
scribed in [3]–[5], [11] represent the state of the art (Table I).
The first codec named Vibrotactile Perceptual Codec using
DWT and SPIHT (VPC-DS) was introduced in [5]. It uses
a DWT, quantization and the SPIHT algorithm together
with a psychohaptic model, which steers the quantization
to compress signals with minimal perceptual impairments.
Therefore, impairments are mostly introduced in frequency
ranges where they are not perceivable. The second codec
VC-PWQ Vibrotactile Codec with Perceptual Wavelet Quantization [3]
PVC-SLP Perceptual Vibrotactile Codec with Sparse Linear Prediction [4]
VPC-DS Vibrotactile Perceptual Codec using DWT and SPIHT [5]
ST-SIM Spectral-Temporal SIMilarity [10]
SPQI Spectral Perceptual Quality Index
VibroMAF Vibrotactile Multi-Method Assessment Fusion
PC Pearson Correlation
MSE Mean Squared Error
TABLE I: Overview of the acronyms for the relevant codecs,
perceptual metrics and performance criteria.
named Perceptual Vibrotactile Codec with Sparse Linear
Prediction (PVC-SLP) [4] employs linear prediction with
a sparsity constraint to decompose the input signal into
coefficients and a residual. These coefficients and the residual
are then quantized independently. A so-called acceleration
sensitivity function, derived from the absolute threshold of
perception, quantizes the residual. The third codec named
Vibrotactile Codec with Perceptual Wavelet Quantization
(VC-PWQ) was introduced in [3] improving on the previous
codec from [5]. Specifically, the psychohaptic model was
refined, the quantization model was changed to preserve
more information and an added arithmetic coding stage
served to achieve higher compression.
So far, one relevant perceptual metric for vibrotactile
signals has been introduced, named the Spectral-Temporal
SIMilarity (ST-SIM) [10]. It uses spectral and temporal
properties of compressed signals to compute a score between
0 and 1. First, the absolute threshold is subtracted from
the spectra of original and compressed signals and the
results are then mapped to a range between 0 and 1 using
a sigmoid function. Then, the overlap between the two
resulting functions is calculated. This operation examines
only the overlap of perceivable frequencies and not the exact
difference. In time domain, the similarity score between two
signals is calculated through a formula similar to Pearson
Correlation (PC). The time component of this metric is
therefore not perceptual. Finally, spectral and temporal scores
are combined through weighted multiplication. A empirically
selected parameter determines how strongly the spectral and
temporal scores are considered. As in previous work, in this
paper we weigh the temporal score twice as strongly as the
spectral score, which is supported by [12].
In [8], we presented a subjective assessment method
based on perceptual similarity comparisons. The subjective
assessment is performed by displaying a pair of original and
compressed signal two times consecutively with different
orderings (i.e. original signal, then compressed signal and
the reverse) to human assessors. After that, the assessors
are asked to rate the similarity of the compressed signal
to the original signals on a scale from 0 to 10 with 10 as
the highest similarity. High similarity implies high signal
quality after compression here. The experiment includes
a hidden reference and two anchor signals. The hidden
reference resembles a catch trial in which the supposedly
compressed signal is actually the original signal. This hidden
reference should receive high ratings and assessors who rate
this very low can be excluded in the post-screening step. The
anchor signals are signals containing controlled perceptual
impairments. They serve to assess whether the rating scale
is appropriate. Using this method, we recorded ratings for the
PVC-SLP and VPC-DS codecs with 19 participants, which
are also presented in [8].
The area of image and video processing already provides
a large number of codecs and quality metrics with various
configuration options. Depending on the video content, in-
dividual quality metrics perform differently with respective
strengths and weaknesses. To achieve a meaningful per-
formance for a wide range of video contents and codecs,
fusion-based quality assessment methods such as Fusion-
based Video Quality Assessment index [13] and Ensemble-
learning-based Video Quality Assessment index [14] allow
for compensating the weaknesses of individual metrics. To-
gether with Netflix, the authors of [13], [14] proposed Video
Multi-Method Assessment Fusion (VMAF) [9] to estimate
the subjective quality. VMAF combines the strengths of
multiple elementary video quality metrics such as Peak
Signal-to-Noise Ratio, Structural Similarity Index, etc. by
fusing the individual metric scores using a SVM regressor.
This approach achieves more accurate results than traditional
methods and has become the defacto standard for video
quality assessment [15]. In this work, we follow the idea
of VMAF by fusing multiple elementary vibrotactile quality
metrics.
III. SUB JEC TIVE EVALUATION DATA
In this section, we introduce a new dataset with subjective
quality ratings of vibrotactile signals compressed by the
VC-PWQ codec [3]. Using the assessment method from [8],
which is largely based on MUSHRA from the audio do-
main [16], we measured ratings with ten human participants
between 20 and 65 years old. All participants reported as
healthy with normal tactile perception capabilities.
For the assessment, we selected the same 8signals (4
materials, recorded with the 3x3 tooltip, fast and slower
speeds) as in [8] from the database of [17]. As described in
[8], these materials, aluminium grid (Signal IDs: 117, 120),
cork (Signal IDs: 133, 136), polyester pad (Signal IDs: 146,
149) and rubber (Signal IDs: 150, 153), are chosen since
they cover a broad range of vibrotactile signal characteristics
from the 280-signal database. The ratings are measured at
nine different compression ratios (CRs), namely 5,10,15,
20,25,30,35,40 and 45.
To enable a comparison to the previously published data
in [8], we normalize all ratings using the hidden reference.
In theory, the hidden reference should receive a rating of 10
since it is exactly equal to the original signal. In reality the
mean rating for the hidden reference signal is around 8.5to
9. In our data, we also observe that there is almost no mean
rating above that of the hidden reference. Thus, we normalize
all ratings for each signal individually by dividing through
the mean rating of the corresponding hidden reference. Thus,
the range of these normalized ratings is now between 0 and
1. We can justify this normalization with the fact that if a
signal receives a similar rating as the hidden reference, it can
10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
Compression Ratio
Score
VC-PWQ
PVC-SLP
VPC-DS
Fig. 1: Average quality score of the normalized and inter-
polated subjective quality ratings for the three vibrotactile
codecs VC-PWQ, PVC-SLP, and VPC-DS.
be regarded as perceptually equal to the original signal and
should therefore receive the highest possible rating.
The ratings for all signals are measured for 6different CRs
for the PVC-SLP and VPC-DS in [8]. For the VC-PWQ,
we measure ratings at 9different CRs. Originally available
were 17 different CR levels for all codecs [3]–[5]. Thus,
to make direct comparison of computed metrics and ratings
possible, rating data needs to be available for the 17 original
CRs. In short, the 6or 9CRs of each signal are a subset
of these original 17 CRs. This means, we can acquire the
rating data for all CRs by interpolating the measurements
from the 6or 9onto the 17 CRs. To do this interpolation,
we use the interp1 function in MATLAB and interpolate
the rating measurements for each signal individually. The
specific interpolation method used is makima, which is
similar to spline but has significantly less over- and
undershoots on the edges. We verify that the interpolation
causes no visible difference in ratings by visual inspection.
Fig. 1 visualizes the normalized and interpolated sub-
jective quality ratings for the three vibrotactile codecs
VC-PWQ, PVC-SLP, and VPC-DS. We show the mean
curves over all eight signals and the standard deviation
interval. We observe that the VC-PWQ now matches the
performance of PVC-SLP for lower CRs, while for higher
CRs the VC-PWQ is the best choice. The new measurements
show slightly more variation over the CR, which we attribute
to the fact that we have fewer human subjects for these
ratings.
IV. SPECTRAL PER CEP TUAL QUAL ITY IN DEX
We propose a novel full-reference perceptual metric called
Spectral Perceptual Quality Index (SPQI) that is able to
reflect real-life experimental results accurately. Our goal is
to avoid some of the shortcomings of existing metrics and
produce more accurate results.
Fig. 2 depicts the process of computing the SPQI. To
compute the SPQI of a compressed signal c[n]with respect
to its original signal s[n], we first divide the signals into
blocks ci[n]and si[n]. The blocks are transformed with a
DCT transform to obtain Si[m]and Ci[m], respectively.
We first subtract the absolute threshold of perception T[m]
from Si[m]and Ci[m]. This threshold is calculated as a
function of frequency as described in [3]. This subtraction
in dB is a filtering operation, where the most perceivable
parts of each block are amplified. We receive the perceptually
weighted spectra Sw,i[m]and Cw ,i[m], which are trans-
formed from dB to power. These two weighted spectra are
then subtracted from each other. This gives us a perceptually
weighted difference spectrum. Then, we calculate the sum
of all absolute values of the resulting difference spectrum
and normalize by the sum of the absolute values of Sw,i[m].
Since we have a power spectrum, the summation of all values
corresponds to the calculation of the energy of the difference
spectrum. After that we transform back to the dB domain.
The resulting value ep,i can be regarded as a measure for
the perceptual error, as it resembles a perceptually weighted,
normalized Mean Squared Error (MSE). In contrast to an
objective error measure, it contains information on the per-
ceptual relevance of the error, which was introduced into the
compressed signal.
We assume that humans tolerate a certain amount of this
perceptual error up to some threshold before it becomes
noticeable. This means that for a low value of ep,i the SPQI
should be close to 1 since perceptual signal quality is high.
Conversely, for ep,i being high, the SPQI should be close
to 0, since a high perceptual error means low signal quality.
In between, the SPQI value should be declining around the
threshold value τ. A mapping from ep,i to an SPQI score
that has the described properties is
SPQIi=1
2(1 tanh(η(ep,i τ))) =: Ξ(ep,i).(1)
Here, the parameter ηcontrols the slope by which the
SPQI declines around τ. To the best of our knowledge, no
studies exist that would allow us to derive optimal values
for τand η. Thus, we aim to find close-to-optimal values
for these two parameters by using the available data. We
describe the derivation of these parameters in Section VI-B.
Finally, we receive the SPQI value SPQIifor the i-th
block. The SPQI of c[n]is then calculated as the mean of
SPQIiover all i. This enables us to accurately measure the
vibrotactile signal quality reflecting human perception.
V. MU LTI -M E TH O D ASS ESS MEN T FUSION
Next, we propose a Multi-Method Assessment Fusion
(MAF) approach for predicting the subjective quality called
VibroMAF. VibroMAF is inspired by VMAF [9], which is
the defacto standard for video quality assessment [15]. We
follow the idea of VMAF by fusing multiple elementary
vibrotactile quality metrics.
Every individual metric has its own strengths and weak-
nesses, depending on the source signal, the employed codec,
or the degree of distortion. Fusing the individual metrics into
a final score combines the strengths of all input metrics.
Si[m] [dB]
Ci[m] [dB]
+
+
T[m]
T[m]
dB pow
dB pow
+Pm|·|
Pm|Sw,i [m]|pow dB Ξ(·)SPQIi
ep,i
Sw,i [m]
Cw,i [m]
Fig. 2: SPQI computation process.
VibroMAF
SPQI ST-SIMNSNR
SVM
c[n], s[n]c[n], s[n]c[n], s[n]
VibroMAF Score
Fig. 3: Workflow of the proposed VibroMAF. The SVM
regressor determines the weight for each individual metric
score calculated from the compressed signal c[n]and original
signal s[n].
Similar to VMAF, we train a Support Vector Machine (SVM)
regressor to assign weights to the elementary metrics and
fuse them into a single quality score.
Fig. 3 visualizes the workflow of the proposed SVM based
metric fusion. VibroMAF considers the Normalized Signal-
to-Noise Ratio (NSNR), the ST-SIM, and the proposed
SPQI as input for the SVM. The NSNR is calculated by
normalizing all SNR values by 75 dB and restricting the
output to the range of 0to 1. The SVM combines the
weighted metric scores into a final output. This results in
an accurate and comparable estimation of the signal quality
and allows for extending MAF with future elementary quality
metrics to further improve the estimation.
VI. EVAL UATI ON AN D RESU LTS
In this section, we define the performance evaluation
criteria, discuss the experiments conducted to evaluate the
proposed approaches and present the results.
A. Performance Criteria
We first define the premise on which we evaluate the
suitability of perceptual metrics. We measure the MSE and
PC of the estimated quality scores compared to the scores of
the subjective experiments. We compute the two measures
using the ratings for all eight signals rather than just the
overall mean rating.
The PC provides insights how well the computed metric
values correlate with the real ratings. This is important be-
cause in quality assessment, we often are evaluating quality
by comparing. Therefore, we are interested to know if the
metric can discern differences in perceptual quality between
codecs and different CRs.
Metric VC-PWQ [3] PVC-SLP [4] VPC-DS [5]
min MSE SPQI 0.006 0.028 0.005
MSE ST-SIM [10] 0.017 0.009 0.064
max PC SPQI 0.843 0.876 0.960
PC ST-SIM [10] 0.837 0.964 0.921
TABLE II: Performance comparison of SPQI and ST-SIM for
the vibrotactile codecs VC-PWQ, PVC-SLP, and VPC-DS.
The MSE determines how close the metric values are to
the ratings. Intuitively, the best metric is the one that provides
scores that are equal to the experimental ratings. Thus, we
seek to achieve the smallest possible MSE.
B. SPQI
We evaluate the SPQI on the dataset of 8signals described
in Section III by computing the PC and MSE as described in
Section VI-A. We do so by varying the parameters τbetween
5 dB and 0 dB with steps of 0.1 dB and ηbetween 0 and
1 with steps of 0.05. The block length is 512 samples. Fig. 4
shows the resulting MSEs and PCs for all three codecs.
First, we see that the results for VC-PWQ and VPC-DS
are similar, whereas PVC-SLP leads to a different outcome.
In order to assess, whether the SPQI can outperform the
ST-SIM, we compute the minimum MSE and maximum PC
values over ηand τof the SPQI for all three codecs in
Table II. Comparing to the values of ST-SIM, we can see that
for the PVC-SLP the SPQI never achieves the performance
of ST-SIM. For the other two codecs however, it is possible
to achieve substantially better ratings with the SPQI.
To find the close-to-optimal set of parameters, we optimize
jointly for the VC-PWQ and VPC-DS. This is justified
since the plots in Fig. 4 for these two codecs are highly
similar. Maximizing the PC gives us τ=3.1 dB and
η= 0.4. However, for these values the MSE is poor with
0.018 and 0.014 for the VC-PWQ and VPC-DS, respectively.
Minimizing the MSE results in τ=2.0 dB and η= 0.3.
Using those parameters minimizes the MSE of VC-PWQ and
VPC-DS to 0.007 and 0.006, respectively. We achieve a PC
of 0.839 for VC-PWQ and 0.960 for VPC-DS. Thus, we see
that the PC is almost at the maximum value. Fig. 4 depicts
the chosen values of τand ηin red points.
In Fig. 5, we show the resulting mean curves of the ratings
from Fig. 1, the SPQI, and ST-SIM for all eight signals. For
0
0.2
0.4
0.6
0.8
1
4
2
0
0
0.1
ητ
MSE
0
0.2
0.4
0.6
0.8
1
4
2
0
0
0.1
0.2
ητ
MSE
0
0.2
0.4
0.6
0.8
1
4
2
0
0
0.05
0.1
ητ
MSE
0
0.2
0.4
0.6
0.8
1
4
2
0
0.6
0.8
ητ
PC
(a) VC-PWQ
0
0.2
0.4
0.6
0.8
1
4
2
0
0.6
0.8
ητ
PC
(b) PVC-SLP
0
0.2
0.4
0.6
0.8
1
4
2
0
0.6
0.8
1
ητ
PC
(c) VPC-DS
Fig. 4: MSE (first row) and PC (second row) as a function of τand ηfor the three examined vibrotactile codecs averaged
over all test signals. The red dot highlights the chosen values of η= 0.3and τ=2.0.
10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
Compression Ratio
Score
VC-PWQ
PVC-SLP
VPC-DS
Fig. 5: Comparison of the SPQI (dashed) and ST-SIM
(dash-dotted) to the subjective ratings (solid) for the three
vibrotactile codecs VC-PWQ, PVC-SLP, and VPC-DS
.
the PVC-SLP, the ST-SIM matches the ratings closer, while
for the VC-PWQ and VPC-DS our new SPQI is superior.
C. VibroMAF
We train the SVM using the dataset with subjective
ratings introduced in Section III. We select six of the eight
signals for training and two remain for testing. The two
test signals selected are aluminium grid - fast (120) and
polyester pad - slower (149) as representative test signals of
different material classes and recording speeds. We configure
the SVM regressor with a Radial Basis Function kernel, a
regularization parameter of 3000, and an epsilon of 0.1. All
configurations we used were determined empirically.
We evaluate the performance of VibroMAF on the two
Metric All Codecs VC-PWQ [3] PVC-SLP [4] VPC-DS [5]
MSE VibroMAF 0.011 0.007 0.019 0.006
MSE SPQI 0.027 0.009 0.067 0.006
MSE ST-SIM [10] 0.037 0.019 0.012 0.080
MSE NSNR 0.440 0.452 0.526 0.341
PC VibroMAF 0.918 0.854 0.901 0.957
PC SPQI 0.800 0.807 0.741 0.982
PC ST-SIM [10] 0.775 0.831 0.945 0.918
PC NSNR 0.453 0.433 0.739 0.536
TABLE III: Performance comparison of VibroMAF with the
elementary quality metrics SPQI, ST-SIM, and NSNR for the
vibrotactile codecs VC-PWQ, PVC-SLP, and VPC-DS.
test signals with subjective ratings. We encode the signals at
different quality levels with the vibrotactile codecs VC-PWQ,
PVC-SLP, and VPC-DS. We measure the MSE and PC
of VibroMAF and the elementary quality metrics SPQI,
ST-SIM, and NSNR. Table III summarizes the resulting
quality scores for the individual codecs and the average of
all codecs.
On average, VibroMAF performs best with an MSE
of 0.011 and a PC of 0.915. For the individual codecs,
VibroMAF shows the best performance on VC-PWQ for both
MSE and PC. The proposed elementary metric SPQI shows
together with VibroMAF the lowest MSE for the VPC-DS
codec and the highest PC for this codec. For the PVC-SLP
codec, the ST-SIM achieves the best performance in terms of
MSE and PC. Notably, the MSE of the NSNR is significantly
higher than the other metrics.
While VibroMAF does not outperform ST-SIM for the
PVC-SLP codec, it benefits from the strengths of the other
metrics and hence shows the best performance on average.
10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
Compression Ratio
Score
VC-PWQ, Rating
VC-PWQ, VibroMAF
PVC-SLP, Rating
PVC-SLP, VibroMAF
VPC-DS, Rating
VPC-DS, VibroMAF
Fig. 6: VibroMAF score compared to subjective ratings for
the vibrotactile codecs VC-PWQ, PVC-SLP, and VPC-DS.
Fig. 6 visualizes the VibroMAF scores estimated and the
subjective ratings for the individual codecs highlighting the
precise matches for all codecs. Contrarily, the performance
of the elementary metrics is highly correlated to the selected
codec. A possible explanation for the strong performance of
the metric-encoder pairs (VC-PWQ, SPQI) and (PVC-SLP,
ST-SIM) is that the metrics represent the strengths of the
respective codec. This demonstrates the ability of VibroMAF
to combine the strengths of all individual metrics. Further,
it enables a better comparability among the signal qualities
of different codecs. Additionally, VibroMAF can be further
improved with future quality metrics and vibrotactile codecs.
VII. CONCLUSION
In this paper, we proposed the novel metric SPQI to
accurately estimate the perceptual quality of compressed
vibrotactile signals. The SPQI computes the perceptually
weighted error measure and maps this error measure to a
similarity score between 0 and 1. Inspired by VMAF, we
developed the fusion method VibroMAF which combines
multiple elementary vibrotactile quality metrics. This allows
for combining the strengths of multiple elementary metrics.
We used a SVM to fuse the individual scores of SPQI,
ST-SIM, and NSNR into a final quality estimation. Addi-
tionally, we introduced a new dataset with subjective ratings
of vibrotactile signals which was used for training and the
evaluation of the proposed metrics.
With the SPQI, we reduced the MSE between the sub-
jective ratings and the computed metric by 64 % and 92 %
compared to the state of the art for two of the three latest
vibrotactile codecs, while for the third codec ST-SIM is
superior. With VibroMAF we reduce the average MSE to the
subjective ratings over all three codecs by 59 % compared to
best performing elementary metric.
While VibroMAF does not outperform all elementary
metrics for all vibrotactile codecs, the results demonstrate
that fusing individual metric scores allows for compensating
weaknesses of the elementary metrics for certain codecs.
Further, VibroMAF is well suited for generating accurate and
comparable quality ratings across all three codecs. For future
work, VibroMAF is extendable with new vibrotactile quality
metrics to further increase the resulting accuracy. We provide
VibroMAF as well as the elementary metrics as Open Source
Python implementation available on GitHub1for a simple
usage and extension with future metrics.
REFERENCES
[1] E. Steinbach, S. Hirche, M. Ernst, F. Brandi, R. Chaudhari, J. Kam-
merl, and I. Vittorias, “Haptic communications,Proceedings of the
IEEE, vol. 100, no. 4, pp. 937–956, 2012.
[2] E. Steinbach, M. Strese, M. Eid, X. Liu, A. Bhardwaj, Q. Liu, M. Al-
Ja’afreh, T. Mahmoodi, R. Hassen, A. El Saddik et al., “Haptic codecs
for the tactile internet,” Proceedings of the IEEE, vol. 107, no. 2, pp.
447–470, 2018.
[3] A. Noll, L. Nockenberg, B. G¨
ulecy¨
uz, and E. Steinbach, “Vc-pwq:
Vibrotactile signal compression based on perceptual wavelet quantiza-
tion,” in 2021 IEEE World Haptics Conference (WHC). IEEE, 2021,
pp. 427–432.
[4] R. Hassen, B. Guelecyuez, and E. G. Steinbach, “Pvc-slp: Perceptual
vibrotactile-signal compression based-on sparse linear prediction,”
IEEE Transactions on Multimedia, 2020.
[5] A. Noll, B. G¨
ulecy¨
uz, A. Hofmann, and E. Steinbach, “A rate-scalable
perceptual wavelet-based vibrotactile codec,” in 2020 IEEE Haptics
Symposium (HAPTICS). IEEE, 2020, pp. 854–859.
[6] T. L. Senkow, N. D. Theis, J. C. Quindlen-Hotek, and V. H. Barocas,
“Computational and psychophysical experiments on the pacinian cor-
puscle’s ability to discriminate complex stimuli,IEEE Transactions
on Haptics, vol. 12, no. 4, pp. 635–644, 2019.
[7] S.-C. Li, E. Muschter, J. Limanowski, and A. Hatzipanayioti, “Chapter
9 - human perception and neurocognitive development across the
lifespan,” in Tactile Internet, F. H. Fitzek, S.-C. Li, S. Speidel,
T. Strufe, M. Simsek, and M. Reisslein, Eds. Academic Press, 2021,
pp. 199–221.
[8] E. Muschter, A. Noll, J. Zhao, R. Hassen, M. Strese, B. Guele-
cyuez, S.-C. Li, and E. Steinbach, “Perceptual quality assessment of
compressed vibrotactile signals through comparative judgment,IEEE
Transactions on Haptics, 2021.
[9] N. T. Blog, “Toward A Practical Perceptual Video Quality
Metric,” Jun. 2016. [Online]. Available: https://netflixtechblog.com/
toward-a-practical- perceptual- video-quality- metric- 653f208b9652
[10] R. Hassen and E. Steinbach, “Subjective evaluation of the spectral tem-
poral similarity (st-sim) measure for vibrotactile quality assessment,”
IEEE Transactions on Haptics, vol. 13, no. 1, pp. 25–31, 2020.
[11] E. Steinbach, S.-C. Li, B. G¨
ulec¸y ¨
uz, R. Hassen, T. Hulin, L. Jo-
hannsmeier, E. Muschter, A. Noll, M. Panzirsch, H. Singh, and
X. Xu, “Chapter 5 - haptic codecs for the tactile internet,” in Tactile
Internet, F. H. Fitzek, S.-C. Li, S. Speidel, T. Strufe, M. Simsek, and
M. Reisslein, Eds. Academic Press, 2021, pp. 103–129.
[12] L. A. Jones and N. B. Sarter, “Tactile displays: Guidance for their
design and application,” Human factors, vol. 50, no. 1, pp. 90–111,
2008.
[13] J. Y. Lin, T.-J. Liu, E. C.-H. Wu, and C.-C. J. Kuo, “A fusion-based
video quality assessment (fvqa) index,” in Signal and Information
Processing Association Annual Summit and Conference (APSIPA),
2014 Asia-Pacific, 2014, pp. 1–5.
[14] J. Y. Lin, C.-H. Wu, I. Katsavounidis, Z. Li, A. Aaron, and C.-C. J.
Kuo, “Evqa: An ensemble-learning-based video quality assessment
index,” in 2015 IEEE International Conference on Multimedia Expo
Workshops (ICMEW), 2015, pp. 1–6.
[15] N. T. Blog, “VMAF: The Journey Continues,
Oct. 2018. [Online]. Available: https://netflixtechblog.com/
vmaf-the- journey-continues- 44b51ee9ed12
[16] B. Series, “Method for the subjective assessment of intermediate qual-
ity level of audio systems,International Telecommunication Union
Radiocommunication Assembly, 2014.
[17] J. Kirsch, A. Noll, M. Strese, Q. Liu, and E. Steinbach, “A low-cost
acquisition, display, and evaluation setup for tactile codec develop-
ment,” in 2018 IEEE International Symposium on Haptic, Audio and
Visual Environments and Games (HAVE). IEEE, 2018, pp. 1–6.
1https://github.com/hofbi/vibromaf
Article
Full-text available
We present a comprehensive scheme for the quality assessment of compressed vibrotactile signals with human assessors. Inspired by the multiple stimulus test with hidden reference and anchors (MUSHRA) from the audio domain, we designed a method in which each compressed signal is compared to its original signal and rated on a numerical scale. For each signal tested, the hidden reference and two anchor signals are used to validate the results and provide assessor screening criteria. Differing from previous approaches, our method is hierarchically structured and strictly timed in a sequential manner to avoid experimental confounds and provide precise psychophysical assessments. We validated our method in an experiment with 20 human participants in which we compared two state-of-the-art lossy codecs. The results show that, with our approach, the performance of different codecs can be compared effectively. Furthermore, the method also provides a measure of subjective quality at different data compression rates. The proposed procedure can be easily adapted to evaluate other vibrotactile codecs.
Conference Paper
Full-text available
For a fully immersive virtual reality experience, humans have to be presented with high quality haptic stimuli in addition to audio and video. However, delivering haptic stimuli with high level of realism is still challenging. An important component of haptic stimulation is based on vibrotactile signals. They are emitted when sliding a tooltip or a finger over a textured surface and carry a large amount of information about the surface material properties. Vibrotactile signals have received considerable attention so far, though as the number of interaction points to be displayed will start to increase soon, it is vital that data rates are kept low. This calls for an efficient codec that is able to compress these signals while maintaining perceptual transparency. The IEEE P1918.1.1 standardization group has issued a call for contributions for such a codec. In this work, we present our contribution to this standardization effort. We have developed a highly efficient codec which employs a discrete wavelet transform, human tactile perceptual modeling, quantization, and lossless coding to achieve high compression, while maintaining perceptual signal quality. The proposed vibrotactile codec compresses the signals at least by a factor of 10 with practically no perceptual impairment for most signals. Thus, our approach significantly outperforms the current state-of-the-art.
Chapter
This chapter discusses the state of the art and current investigations by the authors in the field of perceptual haptic coding. The discussion covers both kinesthetic and tactile codecs, which take different types of input and target different objectives. Kinesthetic codecs are designed to reduce the number of packets to be exchanged bidirectionally during network-based physical interaction. Bilateral teleoperation of a robotic system with force feedback is an example for this. A special requirement in this context is to ensure stability and reduce data traffic despite the negative impact of delay in the bidirectional exchange of kinesthetic information. For this purpose, we marry kinesthetic data reduction schemes with stabilizing control approaches and thereby improve the trade-off between stability, transparency, and network resource usage. Tactile codecs are designed to minimize the required transmission rate during unidirectional exchange of surface interaction information. Compared to the kinesthetic codecs they are more delay-tolerant. Both types of haptic codecs share the need to incorporate mathematical models of human perception. The development of such models is a current research challenge. To this end, we describe the most widely used models of human kinesthetic and vibrotactile perception and how they can be leveraged in perceptual coding schemes. Additionally, haptic codecs need to support multiple points of interaction. This requires a hierarchical design, where spatial redundancy (e.g., on a finger, among fingers, across the hand, arm, etc.) is exploited. Finally, haptic codecs need to be learning-oriented, which means that they need to support remote learning scenarios, such as learning from (remote) demonstrations. We also describe and analyze the performance of the kinesthetic and tactile codecs under consideration within the IEEE standardization activity P1918.1.1. We present both objective and subjective evaluation results and complement the chapter with a discussion of the available objective quality measures that have been found to accurately predict human judgments of compressed haptic signals. The development of haptic codecs requires interdisciplinary expertise from psychology, signal processing, applied information theory, control, and sensor/actuator development. Haptic codecs are a key enabler for a wide range of applications, e.g., in industry or medicine.
Chapter
Humans interact with the environments through their senses. Since Helmholtz's classical concept, it is well known in psychology and cognitive neuroscience that human perception and action are influenced by an individual's prior sensory and learning experiences, as well as by other factors, such as task-specific goals or contexts. Focusing on human perception and action as one of the target primary research fields for the new transdisciplinary research of Tactile Internet with Human-in-the-Loop (TaHiL), this chapter reviews neurocognitive processes for multisensory perception, as well as how these processes are affected by age-related effects across the lifespan and individual differences. Neural information processing in brain circuitries that underlie the different senses not only operate very fast (on the order of <20 ms), but also at very different speeds. Whereas hearing and seeing operate, respectively, in the range of 3 ms or 15 ms, the tactile sense operates in the 1 ms time range. Furthermore, neuronal gain control of neural information processing and uncertainty reduction are key mechanisms of human multisensory perception. Empirical data from lifespan developmental psychology and cognitive neuroscience show that the one-size-fits-all assumption, which is commonly adopted in the engineering fields, cannot by default be applied to user populations covering broad age ranges. Empirical evidence indicates that mechanisms of brain development and aging can substantially impact neuronal gain control with consequences for the speed and robustness of human perception and action. Furthermore, prominent age-related differences pertaining to the development and aging of the frontal-parietal brain network affect multiple cognitive functions, such as attention, executive control, and valuation. These effects on cognition would further influence goal-directed multisensory perception and action across the lifespan. Thus the Human-in-the-Loop approach emphasized by the research of TaHiL requires systematic investigations of the effects of development, aging, and skill acquisition on the dynamic interplay between multisensory perception, goal anticipation, and action. Basic experimental research with population-based samples is indispensable for understanding age- and expertise-sensitive psychophysical and neurocognitive factors. Such knowledge can guide theoretical developments of computational models as well as engineering innovations of key technologies (e.g., sensors and actuators, data coding and compression methods, communication networks and human-inspired machine learning) for the advancement of next generation quasi-real-time human–machine interactions in the Tactile Internet.
Article
Developing a signal compression technique that is able to achieve a low bit rate while maintaining high perceptual signal quality is a classical signal processing problem vigorously studied for audio, speech, image, and video type of signals. Yet, until recently, there has been limited effort directed toward the compression of vibrotactile signals, which represent a crucial element of rich touch (haptic) information. A vibrotactile signal; produced when stroking a textured surface with a tool-tip or bare-finger; like other signals contains a great deal of redundant and imperceptible information that can be exploited for efficient compression. This paper presents PVC-SLP, a vibrotactile perceptual coding approach. PVC-SLP employs a model of tactile sensitivity; called ASF (Acceleration Sensitivity Function); for perceptual coding. The ASF is inspired by the four channels model that mediate the perception of vibrotactile stimuli in the glabrous skin. The compression algorithm introduces sparsity constraints in a linear prediction scheme both on the residual and the predictor coefficients. The perceptual quantization of the residual is developed through the use of ASF. The quantization parameters of the residual and the predictor coefficients were jointly optimized; by means of both squared error and perceptual quality measures; to find the sweet spot of the rate-distortion curve. PVC-SLP coding performance is evaluated using two publicly available databases that collectively comprise 1281 vibrotactile signals covering 193 material classes. Furthermore, we compare PVC-SLP with a recent vibrotactile compression method and show that PVC-SLP perceptually outperforms existing method by a sizable margin. Most recently, PVC-SLP has been selected to become part of the haptic codec standard currently under preparation by IEEE P1918.1.1, aka Haptic Codecs for the Tactile Internet.
Article
Recent standardization efforts for Tactile Internet (TI) and haptic codecs have paved the route for delivering tactile experiences in synchrony with audio and visual interaction components. Since humans are the ultimate consumers of tactile interactions, it is utmost important to develop objective quality assessment measures that are in close agreements with human perception. We present the results of a large-scale subjective study of a recently proposed objective quality assessment approach for vibrotactile signals called ST-SIM (Spectral Temporal SIMilarity). ST-SIM encompasses two components: perceptual spectral and temporal similarity measures. Two subjective experiments were conducted to validate ST-SIM, and elicited subjective ratings are used to create a VibroTactile Quality Assessment (VTQA) database. The VTQA database together with ST-SIM provide viable means to the development of vibrotactile compression and transmission applications. Our experimental results show that the ST-SIM highly correlates with human opinions in both experiments and significantly outperforms commonly used measures. The VTQA database is made publicly available at https://www.raniahassen.com/RESEARCH/ .
Article
Recognizing and discriminating vibrotactile stimuli is an essential function of the Pacinian corpuscle. This function has been studied at length in both a computational and an experimental setting, but the two approaches have rarely been compared, especially when the computational model has a high level of structural detail. In this work, we explored whether the predictions of a multiscale, multiphysical computational model of the Pacinian corpuscle can predict the outcome of a corresponding psychophysical experiment. The discrimination test involved either (1) two simple stimuli with frequency in the 160-500 Hz range, or (2) two complex stimuli formed by combining the waveforms for a 100 Hz stimulus with a second stimulus in the 160-500 Hz range. The subjects ability to distinguish between the simple stimuli increased as the frequency increased, a result consistent with the model predictions for the same stimuli. The model also predicted correctly that subjects would find the complex stimuli more difficult to distinguish than the simple ones and also that the discriminability of the complex stimuli would show no trend with frequency difference.
Article
The Tactile Internet will enable users to physically explore remote environments and to make their skills available across distances. An important technological aspect in this context is the acquisition, compression, transmission, and display of haptic information. In this paper, we present the fundamentals and state of the art in haptic codec design for the Tactile Internet. The discussion covers both kinesthetic data reduction and tactile signal compression approaches. We put a special focus on how limitations of the human haptic perception system can be exploited for efficient perceptual coding of kinesthetic and tactile information. Further aspects addressed in this paper are the multiplexing of audio and video with haptic information and the quality evaluation of haptic communication solutions. Finally, we describe the current status of the ongoing IEEE standardization activity P1918.1.1 which has the ambition to standardize the first set of codecs for kinesthetic and tactile information exchange across communication networks.