PreprintPDF Available

Take Me One More: Efficient Clustering Compression using Inter-Frame Encoding

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

This paper introduces a novel data encoding compression algorithm aimed at significantly reducing storage requirements for quick-capture devices, such as CCTV cameras and smartphones. By leveraging Huffman coding alongside Discrete Cosine Transform (DCT) techniques, the proposed algorithm offers an efficient solution for minimizing data file sizes. Unlike traditional image compression methods that treat each frame independently, our approach utilizes inter-frame encoding to exploit temporal redundancies between consecutive frames, achieving higher compression ratios while maintaining image quality. The algorithm's workflow, implementation details, and its application in dynamic cluster formation and memory optimization are discussed, alongside case studies demonstrating its effectiveness in various real-world scenarios.
Take Me One More: Efficient Clustering Compression using
Inter-Frame Encoding
Ignacio Brasca
February 20, 2024
Abstract
This paper introduces a novel data encoding compres-
sion algorithm aimed at significantly reducing stor-
age requirements for quick-capture devices, such as
CCTV cameras and smartphones. By leveraging
Huffman coding alongside Discrete Cosine Transform
(DCT) techniques, the proposed algorithm offers an
efficient solution for minimizing data file sizes. Un-
like traditional image compression methods that treat
each frame independently, our approach utilizes inter-
frame encoding to exploit temporal redundancies be-
tween consecutive frames, achieving higher compres-
sion ratios while maintaining image quality. The al-
gorithm’s workflow, implementation details, and its
application in dynamic cluster formation and mem-
ory optimization are discussed, alongside case studies
demonstrating its effectiveness in various real-world
scenarios.
Keywords: intraframe data encoding, compres-
sion algorithm, Huffman coding, Discrete Cosine
Transform (DCT), inter-frame encoding, dynamic
cluster formation, memory optimization, data com-
pression
1 Introduction
In recent times, there has been a surge in the need
for efficient data compression techniques due to the
rapid expansion of quick-capture devices like CCTV
cameras, smartphones, and other portable storage-
constrained devices. In response to this challenge,
there’s formulated in this essay a potential concept
for an intraframe data encoding compression algo-
rithm that allows reduction of storage requirements.
This proposed algorithm leverages the power of Huff-
man coding and Discrete Cosine Transform (DCT)
techniques, which are highly effective in minimizing
the size of data files.
Using Huffman coding as a form of entropy en-
coding technique that replaces a sequence of symbols
with shorter codes based on their probability distri-
bution we can save storage in benefit of redundancy
[7]. In contrast, DCT is a mathematical process [6]
that decomposes an image into a series of frequency
components and helps reduce the redundancy as well,
in pictures as well as video frames (pictures in a time-
series). Combining these two powerful techniques,
the proposed algorithm can offer significant storage
savings for quick-capture devices.
The method here proposed allow us to understand
not only one picture as a whole but an ecosystem of
similar pictures [5] based on a timestamp technique.
Enabling us the categorization based a on threshold
generated after a difference threshold computed us-
ing the inception frame. This technique has been in
used in the past for video compression [8], but not for
single images. This essay will provide an overview of
the inter-frame encoding algorithm, its workflow, im-
plementation details, and case studies to demonstrate
its effectiveness.
1
2 Background
Traditional image compression techniques, such as
JPEG and PNG, rely on compressing individual
frames independently. While these methods are effec-
tive for compressing individual images, they may not
be optimal for sequences of similar images when those
are being captured by real-time devices or smart
phones. inter-frame encoding help us to overpass this
limitation by exploiting the temporal redundancy be-
tween consecutive frames in a sequence. By storing
only the differences between frames, inter-frame en-
coding can achieve higher compression ratios while
preserving image quality.
2.1 Formal Definitions
1. Inception Frame (I):The first frame in a se-
quence of images.
2. Beam (beamn,m):A data point in a frame that
can be modified using a set of operations.
3. Difference Threshold (N):The maximum
difference allowed between consecutive frames to
be considered similar.
4. Cluster Frames (K):A group of similar im-
ages within the difference threshold.
2.2 Sequence Compression Against a
Set of Frames
To introduce the underlying techniques, we can define
a set of frames Kas a set of frames that are part of a
sequence of images. Ideally, a set of images computed
with this technique will use a limitation of Nframes,
where Nis the amount of frames we want to compute
the difference against in a timeframe t.
For a set of frames as
K={X1, X2, X3}
As soon as we cluster them down, we can define a set
of expressions we can perform at the frame level as
Xk=XnXm
whereas we can also combine these to produce a new
Xkframe based on the statement generated after the
application of expressions.
Operations can be computed at runtime, where
a new piece of information appears in the set of K
frames, and this will be included in the set of new
data points K.
A data point part of Kwill be called a beam and
will be part of a set of beamsn,m where n, m are the
dimensions restrictions of the matrix containing all
the beams available in the source of information pre-
sented as a frame in K.
2.3 Restrictions
There are a few restrictions we need to take into ac-
count when computing the difference between frames
in a sequence of images:
1. We need to compute always the same amount
of information in a finite combination of RGB
values; this means each frame should contain at
least max(Xn) pixels, where Xnis the frame
with the highest amount of color depth (infor-
mation per pixel).
2. len(K) should always be higher than 1 where
Kis the set of frames we want to compute the
difference against.
3. XnXmshould always output a valid matrix as
a result.
Operations available at the beam level are placed
uttermost in the front of any pixel transformation
technique available, always using the inter-frame per-
spective of the information account.
2
In case multiple beams xi,j would be intertwined,
we can define a net set of operations to establish what
or which part of the frames we want to modify during
which amount of time.
2.4 Inception frame
Currently, compression against a set of frames is be-
ing computed against a linear relationship where K
frames are only compressed after Ntimes during a t
span. Presented here is the utilization of a global dif-
ference matrix called Kglobal where we track against
a set of values used on the same matrix as it was a
memory registry.
Starting from frame n, we can compute and utilize
the same amount of beamsn,m already stored from
the initial inception frame.
From an inception frame, we should always com-
pute the difference against the same amount of
frames, where nis the frame we want to compute
the difference against and then evaluate if the differ-
ence threshold is small enough to be considered part
of the same cluster.
Operations can be computed at runtime, against
a set of beamsn,m where n, m are the dimensions
restrictions of the matrix containing all the beams
available in the source of information presented as a
frame in K. This information is beneficial in order to
calculate not only the difference between frames but
also to compute inception frame to obtain the same
set of frames in a different time span t.
2.5 Reverting frames
In order to revert an operation , we can simply apply
the inverse operation 1to the frame Xkand obtain
the original frame Xnor Xm. This operation is only
possible if the original frame Xnor Xmis available
in the set of frames K.
3 Architecture
The proposed two-step architecture for processing
frames in a cluster of pictures is designed to efficiently
analyze a series of images while minimizing redun-
dancy and memory usage. The first part involves dy-
namically defining image clusters, where each cluster
consists of images with similar content up to a spec-
ified threshold. Images exceeding the threshold are
considered distinct and initiate a new cluster. The
second part focuses on optimizing memory and effi-
ciently matching differences between frames to avoid
processing redundant information.
The two steps in the proposed architecture can be
named as follows:
1. Dynamic Cluster Formation (DCF)
2. Memory Optimization and Inter-frame Matching
(MOIM)
3.1 Dynamic Cluster Formation
The technique here described (Dynamic Cluster For-
mation) or from now on: DCF, is an step that con-
tinuously checks for changes in the input stream of
images based on a initial timestamp. When a dif-
ference greater than a defined threshold is detected,
the current cluster is considered complete, and a new
cluster begins. This step ensures that similar im-
ages are grouped together, allowing for efficient inter-
frame encoding.
The DCF step can be implemented using a vari-
ety of techniques, such as threshold-based clustering,
k-means clustering, or hierarchical clustering. The
choice of clustering technique depends on the specific
requirements of the application and the nature of the
input data. For images, we recommend the usage
of a threshold-based clustering technique, where the
difference between consecutive frames is compared
against a user-defined threshold during a timeframe
t.
3
Reading input
Difference
>Thresh-
old?
Complete
Cluster
Start New
Cluster
Yes
No
Figure 1: Dynamic Cluster Formation (DCF) Step
Diagram
The DCF step is essential for efficiently grouping
similar images into clusters, which can then be pro-
cessed using the technique described in the next sec-
tion.
3.2 Memory Optimization and Inter-
frame Matching
The MOI M step aims to minimize redundant pro-
cessing and memory usage. It matches differences
between frames and avoids reprocessing frames with
similar information. This optimization step is cru-
cial for achieving high compression ratios and efficient
processing. (View Figure 2)
Proposed here is the utilization of a global differ-
ence matrix called Kglobal , where we track against a
set of values used on the same matrix as if it were a
memory registry. This matrix will be used to com-
pute the difference between frames and then evaluate
if the difference threshold is small enough to be con-
sidered part of the same cluster.
Huffman coding and Discrete Cosine Transform
(DCT) techniques can be applied to the differences
between frames to achieve high compression ratios.
This step also involves managing memory efficiently
to minimize the computational overhead of process-
ing differences (see Annex Afor more information).
In collaboration with the first step of this pipeline,
MOI M ensures that the encoding algorithm can
achieve high compression ratios while maintaining
image quality.
Input Frames
Compute Frame Difference
Kglobal Matrix Update
Apply Huffman Coding
Compressed Storage
Figure 2: Memory Optimization and Inter-frame
Matching (MOIM) Step Diagram
4 Clustering Encoding
Applying the inter-frame encoding algorithm works
by computing the difference between consecutive
frames inside a cluster in a time sequence t. Let Kt
represent the t-th frame in the sequence, and Kt1
represent the previous frame. The difference between
Ktand Kt1is computed as follows:
Kt=KtKt1
The difference Ktis then stored using techinques
expressed in the previous section along with any ad-
ditional information required for reconstruction.
The inter-frame encoding algorithm can be ap-
plied to a wide range of applications, including video
4
compression [1] [2] [4], medical imaging, and remote
sensing. By exploiting temporal redundancy be-
tween consecutive frames, inter-frame encoding can
achieve high compression ratios while maintaining
image quality. As you can see in Figure 3this process
is repeated for the entire sequence of Kframes.
Inception frame
Kn
Kn1
Kn2
Figure 3: Inter-frame Encoding Diagram
5 Performance Metrics
Performance rely on the reduction of redundancy and
the minimization of memory usage. The following
metrics can be used to evaluate the performance of
the encoding algorithm:
1. Compression Ratio: The ratio of the original file
size to the compressed file size.
2. Memory Usage: The amount of memory required
to store the compressed metadata and recon-
structed frames.
3. Processing Time: The time required to encode
and decode the differences.
4. Image Quality: The visual quality of the recon-
structed frames compared to the original frames.
5. Number of Clusters: The number of clusters
formed during the dynamic cluster formation
step.
6 Proposed Workflow
The workflow for inter-frame encoding involves sev-
eral steps:
1. Read base image (I0) from disk starting from the
cluster defined in DCF.
2. Iterate over the sequence of images until the end
of the cluster.
3. Compute the difference between each consecu-
tive frame using DCF.
4. Store the differences in a compressed metadata
format using MOIM.
5. Repeat the process for the entire sequence of im-
ages inside the storage utilized.
The following diagram illustrates the proposed
workflow for inter-frame encoding:
Read base image I0from disk
Iterate over image sequence
Compute frame difference with DCF
Store differences in MOIM
Repeat for image sequence
Figure 4: Proposed Workflow for Inter-frame Encod-
ing
5
7 Implementation
This section outlines the implementation of the pro-
posed algorithm, focusing on the pseudo-algorithm
for the theoretical approach.
7.1 Pseudo-Algorithm
The pseudo-algorithm describes the process of com-
pressing and decompressing image frames using inter-
frame encoding based on a temporal redundancy,
leveraging Huffman coding and DCT techniques to
improve singular compression ratios at image level
and relying on techniques described DCF and MOIF,
which work at storage level. The process is presented
in mathematical terms to illustrate the relationship
with the theory described in this document.
1. Initialization:
Set the inception frame I0as the reference
frame.
Initialize an empty set Sfor storing com-
pressed frame differences.
2. Forming Kusing DC F (Dynamic Cluster
Formation):
(a) Continuously form Kclusters based on the
difference threshold from a set of frames.
(b) Store clusters generated as part of a set K.
(c) Set the first frame K0as the reference frame
(also known as the inception frame I0)
(d) Implement M OI M step to minimize redun-
dant processing and memory usage (See A).
3. For each frame Kninside K:
(a) Compute the difference Kn=KnKn1.
(b) Apply DCT to Knto obtain frequency
components Fn.
(c) Encode Fnusing Huffman coding to pro-
duce a compressed representation Cn.
(d) Add Cnto the set S.
4. Decompression:
(a) Apply difference stored after MOIM To
obtain the difference between frames.
(b) From frames obtained after MOIM, apply
Huffman decoding to obtain Fn.
(c) Apply the inverse DCT to Fnto obtain
Kn.
(d) Reconstruct frame In= In+In1.
This section provides a framework for the practical
application of the proposed algorithm, detailing the
steps required to read, extract, and apply differences
between frames based on metadata, thus illustrating
the algorithm’s in a real-world scenario.
7.2 Implementation Details
Implementing inter-frame encoding requires careful
consideration of several factors, including file for-
mats, compression techniques, and computational
complexity. Various file formats, such as JSON or
binary formats, can be used to store image data and
metadata efficiently. Additionally, compression tech-
niques, such as run-length encoding or delta encod-
ing, can further reduce storage requirements. It is es-
sential to balance compression ratios with computa-
tional overhead to achieve optimal performance. Im-
plementation details like metadata formats and com-
pression techniques can be tailored to the specific re-
quirements of the application although here we rec-
ommend a single JSON file format to store the dif-
ferences between frames and metadata required for
reconstruction.
8 Case Studies
Encoding here presented has been widely used in
video compression standards such as MPEG and
6
H.264 [9] in the past, although here presented we
recommend the applicatiion of the same technique
to a cluster. These standards leverage inter-frame
encoding to achieve significant reductions in storage
size as a whole while maintaining high-quality im-
ages without being affected. Additionally, encoding
could have applications in medical imaging, surveil-
lance systems, and remote sensing, where storage ef-
ficiency is critical and images contain significant tem-
poral redundancy.
9 Conclusion
In conclusion, inter-frame coding offers a powerful
solution for compressing similar image sequences effi-
ciently since its first appearance. However, this tech-
nique has always been applied to video coding. By ex-
ploiting the temporal redundancy between consecu-
tive frames, the proposed algorithm can achieve high
compression ratios while maintaining the image qual-
ity individually. The dynamic group formation step
ensures that similar images are grouped together,
which enables efficient inter-frame coding. The mem-
ory optimization and inter-frame matching step min-
imizes redundant processing and memory usage, fur-
ther improving the efficiency of the compression algo-
rithm. The proposed algorithm has the potential to
significantly reduce the storage requirements of fast
capture devices, making it a valuable tool for a wide
range of applications that rely on the uniformity of
similar images presented in a time series.
References
[1] Belyaev, E. An efficient compressive sensed
video codec with inter-frame decoding and low-
complexity intra-frame encoding. Sensors 23, 3
(2023), 1368.
[2] Girod, B., Aaron, A. M., Rane, S., and
Rebollo-Monedero, D. Distributed video
coding. Proceedings of the IEEE 93, 1 (2005),
71–83.
[3] Huffman, D. A. A method for the construction
of minimum-redundancy codes. Proceedings of the
IRE 40, 9 (1952), 1098–1101.
[4] Koga, T., Iijima, Y., Iinuma, K., and Ishig-
uro, T. Statistical performance analysis of an
interframe encoder for broadcast television sig-
nals. IEEE Transactions on Communications 29,
12 (1981), 1868–1876.
[5] Lee, J.-D., Wan, S.-Y., Ma, C.-M., and Wu,
R.-F. Compressing sets of similar images us-
ing hybrid compression model. In Proceedings.
IEEE International Conference on Multimedia
and Expo (2002), vol. 1, IEEE, pp. 617–620.
[6] Narasimha, M., and Peterson, A. On the
computation of the discrete cosine transform.
IEEE Transactions on Communications 26, 6
(1978), 934–936.
[7] Van Leeuwen, J. On the construction of huff-
man trees. In ICALP (1976), pp. 382–410.
[8] Wang, Z., Chanda, D., Simon, S., and
Richter, T. Memory efficient lossless compres-
sion of image sequences with jpeg-ls and temporal
prediction. In 2012 Picture Coding Symposium
(2012), IEEE, pp. 305–308.
[9] Wiegand, T., Sullivan, G. J., Bjonte-
gaard, G., and Luthra, A. Overview of the
h. 264/avc video coding standard. IEEE Transac-
tions on circuits and systems for video technology
13, 7 (2003), 560–576.
7
A Annex A: Utilizing Huffman
Coding in MOIM
A.1 Introduction
Huffman coding [3], a popular method for loss-
less data compression, can significantly enhance the
MOIM step by reducing the amount of data required
to represent frame differences. This section explains
the process and provides a practical example of its
application.
A.2 Huffman Coding in MOIM
The MOIM step, crucial for minimizing redundant
data and optimizing memory usage, can benefit from
Huffman coding by encoding the frame differences
more compactly. Huffman coding achieves this by
assigning variable-length codes to input characters,
with shorter codes for more frequent characters.
A.2.1 Example
Consider a simplified scenario where we have com-
puted the differences between consecutive frames, re-
sulting in a series of values. Given the frame differ-
ences:
412442314
Frequency of each difference value:
Value Frequency
1 2
2 2
3 1
4 4
Applying Huffman coding to these differences:
Value Huffman Code
4 0
1 10
2 110
3 111
Thus, the encoded sequence using Huffman codes
would be:
0 10 110 0 0 110 111 10 0
This encoded sequence is significantly shorter than
the original representation, demonstrating how Huff-
man coding can compress data efficiently.
A.3 Conclusion
Incorporating Huffman coding into the MOIM step
allows for a significant reduction in data size by effi-
ciently encoding frame differences. This process not
only saves storage space but also accelerates data pro-
cessing and retrieval, making it a valuable technique
for optimizing inter-frame matching and memory us-
age in video and image compression systems.
8
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper is dedicated to video coding based on a compressive sensing (CS) framework. In CS, it is assumed that if a video sequence is sparse in some transform domain, then it could be reconstructed from a much lower number of samples (called measurements) than the Nyquist–Shannon theorem requires. Here, the performance of such a codec depends on how the measurements are acquired (or sensed) and compressed and how the video is reconstructed from the decoded measurements. Here, such a codec potentially could provide significantly faster encoding compared with traditional block-based intra-frame encoding via Motion JPEG (MJPEG), H.264/AVC or H.265/HEVC standards. However, existing video codecs based on CS are inferior to the traditional codecs in rate distortion performance, which makes them useless in practical scenarios. In this paper, we present a video codec based on CS called CS-JPEG. To the author’s knowledge, CS-JPEG is the first codec based on CS, combining fast encoding and high rate distortion results. Our performance evaluation shows that, compared with the optimized software implementations of MJPEG, H.264/AVC, and H.265/HEVC, the proposed CS-JPEG encoding is 2.2, 1.9, and 30.5 times faster, providing 2.33, 0.79, and 1.45 dB improvements in the peak signal-to-noise ratio, respectively. Therefore, it could be more attractive for video applications having critical limitations in computational resources or a battery lifetime of an upstreaming device.
Article
Full-text available
Databases applications are widely used nowadays. Among the "visual" data widely manipulated we find medical and satellite images. Applications using these types of data, produce a large amount of similar images. Thus a compression technique is useful to reduce transmission time and space storage. Lossless compression methods are necessary in such critical applications. Set Redundancy Compression (SRC) methods exploit the inter-image redundancy and achieve better results than individual image compression techniques when applied to sets of similar images. In this paper, we make a comparative study of SRC methods using standard archivers. We also propose a new SRC method based on Jiang Predictor.
Conference Paper
Full-text available
Article
In this paper, a lossless encoder for image sequences based on JPEG-LS defined for still images with temporal-extended prediction and context modeling is proposed. As embedded systems are one important field of application of the codec, on-line lossy reference frame compression is used to reduce the encoder's memory requirement. Variations of the pixel values in the reference frame due to lossy compression are acceptable since the predictor provides only estimations of the pixel values being encoded in the current frame. Larger variations decrease the final lossless compression performance of the encoder such that a trade-off between the memory requirement and the overall compression ratio is required. Different compression algorithms for the reference frame, including JPEG, JPEG 2000 and near-lossless JPEG-LS, and their impacts on the memory requirement and the overall lossless compression ratio have been studied. Experimental results show 9.6% or more gain in lossless compression ratio compared to applying the standard JPEG-LS frame-by-frame and 80% reduction in the encoder buffer size compared to storing the uncompressed reference frame.
Article
This paper desribes an objective evaluation for coding performance of an interframe encoder (NETEC-22H). Also described is the coding performance improvement by an adaptive bit sharing multiplexer (ABS-MUX) in which transmission bit rate is dynamically allocated to several channels. Measurements made for actual broadcast TV programs over a time of 36 h show that an SNR of higher than 50 dB unweighted is obtained by this coding equipment for 99 percent of the time for broadcast TV programs at the transmission bit rate of 30 Mbits/s and for 93 percent of the time at 20 Mbits/s. The residual 1 percent at 30 Mbits/s or 7 percent at 20 Mbits/s is transmitted with a slightly lower SNR. The picture quality difference between the 20 and 30 Mbit/s transmission is about 6 dB in SNR on the average. It is also shown that a three-channel ABS-MUX (20 Mbits/s per channel on the average) reduces probability of coarse quantization by a factor of 5-10 compared with the fixed bit rate transmission at 20 Mbits/s.
Conference Paper
A new compression scheme called the hybrid compression model (HCM) is proposed for compressing sets of similar images. The HCM employs the region growing technique to partition the median image of a set of similar images; and furthermore, it uses the centroid method to characterize the original image data. The differences between the predicted and the original image data are stored and encoded for later use. The efficacy of its application on progressive transmission of similar images over the networks is also studied. The experimental results on various images show that our method provides significant improvement in compression efficiency, ranging from 5.6% to 134.9% in comparison with traditional centroid methods.
Article
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
Article
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
Article
Distributed coding is a new paradigm for video compression, based on Slepian and Wolf's and Wyner and Ziv's information-theoretic results from the 1970s. This paper reviews the recent development of practical distributed video coding schemes. Wyner-Ziv coding, i.e., lossy compression with receiver side information, enables low-complexity video encoding where the bulk of the computation is shifted to the decoder. Since the interframe dependence of the video sequence is exploited only at the decoder, an intraframe encoder can be combined with an interframe decoder. The rate-distortion performance is superior to conventional intraframe coding, but there is still a gap relative to conventional motion-compensated interframe coding. Wyner-Ziv coding is naturally robust against transmission errors and can be used for joint source-channel coding. A Wyner-Ziv MPEG encoder that protects the video waveform rather than the compressed bit stream achieves graceful degradation under deteriorating channel conditions without a layered signal representation.
Fast Test Zone Search Algorithm for Interframe Encoding
  • H Choi
  • A A Trofimov
H. Ban Choi, A. A. Trofimov, "Fast Test Zone Search Algorithm for Interframe Encoding," 2017.