Content uploaded by Ignacio Brasca
Author content
All content in this area was uploaded by Ignacio Brasca on Apr 16, 2024
Content may be subject to copyright.
Take Me One More: Efficient Clustering Compression using
Inter-Frame Encoding
Ignacio Brasca
February 20, 2024
Abstract
This paper introduces a novel data encoding compres-
sion algorithm aimed at significantly reducing stor-
age requirements for quick-capture devices, such as
CCTV cameras and smartphones. By leveraging
Huffman coding alongside Discrete Cosine Transform
(DCT) techniques, the proposed algorithm offers an
efficient solution for minimizing data file sizes. Un-
like traditional image compression methods that treat
each frame independently, our approach utilizes inter-
frame encoding to exploit temporal redundancies be-
tween consecutive frames, achieving higher compres-
sion ratios while maintaining image quality. The al-
gorithm’s workflow, implementation details, and its
application in dynamic cluster formation and mem-
ory optimization are discussed, alongside case studies
demonstrating its effectiveness in various real-world
scenarios.
Keywords: intraframe data encoding, compres-
sion algorithm, Huffman coding, Discrete Cosine
Transform (DCT), inter-frame encoding, dynamic
cluster formation, memory optimization, data com-
pression
1 Introduction
In recent times, there has been a surge in the need
for efficient data compression techniques due to the
rapid expansion of quick-capture devices like CCTV
cameras, smartphones, and other portable storage-
constrained devices. In response to this challenge,
there’s formulated in this essay a potential concept
for an intraframe data encoding compression algo-
rithm that allows reduction of storage requirements.
This proposed algorithm leverages the power of Huff-
man coding and Discrete Cosine Transform (DCT)
techniques, which are highly effective in minimizing
the size of data files.
Using Huffman coding as a form of entropy en-
coding technique that replaces a sequence of symbols
with shorter codes based on their probability distri-
bution we can save storage in benefit of redundancy
[7]. In contrast, DCT is a mathematical process [6]
that decomposes an image into a series of frequency
components and helps reduce the redundancy as well,
in pictures as well as video frames (pictures in a time-
series). Combining these two powerful techniques,
the proposed algorithm can offer significant storage
savings for quick-capture devices.
The method here proposed allow us to understand
not only one picture as a whole but an ecosystem of
similar pictures [5] based on a timestamp technique.
Enabling us the categorization based a on threshold
generated after a difference threshold computed us-
ing the inception frame. This technique has been in
used in the past for video compression [8], but not for
single images. This essay will provide an overview of
the inter-frame encoding algorithm, its workflow, im-
plementation details, and case studies to demonstrate
its effectiveness.
1
2 Background
Traditional image compression techniques, such as
JPEG and PNG, rely on compressing individual
frames independently. While these methods are effec-
tive for compressing individual images, they may not
be optimal for sequences of similar images when those
are being captured by real-time devices or smart
phones. inter-frame encoding help us to overpass this
limitation by exploiting the temporal redundancy be-
tween consecutive frames in a sequence. By storing
only the differences between frames, inter-frame en-
coding can achieve higher compression ratios while
preserving image quality.
2.1 Formal Definitions
1. Inception Frame (I):The first frame in a se-
quence of images.
2. Beam (beamn,m):A data point in a frame that
can be modified using a set of operations.
3. Difference Threshold (N):The maximum
difference allowed between consecutive frames to
be considered similar.
4. Cluster Frames (K):A group of similar im-
ages within the difference threshold.
2.2 Sequence Compression Against a
Set of Frames
To introduce the underlying techniques, we can define
a set of frames Kas a set of frames that are part of a
sequence of images. Ideally, a set of images computed
with this technique will use a limitation of Nframes,
where Nis the amount of frames we want to compute
the difference against in a timeframe t.
For a set of frames as
K={X1, X2, X3}
As soon as we cluster them down, we can define a set
of expressions we can perform at the frame level as
Xk=Xn◦Xm
whereas we can also combine these to produce a new
Xkframe based on the statement generated after the
application of expressions.
Operations ◦can be computed at runtime, where
a new piece of information appears in the set of K
frames, and this will be included in the set of new
data points K.
A data point part of Kwill be called a beam and
will be part of a set of beamsn,m where n, m are the
dimensions restrictions of the matrix containing all
the beams available in the source of information pre-
sented as a frame in K.
2.3 Restrictions
There are a few restrictions we need to take into ac-
count when computing the difference between frames
in a sequence of images:
1. We need to compute always the same amount
of information in a finite combination of RGB
values; this means each frame should contain at
least max(Xn) pixels, where Xnis the frame
with the highest amount of color depth (infor-
mation per pixel).
2. len(K) should always be higher than 1 where
Kis the set of frames we want to compute the
difference against.
3. Xn◦Xmshould always output a valid matrix as
a result.
Operations available at the beam level are placed
uttermost in the front of any pixel transformation
technique available, always using the inter-frame per-
spective of the information account.
2
In case multiple beams xi,j would be intertwined,
we can define a net set of operations to establish what
or which part of the frames we want to modify during
which amount of time.
2.4 Inception frame
Currently, compression against a set of frames is be-
ing computed against a linear relationship where K
frames are only compressed after Ntimes during a t
span. Presented here is the utilization of a global dif-
ference matrix called Kglobal where we track against
a set of values used on the same matrix as it was a
memory registry.
Starting from frame n, we can compute and utilize
the same amount of beamsn,m already stored from
the initial inception frame.
From an inception frame, we should always com-
pute the difference against the same amount of
frames, where nis the frame we want to compute
the difference against and then evaluate if the differ-
ence threshold is small enough to be considered part
of the same cluster.
Operations ◦can be computed at runtime, against
a set of beamsn,m where n, m are the dimensions
restrictions of the matrix containing all the beams
available in the source of information presented as a
frame in K. This information is beneficial in order to
calculate not only the difference between frames but
also to compute inception frame to obtain the same
set of frames in a different time span t.
2.5 Reverting frames
In order to revert an operation ◦, we can simply apply
the inverse operation ◦−1to the frame Xkand obtain
the original frame Xnor Xm. This operation is only
possible if the original frame Xnor Xmis available
in the set of frames K.
3 Architecture
The proposed two-step architecture for processing
frames in a cluster of pictures is designed to efficiently
analyze a series of images while minimizing redun-
dancy and memory usage. The first part involves dy-
namically defining image clusters, where each cluster
consists of images with similar content up to a spec-
ified threshold. Images exceeding the threshold are
considered distinct and initiate a new cluster. The
second part focuses on optimizing memory and effi-
ciently matching differences between frames to avoid
processing redundant information.
The two steps in the proposed architecture can be
named as follows:
1. Dynamic Cluster Formation (DCF)
2. Memory Optimization and Inter-frame Matching
(MOIM)
3.1 Dynamic Cluster Formation
The technique here described (Dynamic Cluster For-
mation) or from now on: DCF, is an step that con-
tinuously checks for changes in the input stream of
images based on a initial timestamp. When a dif-
ference greater than a defined threshold is detected,
the current cluster is considered complete, and a new
cluster begins. This step ensures that similar im-
ages are grouped together, allowing for efficient inter-
frame encoding.
The DCF step can be implemented using a vari-
ety of techniques, such as threshold-based clustering,
k-means clustering, or hierarchical clustering. The
choice of clustering technique depends on the specific
requirements of the application and the nature of the
input data. For images, we recommend the usage
of a threshold-based clustering technique, where the
difference between consecutive frames is compared
against a user-defined threshold during a timeframe
t.
3
Reading input
Difference
>Thresh-
old?
Complete
Cluster
Start New
Cluster
Yes
No
Figure 1: Dynamic Cluster Formation (DCF) Step
Diagram
The DCF step is essential for efficiently grouping
similar images into clusters, which can then be pro-
cessed using the technique described in the next sec-
tion.
3.2 Memory Optimization and Inter-
frame Matching
The MOI M step aims to minimize redundant pro-
cessing and memory usage. It matches differences
between frames and avoids reprocessing frames with
similar information. This optimization step is cru-
cial for achieving high compression ratios and efficient
processing. (View Figure 2)
Proposed here is the utilization of a global differ-
ence matrix called Kglobal , where we track against a
set of values used on the same matrix as if it were a
memory registry. This matrix will be used to com-
pute the difference between frames and then evaluate
if the difference threshold is small enough to be con-
sidered part of the same cluster.
Huffman coding and Discrete Cosine Transform
(DCT) techniques can be applied to the differences
between frames to achieve high compression ratios.
This step also involves managing memory efficiently
to minimize the computational overhead of process-
ing differences (see Annex Afor more information).
In collaboration with the first step of this pipeline,
MOI M ensures that the encoding algorithm can
achieve high compression ratios while maintaining
image quality.
Input Frames
Compute Frame Difference
Kglobal Matrix Update
Apply Huffman Coding
Compressed Storage
Figure 2: Memory Optimization and Inter-frame
Matching (MOIM) Step Diagram
4 Clustering Encoding
Applying the inter-frame encoding algorithm works
by computing the difference between consecutive
frames inside a cluster in a time sequence t. Let Kt
represent the t-th frame in the sequence, and Kt−1
represent the previous frame. The difference between
Ktand Kt−1is computed as follows:
∆Kt=Kt−Kt−1
The difference ∆Ktis then stored using techinques
expressed in the previous section along with any ad-
ditional information required for reconstruction.
The inter-frame encoding algorithm can be ap-
plied to a wide range of applications, including video
4
compression [1] [2] [4], medical imaging, and remote
sensing. By exploiting temporal redundancy be-
tween consecutive frames, inter-frame encoding can
achieve high compression ratios while maintaining
image quality. As you can see in Figure 3this process
is repeated for the entire sequence of Kframes.
Inception frame
Kn
Kn−1
Kn−2
Figure 3: Inter-frame Encoding Diagram
5 Performance Metrics
Performance rely on the reduction of redundancy and
the minimization of memory usage. The following
metrics can be used to evaluate the performance of
the encoding algorithm:
1. Compression Ratio: The ratio of the original file
size to the compressed file size.
2. Memory Usage: The amount of memory required
to store the compressed metadata and recon-
structed frames.
3. Processing Time: The time required to encode
and decode the differences.
4. Image Quality: The visual quality of the recon-
structed frames compared to the original frames.
5. Number of Clusters: The number of clusters
formed during the dynamic cluster formation
step.
6 Proposed Workflow
The workflow for inter-frame encoding involves sev-
eral steps:
1. Read base image (I0) from disk starting from the
cluster defined in DCF.
2. Iterate over the sequence of images until the end
of the cluster.
3. Compute the difference between each consecu-
tive frame using DCF.
4. Store the differences in a compressed metadata
format using MOIM.
5. Repeat the process for the entire sequence of im-
ages inside the storage utilized.
The following diagram illustrates the proposed
workflow for inter-frame encoding:
Read base image I0from disk
Iterate over image sequence
Compute frame difference with DCF
Store differences in MOIM
Repeat for image sequence
Figure 4: Proposed Workflow for Inter-frame Encod-
ing
5
7 Implementation
This section outlines the implementation of the pro-
posed algorithm, focusing on the pseudo-algorithm
for the theoretical approach.
7.1 Pseudo-Algorithm
The pseudo-algorithm describes the process of com-
pressing and decompressing image frames using inter-
frame encoding based on a temporal redundancy,
leveraging Huffman coding and DCT techniques to
improve singular compression ratios at image level
and relying on techniques described DCF and MOIF,
which work at storage level. The process is presented
in mathematical terms to illustrate the relationship
with the theory described in this document.
1. Initialization:
Set the inception frame I0as the reference
frame.
Initialize an empty set Sfor storing com-
pressed frame differences.
2. Forming Kusing DC F (Dynamic Cluster
Formation):
(a) Continuously form Kclusters based on the
difference threshold from a set of frames.
(b) Store clusters generated as part of a set K.
(c) Set the first frame K0as the reference frame
(also known as the inception frame I0)
(d) Implement M OI M step to minimize redun-
dant processing and memory usage (See A).
3. For each frame Kninside K:
(a) Compute the difference ∆Kn=Kn−Kn−1.
(b) Apply DCT to ∆Knto obtain frequency
components Fn.
(c) Encode Fnusing Huffman coding to pro-
duce a compressed representation Cn.
(d) Add Cnto the set S.
4. Decompression:
(a) Apply difference stored after MOIM To
obtain the difference between frames.
(b) From frames obtained after MOIM, apply
Huffman decoding to obtain Fn.
(c) Apply the inverse DCT to Fnto obtain
∆Kn.
(d) Reconstruct frame In= ∆In+In−1.
This section provides a framework for the practical
application of the proposed algorithm, detailing the
steps required to read, extract, and apply differences
between frames based on metadata, thus illustrating
the algorithm’s in a real-world scenario.
7.2 Implementation Details
Implementing inter-frame encoding requires careful
consideration of several factors, including file for-
mats, compression techniques, and computational
complexity. Various file formats, such as JSON or
binary formats, can be used to store image data and
metadata efficiently. Additionally, compression tech-
niques, such as run-length encoding or delta encod-
ing, can further reduce storage requirements. It is es-
sential to balance compression ratios with computa-
tional overhead to achieve optimal performance. Im-
plementation details like metadata formats and com-
pression techniques can be tailored to the specific re-
quirements of the application although here we rec-
ommend a single JSON file format to store the dif-
ferences between frames and metadata required for
reconstruction.
8 Case Studies
Encoding here presented has been widely used in
video compression standards such as MPEG and
6
H.264 [9] in the past, although here presented we
recommend the applicatiion of the same technique
to a cluster. These standards leverage inter-frame
encoding to achieve significant reductions in storage
size as a whole while maintaining high-quality im-
ages without being affected. Additionally, encoding
could have applications in medical imaging, surveil-
lance systems, and remote sensing, where storage ef-
ficiency is critical and images contain significant tem-
poral redundancy.
9 Conclusion
In conclusion, inter-frame coding offers a powerful
solution for compressing similar image sequences effi-
ciently since its first appearance. However, this tech-
nique has always been applied to video coding. By ex-
ploiting the temporal redundancy between consecu-
tive frames, the proposed algorithm can achieve high
compression ratios while maintaining the image qual-
ity individually. The dynamic group formation step
ensures that similar images are grouped together,
which enables efficient inter-frame coding. The mem-
ory optimization and inter-frame matching step min-
imizes redundant processing and memory usage, fur-
ther improving the efficiency of the compression algo-
rithm. The proposed algorithm has the potential to
significantly reduce the storage requirements of fast
capture devices, making it a valuable tool for a wide
range of applications that rely on the uniformity of
similar images presented in a time series.
References
[1] Belyaev, E. An efficient compressive sensed
video codec with inter-frame decoding and low-
complexity intra-frame encoding. Sensors 23, 3
(2023), 1368.
[2] Girod, B., Aaron, A. M., Rane, S., and
Rebollo-Monedero, D. Distributed video
coding. Proceedings of the IEEE 93, 1 (2005),
71–83.
[3] Huffman, D. A. A method for the construction
of minimum-redundancy codes. Proceedings of the
IRE 40, 9 (1952), 1098–1101.
[4] Koga, T., Iijima, Y., Iinuma, K., and Ishig-
uro, T. Statistical performance analysis of an
interframe encoder for broadcast television sig-
nals. IEEE Transactions on Communications 29,
12 (1981), 1868–1876.
[5] Lee, J.-D., Wan, S.-Y., Ma, C.-M., and Wu,
R.-F. Compressing sets of similar images us-
ing hybrid compression model. In Proceedings.
IEEE International Conference on Multimedia
and Expo (2002), vol. 1, IEEE, pp. 617–620.
[6] Narasimha, M., and Peterson, A. On the
computation of the discrete cosine transform.
IEEE Transactions on Communications 26, 6
(1978), 934–936.
[7] Van Leeuwen, J. On the construction of huff-
man trees. In ICALP (1976), pp. 382–410.
[8] Wang, Z., Chanda, D., Simon, S., and
Richter, T. Memory efficient lossless compres-
sion of image sequences with jpeg-ls and temporal
prediction. In 2012 Picture Coding Symposium
(2012), IEEE, pp. 305–308.
[9] Wiegand, T., Sullivan, G. J., Bjonte-
gaard, G., and Luthra, A. Overview of the
h. 264/avc video coding standard. IEEE Transac-
tions on circuits and systems for video technology
13, 7 (2003), 560–576.
7
A Annex A: Utilizing Huffman
Coding in MOIM
A.1 Introduction
Huffman coding [3], a popular method for loss-
less data compression, can significantly enhance the
MOIM step by reducing the amount of data required
to represent frame differences. This section explains
the process and provides a practical example of its
application.
A.2 Huffman Coding in MOIM
The MOIM step, crucial for minimizing redundant
data and optimizing memory usage, can benefit from
Huffman coding by encoding the frame differences
more compactly. Huffman coding achieves this by
assigning variable-length codes to input characters,
with shorter codes for more frequent characters.
A.2.1 Example
Consider a simplified scenario where we have com-
puted the differences between consecutive frames, re-
sulting in a series of values. Given the frame differ-
ences:
412442314
Frequency of each difference value:
Value Frequency
1 2
2 2
3 1
4 4
Applying Huffman coding to these differences:
Value Huffman Code
4 0
1 10
2 110
3 111
Thus, the encoded sequence using Huffman codes
would be:
0 10 110 0 0 110 111 10 0
This encoded sequence is significantly shorter than
the original representation, demonstrating how Huff-
man coding can compress data efficiently.
A.3 Conclusion
Incorporating Huffman coding into the MOIM step
allows for a significant reduction in data size by effi-
ciently encoding frame differences. This process not
only saves storage space but also accelerates data pro-
cessing and retrieval, making it a valuable technique
for optimizing inter-frame matching and memory us-
age in video and image compression systems.
8