Conference PaperPDF Available

COMPUTATIONAL INTELLIGENCE FOR MOVIE EDITING

Authors:
  • International College of Business & Technology Sri Lanka

Abstract and Figures

Filmmaking is a multi-billion dollar industry in the modern world and the movie-making process can be subdivided into pre-production, production, and post-production. Post-production or popularly known as editing is a tedious process that involves a level of technical and artistic capability. The techniques that are still in use are decades old which involves the use of both a machine and manpower. In this study, authors have attempted to study the applicability of Artificial Intelligence for film editing. The study was conducted on identifying different methods of film editing and selected editing based on the shot size and emotion classification. A formula has been derived using a shot size character size ratio and based on it dataset has been developed. Develop application using CNN is able to identify the character emotions and shot type with higher accuracy, therefore, the study outcome suggests that AI can be successfully used in film editing.
Content may be subject to copyright.
COMPUTATIONAL INTELLIGENCE FOR MOVIE EDITING
Dharmasiri J.M.S.R 1, Herath K.M.G.K. 2
1Cardiff School of Technologies, Cardiff Metropolitan University, UK
2Department of Information Technology, Faculty of Information technology and Sciences,
International College of Business and Technology, Sri Lanka
shenalgtr@gmail.com
Abstract
Filmmaking is a multi-billion dollar industry in the modern world and the movie-making
process can be subdivided into pre-production, production, and post-production. Post-
production or popularly known as editing is a tedious process that involves a level of technical
and artistic capability. The techniques that are still in use are decades old which involves the
use of both a machine and manpower. In this study, authors have attempted to study the
applicability of Artificial Intelligence for film editing. The study was conducted on identifying
different methods of film editing and selected editing based on the shot size and emotion
classification. A formula has been derived using a shot size character size ratio and based on it
dataset has been developed. Develop application using CNN is able to identify the character
emotions and shot type with higher accuracy, therefore, the study outcome suggests that AI can
be successfully used in film editing.
Key Words: Key CNN, Shot type, Film editing
Introductıon
Filmmaking is a multi-billion dollar industry in the modern world. Since it’s humble
beginnings in the late 19th century using the “kinescope” (a box like structure that showed
moving pictures), it has evolved to be a powerful medium of communication, propaganda, and
entertainment (silentcinemasociety, 2015).
The post-production stage in moviemaking takes up a longer duration of time, even than the
production stage of the movie. The creation of a typical mainstream Hollywood movie
generally ranges from 1-2 years of time (Follows, 2018) where nearly one year is allocated for
the post-production stage. The Filmmakers have an arsenal of tools at their disposal to portray
a story. With that, they have the ability to tell a story in an interesting fashion, without having
to portray the story directly for the whole duration of the show, like in a Stage Drama. They
achieve this by the use of different shots that are arranged in a careful pattern (Pruitt, 2017).
As a Hollywood movie utilizes several thousands of shots during the shooting (Follows, 2018),
cutting and joining them together can take ages as there are several factors that needs to be
considered when joining two shots like the actor continuity, the dialogues, type of scene
(whether emotional or an action scene etc), ASL (average shot length) along with a plethora of
other variables (Mcgregor, 2017).
Artificial intelligence has been used in many domains successfully and its footmarks are now
visible in movie editing as well. IBM Watson (Smith, 2016)was able to create a movie trailer
from a full movie with the use of AI and also (IntelligentHQ, 2019) (Brown, 2019) suggest that
features of AI are now embedded into modern industry editing tools, especially on automatic
corrections of images and adding effects. The main aim of this study is to investigate the
possibility of AI to assist the editor to select suitable shot types according to the script. The
study is focused on identifying the shots and categories them, so the editors will able to select
from the category and use it according to the movie script. Shots can be mainly categories
according to shot size and camera angle. In this study only shot size was selected due to
complexity identifying the exact camera angle from an image. A shot can be categories
according to the shot size as Extreme Long Shot ELS, Long Shot / Wide Shot LS / WS, Full
Shot FS, Medium Long shot /Medium Wide Shot MLS / MWS, Cowboy Shot CS,
Medium Shot MS, Medium Close Up MCU, Close Up CU, Extreme Close Up ECU
(studiobinder, 2019) and also to categories shot based on the facial emotions.
Methodology
In the movie editing process each editor has their own personal traits and beliefs. Therefore
when selecting the sample movie shots author has requested six well-known movie editors in
Sri Lanka to provide samples for each type of shot to avoid the personal attributes of a one.
The dataset consists of 653 images from selected shot types and 600 images were used for the
taring and rest were used for cross-validation tests. After analyzing the sample shots, it was
evident that the frame height has variations depending on the movie types as well. Based on
the findings, for the shots types’ classification below mention, the algorithm is proposed.
Instead of calculating the actual face size, the height in which the face is displayed in the frame
is calculated using the pixels.
Character Face Height = CH
Frame Height = FH
CH to FH Ratio = CH / FH
Using the developed algorithm, the calculation is performed to find the ratio values for each
shot type. Below is a sample ratio calculation done for shot type Cowboy.
Face Co-Ordinates (px)
X1 = 285 Y1 = 103
X2 = 263 Y2 = 252
Frame Height (FH) = 1080px
Character Face Height (CH) = (Y2 Y1)
= 252 103 = 149px
CH
FH
Figure Error! No text of specified
style in document.1: Frame to
character ratio
Figure.1 Cowboy shot
CH to FH Ratio = 149
768
= 0.194
The following is a summary of the obtained values.
By performing calculations on images of the same shot type, a minimum, and
maximum range were identified.
The obtained ranges were then corrected in accordance with the ranges of the
neighboring shot type.
Table 1Shot type range results
Shot Type
Obtained Range
Extreme
Long Shot
0.004<R<0.021
Long Shot
0.055<R<0.638
Full Shot
0.12<R<0.128
Medium
Long Shot
0.114<R<0.132
Cowboy Shot
0.1398<R<0.194
Medium Shot
0.291<R<0.376
Medium
Close Up
0.400<R<0.562
Close Up
0.598<R<0.634
Extreme
Close Up
0.82<R<0869
As shown in Table 1 for the full and medium long shot it was unable to identify accurate range
values. Underline reason for the unclear range is that a shot can be called as a “full shot” on
the contrary some will call it as a “medium long shot” there for both shots can be classified
into one group.
As the majority of the project deals with image processing, the Creation of this the software
was done through Python 3.6 version owing to the plethora of image processing libraries
available for python.
OpenCV 2 was used to detect the faces from the web camera input using the HAAR Cascade
classifier (P. Viola, 2001).In the cascade classifier, all the face features are not considered at
once instead the features are grouped into different stages of classifiers and applied one-by-one
Several dependencies were also used for the process Keras 2.0,Tensorflow 1.1.0, Pandas
0.19.1, Numpy 1.12.1,H5py 2.7.0, Statisticsopencv-python 3.2.0. From the inputted image
frame application is identifying the character face. Identification of the face is done using
Convolutional Neural network (CNN) developed based on the model “Real-time Convolutional
Neural Networks for emotion and gender classification” (Arriaga,et al., 2013) and CNN was
trained using FER-2013 Dataset which contained over35000 images and IMDB gender
Database which contained 460,723 images. The detected face then used for calculating the
height of the face and frame height in pixels. Then it will be pass on to the ratio algorithm and
calculate the ratio. Then the ratio will be passed on to the neural network for the classification
of shot types.
Results
For the evaluation of the application Confusion matrix was used this was done with both
training dataset and cross-validation set. Below shown testing to identify shot types with 50
attempts for each shot type.
Table.2Confusion matrix for shot types identification
ELS
LS
F S
M L S
CS
MS
MCU
CU
ECU
ELS
82%
8%
10%
0
0
0
0
0
0
LS
4%
86%
6%
4%
0
0
0
0
0
FS
0
0
78%
22%
0
0
0
0
0
MLS
0
0
22%
70%
8%
0
0
0
0
CS
0
0
0
0
96%
4%
0
0
0
MS
0
0
0
0
0
96%
2%
2%
0
MCU
0
0
0
0
0
2%
96%
2%
0
CU
0
0
0
0
0
0
2%
98%
0
ECU
0
0
0
0
0
0
0
4%
96%
After concluding the initial testing, classification accuracy test was conducted with 50 test
cases that were not used in the initial training and has a mean average of 90% for correct
classification.
Discussion
Based on the findings, it is evident that the application was very successful for the medium and
close-up shot classifications but the results of long shots were not that significant for an AI
accuracy level. The reason for this might have been the level of clarity of character face in a
larger frame. Also, full shot and long shots result accuracy are low but as discussed in the early
segment these two groups have some ambiguity on differentiation. Therefore can suggest that
two segments need to be considered as one group and if so identification rate will be above
97%.
Conclusions and Recommendation
This research project dealt with understanding the creative task of video editing and how it can
be automated using an Artificial Intelligence. The algorithm derived for classification of the
shot type shown a higher level of effectiveness. Based on the results we argue that both those
full shots and long shots should be categorized as one unless other distinguishing characteristics
can be identified. The application developed can be used with a very minor modification to a
live TV show editing since in live TV it is all about using shot types other than any editing
techniques. Even Though this software is not close to being a fully functional software that can
virtually replace the responsibilities of an editor, it provided insight on how AI can be used in
film editing in the future. The algorithm can further tested with a larger dataset to improve the
accuracy and also by considering other parameters like camera angle can further advance it.
References
FMF Resources, 2015. http://www.filmmakersfans.com. [Online]
Available at: http://www.filmmakersfans.com/film-editing-tips-jump-cut-cut-in-and-
cutaways/
[Accessed 14 March 2019].
Arriaga, O., Ploger, P. G. & Valdenegro, M., 2013. Real-time Convolutional Neural
Networks for Emotion and Gender Classification, Germany: s.n.
silentcinemasociety, 2015. silentcinemasociety.org. [Online]
Available at: http://www.silentcinemasociety.org/category/a-trip-to-the-moon/
[Accessed 14 March 2019].
studiobinder, 2019. studiobinder.com. [Online]
Available at: https://www.studiobinder.com/blog/kuleshov-effect-examples/
[Accessed 16 March 2019].
Arriaga, O., Ploger, P. G. & Valdenegro, M., 2013. Real-time Convolutional Neural
Networks for Emotion and Gender Classification, Germany: s.n.
Brown, L., 2019. 6 Best AI Video Editing Software and Service (Black Magic). [Online]
Available at: https://filmora.wondershare.com/business/ai-video-editing.html
[Accessed 1 June 2019].
Follows, S., 2018. https://stephenfollows.com. [Online]
Available at: https://stephenfollows.com/how-long-the-average-hollywood-movie-take-to-
make/
[Accessed 19 March 2019].
IntelligentHQ, 2019. How Artificial Intelligence is Transforming Video Editing. [Online]
Available at: https://www.intelligenthq.com/artificial-intelligence-transforming-video-
editing/
[Accessed 15 March 2019].
Mcgregor, E., 2017. https://www.premiumbeat.com/. [Online]
Available at: https://www.premiumbeat.com/blog/filmmakers-guide-establishing-shot/
[Accessed 20 March 2019].
P. Viola, M., 2001. Rapid Object Detection using a Boosted Cascade of Simple. s.l.,
CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION.
Pruitt, S., 2017. https://www.history.com. [Online]
Available at: https://www.history.com/news/the-lumiere-brothers-pioneers-of-cinema
[Accessed 12 March 2019].
Smith, J. R., 2016. IBM Research Takes Watson to Hollywood with the First “Cognitive
Movie Trailer. [Online]
Available at: https://www.ibm.com/blogs/think/2016/08/cognitive-movie-trailer/
[Accessed 26 March 2019].
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
Article
In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository.
Best AI Video Editing Software and Service (Black Magic)
  • L Brown
Brown, L., 2019. 6 Best AI Video Editing Software and Service (Black Magic). [Online] Available at: https://filmora.wondershare.com/business/ai-video-editing.html [Accessed 1 June 2019].
How Artificial Intelligence is Transforming Video Editing
  • Intelligenthq
IntelligentHQ, 2019. How Artificial Intelligence is Transforming Video Editing. [Online] Available at: https://www.intelligenthq.com/artificial-intelligence-transforming-videoediting/ [Accessed 15 March 2019].
IBM Research Takes Watson to Hollywood with the First "Cognitive Movie Trailer
  • J R Smith
Smith, J. R., 2016. IBM Research Takes Watson to Hollywood with the First "Cognitive Movie Trailer. [Online]