Figure 2 - uploaded by Quentin Galvane
Content may be subject to copyright.
2: Examples of eight different shot sizes: Extreme Close-up, Close-up, Medium Closeup, Medium shot, Medium Full shot, Full shot, Long shot, Extreme Long shot 

2: Examples of eight different shot sizes: Extreme Close-up, Close-up, Medium Closeup, Medium shot, Medium Full shot, Full shot, Long shot, Extreme Long shot 

Source publication
Thesis
Full-text available
The wide availability of high-resolution 3D models and the facility to create new geometrical and animated content, using low-cost input devices, opens to many the possibility of becoming digital 3D storytellers. To date there is however a clear lack of accessible tools to easily create the cinematography (positioning and moving the cameras to crea...

Citations

... There are few studies that specifically address this topic in video games, especially those that work with virtual scenes in real time. An example of this is the work conducted by Galvane [32], where he proposes a system based on the Reynolds steering behavior model to control and coordinate a collection of autonomous camera agents that move in dynamic 3D environments with the objective of filming events of multiple scales. In addition, Galvane proposes an approach based on the importance of cinematographic reproduction in games, taking advantage of the narrative and geometric information to automatically calculate the trajectories and the planning of the camera in interactive time. ...
Article
Full-text available
In this paper, we present a novel approach for the optimal camera selection in video games. The new approach explores the use of information theoretic metrics f-divergences, to measure the correlation between the objects as viewed in camera frustum and the ideal or target view. The f-divergences considered are the Kullback–Leibler divergence or relative entropy, the total variation and the χ2 divergence. Shannon entropy is also used for comparison purposes. The visibility is measured using the differential form factors from the camera to objects and is computed by casting rays with importance sampling Monte Carlo. Our method allows a very fast dynamic selection of the best viewpoints, which can take into account changes in the scene, in the ideal or target view, and in the objectives of the game. Our prototype is implemented in Unity engine, and our results show an efficient selection of the camera and an improved visual quality. The most discriminating results are obtained with the use of Kullback–Leibler divergence.
... Automated camera management systems have already been proposed at the beginning of the current century, see Rui, He, Gupta, and Liu (2001). Galvane's PhD thesis offers a comprehensive background overview on literature concerning camera steering and path planning, see Galvane (2015). ...
... • declarative representations of production grammar, for example Christianson et al. (1996), • several particular machine learning approaches, for example Pardo, Caba, Alcázar, Thabet, and Ghanem (2021) which is looking at cuts, or a cascaded deep-learning architecture presented in H. Jiang et al. (2020), or deep reinforcement learning applied in Gschwindt et al. (2019), or a neural network in Okuda et al. (2009), • Random Forests for classification and prediction (J. Chen et al., 2018;Fujisawa et al., 2016), • Hidden Markov Models (HMMs) (Leake et al., 2017;Merabti et al., 2014Merabti et al., , 2015, • Semi-Markov models (Galvane, 2015), • Bayesian Networks (Lavee, Rivlin, & Rudzsky, 2009), • support vector machines (SVMs) (de Lima et al., 2009a), • graphical models ), • ontology-based (Lefevre et al., 2021;Ramírez, 2005), • (hierarchically organised) finite state machines , • recurrent decision trees (J. , • constraint satisfaction techniques (Janzen, Horsch, & Neufeld, 2011), • and many more. ...
... This planner is only one part of a complex decision making pipeline, as they also developed a "narrative planner that, together with a bank of autonomous character directors, creates cinematic goals for a constraint-based realtime 3D virtual cinematography planner" (Bares, Grégoire, & Lester, 1998). Galvane (2015) also explores the space of automatic cinematography in virtual environments. He experiments with narrative-driven camera control and Semi-Markov models for editing. ...
Thesis
Full-text available
For recorded video content, researchers have proposed advanced concepts and approaches that enable the automatic composition and personalised presentation of coherent videos. This is typically achieved by selecting from a repository of individual video clips and concatenating a new sequence of clips based on some kind of model. However, there is a lack of generic concepts dedicatedly enabling such video mixing functionality for scenarios based on live video streams. This thesis aims to address this gap and explores how a live vision mixing process could be automated in the context of live television production, and, consequently, also extended to other application scenarios. This approach is coined the 'Virtual Director' concept. The name of the concept is inspired by the decision making processes which human broadcast TV directors are conducting when vision mixing live video streams stemming from multiple cameras. Understanding what is currently happening in the scene, they decide which camera view to show, at what point in time to switch to a different perspective, and how to adhere to cinematographic and cinematic paradigms while doing so. While the automation of vision mixing is the focus of this thesis, it is not the ultimate goal of the underlying vision. To automate for many viewers in parallel in a scalable manner allows taking decisions for each viewer or groups of viewers individually. To successfully do so allows moving away from a broadcast model where every viewer gets to see the same output. Particular content adaptation and personalisation features may provide added value for users. Preferences can be expressed dynamically, enabling interactive media experiences. In the course of this thesis, Virtual Director research prototypes are developed for three distinct application domains. Firstly, for distributed theatre performance, a script-based approach and a set of software tools are designed. A basic approach for the decision making process and a pattern how to decouple it into two core components are proposed. A trial validates the technology which does not implement full automation, yet successfully enables a theatre play. The second application scenario is live event 'narrowcast', a term used to denote the personalised equivalent to a 'broadcast'. In the context of this scenario, several computational approaches are considered for the implementation of an automatic Virtual Director with the conclusion to use and recommend a combination of (complex) event processing engines and event-condition-action (ECA) rules to model the decision making behaviour. Several content genres are subject to experimentation. Evaluation interviews provide detailed feedback on the specific research prototypes as well as the Virtual Director concept in general. In the third application scenario, group video communication, the most mature decision making behaviour is achieved. This behaviour needs to be defined in what can be a challenging process and is formalised in a model that is referred to as the 'production grammar'. The aforementioned pattern is realised such that a 'Semantic Lifting' process is processing low-level cue information in order to derive in more abstract, higher-level terms what is currently happening in the scene. The output of the Semantic Lifting process is informing and triggering the second process which is called the 'Director' decision making and eventually takes decisions on how to present the available content on screens. Overall, the exploratory research on the Virtual Director concept resulted in its successful application in the three domains, validated by stakeholder feedback and a range of informal and formal evaluation efforts. As a synthesis of the research in the three application scenarios, the thesis includes a detailed description of the Virtual Director concept. This description is contextualised by many detailed learnings that are considered relevant for both scholars and practitioners regarding the development of such technology.
... Automatic camera control has been studied extensively in computer graphics to provide users with an effective view of a virtual environment (see [1] and [2] for surveys). Many of these existing methods rely on a fully unconstrained camera and a complete understanding of the virtual scene. ...
... The drone position is captured used a Optitrack motion capture system and a proportional-derivative (PD) controller is used to command the drone velocity through a python wrapper based on the official Tello SDK. 2 The different components of the system communicate using ROS [36]. The environment is mapped from the Kinect's point cloud using RTAB-Map [37] and is acquired by running an automated routine that controls the robot-mounted camera to create an environment map. ...
Preprint
Drones can provide a minimally-constrained adapting camera view to support robot telemanipulation. Furthermore, the drone view can be automated to reduce the burden on the operator during teleoperation. However, existing approaches do not focus on two important aspects of using a drone as an automated view provider. The first is how the drone should select from a range of quality viewpoints within the workspace (e.g., opposite sides of an object). The second is how to compensate for unavoidable drone pose uncertainty in determining the viewpoint. In this paper, we provide a nonlinear optimization method that yields effective and adaptive drone viewpoints for telemanipulation with an articulated manipulator. Our first key idea is to use sparse human-in-the-loop input to toggle between multiple automatically-generated drone viewpoints. Our second key idea is to introduce optimization objectives that maintain a view of the manipulator while considering drone uncertainty and the impact on viewpoint occlusion and environment collisions. We provide an instantiation of our drone viewpoint method within a drone-manipulator remote teleoperation system. Finally, we provide an initial validation of our method in tasks where we complete common household and industrial manipulations.
... A related aspect is that most of the above studies use cameras either mounted on the remote robot and/or fixed in the scene for a thirdperson view, which can be limited due to occlusions. To overcome this, Rakita et al. [44] propose the idea of multiple camera viewpoints in teleoperation using multiple cameras, drawing inspiration from the computer graphics domain, where moving the user viewpoint in a virtual scene is very useful in applications such as cinematic replays and peopletracking in crowd simulations [11,17]. Our approach in this article is to integrate the native teleporting feature of VR to allow multiple viewpoints to the operator during telemanipulation. ...
Article
Full-text available
Intuitive interaction is the cornerstone of accurate and effective performance in remote robotic teleoperation. It requires high-fidelity in control actions as well as perception (vision, haptic, and other sensory feedback) of the remote environment. This paper presents Vicarios , a Virtual Reality (VR) based interface with the aim of facilitating intuitive real-time remote teleoperation, while utilizing the inherent benefits of VR, including immersive visualization, freedom of user viewpoint selection, and fluidity of interaction through natural action interfaces. Vicarios aims to enhance the situational awareness, using the concept of viewpoint-independent mapping between the operator and the remote scene, thereby giving the operator better control in the perception-action loop. The article describes the overall system of Vicarios , with its software, hardware, and communication framework. A comparative user study quantifies the impact of the interface and its features, including immersion and instantaneous user viewpoint changes, termed “ teleporting ”, on users’ performance. The results show that users’ performance with the VR-based interface was either similar to or better than the baseline condition of traditional stereo video feedback, approving the realistic nature of the Vicarios interface. Furthermore, including the teleporting feature in VR significantly improved participants’ performance and their appreciation for it, which was evident in the post-questionnaire results. Vicarios capitalizes on the intuitiveness and flexibility of VR to improve accuracy in remote teleoperation.
... In data visualization, many works have considered how to choose viewpoints to best enable viewers to see a data set [e.g., 22,37,38], but again, these methods require complete geometry. Galvane [16] reviews many approaches that automatically move a camera around in a virtual scene to achieve various goals. Our work draws on this work on automatic camera control, as we dynamically move a camera in our environment to improve the visibility of remote telemanipulation. ...
... Our visualocclusion-avoidance method di ers from such approaches as these methods can leverage full geometric understanding of the environment and can produce free-ying camera motions in the virtual scene. Galvane [13] reviews many approaches that automatically move a camera around in a virtual scene for applications such as cinematic replays and tracking people in crowd simulations. Our method draws on this work, as we dynamically move a camera in the workspace to improve the visibility of remote teleoperation. ...
Conference Paper
Full-text available
In this paper, we present a method that improves the ability of remote users to teleoperate amanipulation robot arm by continuously providing them with an effective viewpoint using a secondcamera-in-hand robot arm. The user controls the manipulation robot usinganyteleoperation interface, and the camera-in-hand robot automatically servos to provide a view of the remote environment that is estimated to best support effective manipulations. Our method avoids occlusions with the manipulation arm to improve visibility, provides context and detailed views of the environment by varying the camera-target distance, utilizes motion prediction to cover the space of the user»s next manipulation actions, and actively corrects views to avoid disorienting the user as the camera moves. Through two user studies, we show that our method improves teleoperation performance over alternative methods of providing visual support for teleoperation. We discuss the implications of our findings for real-world teleoperation and for future research.
... We then align the base and the robot model with the vertical thirds of the image plane to make a more aesthetically pleasing composition. This adheres to the well known "rule of thirds" from photography and cinematography [16]. ...
Chapter
In order to support common annotation tasks in visual media production and archiving, we propose two datasets which cover the annotation of the bustle of a scene (i.e., populated to unpopulated), the cinematographic type of a shot as well as the time of day and season of a shot. The dataset for bustle and shot type, called People@Places, adds annotations to the Places365 dataset, and the ToDY (time of day/year) dataset adds annotations to the SkyFinder dataset. For both datasets, we provide a toolchain to create automatic annotations, which have been manually verified and corrected for parts of the two datasets. We provide baseline results for these tasks using the EfficientNet-B3 model, pretrained on the Places365 dataset.
Article
With the rapid growth of IoT multimedia devices, more and more content is being delivered in multimedia forms which generally requires more effort and resources from users to create and could be challenging to streamline. In this article, we propose Text2Animation (T2A), a framework that helps to generate complex multimedia content, and animation, from the simple textual script input. By leveraging recent advances in computational cinematography and video understanding, the purposed workflow further reduces associated knowledge requirements dramatically for cinematography. By jointly incorporating the fidelity and aesthetic models, T2A jointly considers the comprehensiveness of the visual presentation of the input script and the compliance of generated video with given cinematography specifications. The virtual camera placement in a 3-D environment is mapped into an optimization problem that can be resolved by using dynamic programming to achieve the lowest computational complexity. Experimental results show that T2A can reduce the manual animation production process by around 74%, and the new optimization framework can improve the perceptual quality of the output video by up to 35%. More video footage can be found at https://www.youtube.com/watch?v=MMTJbmWL3gs .