Article

Tennis Real Play

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose the process for database normalization and the behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose the framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... service or net playing) [17]. Another theme of research in sports video analysis have been on team activity analysis in group sports such as handball [10] or basketball [6,26,38,55]. Lastly, many researchers working on sports video analysis have focused on automatic highlight detection and summarization [37,41,51,52]. ...
Article
Full-text available
Most existing software packages for sports video analysis require manual annotation of important events in the video. Despite being the most popular sport in the United States, most American football game analysis is still done manually. Line of scrimmage and offensive team formation recognition are two statistics that must be tagged by American Football coaches when watching and evaluating past play video clips, a process which takesmanyman hours per week. These two statistics are the building blocks for more high-level analysis such as play strategy inference and automatic statistic generation. In this chapter, we propose a novel framework where given an American football play clip, we automatically identify the video frame in which the offensive team lines in formation (formation frame), the line of scrimmage for that play, and the type of player formation the offensive team takes on. The proposed framework achieves 95% accuracy in detecting the formation frame, 98% accuracy in detecting the line of scrimmage, and up to 67%accuracy in classifying the offensive team’s formation. To validate our framework, we compiled a large dataset comprising more than 800 play-clips of standard and high definition resolution from real-world football games. This dataset will be made publicly available for future comparison.
Article
We present a system that converts annotated broadcast video of tennis matches into interactively controllable video sprites that behave and appear like professional tennis players. Our approach is based on controllable video textures and utilizes domain knowledge of the cyclic structure of tennis rallies to place clip transitions and accept control inputs at key decision-making moments of point play. Most importantly, we use points from the video collection to model a player’s court positioning and shot selection decisions during points. We use these behavioral models to select video clips that reflect actions the real-life player is likely to take in a given match-play situation, yielding sprites that behave realistically at the macro level of full points, not just individual tennis motions. Our system can generate novel points between professional tennis players that resemble Wimbledon broadcasts, enabling new experiences, such as the creation of matchups between players that have not competed in real life or interactive control of players in the Wimbledon final. According to expert tennis players, the rallies generated using our approach are significantly more realistic in terms of player behavior than video sprite methods that only consider the quality of motion transitions during video synthesis. The supplementary material/video are available at our https://cs.stanford.edu/~haotianz/research/vid2player/ project website.
Article
Full-text available
We introduce a new optimization algorithm for video sprites to animate realistic-looking characters. Video sprites are animations created by rearranging recorded video frames of a moving object. Our new technique to find good frame arrangements is based on repeated partial replacements of the sequence. It allows the user to specify animations using a flexible cost function. We also show a fast technique to compute video sprite transitions and a simple algorithm to correct for perspective effects of the input footage. We use our techniques to create character animations of animals, which are difficult both to train in the real world and to animate as 3D models.
Article
Full-text available
Watching sports, such as tennis and soccer, remains popular for a broad class of consumers. However, audiences might not be able to enjoy their favorite games from a television or PC when they are traveling, so mobile devices are increasingly used for watching sports video. Unfortunately, watching sports on a mobile device is not as simple as watching sports on TV. Bandwidth limitations on wireless networks prevent high bit-rate video transmission. In addition, small displays lose visual details of the sports event. Bandwidth limitations occur primarily when multiple users want to stream video on the same wireless link. These bottlenecks are likely to occur because of the popularity of an event. This article describes a camera modeling based, mixed-reality system concept. The idea is to build a 3D model for sports video, where all parameters of this model can be obtained by analyzing the broadcasting video. Instead of sending original images to the mobile device, the system only sends parameters of the 3D model and the information about players and balls, which can significantly save transmission bandwidth. Additionally, because we have full-quality information about camera modeling and also players and balls, the mobile client is able to recover the important information without loss of the visual details. In addition, we can generate virtual scenes for less important areas, such as the playing field and the commercial billboard, without changing the major story of the sports game.
Conference Paper
Full-text available
Recognition of player actions in broadcast sports video is a chal- lenging task due to low resolution of the players in video frames. In this paper, we present a novel method to recognize the basic player actions in broadcast tennis video. Different from the existing ap- pearance-based approaches, our method is based on motion analysis and considers the relationship between the movements of different body parts and the regions in the image plane. A novel motion de- scriptor is proposed and supervised learning is employed to train the action classifier. We also propose a novel framework by combining the player action recognition with other multimodal features for semantic and tactic analysis of the broadcast tennis video. Incorpo- rating action recognition into the framework not only improves the semantic indexing and retrieval performance of the video content, but also conducts highlights ranking and tactics analysis in tennis matches, which is the first solution to our knowledge for tennis game. The experimental results demonstrate that our player action recognition method outperforms existing appearance-based ap- proaches and the multimodal framework is effective for broadcast tennis video analysis.
Conference Paper
Full-text available
While most current approaches for sports video analysis are based on broadcast video, in this paper, we present a novel approach for highlight detection and automatic replay generation for soccer videos taken by the main camera. This research is important as current soccer highlight detection and replay generation from a live game is a labor-intensive process. A robust multi-level, multi-model event detection framework is proposed to detect the event and event boundaries from the video taken by the main camera. This framework explores the possible analysis cues, using a mid-level representation to bridge the gap between low-level features and high-level events. The event detection results and mid-level representation are used to generate replays which are automatically inserted into the video. Experimental results are promising and found to be comparable with those generated by broadcast professionals.
Conference Paper
Full-text available
Sprite is an image constructed from video clips and is also a medium for multimedia applications. An automatic sprite generation with foreground removal and super-resolution is proposed in this paper. To remove the foreground objects, each pixel-value on the sprite is iteratively updated by the value with maximum appearance probability on temporal and spatial distribution. By storing the half-pixel, superresolution sprite has less blurring-defect from source video. In the result, the generated sprite preserves the complete scenes of background and has higher image quality, and it can used to increase the visual quality in current sprite applications and also employed to facilitate video segmentation.
Conference Paper
Full-text available
ABSTRACT A new method,called TIP (Tour Into the Picture) is presented for easily making,animations from one 2D picture or photograph of a scene. In TIP, animation is created from the viewpoint of a camera,which ,can be three-dimensionally "walked or flown- through" the 2D picture or photograph. To make such animation, conventional computer,vision techniques cannot be applied in the 3D modeling process for the scene, using only a single 2D image. Instead a spidery ,mesh ,is employed ,in our ,method ,to obtain ,a simple scene model ,from ,the 2D image ,of the ,scene using a graphical user interface. Animation ,is thus ,easily generated without the need of multiple 2D images. Unlike existing methods, our method is not intended to construct a precise ,3D scene model. The scene model ,is rather simple, and not fully 3D-structured. The modeling process starts byspecifying,the vanishing ,point in the ,2D image. ,The background,in the ,scene model ,then consists of at ,most ,five rectangles, whereas hierarchical polygons are used as a model for each foreground ,object. Furthermore a virtual camera ,is moved around the 3D scene model, with the viewing angle being freely controlled. This process is easily and effectively performed,using the spidery mesh ,interface. We have ,obtained a wide ,variety of animated scenes which demonstrate the efficiency of TIP. CRCategories and ,Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation - viewing algorithms; I.3.7 [Computer Graphics] Three-dimensional Graphics and Realism, Animation Additional Keywords: graphical user interface, image-based
Article
Full-text available
In this paper we propose a novel approach for detecting interest points invariant to scale and affine transformations. Our scale and affine invariant detectors are based on the following recent results: (1) Interest points extracted with the Harris detector can be adapted to affine transformations and give repeatable results (geometrically stable). (2) The characteristic scale of a local structure is indicated by a local extremum over scale of normalized derivatives (the Laplacian). (3) The affine shape of a point neighborhood is estimated based on the second moment matrix.
Article
Full-text available
In many applications today user interaction is moving away from mouse and pens and is becoming pervasive and much more physical and tangible. New emerging interaction technologies allow developing and experimenting with new interaction methods on the long way to providing intuitive human computer interaction. In this paper, we aim at recognizing gestures to interact with an application and present the design and evaluation of our sensor-based gesture recognition. As input device we employ the Wii-controller(Wiimote) which recently gained much attention world wide. We use the Wiimote’s acceleration sensor independent of the gaming console for gesture recognition. The system allows the training of arbitrary gestures by users which can then be recalled for interacting with systems like photo browsing on a home TV. The developed library exploits Wii-sensor data and employs a hidden Markov model for training and recognizing user-chosen gestures. Our evaluation shows that we can already recognize gestures with a small number of training samples. In addition to the gesture recognition we also present our experiences with the Wii-controller and the implementation of the gesture recognition. The system forms the basis for our ongoing work on multimodal intuitive media browsing and are available to other researchers in the field.
Article
Full-text available
Colonization is a computer-assisted process of adding color to a monochrome image or movie. The process typically involves segmenting images into regions and tracking these regions across image sequences. Neither of these tasks can be performed reliably in practice; consequently, colonization requires considerable user intervention and remains a tedious, time-consuming, and expensive task. In this paper we present a simple colorization method that requires neither precise image segmentation, nor accurate region tracking. Our method is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors. We formalize this premise using a quadratic cost function and obtain an optimization problem that can be solved efficiently using standard techniques. In our approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence. We demonstrate that high quality colorizations of stills and movie clips may be obtained from a relatively modest amount of user input.
Article
Full-text available
This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce videobased animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for sp...
Article
Full-text available
Image morphing techniques can generate compelling 2D transitions between images. However, differences in object pose or viewpoint often cause unnatural distortions in image morphs that are difficult to correct manually. Using basic principles of projective geometry, this paper introduces a simple extension to image morphing that correctly handles 3D projective camera and scene transformations. The technique, called view morphing, works by prewarping two images prior to computing a morph and then postwarping the interpolated images. Because no knowledge of 3D shape is required, the technique may be applied to photographs and drawings, as well as rendered scenes. The ability to synthesize changes both in viewpoint and image structure affords a wide variety of interesting 3D effects via simple image transformations. CR Categories and Subject Descriptors: I.3.3 [Computer Graphics ]: Picture/Image Generation-- viewing algorithms; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Re...
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
In this paper, several psychophysical experiments are conducted to find the perceptual limitation in terms of duration, retinal velocity, and direction of motions. Moreover, the jerkiness artifacts are considered together with motion blur artifacts. The analysis results can provide important information for the computation reduction of frame rate up conversion.
Article
LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.
Article
Tour into the picture (TIP) proposed by Horry et al.13 is a method for generating a sequence of walk-through images from a single reference picture (or image). By navigating a 3D scene model constructed from the picture, TIP produces convincing 3D effects. Assuming that the picture has one vanishing point, they proposed the scene modeling scheme called spidery mesh. However, this scheme has to go through major modification when the picture contains multiple vanishing points or does not have any well-defined vanishing point. Moreover, the spidery mesh is hard to generalize for other types of images such as panoramic images. In this paper, we propose a new scheme for TIP which is based on a single vanishing line instead of a vanishing point. Based on projective geometry, our scheme is simple and yet general enough to address the problems faced with the previous method. We also show that our scheme can be naturally extended to a panoramic image.
Article
The classical filtering and prediction problem is re-examined using the Bode-Sliannon representation of random processes and the “state-transition” method of analysis of dynamic systems. New results are: (1) The formulation and methods of solution of the problem apply without modification to stationary and nonstationary statistics and to growing-memory and infinitememory filters. (2) A nonlinear difference (or differential) equation is derived for the covariance matrix of the optimal estimation error. From the solution of this equation the coefficients of the difference (or differential) equation of the optimal linear filter are obtained without further calculations. (3) The filtering problem is shown to be the dual of the noise-free regulator problem. The new method developed here is applied to two well-known problems, confirming and extending earlier results. The discussion is largely self-contained and proceeds from first principles; basic concepts of the theory of random processes are reviewed in the Appendix.
Article
This paper addresses the automatic analysis of court-net sports video content. We extract information about the players, the playing-field in a bottom-up way until we reach scene-level semantic concepts. Each part of our framework is general, so that the system is applicable to several kinds of sports. A central point in our framework is a camera calibration module that relates the a-priori information of the geometric layout in the form of a court model to the input image. Exploiting this information, several novel algorithms are proposed, including playing-frame detection, players segmentation and tracking. To address the player-occlusion problem, we model the contour map of the player silhouettes using a nonlinear regression algorithm, which enables to locate the players during the occlusions caused by players in the same team. Additionally, a Bayesian-based classifier helps to recognize predefined key events, where the input is a number of real-world visual features. We illustrate the performance and efficiency of the proposed system by evaluating it for a variety of sports videos containing badminton, tennis and volleyball, and we show that our algorithm can operate with more than 91% feature detection accuracy and 90% event detection.
Conference Paper
In this paper we present the combination of two well known techniques like video synthesis and flocking behavior to introduce a new form of video animation, real-video based animation. This system can be used in the elaboration of games and special effects for movies. We have developed a system that makes possible to manipulate objects in the frames of a video while maintaining its natural appearance and complexity and allows us to multiply an object in the frame or control the pattern of its movement. The system accepts as input a video in format AVI and renders automatically another with new patterns from the original. Video synthesis and flocking behavior are well known independent techniques but the combination of them was not researched yet.
Conference Paper
Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose a database normalization process and a behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose a framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.
Article
This paper presents an original algorithm to automatically acquire accurate camera calibration from broadcast tennis video (BTV) as well as demonstrates two of its many applications. Accurate camera calibration from BTV is challenging because the frame-data of BTV is often heavily distorted and full of errors, resulting in wildly fluctuating camera parameters. To meet this challenge, we propose a frame grouping technique, which is based on the observation that many frames in BTV possess the same camera viewpoint. Leveraging on this fact, our algorithm groups frames according to the camera viewpoints. We then perform a group-wise data analysis to obtain a more stable estimate of the camera parameters. Recognizing the fact that some of these parameters do vary somewhat even if they have similar camera viewpoint, we further employ a Hough-like search to tune such parameters, minimizing the reprojection disparity. This two-tiered process gains stability in the estimates of the camera parameters, and yet ensures good match between the model and the reprojected camera view via the tuning step. To demonstrate the utility of such stable calibration, we apply the camera matrix acquired to two applications: (a) 3D virtual content insertion; and (b) tennis-ball detection and tracking. The experimental results show that our algorithm is able to acquire accurate camera matrix and the two applications have very good performances.
Article
How to effectively and less-intrusively deliver the advertising message by spatially replacing regions with advertisements in a period of exposure time for sports videos has been known as a challenging problem. The size, placement locations, and the representation of advertisement are the critical factors that have significant impact on both the recognition effectiveness and the perceived intrusiveness. In this paper, we take advertising theory, psychology, and computational aesthetics into account to develop a novel virtual advertising mechanism, called virtual spotlighted advertising (ViSA), for tennis videos. We utilize the extraneous visual acuity of viewers while watching the attractive object, such that they are not much disturbed from the progress of the game, and at the same time, the inserted advertisement can effectively deliver its message across to them. We propose a framework and realize an exemplary system to serve ViSA. The system automatically detects the candidate insertion points in both temporal and spatial domains and estimates the most effective region for visual communication. Then, the harmonically re-colored advertisements with foveation model based non-uniform transparency, are projected onto the court. The evaluation results demonstrate the effectiveness of the proposed ViSA in terms of recall and recognition. Moreover, the induced visual intrusiveness is limited by the proposed innovative representation style.
Conference Paper
Tennis Real Play (TRP) is an interactive tennis game system constructed with models extracted from videos of real matches. The key techniques proposed for TRP include player modeling and video-based player/court rendering. For player model creation, we propose a database normalization process and a behavioral transition model of tennis players, which might be a good alternative for motion capture in the conventional video games. For player/court rendering, we propose a framework for rendering vivid game characters and providing the real-time ability. We can say that image-based rendering leads to a more interactive and realistic rendering. Experiments show that video games with vivid viewing effects and characteristic players can be generated from match videos without much user intervention. Because the player model can adequately record the ability and condition of a player in the real world, it can then be used to roughly predict the results of real tennis matches in the next days. The results of a user study reveal that subjects like the increased interaction, immersive experience, and enjoyment from playing TRP.
Conference Paper
Image-based rendering (IBR) is a technique to render the video from images, and it provides users to have more interaction and immersive experience in watching a video. In this paper, we integrate the computation of several IBR applications, analyze the bandwidth of memory access, and design an architecture to process the computation of IBR. Experimental results show that the proposed IBR Engine is able to render a video with resolution 720×480 and 30 frames per second, which is 12.7 times faster than a Core2Due 2.83 GHz CPU. For the extensions, IBR Engine can be embedded in the television system and lets viewers enjoy the functions from IBR.
Conference Paper
Our goal is to recognize human action at a distance, at resolutions where a whole person may be, say, 30 pixels tall. We introduce a novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure, and an associated similarity measure to be used in a nearest-neighbor framework. Making use of noisy optical flow measurements is the key challenge, which is addressed by treating optical flow not as precise pixel displacements, but rather as a spatial pattern of noisy measurements which are carefully smoothed and aggregated to form our spatiotemporal motion descriptor. To classify the action being performed by a human figure in a query sequence, we retrieve nearest neighbor(s) from a database of stored, annotated video sequences. We can also use these retrieved exemplars to transfer 2D/3D skeletons onto the figures in the query sequence, as well as two forms of data-based action synthesis "do as I do" and "do as I say". Results are demonstrated on ballet, tennis as well as football datasets.
Conference Paper
This paper presents a new framework for arbitrary view synthesis and presentation of sporting events for mixed reality entertainment. In accordance with the viewpoint position of an observer, virtual view image of sporting scene is generated by view interpolation among multiple videos captured at real stadium. Then the synthesized sporting scene is overlaid onto a desktop stadium model in the real world via HMD. This makes it possible to watch the event in front of the observer. Projective geometry between cameras is used for virtual view generation of the dynamic scene and geometric registration between the real world and the virtual view image of sporting scene. The proposed method does not need to calibrate multiple video cameras for capturing the event and the HMD camera. Therefore it can be applied even to dynamic events in a large space and enables observation with immersive impression. The proposed approach leads to make a new type of mixed reality entertainment for sporting events.
Conference Paper
This paper presents the results of a project generalizing the 'video textures' technique as described in (Schodl et al., 2000). The algorithms used were generalized such that either video or audio input clips could be used to create novel and infinitely varying video or audio 'textures'. The algorithms were tested and refined to produce acceptable results for both forms of media, and a simple interface was developed allowing the application to be easily extended to other forms of media commonly used in computer graphics such as motion capture.
Conference Paper
This paper describes a data-driven approach for generating photorealistic animations of human motion. Each animation sequence follows a user-choreographed path and plays continuously by seamlessly transitioning between different segments of the captured data. To produce these animations, we capitalize on the complementary characteristics of motion capture data and video. We customize our capture system to record motion capture data that are synchronized with our video source. Candidate transition points in video clips are identified using a new similarity metric based on 3-D marker trajectories and their 2-D projections into video. Once the transitions have been identified, a video-based motion graph is constructed. We further exploit hybrid motion and video data to ensure that the transitions are seamless when generating animations. Motion capture marker projections serve as control points for segmentation of layers and nonrigid transformation of regions. This allows warping and blending to generate seamless in-between frames for animation. We show a series of choreographed animations of walks and martial arts scenes as validation of our approach.
Article
Human gesture recognition plays an important role in automating the analysis of video material at a high level. Especially in sports videos, the determination of the player's gestures is a key task. In many sports views, the camera covers a large part of the sports arena, resulting in low resolution of the player's region. Moreover, the camera is not static, but moves dynamically around its optical center, i.e. pan/tilt/zoom camera. These factors make the determination of the player's gestures a challenging task. To overcome these problems, we propose a posture descriptor that is robust to shape corruption of the player's silhouette, and a gesture spotting method that is robust to noisy sequences of data and needs only a small amount of training data. The proposed posture descriptor extracts the feature points of a shape, based on the curvature scale space (CSS) method. The use of CSS makes this method robust to local noise, and our method is also robust to significant shape corruption of the player's silhouette. The proposed spotting method provides probabilistic similarity and is robust to noisy sequences of data. It needs only a small number of training data sets, which is a very useful characteristic when it is difficult to obtain enough data for model training. In this paper, we conducted experiments spotting serve gestures using broadcast tennis play video. From our experiments, for 63 shots of playing tennis, some of which include a serve gesture and while some do not, it achieved 97.5% precision rate and 86.7% recall rate.
Article
This paper proposes a new method for presenting sports videos. Tennis videos are used as an example for the implementation of a viewing program called as Tennis Video 2.0. For the methods in video analysis, background generation by considering the pixels in temporal and spatial distribution is proposed; foreground segmentation combining automatic trimap generation and matting model is proposed. To provide more functions in watching videos, the rendering flow of video contents and the semantic Scalability are proposed. With the new analysis and rendering tools, the presentation of sports videos has three properties—Structure, Interactivity, and Scalability. The experiments show that several broadcasting game videos are employed to evaluate the robustness and performance of the proposed system. For user study, 20 evaluators highly identify that Tennis Video 2.0 is a new presentation of sports videos and give people better viewing experience.
Article
We describe a method for plausible interpolation of images, with a wide range of applications like temporal up-sampling for smooth playback of lower frame rate video, smooth view interpolation, and animation of still images. The method is based on the intuitive idea, that a given pixel in the interpolated frames traces out a path in the source images. Therefore, we simply move and copy pixel gradients from the input images along this path. A key innovation is to allow arbitrary (asymmetric) transition points, where the path moves from one image to the other. This flexible transition preserves the frequency content of the originals without ghosting or blurring, and maintains temporal coherence. Perhaps most importantly, our framework makes occlusion handling particularly simple. The transition points allow for matches away from the occluded regions, at any suitable point along the path. Indeed, occlusions do not need to be handled explicitly at all in our initial graph-cut optimization. Moreover, a simple comparison of computed path lengths after the optimization, allows us to robustly identify occluded regions, and compute the most plausible interpolation in those areas. Finally, we show that significant improvements are obtained by moving gradients and using Poisson reconstruction.
Article
We present an interactive system that lets a user move and deform a two-dimensional shape without manually establishing a skeleton or freeform deformation (FFD) domain beforehand. The shape is represented by a triangle mesh and the user moves several vertices of the mesh as constrained handles. The system then computes the positions of the remaining free vertices by minimizing the distortion of each triangle. While physically based simulation or iterative refinement can also be used for this purpose, they tend to be slow. We present a two-step closed-form algorithm that achieves real-time interaction. The first step finds an appropriate rotation for each triangle and the second step adjusts its scale. The key idea is to use quadratic error metrics so that each minimization problem becomes a system of linear equations. After solving the simultaneous equations at the beginning of interaction, we can quickly find the positions of free vertices during interactive manipulation. Our approach successfully conveys a sense of rigidity of the shape, which is difficult in space-warp approaches. With a multiple-point input device, even beginners can easily move, rotate, and deform shapes at will.
Article
Using generic interpolation machinery based on solving Poisson equations, a variety of novel tools are introduced for seamless editing of image regions. The first set of tools permits the seamless importation of both opaque and transparent source image regions into a destination region. The second set is based on similar mathematical ideas and allows the user to modify the appearance of the image seamlessly, within a selected region. These changes can be arranged to affect the texture, the illumination, and the color of objects lying in the region, or to make tileable a rectangular selection.
Article
We present an algorithm for morphing two images, often with little or no user interaction. For two similar images (such as different faces against a neutral background), the algorithm generally can create a pleasing morph completely automatically. The algorithm seeks to minimize the work needed to deform one image into the other. Work is defined as a function of the amount of warping and recoloration. We invoke a hierarchical method for finding a minimal work solution. Anchor point constraints are satisfied by penalties imposed on deformations that disobey these constraints. Good results can be obtained in less than 10 s for 256x256 images.
Conference Paper
We present a novel approach for interactive multimedia content creation that establishes an interactive environment in cyberspace in which users interact with autonomous agents generated from video images of real-world creatures. Each agent has autonomy, personality traits, and behaviors that reflect the results of various interactions determined by an emotional model with fuzzy logic. After an agent's behavior is determined, a sequence of video images that best match the determined behavior is retrieved from the database in which a variety of video image sequences of the real creature's behaviors are stored. The retrieved images are successively displayed on the cyberspace to make it responsive. Thus the autonomous agent behaves continuously. In addition, an explicit sketch-based method directly initiate the reactive behavior of the agent without involving the emotional process. This paper describes the algorithm that establishes such an interactive system. First, an image processing algorithm to gene
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Conference Paper
This paper attempts to classify tennis games into 58 winning patterns for training purpose. It bases on tracking ball movement from broadcasting tennis video. Trajectory and landing position are used as the basic features for classification. We use an improved Bayesian networks to classify the landing position of different patterns. Intelligent agents are used to combine trajectories and landing positions as two features are in different dimensions. Semantic labels are granted after classification. The aim of the analysis is to provide a browsing tool for coachers or other personnel to retrieve tennis video clips.
Conference Paper
This paper presents the results of a project generalizing the 'video textures' technique as described in (Schodl et al., 2000). The algorithms used were generalized such that either video or audio input clips could be used to create novel and infinitely varying video or audio 'textures'. The algorithms were tested and refined to produce acceptable results for both forms of media, and a simple interface was developed allowing the application to be easily extended to other forms of media commonly used in computer graphics such as motion capture.
Article
The Hausdorff distance measures the extent to which each point of a model set lies near some point of an image set and vice versa. Thus, this distance can be used to determine the degree of resemblance between two objects that are superimposed on one another. Efficient algorithms for computing the Hausdorff distance between all possible relative positions of a binary image and a model are presented. The focus is primarily on the case in which the model is only allowed to translate with respect to the image. The techniques are extended to rigid motion. The Hausdorff distance computation differs from many other shape comparison methods in that no correspondence between the model and the image is derived. The method is quite tolerant of small position errors such as those that occur with edge detectors and other feature extraction methods. It is shown that the method extends naturally to the problem of comparing a portion of a model against an image