Erika Lu's research while affiliated with University of Oxford and other places

Publications (11)

Preprint
Full-text available
Computer vision is increasingly effective at segmenting objects in images and videos; however, scene effects related to the objects---shadows, reflections, generated smoke, etc---are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of v...
Preprint
Full-text available
Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. We make the following contributions: First, we introduce a simple variant of...
Article
We present a method for retiming people in an ordinary, natural video --- manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video...
Preprint
We present a method for retiming people in an ordinary, natural video---manipulating and editing the time in which different motions of individuals in the video occur. We can temporally align different motions, change the speed of certain actions (speeding up/slowing down, or entirely "freezing" people), or "erase" selected people from the video al...
Preprint
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparabl...
Chapter
Nearly all existing counting methods are designed for a specific object class. Our work, however, aims to create a counting model able to count any class of object. To achieve this goal, we formulate counting as a matching problem, enabling us to exploit the image self-similarity property that naturally exists in object counting problems.
Preprint
Full-text available
Nearly all existing counting methods are designed for a specific object class. Our work, however, aims to create a counting model able to count any class of object. To achieve this goal, we formulate counting as a matching problem, enabling us to exploit the image self-similarity property that naturally exists in object counting problems. We make t...
Article
We introduce a paradigm for understanding physical scenes without human annotations. At the core of our system is a physical world representation that is first recovered by a perception module and then utilized by physics and graphics engines. During training, the perception module and the generative models learn by visual de-animation - interpreti...

Citations

... Layered video decomposition, introduced in [79], has been applied to optical flow estimation [67,86], motion segmentation [59,89], and video editing [40,51,90]. It is connected to object-centric representation learning [5,9,20,29,43,49], where the compositional structure of scenes is also essential. ...
... Yoo et al. [33] introduced Timedependent DIP in MRI images, which encodes the temporal variations of images with a DIP network that generates MRI images. Lu et al. [18] utilize DIP for video editing and manipulation. The above works show clearly that DIP cannot be naively extended from images to video and it requires additional regularization and loss functions. ...
... Specifically, a physics engine simulates observable kinematics given certain causes following governing dynamic rules. To simulate a sequence of motion of rigid-body objects given specific forces, a physics engine is considered [36], [61], [62], [105]- [107]. Most of these simulators are non-differentiable, making them prohibited to be employed in an end-to-end deep learning framework. ...
... Motion segmentation via hybrid IBR. In our volumetric IBR framework, we observed that without any initialization, scene factorization tends to be dominated by either the time-invariant or the time-varying representation, a phenomena also observed in recent methods [28,41]. To facilitate factorization, Gao et al. [19] initialize the systems by adopt-ing masks from semantic segmentation, relying on the assumption that all moving objects are captured by a set of candidate semantic segmentation labels, and segmentation masks are temporally accurate. ...
... Extensions of[308] include[309],[310]. Lai et al.[311] presented a memory-augmented self-supervised method that enables generalizable and accurate pixel-level tracking. Zhang et al.[312] used the spatial-temporal consistency of depth maps to mitigate forgetting during the learning process. ...
... 6. Significantly, our framework adopting scalar representation achieves comparative counting performance.We are only motivated by demonstrating the framework is reasonable on actively learning the similar matters. Nevertheless, we notice that our MSE is poor, [34], FSOD [35], GMN [36], MAML [37], FameNet [22]. which is incurred by outliers according to the sensitivity of MSE. ...