James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.

James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.


← Back to homepage

Editing Video by Recovering Scene Structure

How do we let users edit captured video meaningfully — by first recovering the scene structure (moving objects, lighting vs. reflectance, cross-frame consistency) that makes plausible modifications possible?

Editing video is harder than editing a photograph: changes to one frame must propagate consistently to every other, and many edits (removing a person, separating lighting from material, stabilising flicker) require understanding the underlying scene rather than just manipulating pixels. The papers in this thread approach editing as inverse reconstruction: decompose video into scene structure first, then edit.

A postdoc-era thread spanning UCL, MPI-Inf, Harvard, and LIRIS-CNRS. The earliest piece (2011, UCL) is the cinemagraphs authoring tool — a moment image isolated from a stabilised clip. Miguel Granados led the video-inpainting work at MPI-Inf (2012) — removing dynamic objects from crowded scenes, and the harder case of background recovery under a free-moving camera. Nicolas Bonneel led the consistency and decomposition line (2014–2017) — interactive intrinsic decomposition, blind temporal consistency stabilising any per-frame filter, and the spatio-temporal extension to camera arrays. The 2016 multicut paper takes a different angle on the same theme: cut the video into the right regions before editing.

Authors

Bjoern Andres · Nicolas Bonneel · Miguel Granados · Oliver Grau · Jan Kautz · Kwang In Kim · Steffen Kirchhoff · Evgeny Levinkov · Sylvain Paris · Fabrizio Pece · Hanspeter Pfister · Kartic Subr · Kalyan Sunkavalli · Deqing Sun · Christian Theobalt · Oliver Wang

Papers in this thread

European Conference on Visual Media Production (CVMP), 2011
An authoring tool that pipelines stabilisation, segmentation, motion selection, and loop detection to produce cinemagraphs — short looping clips where only a chosen region moves.
Computer Graphics Forum (Eurographics), 2012
Object removal from crowded scenes by filling the spatio-temporal hole from other regions of the video where the occluded background was visible, posed as a graph-cut optimisation. Pitched at occlusions harder than previous work had attempted.
European Conference on Computer Vision (ECCV), 2012
Inpaints background revealed by removing dynamic objects from a free-moving-camera video by aligning candidate frames with piecewise planar homographies — sidestepping the full per-frame depth and pose recovery that earlier free-camera methods required.
ACM Transactions on Graphics (SIGGRAPH Asia), 2014
Decomposes video into reflectance and illumination via a hybrid L2-Lp gradient split, fast enough (two orders of magnitude over prior tools) to support interactive refinement and lighting-aware compositing.
ACM Transactions on Graphics (SIGGRAPH Asia), 2015
A gradient-domain post-process that stabilises any per-frame filter against flicker by borrowing temporal regularity from the unprocessed video — agnostic to what the filter actually is. Demonstrated across stylisation, intrinsic decomposition, and depth.
Pacific Graphics 2016 (Short Paper), 2016
Interactive multi-label video segmentation from multi-coloured scribbles, posed as a multicut on a supervoxel graph and solved fast enough to feel responsive. Multiple objects cut at once with consistent spatio-temporal boundaries, rather than chained binary segmentations.
Computer Graphics Forum (Eurographics), 2017
Extends the blind-consistency idea from time to time-and-space across stereo, light field, and wide-baseline rigs, and adds a filter-transfer scheme that runs the expensive filter on a small subset of frames and propagates the effect — an order-of-magnitude saving for camera-array data.