Stuart James’s research while affiliated with Istituto Italiano di Tecnologia and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (42)


Interactive Digital Storytelling Navigating the Inherent Currents of the Diasporic Mind
  • Chapter

December 2024

·

2 Reads

·

1 Citation

·

·

Miguel Pessoa

·

[...]

·



Figure 2. The MfM Pipeline. a) We extract 2D maps representing the spatial arrangement of detected objects, from the image's point of view. b) The maps are encoded as a graph, with a node for each detection, and edges connecting detections from the same image (SameMap) or with the same class label (Same-class). c) A GNN predicts the location of all object and the cameras in one reference frame.
MfM Dataset Evaluation (Section 4.3): Average camera (µc) and object (µo) error, their standard deviation (σc and σo), and the failure percentage (Fail) of MfM using perfect inputs (GT Local Maps), compared against standard COLMAP.
Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images
  • Preprint
  • File available

November 2024

·

19 Reads

World-wide detailed 2D maps require enormous collective efforts. OpenStreetMap is the result of 11 million registered users manually annotating the GPS location of over 1.75 billion entries, including distinctive landmarks and common urban objects. At the same time, manual annotations can include errors and are slow to update, limiting the map's accuracy. Maps from Motion (MfM) is a step forward to automatize such time-consuming map making procedure by computing 2D maps of semantic objects directly from a collection of uncalibrated multi-view images. From each image, we extract a set of object detections, and estimate their spatial arrangement in a top-down local map centered in the reference frame of the camera that captured the image. Aligning these local maps is not a trivial problem, since they provide incomplete, noisy fragments of the scene, and matching detections across them is unreliable because of the presence of repeated pattern and the limited appearance variability of urban objects. We address this with a novel graph-based framework, that encodes the spatial and semantic distribution of the objects detected in each image, and learns how to combine them to predict the objects' poses in a global reference system, while taking into account all possible detection matches and preserving the topology observed in each image. Despite the complexity of the problem, our best model achieves global 2D registration with an average accuracy within 4 meters (i.e., below GPS accuracy) even on sparse sequences with strong viewpoint change, on which COLMAP has an 80% failure rate. We provide extensive evaluation on synthetic and real-world data, showing how the method obtains a solution even in scenarios where standard optimization techniques fail.

Download


Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving

October 2024

·

43 Reads

This paper proposes the RePAIR dataset that represents a challenging benchmark to test modern computational and data driven methods for puzzle-solving and reassembly tasks. Our dataset has unique properties that are uncommon to current benchmarks for 2D and 3D puzzle solving. The fragments and fractures are realistic, caused by a collapse of a fresco during a World War II bombing at the Pompeii archaeological park. The fragments are also eroded and have missing pieces with irregular shapes and different dimensions, challenging further the reassembly algorithms. The dataset is multi-modal providing high resolution images with characteristic pictorial elements, detailed 3D scans of the fragments and meta-data annotated by the archaeologists. Ground truth has been generated through several years of unceasing fieldwork, including the excavation and cleaning of each fragment, followed by manual puzzle solving by archaeologists of a subset of approx. 1000 pieces among the 16000 available. After digitizing all the fragments in 3D, a benchmark was prepared to challenge current reassembly and puzzle-solving methods that often solve more simplistic synthetic scenarios. The tested baselines show that there clearly exists a gap to fill in solving this computationally complex problem.




6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

July 2024

·

28 Reads

We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e.g. iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an "a priori" pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by 12% and translation accuracy by 22% on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15fps on consumer hardware.




Citations (23)


... The tool and its supportive documentation allow the creation of videos made from a combination of images, text and audio. A more robust approach in terms of human-computer interaction, still in the prototype version, undertaken by the Portuguese Center for Refugees is the design of a custom-made interactive digital storytelling authoring tool, tested with a real refugee and migrant audience (Nisi et al., 2025). As for image generation with AI and refugee education, little can be said at the moment, as the field is nascent and emergent. ...

Reference:

A Year in Greece’: Adolescent Refugees Create a Digital Photographic Exhibition.
Interactive Digital Storytelling Navigating the Inherent Currents of the Diasporic Mind
  • Citing Chapter
  • December 2024

... It successfully demonstrates the effectiveness of 3D Gaussian splatting for optimizing human reconstruction based on SMPL representations. Also, work like GS-pose [4] and 6D-GS [3] leverage 3D Gaussian splatting to optimize the 6D pose of objects. We reasonably infer that this approach can also be effectively applied to reconstruct human-object interactions from a single viewpoint, potentially replacing traditional silhouette-based optimizers [54]. ...

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
  • Citing Chapter
  • November 2024

... The Jigsaw method fosters motivation, comprehension, and confidence in collaborative learning environments (Lin et al., 2025) while enhancing group competence through visual thinking and social interaction (Maldonado López et al., 2023). Its core principle aligns with progressive concept-building and systematic information integration, much like its application in artificial intelligence (Talon et al., 2025) and strategic segmentation for optimized learning (Apáti et al., 2025) though unrelated to learning, efficiency and optimization reflect the Jigsaw method, where structured collaboration enhances comprehension.. As a peer-teaching approach, Jigsaw improves understanding and active engagement (Chng et al., 2024), though its effectiveness depends on students' readiness for collaborative thinking and structured implementation (Riant et al., 2024). ...

GANzzle + + : Generative approaches for jigsaw puzzle solving as local to global assignment in latent spatial representations
  • Citing Article
  • November 2024

Pattern Recognition Letters

... At the same time, using sparse images and focusing common urban objects, that typically have plain and standardized appearance, makes establishing object matches from the input images unreliable. Given the success of Graph Neural Network (GNN) to address geometrical reasoning problems [17,22,49], we frame MfM as a graph problem, assigning a node to each detection and attempting to regress its location in the global map. In this formulation, we use same-map edges to force the network to preserve the topology of each local map, and same-class edges to account for all possible detections matches, instead of explicitly matching the input detections. ...

Positional diffusion: Graph-based diffusion models for set ordering
  • Citing Article
  • October 2024

Pattern Recognition Letters

... CROSSFIRE [17] incorporates learned local features to mitigate local minima but still relies on accurate initial pose priors. IFFNeRF [18] proposes NeRF model inversion to rerender images matching a target view but overlooks unique 3DGS characteristics, such as ellipsoid elongation, rotation, and non-uniform spatial distribution, which our approach effectively addresses. [13] pioneers LiDAR-camera fused 3DGS mapping using KD-trees and 2D voxel grids, employing NCC for coarse alignment and PnP for pose refinement. ...

IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model
  • Citing Conference Paper
  • May 2024

... Moreover, with the strong support of large language models, AI technology's capacity to process and analyze semantic information plays a vital role in transforming static texts and images or demo media into dynamic narratives that convey semantic emotions. MEMEX project explored inclusive digital storytelling [59], which integrates AI and AR technologies to amplify marginalized narratives, fostering social inclusion and shared cultural heritage. This semantic processing enables AI to imbue digital recreations with depth of thought, thereby enhancing the user's emotional connection to cultural heritage. ...

Inclusive Digital Storytelling: Artificial Intelligence and Augmented Reality to Re-centre Stories from the Margins

Lecture Notes in Computer Science

... In cities such as Barcelona, Paris, and Lisbon, digital storytelling applications have been used to promote social inclusion and cohesion by engaging marginalized communities in cultural heritage activities. These tools help bridge cultural gaps and foster a sense of belonging (Nisi et al., 2023). Building resilient cities involves addressing the needs of those most at risk of exclusion, such as during urban flooding. ...

"Connected to the people": Social Inclusion & Cohesion in Action through a Cultural Heritage Digital Tool

Proceedings of the ACM on Human-Computer Interaction

... A graph structure provides the realization of both kinds of connectedness with content provided by WikiData [18]. We follow the approach of [43] for constructing a Knowledge Graph (KG) to provide content suggestions in the form of textual information and images [44]. ...

Geolocation of Cultural Heritage Using Multi-view Knowledge Graph Embedding
  • Citing Chapter
  • August 2023

Lecture Notes in Computer Science

... KGE is an extensively researched area [36,37] that aims to map entities and relations into a continuous vector space while preserving their semantic information. It is widely used in tasks such as link prediction [38] and question answering [39,40] by using a score function to predict the validity of triplets. Depending on the type of score function, existing embedding models can be broadly categorized into three types, namely translation distance-based methods, semantic matching-based methods and neural networks-based methods. ...

Locality-aware subgraphs for inductive link prediction in knowledge graphs
  • Citing Article
  • February 2023

Pattern Recognition Letters

... For example, most of the commercial tools and literature focus specifically on planning (e.g. [6,16]), generating (e.g. [24,73,123]) or revising (e.g. ...

Writing with (Digital) Scissors: Designing a Text Editing Tool for Assisted Storytelling Using Crowd-Generated Content

Lecture Notes in Computer Science