Jo Vermeulen’s research while affiliated with Autodesk and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (84)


AQuA: Automated Question-Answering in Software Tutorial Videos with Visual Anchors
  • Conference Paper

May 2024

·

4 Citations

Saelyne Yang

·

Jo Vermeulen

·

George Fitzmaurice

·




Figure 3: System design showing the architectures involved in 3DALL-E, which incorporates three large AI models into the workbench of an industry standard CAD software. In the top left panel, we show how text AI outputs are displayed in the UI. In the bottom left panel, we show how users could pass in image prompts and retrieve DALL-E generations within the plugin.
Figure 8: Pattern of generation activity for í µí±‡ í µí±’í µí±‘í µí±–í µí±¡ , when participants edited an existing model.
Figure 9: Pattern of generation activity for í µí±‡ í µí±í µí±Ÿí µí±’í µí±Ží µí±¡í µí±’ , when participants created a model from scratch.
Figure 13: Snapshots of 3D design process of one participant (P18) who 3D modelled using a DALL-E generation as a reference image during í µí±‡ í µí±í µí±Ÿí µí±’í µí±Ží µí±¡í µí±’ .
Table of participant details, with discipline and Fusion 360 usage frequency. We list labels for the model they designed during í µí±‡ í µí±í µí±Ÿí µí±’í µí±Ží µí±¡í µí±’ and labels for the model they brought in (í µí±‡ í µí±’í µí±‘í µí±–í µí±¡ ).
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
  • Preprint
  • File available

October 2022

·

691 Reads

·

3 Citations

Text-to-image AI systems are capable of generating novel images for inspiration, but their applications for 3D design workflows and how designers can build 3D models using AI-provided inspiration is less understood. To investigate this, we integrated DALL-E, GPT-3, and CLIP within a CAD software in 3DALL-E, a plugin that allows users to construct text and image prompts based on what they are modelling. In a study with 13 designers, we found that designers saw great potential to incorporate 3DALL-E into their workflows and to use text-to-image AI for reference images, renders, materials, and design considerations. Additionally, we elaborate on prompting patterns and provide measures of prompt complexity observed across participants. We conclude on a discussion of how 3DALL-E can merge with existing generative design workflows and propose prompt bibliographies as a form of human-AI design history.

Download

Fig. 1. SimCURL learns user representations from a large corpus of unlabeled command sequences. These learned representations are then transferred to multiple downstream tasks that have only limited labels available.
Fig. 3. An overview of the SimCURL method (left) and the user-session network architecture (right). The user's command sequence u i is first divided into sessions {s i,j }, from which two augmented views are generated via session dropout. The views are passed through the main network to obtain the representation vectors r i and r i , then the projection head to produce z i and z i , on which the contrastive loss is applied. Solid and dashed lines denote positive and negative pairs, respectively.
SimCURL: Simple Contrastive User Representation Learning from Command Sequences

July 2022

·

55 Reads

User modeling is crucial to understanding user behavior and essential for improving user experience and personalized recommendations. When users interact with software, vast amounts of command sequences are generated through logging and analytics systems. These command sequences contain clues to the users' goals and intents. However, these data modalities are highly unstructured and unlabeled, making it difficult for standard predictive systems to learn from. We propose SimCURL, a simple yet effective contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences. Our method introduces a user-session network architecture, as well as session dropout as a novel way of data augmentation. We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands. Our method shows significant improvement over existing methods when the learned representation is transferred to downstream tasks such as experience and expertise classification.






What's the Situation with Situated Visualization? A Survey and Perspectives on Situatedness

September 2021

·

68 Reads

·

101 Citations

IEEE Transactions on Visualization and Computer Graphics

Situated visualization is an emerging concept within visualization, in which data is visualized in situ, where it is relevant to people. The concept has gained interest from multiple research communities, including visualization, human-computer interaction (HCI) and augmented reality. This has led to a range of explorations and applications of the concept, however, this early work has focused on the operational aspect of situatedness leading to inconsistent adoption of the concept and terminology. First, we contribute a literature survey in which we analyze 44 papers that explicitly use the term “situated visualization” to provide an overview of the research area, how it defines situated visualization, common application areas and technology used, as well as type of data and type of visualizations. Our survey shows that research on situated visualization has focused on technology-centric approaches that foreground a spatial understanding of situatedness. Secondly, we contribute five perspectives on situatedness (space, time, place, activity, and community) that together expand on the prevalent notion of situatedness in the corpus. We draw from six case studies and prior theoretical developments in HCI. Each perspective develops a generative way of looking at and working with situatedness in design and research. We outline future directions, including considering technology, material and aesthetics, leveraging the perspectives for design, and methods for stronger engagement with target audiences. We conclude with opportunities to consolidate situated visualization research.


Citations (68)


... The Arizona Water Chatbot [60] employs RAG to retrieve water-related information from reputable sources, improving decision-making. Ren et al. [83] explore memory retrieval and generation refinement using RAG for enhanced conversational AI, and Yang et al. [81] develop AQuA that combines software UI elements associated with questions as the query and generates answers using RAG-powered GPT-4 from official documentation and tutorial resources. While RAG has proven effective in handling large knowledge repositories, its application in integrating dietary data has not yet been comprehensively explored. ...

Reference:

DietGlance: Dietary Monitoring and Personalized Analysis at a Glance with Knowledge-Empowered AI Assistant
AQuA: Automated Question-Answering in Software Tutorial Videos with Visual Anchors
  • Citing Conference Paper
  • May 2024

... These algorithms utilize the diffusion algorithm and transform the input into a pure noise image that gradually denoising to produce a new image. Currently, DALL-E [15], Stable Diffusion [16], [17], and Midjourney [18] provide competitive algorithms that can be used for different image synthesis applications. ...

3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
  • Citing Conference Paper
  • July 2023

... The system has the advantages of facilitating collaboration, providing inspiration, and fully exploring the design process. Moreover, the user-friendly interface, similar to search tools such as Google Images, provided a faster and more efficient research process [41]. ...

3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows

... 7.1.1. Smart glasses or watches can deliver in-situ visual or audio cues [11] during meals, helping users make informed choices in a more seamless way. Real-time audio haptic feedback [73], for example, could combine sound cues (such as a pleasant tone indicating a correct portion or a warning beep for overeating) with tactile sensations (like vibrations to alert users when a meal exceeds recommended nutrient levels). ...

Data Every Day: Designing and Living with Personal Situated Visualizations
  • Citing Conference Paper
  • April 2022

... The visual would be considered beneficial if it integrated fitness and menstrual cycle data in a way that illustrates the expected pattern [11]. Mobile-friendly was one of the criteria since devices are ingrained in everyday life and users prefer to have their data available in-hand at all times [31]. ...

Challenges in Everyday Use of Mobile Visualizations
  • Citing Chapter
  • November 2021

... This process results in a physical artefact [30] that allows cognitive understanding of data [29], making data interpretable as the physical representation engages multiple senses [23] and is experienced with the entire human body [24]. This approach has already been used to collect and modify data on activities [44], habits [15] or preferences [51], thus we believe it will allow office workers to create meaning from personal experiences [28]; to align these meanings with others [20]; and to enrich these meanings with details about the social and physical context where they are situated [5]. ...

What's the Situation with Situated Visualization? A Survey and Perspectives on Situatedness
  • Citing Article
  • September 2021

IEEE Transactions on Visualization and Computer Graphics

... Purposes expressed by the researchers were experiential (63 or 95.5%), targeting usability (30 or 45.5%) and/or embodiment (33 or 50%), attitudinal (45 or 68.2%), communal (41 or 62.1%), including specific populations (30 or 45.5%) and social factors (11 or 16.7%), and vocal, or reasons deemed of particular relevance, if not exclusive to voice UX (17 or 25.8%). This last category, which may be especially pertinent here, included: conversational styles [128], response styles [177], verbal behaviour [125], communicative signals [136], voice input/output (voice input/output) [166], disclosures [93], long-turns in conversation [33], living noise [70], source orientation [51], conversation expectations [25], emotion in conversation [83], features of dialogue [121], verbal prompts [1], synthetic versus real voices [81], conversational exchange and repair [39], and "rich" communication [3]. Table 1. ...

Machine Body Language: Expressing a Smart Speaker’s Activity with Intelligible Physical Motion
  • Citing Conference Paper
  • June 2021

... Given the increasing role of AI in work and daily life, it has become critical to address this gap through targeted training initiatives. Without proper AI-related knowledge, individuals risk becoming vulnerable to misinformation, biased algorithms, and unethical AI practices (Avdic & Vermeulen, 2020;Cave & Dihal, 2019;Porcheron et al., 2018). In order to mitigate these risks, teaching the fundamentals of AI is of paramount importance. ...

Intelligibility Issues Faced by Smart Speaker Enthusiasts in Understanding What Their Devices Do and Why
  • Citing Conference Paper
  • December 2020

... Such discrepancies can conceal important qualities of the data, such as how, when, and where they might be contingent, incorrect, inconsistent, inaccurate, incomplete, uncertain, relative, contextual, or situated (e.g., D'Ignazio & Klein, 2020; Dörk et al., 2013;Drucker, 2017;Kennedy et al., 2016;Kitchin, 2014;Kosminsky et al., 2019;Loukissas, 2019). These issues have been explored by previous studies, including those looking at sociological perspectives on visualization practices (e.g., Ricker et al., 2020;Simpson, 2020;Van Geenen & Wieringa, 2020), uncertainty visualization and the visualization of missingness (e.g., Fernstad, 2019;Kay et al., 2016;Kinkeldey et al., 2017;McCurdy et al., 2019;McNutt et al., 2020;Skeels et al., 2010;Song & Szafir, 2018), and by work that integrates local and material contexts in visualization practice (e.g., D'Ignazio, 2017;Loukissas, 2016;Offenhuber, 2019). ...

Belief at first sight: Data visualization and the rationalization of seeing

Information Design Journal