Riku Arakawa's research while affiliated with Carnegie Mellon University and other places

Publications (28)

Preprint
CatAlyst uses generative models to help workers' progress by influencing their task engagement instead of directly contributing to their task outputs. It prompts distracted workers to resume their tasks by generating a continuation of their work and presenting it as an intervention that is more context-aware than conventional (predetermined) feedba...
Article
A user often needs training and guidance while performing several daily life procedures, e.g., cooking, setting up a new appliance, or doing a COVID test. Watch-based human activity recognition (HAR) can track users' actions during these procedures. However, out of the box, state-of-the-art HAR struggles from noisy data and less-expressive actions...
Article
Full-text available
Conventional motion tutorials rely mainly on a predefined motion and vision-based feedback that normally limits the application scenario and requires professional devices. In this paper, we propose VoLearn, a cross-modal system that provides operability for user-defined motion learning. The system supports the ability to import a desired motion fro...
Preprint
Full-text available
In this paper, we discuss the potential of applying unsupervised anomaly detection in constructing AI-based interactive systems that deal with highly contextual situations, i.e., human-human communication, in collaboration with domain experts. We reached this approach of utilizing unsupervised anomaly detection through our experience of developing...
Preprint
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is v...
Conference Paper
Full-text available
We demonstrate that recent natural language processing (NLP) techniques introduce a new paradigm of vocabulary learning that benefits from both micro and usage-based learning by generating and presenting the usages of foreign words based on the learner’s context. Then, without allocating dedicated time for studying, the user can become familiarized...
Preprint
Full-text available
We present our case study that aims to help professional assessors make decisions in human assessment, in which they conduct interviews with assessees and evaluate their suitability for certain job roles. Our workshop with two industrial assessors revealed that a computational system that can extract nonverbal cues of assesses from interview videos...
Preprint
Full-text available
We propose a system displaying audience eye gaze and nod reactions for enhancing synchronous remote communication. Recently, we have had increasing opportunities to speak to others remotely. In contrast to offline situations, however, speakers often have difficulty observing audience reactions at once in remote communication, which makes them feel...
Conference Paper
Full-text available
Transcribing speech from audio files to text is an important task not only for exploring the audio content in text form but also for utilizing the transcribed data as a source to train speech models, such as automated speech recognition (ASR) models. A post-correction approach has been frequently employed to reduce the time cost of transcription wh...
Conference Paper
Full-text available
Explicitly alerting users is not always an optimal intervention, especially when they are not motivated to obey. For example, in video-based learning, learners who are distracted from the video would not follow an alert asking them to pay attention. Inspired by the concept of Mindless Computing, we propose a novel intervention approach, Mindless At...
Article
Full-text available
A growing number of people are using catch-up TV services rather than watching simultaneously with other audience members at the time of broadcast. However, computational support for such catching-up users has not been well explored. In particular, we are observing an emerging phenomenon in online media consumption experiences in which speculation...
Preprint
Full-text available
A growing number of people are using catch-up TV services rather than watching simultaneously with other audience members at the time of broadcast. However, computational support for such catching-up users has not been well explored. In particular, we are observing an emerging phenomenon in online media consumption experiences in which speculation...
Preprint
Full-text available
Explicitly alerting users is not always an optimal intervention, especially when they are not motivated to obey. For example, in video-based learning, learners who are distracted from the video would not follow an alert asking them to pay attention. Inspired by the concept of Mindless Computing, we propose a novel intervention approach, Mindless At...
Conference Paper
Full-text available
Humans are known to have a better subconscious impression of other humans when their movements are imitated in social interactions. Despite this influential phenomenon, its application in human-computer interaction is currently limited to specific areas, such as an agent mimicking the head movements of a user in virtual reality, because capturing u...
Conference Paper
Full-text available
Video-Reflection is a common approach to realize reflection in the field of executive coaching for professional development, which presents a video recording of the coaching session to a coachee in order to make the coachee reflectively think about oneself. However, it requires a great deal of time to watch the full length of the video and is highl...
Preprint
Full-text available
We demonstrate the first reinforcement-learning application for robots equipped with an event camera. Because of the considerably lower latency of the event camera, it is possible to achieve much faster control of robots compared with the existing vision-based reinforcement-learning applications using standard cameras. To handle a stream of events...
Conference Paper
Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, because the overlapping speech degrades the sense of immersion to the converted speech. We prese...
Conference Paper
Full-text available
Executive coaching has been drawing more and more attention for developing corporate managers. While conversing with managers, coach practitioners are also required to understand internal states of coachees through objective observations. In this paper, we present REsCUE, an automated system to aid coach practitioners in detecting unconscious behav...
Preprint
Full-text available
Executive coaching has been drawing more and more attention for developing corporate managers. While conversing with managers, coach practitioners are also required to understand internal states of coachees through objective observations. In this paper, we present REsCUE, an automated system to aid coach practitioners in detecting unconscious behav...
Preprint
Full-text available
Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy...

Citations

... At the same time, we have observed that several efforts are dedicated towards constructing linguistic benchmarks questioning reasoning capabilities in LMs, including mathematics [59], symbolic reasoning [58], implicit reasoning based on strategies [137], commonsense understanding [138], temporal, causal, linguistic understanding and others [139]. We argue that such attempts could assist the creation of appropriate VL datasets, which would incorporate visual and linguistic challenges, so that knowledge contribution would be more concrete. ...
... An example of the former is found in [63], where a robotic hand autonomously determines when to let go of a ball while being swung by a human arm. An example of the latter is found in [90], where a robotic arm, finger, and tentacle were controlled through shoulder motions to not interfere with manipulation tasks. Both work towards achieving free and at-will control of an augmented body with additional degrees of freedom. ...
... By contrast, several studies in HCI also design interventions that affect people's behavior less explicitly. Arakawa and Yakura [5] proposed mindless intervention to draw students' attention to online classes based on the design of Mindless Computing [1]. Since such mindless interventions are designed for situations that do not demand users' conscious awareness, they could be effective in our case to draw workers' attention to the task (e.g., writing). ...
... This is consistent with prior studies that have proposed systems of sharing visual cues with remote users to improve co-presence [35]. Co-presence has been one of the main topics in distributed communication [73], and many works have proposed various methods of forming co-presence [3,33,35,55]. These existing methods use additional devices (e.g., HMD) or require users' intentional actions. ...
... While worn systems have employed depth (e.g., WatchSense [52]) and thermal cameras (e.g., Fingertrak [23], Pyro [18], Yamato et al. [67]), by far the most common camera variety used are those operating in the visible or near infrared light range. The latter systems include Digits [29], CyclopsRing [7], Hand with Sensing Sphere [2], Back-Hand-Pose [63], and Opisthenar [68]. Range-finding sensors (optical or acoustic) are also fairly common, and utilized in systems such as ThumbTrak [55], RotoWrist [46] and WristWhirl [17]. ...
... Besides the commonly used single-point input with pens, enhanced interaction techniques have been explored. Examples include using touch input on the non-dominant hand, supporting pen input in bimanual interaction [26,50], unimodal surface-based pen-postures [7], bending [14] or using sensors in or around the pen [24,40] for gestures and postures, and examining pen-grips [61]. Our work was inspired by tilting [66] and hovering [17] the pen above interactive surfaces, which we use in a VR context. ...
... Multiple systems that intend to facilitate reflection have been designed in HCI for various application contexts [3,4,12,41]. For instance, the web-based application MoodAdaptor prompts participants to reflect on positive and negative memories depending on one's current mood [49]. ...
... A large amount of audio data, often unlabelled and of private nature, is generated everyday from cellphones, tablets, personal assistants and other IoT devices [1,2,3,4]. Being able to utilise this data to solve various speech related tasks has been of great interest to researchers for over a decade [5,6,7,8,9]. Self-supervised learning (SSL) allows the learning of representations from unlabelled data, which can later be used to solve specific downstream tasks, e.g., automatic speech recognition (ASR), speech translation, keyword spotting (KWS), and others. ...
... RCLens [LGG * 18] is an active learning system that uses visualization approaches to support the discovery of rare instances. EnsembleLens [XXM * 19] is a hybrid visual system that utilizes a modified Gaussian mixture model [AY19] to identify problematic patterns in human behaviors. RISSAD [DB21] is an interactive approach that not only assists users in detecting abnormalities but also automatically defines them using descriptive rules. ...