Tuesday 17 Oct
PS1: 3DUIs: Manipulation
11:00 AEDT (UTC+11)
Leigthton Hall
Session Chair: K. Kim
Toggle Papers
AMP-IT and WISDOM: Improving 3D Manipulation for High-Precision Tasks in Virtual Reality
Francielly Rodrigues, National Laboratory for Scientific Computing – LNCC;
Alexander Giovannelli, Virginia Tech;
Leonardo Pavanatto, Virginia Tech;
Haichao Miao, Lawrence Livermore National Laboratory;
Jauvane C. de Oliveira, National Laboratory for Scientific Computing;
Doug Bowman, Virginia Tech
Toggle Abstract
Precise 3D manipulation in virtual reality (VR) is essential for effectively aligning virtual objects. However, state-of-the-art VR manipulation techniques have limitations when high levels of precision are required, including the unnaturalness caused by scaled rotations and the increase in time due to degree-of-freedom (DoF) separation in complex tasks. We designed two novel techniques to address these issues: AMP-IT, which offers direct manipulation with an adaptive scaled mapping for implicit DoF separation, and WISDOM, which offers a combination of Simple Virtual Hand and scaled indirect manipulation with explicit DoF separation. We compared these two techniques against baseline and state-of-the-art manipulation techniques in a controlled experiment. Results indicate that WISDOM and AMP-IT have significant advantages over best-practice techniques regarding task performance, usability, and user preference.
Comparative Analysis of Artefact Interaction and Manipulation Techniques in VR Museums: A Study of Performance and User Experience
Yifan Wang, School of Advanced Technology;
Yue Li, Xi’an Jiaotong-Liverpool University;
Hai-Ning Liang, Xi’an Jiaotong-Liverpool University
Toggle Abstract
For museums in Virtual Reality (VR), various interaction and manipulation techniques could be employed for users to engage with artefact interactions. This study examined four combinations of interaction (controller-based and hand-tracking) and manipulation (direct and indirect) techniques, assessing user performance and experience with these interaction techniques in a virtual museum environment. We conducted a within-subjects experiment and asked participants to perform a series of transform manipulation tasks using the four techniques. Participants’ task completion time was measured. They also provided feedback on acceptance, learnability, presence, sickness, and fatigue, and gave an overall ranking through post-experiment questionnaires and interviews. The results revealed that controller-based direct manipulation outperformed the other techniques in terms of task performance and user experience, with hand-tracking indirect manipulation being the least efficient and the least preferred option. The study offers insights for future research and development in refining interaction and manipulation techniques and designing more user-friendly VR museum experiences.
AR Guidance Design for Line Tracing Speed Control
Jeroen Ceyssens, Hasselt University – tUL – Flanders Make;
Bram van Deurzen, Expertise Centre for Digital Media, Hasselt University – tUL – Flanders Make;
Gustavo Alberto Rovelo Ruiz, Hasselt University;
Kris Luyten, Hasselt University – tUL – Flanders Make;
Fabian Di Fiore, Hasselt University
Toggle Abstract
In many jobs, workers execute precise line tracing tasks; welding, spray painting, or chiseling, for example. Training and support for such tasks can be done using VR and AR. However, to enable workers to achieve the required precision in movement and timing, the effect of visual guidance on continuous movement needs to be explored. In VR environments, we want to ensure people are trained so that the obtained skill is transferable to a real-world context, whereas, in AR, we want to ensure an ongoing task can be completed successfully when adding visual guidance. To simulate these various contexts, we employ a VR environment to investigate the effectiveness of different visualizations for motion-based guidance in a line tracing task. We tested five different visualizations, including faster and slower arrows on the pen, the same arrows on the line, a dynamic graph on the pen or line, and a ghost object to follow. Each visualization was tested with the same set of five lines of different target speeds (2 cm/s to 10 cm/s in steps of 2 cm/s) with a training line of 5 cm/s. Our results show that the example ghost on the line turns out to be the most efficient visualization for allowing users to achieve a specific speed. Users also perceived this visualization as the most engaging and easy to use. These findings have significant implications for the development of AR-based guidance systems, specifically in the realm of speed control, across diverse domains such as industrial applications, training, and entertainment.
Merging Camera and Object Haptic Motion Effects for Improved 4D Experiences
Jaejun Park, Pohang University of Science and Technology;
Sangyoon Han, Pohang University of Science and Technology (POSTECH);
Seungmoon Choi, Pohang University of Science and Technology (POSTECH)
Toggle Abstract
(Haptic) motion effects refer to the vestibular stimuli generated by a motion platform and delivered to the whole body of a user sitting on the platform. Motion effects are an essential tool for creating vivid sensory experiences in various extended reality (XR) applications, ranging from training simulators to recent 4D rides, films, and games for entertainment. For the latter purpose, motion effects emphasize audiovisual events occurring in the scene, such as camera motion, the movement of an object of interest, and special sounds. Recent research developed several algorithms to produce motion effects from the audiovisual stream automatically. However, these algorithms are designed for a single class of motion effects, and extension to multiple motion effect classes remains unexplored. In this paper, we propose an algorithmic framework that merges camera and object motion effects into one motion effect while preserving the perceptual consequences of the two effects. We validate the framework’s perceptual performance through a user study. To our knowledge, this work is one of the first successful reports of merging different kinds of motion effects for improved XR experiences.
Exploring Horizontally Flipped Interaction in Virtual Reality for Improving Spatial Ability
Lal “Lila” Bozgeyikli, University of Arizona;
Evren Bozgeyikli, University of Arizona;
Christopher Schnell, University of Arizona;
Jack A Clark, University of Arizona
Toggle Abstract
Virtual reality (VR) is a high-fidelity medium that can offer experiences that are close to real-life. Spatial ability plays an important role in human life, including academic achievement and advancement in work settings. Spatial ability is known to be improved by practicing relevant tasks. Mental rotation and spatial perception are among such tasks that improve spatial skills. In this research, we investigated a “mirror-reversed” interaction technique in a cup stacking task in VR and looked into its effects on spatial ability, brain activity regarding spatial processing and attention (measured with EEG), performance, and user experience in male participants. Participants stacked cups according to given patterns using direct manipulation with horizontally flipped controls, similar to looking in a mirror while performing object manipulation in real life. In a between-subjects user study, we compared this novel interaction with a baseline where the participants completed the same task with regular controls. Although there was no significant main effect of group on the mental rotation and perspective taking/spatial orientation tests scores, within-group analysis indicated a trend toward an improvement in the mirror-reversed group in spatial orientation, while both groups showed a trend toward improvement in mental rotation. Participants in both groups got better at the task over time (their task completion durations decreased). EEG data revealed significant theta band power increase in the mirror-reversed group whereas there was no difference in the alpha band power between the two groups. Our results are encouraging for exploring spatially challenging interactions in VR for spatial skills training. We share the implementation and user study results, and discuss the implications.
Identifying Virtual Reality Users Across Domain-Specific Tasks: A Systematic Investigation of Tracked Features for Assembly
Alec G Moore, University of Central Florida;
Tiffany D. Do, University of Central Florida;
Nicholas Ruozzi, University of Texas in Dallas;
Ryan P. McMahan, University of Central Florida
Toggle Abstract
Recently, there has been much interest in using virtual reality (VR) tracking data to authenticate or identify users. Most prior research has relied on task-specific characteristics but newer studies have begun investigating task-agnostic, domain-specific approaches. In this paper, we present one of the first systematic investigations of how different combinations of VR tracked devices (i.e., the headset, dominant hand controller, and non-dominant hand controller) and their spatial representations (i.e., position and/or rotation as Euler angles, quaternions, or 6D) affect identification accuracy for domain-specific approaches. We conducted a user study (n = 45) involving participants learning how to assemble two distinct full-scale constructions. Our results indicate that more tracked devices improve identification accuracies for the same assembly task, but only headset features afford the best accuracies across the domain-specific tasks. Our results also indicate that spatial features involving position and any rotation yield better accuracies than either alone.
Leap to the eye: Implicit Gaze-based Interaction to Reveal Invisible Objects for Virtual Environment Exploration
Yang-Sheng Chen, National Taiwan University;
Chiao-En Hsieh, National Taiwan University;
Miguel Ying Jie Then, National Taiwan University;
Ping-Hsuan Han, National Taipei University of Technology;
Yi-Ping Hung, National Taiwan University
Toggle Abstract
Cinematic virtual reality (CVR) brings viewers a novel and immersive movie-watching experience. However, they may miss story events and scene transitions that the director has designed as key points hidden in the VR scene. In this paper, we introduce implicit gaze-based interaction for enhancing the exploration experience in CVR. In contrast to most research on gaze-based selection of objects or explicit guidance of attention using visual cues, we focus on implicit interaction that utilizes the user’s natural gaze and attention to explore the scene. We design and implement different gaze trigger methods for implicit interaction, making the interaction more intuitive and natural when users reveal the invisible objects. We implemented the adaptive collider technique, which offers users a better sense of exploration compared to raycasting and spotlight techniques. We have also conducted user studies to compare animation sequences for visual feedback, with each animation sequence offering different storytelling techniques. One of the sequences is better suited for describing spaces in the virtual world, while the other sequence offers users the feeling of constructing a world through their gaze.
PS2: Perception 1
11:00 AEDT (UTC+11)
Ritchi Theatre
Session Chair: V. Interrante
Toggle Papers
An Exploration of the Effect of Head-Centric Rest Frames On Egocentric Distance Judgments in VR
Yahya Hmaiti, University of Central Florida;
Mykola Maslych, University of Central Florida;
Eugene Matthew Taranta II, University of Central Florida;
Joseph LaViola, University of Central Florida
Toggle Abstract
Users tend to underestimate distances in virtual reality (VR), and several efforts have been directed toward finding the causes and developing tools that mitigate this phenomenon. One hypothesis that stands out in the field of spatial perception is the rest frame hypothesis (RFH), which states that visual frames of reference (RFs), defined as fixed reference points of view in a virtual environment (VE), contribute to minimizing sensory mismatch. RFs have been shown to promote better eye-gaze stability and focus, reduce VR sickness, and improve visual search, along with other benefits. However, their effect on distance perception in VEs has not been evaluated. In this paper, we use a blind walking task to explore the effect of three head-centric RFs (mesh mask, nose, and hat) on egocentric distance estimation. We found that at near and mid-field distances, certain RFs can improve the user’s distance estimation accuracy and reduce distance underestimation. These findings mean that the addition of head-centric RFs, a simple avatar augmentation method, can lead to meaningful improvements in distance judgments, user experience, and task performance in VR.
The Impact of Occlusion on Depth Perception at Arm’s Length
Marc J Fischer, Stanford University;
Jarrett Rosenberg, Stanford University;
Christoph Leuze, Stanford University;
Brian Hargreaves, Stanford University;
Bruce Daniel, Stanford University
Toggle Abstract
This paper investigates the accuracy of Augmented Reality (AR) technologies, particularly commercially available optical see-through displays, in depicting virtual content inside the human body for surgical planning. Their inherent limitations result in inaccuracies in perceived object positioning. We examine how occlusion, specifically with opaque surfaces, affects perceived depth of virtual objects at arm’s length working distances.
A custom apparatus with a half-silvered mirror was developed, providing accurate depth cues excluding occlusion, differing from commercial displays. We carried out a study, contrasting our apparatus with a HoloLens 2, involving a depth estimation task under varied surface complexities and illuminations. In addition, we explored the effects of creating a virtual “hole” in the surface. Subjects’ depth estimation accuracy and confidence were assessed.
Results showed more depth estimation variation with HoloLens and significant depth error beneath complex occluding surfaces. However, creating a virtual hole significantly reduced depth errors and increased subjects’ confidence, irrespective of accuracy enhancement. These findings have important implications for the design and use of mixed-reality technologies in surgical applications, and industrial applications such as using virtual content to guide maintenance or repair of components hidden beneath the opaque outer surface of equipment.
HEADSET: Human Emotion Awareness under Partial Occlusions Multimodal DataSET
Fatemeh Ghorbani Lohesara, TU Berlin;
Davi Rabbouni Freitas, INRIA;
Christine Guillemot, INRIA;
Karen Eguiazarian, Tampere University;
Sebastian Knorr, Ernst-Abbe University of Applied Sciences
Toggle Abstract
The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simultaneously. Finally, we also provide an evaluation of our dataset employment with regard to the tasks of facial expression classification, HMDs removal, and point cloud reconstruction. The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video. HEADSET and its all associated raw data and license agreement will be publicly available for research purposes.
A Systematic Evaluation of Incongruencies and Their Influence on Plausibility in Virtual Reality
Larissa Brübach, University of Würzburg;
Franziska Westermeier, Human-Computer Interaction Group, University of Würzburg;
Carolin Wienrich, University of Würzburg;
Marc Erich Latoschik, University of Würzburg
Toggle Abstract
Currently, there is an ongoing debate about the influencing factors of one’s extended reality (XR) experience. Plausibility, congruence, and their role have recently gained more and more attention. One of the latest models to describe XR experiences, the Congruence and Plausibility model (CaP), puts plausibility and congruence right in the center. However, it is unclear what influence they have on the overall XR experience and what influences our perceived plausibility rating. In this paper, we implemented four different incongruencies within a virtual reality scene using breaks in plausibility as an analogy to breaks in presence. These manipulations were either located on the cognitive or perceptual layer of the CaP model. They were also either connected to the task at hand or not. We tested these manipulations in a virtual bowling environment to see which influence they had. Our results show that manipulations connected to the task caused a lower perceived plausibility. Additionally, cognitive manipulations seem to have a larger influence than perceptual manipulations. We were able to cause a break in plausibility with one of our incongruencies. These results show a first direction on how the influence of plausibility in XR can be systematically investigated in the future.
Comparative Analysis of Change Blindness in Virtual Reality and Augmented Reality Environments
DongHoon Kim, Utah State University;
Dongyun Han, Utah State University;
Isaac Cho, Utah State University
Toggle Abstract
Change blindness is a phenomenon where an individual fails to notice alterations in a visual scene when a change occurs during a brief interruption or distraction. Understanding this phenomenon is specifically important for the technique that uses a visual stimulus, such as Virtual Reality (VR) or Augmented Reality (AR). Previous research had primarily focused on 2D environments or conducted limited controlled experiments in 3D immersive environments.
In this paper, we design and conduct two formal user experiments to investigate the effects of different visual attention-disrupting conditions (Flickering and Head-Turning) and object alternative conditions (Removal, Color Alteration, and Size Alteration) on change blindness detection in VR and AR environments.
Our results reveal that participants detected changes more quickly and had a higher detection rate with Flickering compared to Head-Turning. Furthermore, they spent less time detecting changes when an object disappeared compared to changes in color or size. Additionally, we provide a comparison of the results between VR and AR environments.
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation
Qi Feng, Waseda University;
Hubert P. H. Shum, Durham University;
Shigeo Morishima, Waseda Research Institute for Science and Engineering
Toggle Abstract
Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of altering the perspective after finishing the capture process. To explore this influence, we first propose a pilot study that captures real environments with multiple eye heights and asks participants to judge the egocentric distances and immersion. If a significant influence is confirmed, an effective image-based approach to adapt pre-captured real-world environments to the user’s eye height would be desirable. Motivated by the study, we propose a learning-based approach for synthesizing novel views for omnidirectional images with altered eye heights. This approach employs a multitask architecture that learns depth and semantic segmentation in two formats, and generates high-quality depth and semantic segmentation to facilitate the inpainting stage. With the improved omnidirectional-aware layered depth image, our approach synthesizes natural and realistic visuals for eye height adaptation. Quantitative and qualitative evaluation shows favorable results against state-of-the-art methods, and an extensive user study verifies improved perception and immersion for pre-captured real-world environments.
DeepMetricEye: Metric Depth Estimation in Periocular VR Imagery
Yitong Sun, Royal College of Art;
Zijian Zhou, School of Informatics;
Cyriel Diels, Royal College of Art;
Ali Asadipour, Royal College of Art
Toggle Abstract
Despite the enhanced realism and immersion provided by VR headsets, users frequently encounter adverse effects such as digital eye strain (DES), dry eye, and potential long-term visual impairment due to excessive eye stimulation from VR displays and pressure from the mask. Recent VR headsets are increasingly equipped with eye-oriented monocular cameras to segment ocular feature maps. Yet, to compute the incident light stimulus and observe periocular condition alterations, it is imperative to transform these relative measurements into metric dimensions. To bridge this gap, we propose a lightweight framework derived from the U-Net 3+ deep learning backbone that we re-optimised, to estimate measurable periocular depth maps. Compatible with any VR headset equipped with an eye-oriented monocular camera, our method reconstructs three-dimensional periocular regions, providing a metric basis for related light stimulus calculation protocols and medical guidelines. Navigating the complexities of data collection, we introduce a Dynamic Periocular Data Generation (DPDG) environment based on UE MetaHuman, which synthesises thousands of training images from a small quantity of human facial scan data. Evaluated on a sample of 36 participants, our method exhibited notable efficacy in the periocular global precision evaluation experiment, and the pupil diameter measurement.
PS3: Interactions with virtual agents
11:00 AEDT (UTC+11)
CivEng 109
Session Chair: R. Skarbez
Toggle Papers
Supporting Co-Presence in Populated Virtual Environments by Actor Takeover of Animated Characters
Jingyi Zhang, University College London;
Klara Brandstätter, University College London;
Anthony Steed, University College London
Toggle Abstract
Online social virtual worlds are now becoming widely available on consumer devices including virtual reality headsets. One goal of a virtual world could be to give a user an experience of a crowded environment with many virtual humans. However, gathering enough personnel to control the necessary number of avatars for creating a realistic scene is usually difficult. Additionally, current technology is not capable of fully simulating avatars with behaviours, especially when interaction with users is required. In this paper, we develop a system that enables an actor to take over control of one of a set of avatars. We built an immersive interface that allows an actor to select an avatar to take over and then segue into the currently playing animation. By allowing one person to take control of multiple avatars, we can enhance the plausibility of environments inhabited by simulated characters. In an experiment, we show that in a cafe scenario, one actor can take over the roles of a barista and two customers. Experiment participants reported experiencing the scene as if it were populated by more than one actor. This system and experiment demonstrate the feasibility of one actor controlling multiple avatars sequentially, thus enhancing users’ feelings of being in a populated environment.
Free-form Conversation with Human and Symbolic Avatars in Mixed Reality
Jiarui Zhu, University of California, Santa Barbara;
Radha Kumaran, University of California, Santa Barbara;
Chengyuan Xu, University of California, Santa Barbara;
Tobias Höllerer, University of California, Santa Barbara
Toggle Abstract
The integration of large language models and mixed reality technologies has enabled users to engage in free-form conversations with virtual agents across different “realities”. However, if and how the agent’s visual representation, especially when combined with mixed reality environments, will affect the conversation content or user experience is not yet fully understood. In this work, we design and conduct a user study involving two types of visual representations (a human avatar and a symbolic avatar) and two mixed reality environments (virtual reality and augmented reality), facilitating a free-form conversation experience with GPT-3 powered agents. We found evidence that the use of virtual or augmented realities can influence conversation content. Users chatting with avatars in virtual reality made significantly more references to the location or the space, suggesting they tended to perceive conversations as occurring in the agent’s space, whereas the physical AR environment was perhaps more perceived as the user’s space. Conversations with the human avatar improve user recall of the conversation, even though there is no evidence of increased information extracted during the conversation. These observations and our analysis of post-study questionnaires suggest that human avatars can positively impact user memory and experience. We hope our findings and the open-source implementation will help facilitate future research on free-form conversational agents in mixed reality.
Effects of Interaction with Virtual Pets on Self-Disclosure in Mixed Reality
Seoyeon Lim, Sookmyung Women’s University;
Suh-Yeon Dong, Sookmyung Women’s University
Toggle Abstract
Self-disclosure involves revealing information about oneself to others and is critical in relationship formation to develop trust and understanding, leading to emotional intimacy. In psychotherapy, inducing self-disclosure is critical for a therapist to clearly understand clients and suggest relevant solutions. Companion animals have been known to increase human self-disclosure; hence, we hypothesized that virtual animals could have the same effect, which can be strengthened through interaction. To verify this hypothesis, we implemented a mixed-reality-based interaction between humans and virtual cats through Unity. Participants could interact with the virtual cat using hand gestures and voice commands. Psychological states related to self-disclosure were evaluated using questionnaires after the interaction. Furthermore, participants’ responses to the virtual cats were compared with their responses to non-interactive virtual contents. Participants exhibited higher willingness for self-disclosure with virtual cats compared with virtual humans. Interacting with virtual cats was also found to encourage more self-disclosure than interactions with non-interactive states content. Therefore, virtual animals can induce self-disclosure and can be used in psychotherapy. Our findings demonstrate that virtual animals can be used to provide solutions to mental health problems and can be widely applied in the field of psychotherapy.
Effects of Reward Schedule and Avatar Visibility on Joint Agency during VR Collaboration
Seung Un Lee, KAIST;
Jinwook Kim, KAIST;
Jeongmi Lee, KAIST
Toggle Abstract
Joint agency, a group-level sense of agency, has been studied as an essential social cognitive element while engaging in collaborative tasks. The joint agency has been actively investigated in diverse contexts (e.g., performance, reward schedules, and predictability), yet the studies were mostly conducted in traditional 2D computer environments. Since virtual reality (VR) is an emerging technology for remote collaboration, we aimed to probe the effects of traditional reward schedule factors along with novel VR features (i.e., avatar visibility) on joint agency during remote collaboration. In this study, we implemented an experiment based on a card-matching game to test the effects of the reward schedule (fair or equal) and the counterpart’s avatar hand visibility (absent or present) on the sense of joint agency. The results showed that participants felt a higher sense of joint agency when the reward was distributed equally regardless of the individual performance and when the counterpart’s avatar hand was present. Moreover, the effects of reward schedule and avatar hand visibility interacted, with a bigger amount of deficit for the absent avatar hand when the reward was distributed differentially according to performance. Interestingly, the sense of joint agency was strongly correlated to the level of collaborative performance, as well as to perceptions of other social cognitive factors, including cooperativeness, reward fairness, and social presence. These results contribute to the understanding of joint agency perceptions during VR collaboration and provide design guidelines for remote collaborative tasks and environments for users’ optimal social experience and performance.
Effects of Speed of a Collocated Virtual Walker and Proximity Toward a Static Virtual Character on Avoidance Movement Behavior
Michael Nelson, Purdue;
Alexandros Fampio Koilias, University of Peloponnese;
Dominic Kao, Purdue University;
Christos Mousas, Purdue University
Toggle Abstract
We explored the avoidance movement behaviors of study participants immersed in a virtual reality environment. We placed a static virtual character at the midpoint between the start and target spot for the avoidance task, and a virtual walker character in front of the starting spot and scripted it to reach the target spot. Participants were placed behind the virtual walker in order to measure its influence on participants’ behavior. We developed nine experimental conditions assigned to the virtual walker character by following a 3 (speed: slow vs. normal vs. fast walking speed) x 3 (proximity: close vs. middle vs. far proximity to the static virtual character) study design. For this within-group study, we collected data from 22 study participants to explore how speed and proximity walking patterns assigned to a virtual walker character could impact participants’ avoidance movement behaviors and decisions. Our data revealed that 1) the speed factor impacted the participants’ avoidance movement behavior; 2) the proximity factor did not significantly impact the participants’ avoidance movement behavior; 3) the virtual walker character did not significantly impact participants’ avoidance decisions regarding the static virtual character; 4) in all examined conditions, the side-by-side distances between the participants and the static virtual character were inside the social space according to the proxemics model; and 5) in conditions in which a slow virtual walker character was present or in the condition of normal speed and far proximity, we observed an increased number of participants pass the virtual walker character.
Perception and Proxemics with Virtual Humans on Transparent Display Installations in Augmented Reality
Juanita Benjamin, University of Central Florida;
Gerd Bruder, University of Central Florida;
Carsten Neumann, University of Central Florida;
Dirk Reiners, University of Arkansas at Little Rock;
Carolina Cruz-Neira, University of Central Florida;
Greg Welch, University of Central Florida
Toggle Abstract
It is not uncommon for science fiction movies to portray futuristic user interfaces that can only be realized decades later with state-of-the-art technology. In this work, we present a prototypical augmented reality (AR) installation that was inspired by the movie The Time Machine (2002). It consists of a transparent screen that acts as a window through which users can see the stereoscopic projection of a three-dimensional virtual human (VH). However, there are some key differences between the vision of this technology and the way VHs on these displays are actually perceived. In particular, the additive light model of these displays causes darker VHs to appear more transparent, while light in the physical environment further increases transparency, which may affect the way VHs are perceived, to what degree they are trusted, and the distances one maintains from them in a spatial setting. In this paper, we present a user study in which we investigate how transparency in the scope of transparent AR screens affects the perception of a VH’s appearance, social presence with the VH, and the social space around users as defined by proxemics theory. Our results indicate that appearances are comparatively robust to transparency, while social presence improves in darker physical environments, and proxemic distances to the VH largely depend on one’s distance from the screen but are not noticeably affected by transparency. Overall, our results suggest that such transparent AR screens can be an effective technology for facilitating social interactions between users and VHs in a shared physical space.
Now I Wanna Be a Dog: Exploring the Impact of Audio and Tactile Feedback on Animal Embodiment
Mauricio Flores Vargas, Trinity College Dublin;
Rebecca Fribourg, Nantes Université, ENSA Nantes, École Centrale Nantes, CNRS, AAU-CRENAU;
Enda Bates, Trinity College Dublin;
Rachel McDonnell, Trinity College Dublin
Toggle Abstract
Embodying a virtual creature or animal in Virtual Reality (VR) is becoming common, and can have numerous beneficial impacts. For instance, it can help actors improve their performance of a computer-generated creature, or it can endow the user with empathy towards threatened animal species. However, users must feel a sense of embodiment towards their virtual representation, commonly achieved by providing congruent sensory feedback. Providing effective visuo-motor feedback in dysmorphic bodies can be challenging due to human-animal morphology differences. Thus, the purpose of this study was to experiment with the inclusion of audio and audio-tactile feedback to begin unveiling their influence towards animal avatar embodiment. Two experiments were conducted to examine the effects of different sensory feedback on participants’ embodiment in a dog avatar in an Immersive Virtual Environment (IVE). The first experiment (n=24) included audio, tactile, audio-tactile, and baseline conditions. The second experiment (n=34) involved audio and baseline conditions only.
PS4: Medical Applications
13:45 AEDT (UTC+11)
Leigthton Hall
Session Chair: S. Razzaque
Toggle Papers
Investigating the Effects of Selective Information Presentation in Intensive Care Units Using Virtual Reality
Luisa Theelke, Friedrich-Alexander-Universität Erlangen-Nürnberg;
Fynn-Lennardt Metzler, Friedrich-Alexander-Universität Erlangen-Nürnberg;
Julian Kreimeier, Friedrich-Alexander-Universität Erlangen-Nürnberg;
Christopher Hauer, Friedrich-Alexander-Universität Erlangen-Nürnberg;
Johannes Binder, Universitätsklinikum Erlangen;
Daniel Roth, Technical University of Munich
Toggle Abstract
Medical personnel working in intensive care units (ICUs) are continuously exposed to a multitude of alarms emanating from various monitoring devices, such as cardiac monitors, ventilators, or infusion pumps. The sheer volume of alarms, coupled with high false positive rates, can lead to alarm fatigue. This phenomenon compromises patient safety and places an additional burden on nurses who must diligently prioritize and respond to alarms in the highly dynamic environment. While the testing of stress-reducing strategies in a real ICU is challenging, virtual reality (VR) represents a powerful tool and methodology to simulate an ICU environment and test optimization scenarios for alarm display strategies. For example, redistributing alarms to responsible individuals (personalized information presentation) has been proposed as a solution, but testing in real ICU environments is not applicable due to critical patient safety. In this paper, we present a VR simulation of an ICU to simulate comparable stress situations, as well as to assess the impact of a selective and personalized alarm representation strategy in an evaluation study in two conditions. A stress condition mirrors the current ubiquitous audible alarm distribution in most ICUs, where alarms are heard non-patient-specific throughout the ward. In an experimental condition, alarms are filtered patient-specific to reduce information overload and noise pollution. Our user study with medical personnel and novices shows that stress levels can be simulated with our system as indicated by physiological responses. Further, we show that the perceived task load can be reduced with selective information presentation. We discuss the potential benefits of ICU simulations as a methodology and personalized alarm distribution as a first potential strategy for future technologies in ICUs.
Is this the vReal Life? Manipulating Visual Fidelity of Immersive Environments for Medical Task Simulation
Danny Schott, Otto-von-Guericke University;
Florian Heinrich, Julius-Maximilians-Universität Würzburg;
Lara Stallmeister, University of Magdeburg;
Julia Moritz, USE-Ing. GmbH;
Bennet Hensen, Hanover Medical School;
Christian Hansen, Faculty of Computer Science
Toggle Abstract
Recent developments and research advances contribute to an ever-increasing trend towards quality levels close to what we experience in reality. In this work, we investigate how different degrees of these quality characteristics affect user performance, qualia of user experience (UX), and sense of presence in an example medical task. To this end, a two-way within-subjects design user study was conducted, in which three different levels of visual fidelity were compared. In addition, two different interaction modalities were considered: (1) the use of conventional VR controllers and (2) natural hand interaction using 3D-printed, spatially-registered replicas of medical devices, to interact with their virtual representations. Consistent results indicate that higher degrees of visual fidelity evoke a higher sense of presence and UX. However, user performance was less affected. Moreover, no differences were detected between both interaction modalities for the examined task. Future work should investigate the discovered interaction effects between quality levels and interaction modalities in more detail and examine whether these results can be reproduced in tasks that require more precision. This work provides insights into the implications to consider when studying interactions in VR and paves the way for investigations into early phases of medical product development and workflow analysis.
Mixed Reality 3D Teleconsultation for Emergency Decompressive Craniotomy: An Evaluation with Medical Residents
Kevin Yu, Technical University of Munich;
Daniel Roth, Friedrich-Alexander Universität Erlangen-Nürnberg;
Robin Strak, m3i GmbH;
Frieder Pankratz, LMU;
Julia Schrader-Reichling, Klinikum der Universität München;
Clemens Kraetsch, Friedrich-Alexander-University;
Simon Weidert, LMU Munich ;
Marc Lazarovici, Institut für Notfallmedizin;
Nassir Navab, Technische Universität München;
Ulrich Eck, Technische Universitaet Muenchen
Toggle Abstract
Enabling collaborative telepresence in healthcare, especially surgical procedures, presents a critical challenge. The decompressive craniotomy procedure stands out as particularly complex and time-sensitive. The current teleconsultation approach relies on 2D color cameras, often offering only a fixed view and limited visual capabilities between experts and surgeons. However, teleconsultation can be addressed with Mixed Reality and immersive technology to potentially enable a better consultation of the procedure. We conducted an extensive user study focusing on decompressive craniotomy to investigate the advantages and challenges of our 3D teleconsultation system compared to a 2D video-based consultation system. Our 3D teleconsultation system leverages real-time 3D reconstruction of the patient and environment to empower experts to provide guidance and create virtual 3D annotations. The study utilized 3D-printed head models to perform a lifelike surgical intervention. It involved 14 medical residents and demonstrated an in-vitro 17% improvement in accurately describing the incision size on the patient’s head, contributing to potentially improved patient outcomes.
AR-Based Educational Software for Nonspeaking Autistic People – A Feasibility Study
Ali Shahidi, University of Calgary;
Lorans Alabood, University of Calgary;
Kate M. Kaufman, University of Virginia;
Vikram K. Jaswal, University of Virginia;
Diwakar Krishnamurthy, University of Calgary;
Mea Wang, University of Calgary
Toggle Abstract
Approximately one-third of individuals with autism are nonspeaking: They cannot communicate effectively using speech. Some traditional accounts suggest that these individuals cannot talk because they lack the symbolic capacity for language. And yet, recent studies have shown that these individuals’ cognitive abilities are vastly underestimated by standardized tests, and that difficulties with motor skills and movement contribute to their difficulty with speech. One consequence of the traditional accounts of nonspeaking autism is that life skills (rather than academic content) tend to be emphasized in schooling. Without access to meaningful academic content, their educational and vocational opportunities are significantly limited. Recent studies have proposed the use of head-mounted Augmented Reality (AR) applications as a means of providing engaging, customizable, and age-appropriate content to this population. Specifically, such applications can address the unique sensory and motor needs of nonspeaking autistic students, e.g., allow them to move freely around the room as they interact with lessons in the application. This paper describes the design and evaluation of the first AR application aimed to facilitate tailored educational experiences for nonspeaking autistic students. After extensive consultations with nonspeaking people, parents, and professionals, we developed our application to run on HoloLens 2 offering lessons and multiple-choice comprehension and spelling questions. We conducted a study involving five nonspeaking autistic participants and two specialized educators. Through a design critique process and an iterative design refinement approach, we show that most of our participants successfully interacted with the application and completed different types of lesson tasks. Based on quantitative data from the study sessions and qualitative feedback from participants and educators, we provide recommendations for UI and UX design that will promote the development and use of such software for this under-served and under-researched population.
LiVRSono – Virtual Reality Training with Haptics for Intraoperative Ultrasound
Mareen Allgaier, Otto-von-Guericke University;
Florentine Huettl, University Medicine of the Johannes Gutenberg-University;
Laura Isabel Hanke, University Medicine of the Johannes Gutenberg-University;
Hauke Lang, University Medicine of the Johannes Gutenberg-University;
Tobias Huber, University Medicine of the Johannes Gutenberg-University;
Bernhard Preim, Otto-von-Guericke University;
Sylvia Saalfeld, Otto von Guericke University;
Christian Hansen, Faculty of Computer Science
Toggle Abstract
One of the biggest challenges in using ultrasound (US) is learning to create a spatial mental model of the interior of the scanned object based on the US image and the probe position. As intraoperative ultrasound (IOUS) cannot be easily trained on patients, we present LiVRSono, an immersive VR application to train this skill. The immersive environment, including an US simulation with patient-specific data as well as haptics to support hand-eye coordination, provides a realistic setting. Four clinically relevant training scenarios were identified based on the described learning goal and the workflow of IOUS for liver. The realism of the setting and the training scenarios were evaluated with eleven physicians, of which six participants are experts in IOUS for liver and five participants are potential users of the training system. The setting, handling of the US probe, and US image were considered realistic enough for the learning goal. Regarding the haptic feedback, a limitation is the restricted workspace of the input device. Three of the four training scenarios were rated as meaningful and effective. A pilot study regarding learning outcome shows positive results, especially with respect to confidence and perceived competence. Besides the drawbacks of the input device, our training system provides a realistic learning environment with meaningful scenarios to train the creation of a mental 3D model when performing IOUS. We also identified important improvements to the training scenarios to further enhance the training experience.
Multi-Focus Querying of the Human Genome Information on Desktop and in Virtual Reality: an Evaluation
Gunnar William Reiske, Virginia Polytechnic Institute and State University;
Sungwon In, Virginia Polytechnic Institute and State University;
Yalong Yang, Georgia Institute of Technology
Toggle Abstract
The human genome is incredibly information-rich, consisting of approximately 25,000 protein-coding genes spread out over 3.2 billion nucleotide base pairs contained within 24 unique chromosomes. The genome is critically important in maintaining spatial context, which assists in understanding gene interactions and relationships. However, existing methods of genome visualization that utilize spatial awareness are inefficient and prone to limitations in presenting gene information and spatial context. This study proposed an innovative approach to genome visualization and exploration utilizing virtual reality. To determine the optimal placement of gene information and evaluate its essentiality in a VR environment, we implemented and conducted a user study with three different interaction methods. Two interaction methods were developed in virtual reality to determine if gene information is better suited to be embedded within the chromosome ideogram or separate from the ideogram. The final ideogram interaction method was performed on a desktop and served as a benchmark to evaluate the potential benefits associated with the use of VR. Our study findings reveal a preference for VR, despite longer task completion times. In addition, the placement of gene information within the visualization had a notable impact on the ability of a user to complete tasks. Specifically, gene information embedded within the chromosome ideogram was better suited for single target identification and summarization tasks, while separating gene information from the ideogram better supported region comparison tasks.
PS5: Multisensory and Multimodal Interaction 1
13:45 AEDT (UTC+11)
Ritchi Theatre
Session Chair: S. Choi
Toggle Papers
Auditory, Vibrotactile, or Visual? Investigating the Effective Feedback Modalities to Improve Standing Balance in Immersive Virtual Reality for People with Balance Impairments Due to Type 2 Diabetes
M. Rasel Mahmud, The University of Texas At San Antonio;
Alberto Cordova, University of Texas – San Antonio;
John Quarles, University of Texas at San Antonio
Toggle Abstract
Immersive Virtual Reality (VR) users often experience difficulties with maintaining their balance. This issue poses a significant challenge to the widespread usability and accessibility of VR, particularly for individuals with balance impairments. Previous studies have confirmed the existence of balance problems in VR, but little attention has been given to addressing them. To investigate the impact of different feedback modalities (auditory, vibrotactile, and visual) on balance in immersive VR, we conducted a study with 50 participants, consisting of 25 individuals with balance impairments due to type 2 diabetes and 25 without balance impairments. Participants were asked to perform standing reach and grasp tasks. Our findings indicated that auditory and vibrotactile techniques improved balance significantly (p < .001) in immersive VR for participants with and without balance impairments, while visual techniques only improved balance significantly for participants with balance impairments. Also, auditory and vibrotactile feedback techniques improved balance significantly more than visual techniques. Spatial auditory feedback outperformed other conditions significantly for all people. This study presents implementations and comparisons of potential strategies that can be implemented in future VR environments to enhance standing balance and promote the broader adoption of VR.
Fabric Thermal Display using Ultrasonic Waves
Haokun Wang, University of Texas at Dallas;
Yatharth Singhal, University of Texas at Dallas;
Jin Ryong Kim, University of Texas at Dallas
Toggle Abstract
This paper presents a fabric-based thermal display of a polyester fabric material combined with thermally-conductive materials using an ultrasound haptic display. We first empirically test the thermal generation process in five fabric materials by applying 40 kHz ultrasonic waves to the fabric materials. We also examine their thermal characteristics by applying different frequencies and amplitudes of ultrasonic cues. We show that polyester demonstrates the best thermal performance. We then combine it with thermally-conductive materials, including copper and aluminum, and compare them with the fabric-only condition. Two user studies show that our approach of combining a fabric material with copper and aluminum outperforms fabric-only conditions in thermal perception and thermal level identification. We integrate polyester with aluminum into a glove to explore the use cases in VR and share our findings, insights, limitations, and future works.
SmartSpring: a Low-Cost Wearable Haptic VR Display with Controllable Passive Feedback
Hongkun Zhang, School of Instrument Science and Engineering;
Kehong Zhou, Southeast University;
Ke Shi, Southeast University;
Yunhai Wang, Shandong University;
Aiguo Song, Southeast University;
Lifeng Zhu, Southeast University
Toggle Abstract
With the development of virtual reality, the practical requirements of the wearable haptic interface have been greatly emphasized. While passive haptic devices are commonly used in virtual reality, they lack generality and are difficult to precisely generate continuous force feedback to users. In this work, we present SmartSpring, a new solution for passive haptics, which is inexpensive, lightweight and capable of providing controllable force feedback in virtual reality. We propose a hybrid spring-linkage structure as the proxy and flexibly control the mechanism for adjustable system stiffness. By analyzing the structure and force model, we enable a smart transform of the structure for producing continuous force signals. We quantitatively examine the real-world performance of SmartSpring to verify our model. By asymmetrically moving or actively pressing the end-effector, we show that our design can further support rendering torque and stiffness. Finally, we demonstrate the SmartSpring in a series of scenarios with user studies. Experimental results show the potential of the developed haptic display in virtual reality.
MultiVibes: What if your VR Controller had 10 Times more Vibrotactile Actuators?
Grégoire Richard, Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL;
Thomas Pietrzak, University of Lille;
Ferran Argelaguet Sanz, Inria;
Anatole Lécuyer, Inria;
Géry Casiez, Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL
Toggle Abstract
Consumer-grade virtual reality (VR) controllers are typically equipped with one vibrotactile actuator, allowing to create simple and non-spatialized tactile sensations through the vibration of the entire controller. Leveraging the funneling effect, an illusion in which multiple vibrations are perceived as a single one, we propose MultiVibes, a VR controller capable of rendering spatialized sensations at different locations on the user’s hand and fingers. The designed prototype includes ten vibrotactile actuators, directly in contact with the skin of the hand, limiting the propagation of vibrations through the controller. We evaluated MultiVibes through two controlled experiments. The first one focused on the ability of users to recognize spatio-temporal patterns, while the second one focused on the impact of MultiVibes on the users’ haptic experience when interacting with virtual objects they can feel. Taken together, the results show that MultiVibes is capable of providing accurate spatialized feedback and that users prefer MultiVibes over traditional VR controllers.
To Stick or Not to Stick? Studying the Impact of Offset Recovery Techniques During Mid-Air Interactions
Mae´ Mavromatis, Inria, Univ Rennes, CNRS, IRISA;
Ludovic Hoyet,Inria, Univ Rennes, CNRS, IRISA;
Anatole Le´cuyer, Inria, Univ Rennes, CNRS, IRISA;
Diane Dewez, Inria, Univ Rennes, CNRS, IRISA;
Ferran Argelaguet, Inria, Univ Rennes, CNRS, IRISA
Toggle Abstract
During mid-air interactions, common approaches (such as the god-object method) typically rely on visually constraining the user’s avatar to avoid visual interpenetrations with the virtual environment in the absence of kinesthetic feedback. This paper explores two methods which influence how the position mismatch (positional offset) between users’ real and virtual hands is recovered when releasing the contact with virtual objects. The first method (sticky) constrains the user’s virtual hand until the mismatch is recovered, while the second method (unsticky) employs an adaptive offset recovery method. In the first study, we explored the effect of positional offset and of motion alteration on users’ behavioral adjustments and users’ perception. In a second study, we evaluated variations in the sense of embodiment and the preference between the two control laws. Overall, both methods presented similar results in terms of performance and accuracy, yet, positional offsets strongly impacted motion profiles and users’ performance. Both methods also resulted in comparable levels of embodiment. Finally, participants usually expressed strong preferences toward one of the two methods, but these choices were individual-specific and did not appear to be correlated solely with characteristics external to the individuals. Taken together, these results highlight the relevance of exploring the customization of motion control algorithms for avatars.
Smell of Fire Increases Behaviour Realism in Virtual Reality: A Case Study on a Recreated MGM Grand Hotel Fire
Humayun Khan, University of Canterbury;
Daniel Nilsson, University of Canterbury
Toggle Abstract
Virtual reality allows creating highly immersive visual and auditory experiences, making users feel physically present in the environment. This makes it an ideal platform to simulate dangerous scenarios, including fire evacuation, and study human behaviour without exposing users to harmful elements. However, human perception of the surrounding environment is based on the integration of multiple sensory cues (visual, auditory, tactile, or/and olfactory) present in the environment. When some of the sensory stimuli are missing in the virtual experience, it can break the illusion of being there in the environment and could lead to actions that deviate from normal behaviour. In this work, we added an olfactory cue in a well-documented historic hotel fire scenario that was recreated in VR, and examined the effects of the olfactory cue on human behaviour. We conducted a between subject study on 40 naive participants. Our results show that the addition of the olfactory cue increases behavioural realism. We found that 80% of the studied actions for the VR with olfactory cue condition matched the ones performed by the survivors. In comparison, only 40% of the participants’ actions for VR only condition were similar to the survivors.
Effects of visual presentation near the mouth on cross-modal effects of multisensory flavor perception and ease of eating
Kizashi Nakano, The University of Tokyo;
Monica Perusquia-Hernandez, Nara Institute of Science and Technolgy;
Naoya Isoyama, Otsuma Women’s University;
Hideaki Uchiyama, Nara Institute of Science and Technology;
Kiyoshi Kiyokawa, Nara Institute of Science and Technology
Toggle Abstract
Various studies have suggested that altering the appearance of food can impact multisensory flavor perception.
The cross-modal effect of such visual changes on gustation may allow for the presentation of food tastes that are difficult to express with simple combinations of taste stimuli.
This cross-modal effect of visual changes on gustation holds potential for applications in gustatory displays.
However, the current limitation of existing Head-Mounted Displays (HMDs) is their restricted vertical Field of View (FoV), which prohibits the display of images near the mouth while eating.
This limitation may impede the cross-modal effect of visual changes on multisensory flavor perception.
Additionally, the lack of visibility around the mouth area challenges the ease of eating.
To address these issues, we design a Video See-Through (VST)-HMD with an expanded vertical FoV (approx. 100 [deg]).
Using the HMD, we investigated how presenting visual information near the mouth affects the cross-modal effects of flavor perception and ease of eating.
In our experiment, machine learning techniques were utilized to alter the appearance of food.
However, the result showed no significant differences in the amount of cross-modal effects or the ease of eating between the groups with and without visual information near the mouth.
As a discussion of this result, the participants may not direct their visual attention to the food when they put the food in their mouths.
The experiment also examined whether visual changes alter the taste as well as the smell and texture of the food.
The findings demonstrated that visual changes could present the smell and texture of the food following the modifications.
This result was confirmed irrespective of the visibility near the mouth.
PS6: Pose Tracking and Localization
13:45 AEDT (UTC+11)
CivEng 109
Session Chair: Yue-Li
Toggle Papers
MagLoc-AR: Magnetic-based Localization for Visual-free Augmented Reality in Large-scale Indoor Environments
Haomin Liu, Peking University;
Hua Xue, Sensetime;
Linsheng Zhao, SenseTime;
Danpeng Chen, Computer Science College;
Zhen Peng, Sensetime;
Guofeng Zhang, Computer Science College
Toggle Abstract
Accurate localization of a display device is essential for AR in large-scale environments. Visual-based localization is
the most commonly used solution, but poses privacy risks, suffers from robustness issues and consumes high power. Wireless
signal-based localization is a potential visual-free solution, but its accuracy is not enough for AR. In this paper, we present MagLoc-AR,
a novel visual-free localization solution that achieves sufficient accuracy for some AR applications (e.g. AR navigation) in large-scale
indoor environments. We exploit the location-dependent magnetic field interference that is ubiquitous indoors as a localization signal.
Our method requires only a consumer-grade 9-axis IMU, with the gyroscope and acceleration measurements used to recover the
motion trajectory, and the magnetic measurements used to register the trajectory to the global map. To meet the accuracy requirement
of AR, we propose a mapping method to reconstruct a globally consistent magnetic field of the environment, and a localization method
fusing the biased magnetic measurements with the network-predicted motion to improve localization accuracy. In addition, we provide
the first dataset for both visual-based and geomagnetic-based localization in large-scale indoor environments. Evaluations on the
dataset demonstrate that our proposed method is sufficiently accurate for AR navigation and has advantages over the visual-based
methods in terms of power consumption and robustness. Project page: https://github.com/zju3dv/MagLoc-AR/
Bag of World Anchors for Instant Large-Scale Localization
Fernando Reyes-Aviles, ICG;
Philipp Fleck, Graz University of Technology;
Dieter Schmalstieg, Graz University of Technology;
Clemens Arth, Graz University of Technology
Toggle Abstract
In this work, we present a novel scene description to perform large-scale localization using only geometric constraints.
Our work extends compact world anchors with a search data structure to efficiently perform localization and pose estimation of
mobile augmented reality devices across multiple platforms (e.g., HoloLens 2, iPad). The algorithm uses a bag-of-words approach
to characterize distinct scenes (e.g., rooms). Since the individual scene representations rely on compact geometric (rather than
appearance-based) features, the resulting search structure is very lightweight and fast, lending itself to deployment on mobile devices.
We present a set of experiments demonstrating the accuracy, performance and scalability of our novel localization method. In
addition, we describe several use cases demonstrating how efficient cross-platform localization facilitates sharing of augmented reality
experiences.
SiTAR: Situated Trajectory Analysis for In-the-Wild Pose Error Estimation
Tim Scargill, Duke University;
Ying Chen, Duke University;
Tianyi Hu, Duke University;
Maria Gorlatova, Duke University
Toggle Abstract
Virtual content instability caused by device pose tracking error remains a prevalent issue in markerless augmented reality (AR), especially on smartphones and tablets. However, when examining environments which will host AR experiences, it is challenging to determine where those instability artifacts will occur; we rarely have access to ground truth pose to measure pose error, and even if pose error is available, traditional visualizations do not connect that data with the real environment, limiting their usefulness. To address these issues we present SiTAR (Situated Trajectory Analysis for Augmented Reality), the first situated trajectory analysis system for AR that incorporates estimates of pose tracking error. We start by developing the first uncertainty-based pose error estimation method for visual-inertial simultaneous localization and mapping (VI-SLAM), which allows us to obtain pose error estimates without ground truth; we achieve an average accuracy of up to 96.1% and an average F1 score of up to 0.77 in our evaluations on four VI-SLAM datasets. Next, we present our SiTAR system, implemented for ARCore devices, combining a backend that supplies uncertainty-based pose error estimates with a frontend that generates situated trajectory visualizations. Finally, we evaluate the efficacy of SiTAR in realistic conditions by testing three visualization techniques in an in-the-wild study with 15 users and 13 diverse environments; this study reveals the impact both environment scale and the properties of surfaces present can have on user experience and task performance.
Scene-independent Localization by Learning Residual Coordinate Map with Cascaded Localizers
Junyi Wang, BeiHang University;
Yue Qi, BUAA
Toggle Abstract
Visual localization plays an essential role in a variety of different fields. The indirect learning based method obtains an excellent performance, but it requests a training process in the target scene before the localization. To achieve deep scene-independent localization, we start by proposing the representation called residual coordinate map between a pair of images. Based on the structure, we put forward a network called SILocNet with the proposed residual coordinate map as the output. The network consists of feature extraction, multi-level feature fusion and transformer based coordinate decoder. Moreover, considering the dynamic scene, we introduce an additional segmentation branch that distinguishes fixed and dynamic parts to promote network perception. With SILocNet in place, a cascaded localizer design is presented for reducing the accumulative error. Meanwhile,the simple mathematical analysis behind the cascaded localizers is also provided. To verify how well our algorithm could perform, we conduct experiments on static 7 Scenes, ScanNet and dynamic TUM RGB-D. In particular, we train the network on ScanNet and test it on 7 Scenes and TUM RGB-D to demonstrate the generality performance. All experiments demonstrate superior performance to other existing methods. Additionally, the effects of the cascaded localizer design, feature fusion, transformer based coordinate decoder and segmentation loss are also discussed.
Minilag Filter for Jitter Elimination of Pose Trajectory in AR Environment
Xiuqiang Song, Shandong University;
Weijian Xie, Compute Science College, Zhejiang University;
Jiachen Li, Zhejiang University;
Nan Wang, SenseTime Research and Tetras.AI;
Fan Zhong, Shandong University;
Guofeng Zhang, Computer Science College;
Xueying Qin, Shandong University
Toggle Abstract
In AR applications, the jitter of virtual objects can weaken the sense of integration with the real environment. This jitter is often caused by noise in the pose obtained by 3D tracking or localization methods, especially in monocular vision systems without IMU support. Filtering the pose is an effective method to eliminate jitter, however, it can also cause significant lag in the filtered pose, seriously degrading the AR experience. Existing filters struggle to simultaneously reduce jitter while maintaining low lag. In this paper, we propose a novel Minilag filter, which achieves excellent pose smoothing while significantly reducing the lag through backtracking update and compensation strategies, and has excellent real-time performance. We represent the rotation in the pose in the Lie algebra and filter it in locally Euclidean space, ensuring that the filtering of rotation is consistent with that of vectors. We also analyze the noise distribution and characteristics in the tracked pose, providing a theoretical basis for setting filter parameters. We evaluated the proposed filter using both objective mathematical metrics and a user study, and the experimental results demonstrate that our method achieves state-of-the-art performance.
Real-time Retargeting of Deictic Motion to Virtual Avatars for Augmented Reality Telepresence
Jiho Kang, KAIST Deajeon;
Dongseok Yang KAIST Deajeon;
Taehei Kim, Graduate School of Culture Technology KAIST Daejeon;
Yewon Lee, Graduate School of Culture Technology KAIST Daejeon;
Sung-hee Lee, Graduate School of Culture and Technology KAIST Deajeon
Avatar-mediated augmented reality telepresence aims to enable distant users to collaborate remotely through avatars. When two spaces involved in telepresence are dissimilar, with different object sizes and arrangements, the avatar movement must be adjusted to convey the user’s intention rather than directly following their motion, which poses a significant challenge. In this paper, we propose a novel neural network-based framework for real-time retargeting of users’ deictic motions (pointing at and touching objects) to virtual avatars in dissimilar environments. Our framework translates the user’s deictic motion, acquired from a sparse set of tracking signals, to the virtual avatar’s deictic motion for a corresponding remote object in real-time. One of the main features of our framework is that a single trained network can generate natural deictic motions for various sizes of users. To this end, our network includes two sub-networks: AngleNet and MotionNet. AngleNet maps the angular state of the user’s motion into a latent representation, which is subsequently converted by MotionNet into the avatar’s pose, considering the user’s scale. We validate the effectiveness of our method in terms of deictic intention preservation and movement naturalness through quantitative comparison with alternative approaches. Additionally, we demonstrate the utility of our approach through several AR telepresence scenarios.
PS7: Learning and Training
16:15 AEDT (UTC+11)
Leigthton Hall
Session Chair: S. Lukosch
Toggle Papers
guitARhero: Interactive Augmented Reality Guitar Tutorials
Lucchas Ribeiro Skreinig, Graz University of Technology;
Denis Kalkofen, Flinders University;
Ana Stanescu, Graz University of Technology;
Peter Mohr, Graz University of Technology;
Frank Heyen, University of Stuttgart;
Shohei Mori, Graz University of Technology;
Michael Sedlmair, University of Stuttgart;
Dieter Schmalstieg, Graz University of Technology;
Alexander Plopski, TU Graz
Toggle Abstract
This paper presents guitARhero, an Augmented Reality application for interactively teaching guitar playing to beginners through responsive visualizations overlaid on the guitar neck. We support two types of visual guidance, a highlighting of the frets that need to be pressed and a 3D hand overlay, as well as two display scenarios, one using a desktop magic mirror and one using a video see-through head-mounted display. We conducted a user study with 20 participants to evaluate how well users could follow instructions presented with different guidance and display combinations and compare these to a baseline where users had to follow video instructions. Our study highlights the trade-off between the provided information and visual clarity affecting the user’s ability to interpret and follow instructions for fine-grained tasks. We show that the perceived usefulness of instruction integration into an HMD view highly depends on the hardware capabilities and instruction details.
PianoSyncAR: Enhancing Piano Learning through Visualizing Synchronized Hand Pose Discrepancies in Augmented Reality
Ruofan Liu, Tokyo Institute of Technology;
Erwin Wu, Huawei Tokyo Research Center;
Chen-Chieh Liao, Tokyo Institute of Technology;
Hayato Nishioka, Sony Computer Science Laboratories Inc.;
Shinichi Furuya, Sony Computer Science Laboratories Inc.;
Hideki Koike, Tokyo Institute of Technology
Toggle Abstract
Motor skill acquisition involves learning from spatiotemporal discrepancies between target and self-generated motions. However, in dexterous skills with numerous degrees of freedom, understanding and correcting these motor errors are challenging. This issue becomes crucial for experienced individuals who seek for mastering and sophisticating their skills, where even subtle errors need to be minimized. To enable efficient optimization of body posture in piano learning, we present PianoSyncAR, an augmented reality system that superimposes the time-varying complex hand postures of a teacher over the hand of a learner. Through a user study with 12 pianists, we demonstrate several advantages of the proposed system over conventional tablet-screen, which implicate the potential of AR training as a complementary tool for video-based skill learning in piano playing.
User Experience of Collaborative Co-located Mixed Reality: a User Study in Teaching Veterinary Radiation Safety Rules
Xuanhui Xu, University College Dublin;
Antonella Puggioni, University College Dublin;
David Kilroy, University College Dublin;
Abraham G. Campbell, University College Dublin
Toggle Abstract
As part of the clinical training during their degree course, veterinary students learn how to safely obtain radiographs in horses. However, this can sometimes be challenging due to ethical considerations related to the use of live animals and the fact that it is strictly dependent on the caseload of the veterinary teaching hospital. This networked setup allows the lecturer to guide students in equine radiographic techniques without requiring actual horses or an X-ray machine. A study involving veterinary students showed promising results regarding the effectiveness of using MR to teach radiation safety while performing radiographic techniques on horses. In addition to performance metrics, we employed questionnaires, including MREQ, VRSQ, and UEQ, to collect demographic data and participant feedback. Participants praised the system’s pedagogical effectiveness and overall user experience. The immersive MR experience created a sense of presence and co-presence, underscoring the potential for broader applications of co-located MR in radiology and other areas.
Detecting Teacher Expertise in an Immersive VR Classroom: Leveraging Fused Sensor Data with Explainable Machine Learning Models
Hong Gao, Technical University of Munich;
Efe Bozkir, University of Tübingen;
Philipp Stark, University of Tübingen;
Patricia Goldberg, University of Tübingen;
Gerrit Meixner, UniTyLab;
Enkelejda Kasneci, Technical University of Munich;
Richard Göllner, University of Tübingen
Toggle Abstract
Currently, VR technology is increasingly being used in applications to enable immersive yet controlled research settings. One such area of research is expertise assessment, where novel technological approaches to collecting process data, specifically eye tracking, in combination with explainable models, can provide insights into assessing and training novices, as well as fostering expertise development. We present a machine learning approach to predict teacher expertise by leveraging data from an off-the-shelf VR device collected in a VirATec study. By fusing eye-tracking and controller-tracking data, teachers’ recognition and handling of disruptive events in the classroom are taken into account or considered. Three classification models were compared, including SVM, Random Forest, and LightGBM, with Random Forest achieving the best ROC-AUC score of 0.768 in predicting teacher expertise. The SHAP approach to model interpretation revealed informative features (e.g., fixations on identified disruptive students) for distinguishing teacher expertise. Our study serves as a pioneering effort in assessing teacher expertise using eye tracking within an interactive virtual setting, paving the way for future research and advancements in the field.
Comparing Visualizations to Help a Teacher Effectively Monitor Students in a VR Classroom
Yitoshee Rahman, University of Louisiana at Lafayette;
Arun K Kulshreshth, University of Louisiana at Lafayette;
Christoph W Borst, University of Louisiana at Lafayette
Toggle Abstract
Educational virtual reality (VR) applications are the most recent addition to the learning management tools in this modern age. Due to health concerns, financial concerns, and convenience, people are looking for alternate ways to teach and learn. An efficient VR-based teaching interface could enhance student engagement, learning outcomes, and overall educational experience. Typically, teachers in a VR classroom do not have a way to know what students are doing since students are not visible. An efficient teaching interface should include some mechanism for a teacher to monitor students and alert the teacher if a student is trying to catch the attention of the teacher. An ideal interface would be one, which helps a teacher effectively monitor students while teaching without increasing the cognitive load of the teacher. In this paper, we present a comparative study of two such student monitoring interfaces. In the first interface, the student activity related information is shown using icons near the student avatar (representing a student in the VR environment). While in the second interface, a set of centrally-arranged emoticon-like visual indicators are present in addition to the student avatar, and the student activity related information is shown near the student emoticon. We present a detailed user experiment comparing the two interfaces in terms of teaching management, student monitoring capability, cognitive load, and user preference. Participants preferred and performed better with Indicator-located interface over avatar-located interface.
Towards a Framework for Validating XR Prototyping for Performance Evaluations of Simulated User Experiences
Jan Hendrik Plümer, Salzburg University of Applied Sciences;
Markus Tatzgern, Salzburg University of Applied Sciences
Toggle Abstract
Extended Reality (XR) technology has matured in recent years, leading to increased use of XR simulations for prototyping novel human-centered interfaces, approximating advanced display hardware, or exploring future user experiences, before realising them in real-world scenarios. However, the validity of utilizing XR prototyping (XRP) as a method for gathering performance data on novel user experiences is still underexplored, i.e, it is not clear if results gathered in simulations can be transferred to a real experience. To address this gap, we propose a validation framework that supports establishing equivalence of performance measures gathered with real products and simulated products and, thus, improve ecological validity of XRP. To demonstrate the utility of the framework, we conduct an exemplary validation study using a Varjo XR-3, a state-of-the-art XR head-mounted display (HMD). The study focuses on steering a small drone and comparing it to interactions with its real-world counterpart. We identify functional fidelity, i.e., functional similarity between real and simulated product, as well as simulation overhead from wearing an HMD as major confounding factors for XRP.
The Effect of an Exergame on the Shadow Play Skill Based on Muscle Memory for Young Female Participants: The Case of Forehand Drive in Table Tennis
Forouzan Farzinnejad, Coburg University of applied sciences and arts;
Javad Rasti, University of Isfahan;
Navid Khezrian, Coburg University of Applied Sciences and Arts;
Jens Grubert, Coburg University of Applied Sciences and Arts
Toggle Abstract
Learning and practicing table tennis with traditional methods is a long, tedious process and may even lead to the internalization of incorrect techniques if not supervised by a coach. To overcome these issues, the presented study proposes an exergame with the aim of enhancing young female novice players’ performance by boosting muscle memory, making practice more interesting, and decreasing the probability of faulty training. Specifically, we propose an exergame based on skeleton tracking and a virtual avatar to support correct shadow practice to learn forehand drive technique without the presence of a coach. We recruited 44 schoolgirls aged between 8 and 12 years without a background in playing table tennis and divided them into control and experimental groups. We examined their stroke skills (via the Mott-Lockhart test) and the error coefficient of their forehand drives (using a ball machine) in the pretest, post-test, and follow-up tests (10 days after the post-test). Our results showed that the experimental group had progress in the short and long term, while the control group had an improvement only in the short term. Further, the scale of improvement in the experimental group was significantly higher than in the control group.
Given that the early stages of learning, particularly in girls children, are important in the internalization of individual skills in would-be athletes, this method could support promoting correct training for young females.
Training for Open-Ended Drilling through a Virtual Reality Simulation
Hing Lie, Wellesley College;
Kachina Studer, Massachusetts Institute of Technology;
Zhen Zhao, Massachusetts Institute of Technology;
Ben Thomson, Massachusetts Institute of Technology;
Dishita G Turakhia, Massachusetts Institute of Technology;
John Liu, Massachusetts Institute of Technology
Toggle Abstract
Virtual Reality (VR) can support effective and scalable training of psychomotor skills in manufacturing. However, many industry training modules offer experiences that are close-ended and do not allow for human error. We aim to address this gap in VR training tools for psychomotor skills training by exploring an open-ended approach to the system design. We designed a VR training simulation prototype to perform open-ended practice of drilling using a 3-axis milling machine. The simulation employs near “end-to-end” instruction through a safety module, a setup and drilling tutorial, open-ended practice complete with warnings of mistakes and failures, and a function to assess the geometries and locations of drilled holes against an engineering drawing. We developed and conducted a user study within an undergraduate-level introductory fabrication course to investigate the impact of open-ended VR practice on learning outcomes. Study results reveal positive trends, with the VR group successfully completing the machining task of drilling at a higher rate (75% vs 64%), with fewer mistakes (1.75 vs 2.14 score), and in less time (17.67 mins vs 21.57 mins) compared to the control group. We discuss our findings and limitations and implications for the design of open-ended VR training systems for learning psychomotor skills.
PS8: Emerging Display & Projection Technologies
16:15 AEDT (UTC+11)
Ritchi Theatre
Session Chair: Y. Itoh
Toggle Papers
Neural Projection Mapping Using Reflectance Fields
Yotam Erel, Tel Aviv University;
Daisuke Iwai, Osaka University;
Amit Bermano, Tel-Aviv University
Toggle Abstract
We introduce a high resolution spatially adaptive light source, or a projector, into a neural reflectance field that allows to both calibrate the projector and photo realistic light editing. The projected texture is fully differentiable with respect to all scene parameters, and can be optimized to yield a desired appearance suitable for applications in augmented reality and projection mapping. Our neural field consists of three neural networks, estimating geometry, material, and transmittance. Using an analytical BRDF model and carefully selected projection patterns, our acquisition process is simple and intuitive, featuring a fixed uncalibrated projected and a handheld camera with a co-located light source. As we demonstrate, the virtual projector incorporated into the pipeline improves scene understanding and enables various projection mapping applications, alleviating the need for time consuming calibration steps performed in a traditional setting per view or projector location. In addition to enabling novel viewpoint synthesis, we demonstrate state-of-the-art performance projector compensation for novel viewpoints, improvement over the baselines in material and scene reconstruction, and three simply implemented scenarios where projection image optimization is performed, including the use of a 2D generative model to consistently dictate scene appearance from multiple viewpoints. We believe that neural projection mapping opens up the door to novel and exciting downstream tasks, through the joint optimization of the scene and projection images.
High-frame-rate projection with thousands of frames per second based on the multi-bit superimposition method
Soran Nakagawa, Tokyo Institute of Technology;
Yoshihiro Watanabe, Tokyo Institute of Technology
Toggle Abstract
The growing need for high-frame-rate projectors in the fields of dynamic projection mapping (DPM) and three-dimensional (3D) displays has increased. Conventional methods allow for an increase in the frame rate to as much as 2,841 frames per second (fps) for 8-bit image projection, using digital light processing (DLP) technology when the minimum digital mirror device (DMD) control time is 44 us. However, this rate needs to be further augmented to suit specific applications. In this study, we developed a novel high-frame-rate projection method, which divides the bit depth of an image among multiple projectors and simultaneously projects them in synchronization. The simultaneously projected bit images are superimposed such that a high-bit-depth image is generated within a reduced single-frame duration. Additionally, we devised an optimization process to determine the system parameters necessary for attaining maximum brightness. We constructed a prototype system utilizing two high-frame-rate projectors and validated the feasibility of using our system to project 8-bit images at a rate of 5,600 fps. Furthermore, the quality assessment of our projected image exhibited superior performance in comparison to a dithered image.
Studying User Perceptible Misalignment in Simulated Dynamic Facial Projection Mapping
Hao-Lun Peng, Tokyo Institute of Technology;
Shin’ya Nishida, Kyoto University;
Yoshihiro Watanabe, Tokyo Institute of Technology
Toggle Abstract
High-speed dynamic facial projection mapping (DFPM) is an advanced technology that aims to create perceptual changes in facial appearance by overlapping images based on facial position and shape. Compared to traditional monitor-based augmented reality systems, DFPM offers a higher level of immersion because users can directly observe digital content on their faces. However, DFPM suffers from misalignment issues owing to a slight temporal delay from sensing to projection, which reduces the level of immersion. To the best of our knowledge, no previous study has established the necessary latency requirements to avoid perceptible misalignment and achieve an immersive experience. Furthermore, conventional DFPM works followed latency requirements that were not reported for the DFPM scenario. Therefore, this study measured the latency that provided a just-noticeable difference (JND) in DFPM under different facial motion conditions, using the weighted up-down two-alternative forced-choice method. The results showed that user-perceptible misalignment was influenced by facial motion types and their velocities. Additionally, it was found that an average latency of 3.87 ms was necessary to avoid perceptible misalignment in the DFPM system when the translation speed was 0.5 m/s, which contradicts the commonly held belief regarding the required latency threshold.
Self-Calibrating Dynamic Projection Mapping System for Dynamic, Deformable Surfaces with Jitter Correction and Occlusion Handling
Muhammad Twaha Ibrahim, UC Irvine;
Gopi Meenakshisundaram, University of California, Irvine;
Aditi Majumder, UCI
Toggle Abstract
Dynamic projection mapping (DPM) is becoming increasingly popular, enabling viewers to visualize information on moving and deformable surfaces. Examples include large data visualization on the moving walls of tents deployed in austere remote locations during emergency management or defense operations. A DPM system typically comprises a RGB-D camera and a projector. In this paper, we present the first fully functional DPM system that auto-calibrates (without any physical props like planar checkerboard or rigid 3D objects) and creates a comprehensible display in the presence of large and fast movements by managing jitter and occlusion by passing objects.
Prior DPM systems need specific calibration props, manual inputs and in order to deliver sub-pixel calibration accuracy. Recalibration in the face of movement or change in system setup becomes a time consuming process where the calibration prop needs to be brought back. When rendering content using DPM, errors in calibration are exacerbated and the noise in the depth camera leads to jitter, making the projection unreadable or incomprehensible. Occlusion may disrupt operations completely by jumbling up even the unoccluded parts of the display.
In this paper we propose key hardware-agnostic methods for DPM calibration and rendering to make DPM systems easily deployable, stable and legible. First, we present a novel {\em projector-camera calibration} that does not need synchronization of the devices and leverages the moving surface itself, a counter-intuitive proposition. We project ArUCo markers on the moving surface and use corresponding detected features of these markers in the RGB and depth camera over multiple frames to accurately estimate the intrinsics and extrinsics of both the projector and the RGB-D camera. Second, we present a DPM {\em rendering} method that uses Kalman filtering models to reduce jitter and predict the surface shape in the presence of short term occlusions by other static objects. This results in the first DPM system, to the best of our knowledge, that can auto-calibrate in minutes and can render high resolution content like high-resolution text or images comprehensible even in the presence of fast movements, deformations and occlusions. We compare and evaluate the accuracy with prior methods and analyze the effect of surface movement on the calibration accuracy.
Exploring the Effects of Virtually-Augmented Display Sizes on Users’ Spatial Memory in Smartwatches
Marium-E- Jannat, University of British Columbia;
Khalad Hasan, University of British Columbia
Toggle Abstract
The small display size of the smartwatches makes it difficult to display large amounts of information on the device. Prior work explored leveraging a second device (e.g., Head-mounted displays) to extend the space where users can access large information space with virtual displays anchored on their wrists. Though researchers showed that having an additional virtual screen increased information bandwidth, little is known about the effect of virtual display sizes on users’ performance. In this paper, we examined the impact of display sizes on spatial memory, workload, and user experience to better understand the prospects of virtually-augmented displays for smartwatches. Results from a user study revealed that a 4.8 inches display size can be the “sweet spot” for the virtually-augmented displays to ensure improved spatial memory performance and better user experience with less workload. Finally, we provided a set of design guidelines focusing to display size, spatial memory, user experience, and workload for designing virtually augmented user interfaces for smartwatches.
Perceptual Tolerance of Split-Up Effect for Near-Eye Light Field Display
Ting-Hsun Chi, Electrical Engineering Department, National Taiwan University;
Wen Perng, Electrical Engineering Department, National Taiwan University ;
Homer Chen, National Taiwan University
Toggle Abstract
When the light field of a scene is generated with a finite number of subviews, the defocused regions would appear to be split-up if a camera is used to capture the light field. Yet, the split-up effect is unnoticeable when the light field is viewed directly through a human eye. In this paper, we attribute the unobservability of the split-up effect to the decrease in visual acuity as a function of retinal eccentricity and to the low-pass filtering property of visual attention. Theoretical and experimental results are provided to support our claim. Furthermore, we set an observability criterion for the split-up effect and discuss design strategies for performance improvement of light field displays.
Spaces to Think: A Comparison of Small, Large, and Immersive Displays for the Sensemaking Process
Lee Lisle, Virginia Tech;
Kylie Davidson, Virginia Tech;
Leonardo Pavanatto, Virginia Tech;
Ibrahim Asadullah Tahmid, Virginia Tech;
Chris North, Virginia Tech;
Doug Bowman, Virginia Tech
Toggle Abstract
Analysts need to process large amounts of data in order to extract concepts, themes, and plans of action based upon their findings. Different display technologies offer varying levels of space and interaction methods that change the way users can process data using them. In a comparative study, we investigated how the use of single traditional monitor, a large, high-resolution two-dimensional monitor, and immersive three-dimensional space using the Immersive Space to Think approach impact the sensemaking process. We found that user satisfaction grows and frustration decreases as available space increases. We observed specific strategies users employ in the various conditions to assist with the processing of datasets. We also found an increased usage of spatial memory as space increased, which increases performance in artifact position recall tasks. In future systems supporting sensemaking, we recommend using display technologies that provide users with large amounts of space to organize information and analysis artifacts.
Low-Latency Beaming Display: Implementation of Wearable, 133 $\mu$s Motion-to-Photon Latency Near-eye Display
Yuichi Hiroi, The University of Tokyo;
Akira Watanabe, Tokyo Institute of Technology;
Yuri Mikawa, The University of Tokyo;
Yuta Itoh, The University of Tokyo
Toggle Abstract
This paper presents a low-latency Beaming Display system with a 133~$\mu$s motion-to-photon (M2P) latency, the delay from head motion to the corresponding image motion. The Beaming Display represents a recent near-eye display paradigm that involves a steerable remote projector and a passive wearable headset. This system aims to overcome typical trade-offs of Optical See-Through Head-Mounted Displays (OST-HMDs), such as weight and computational resources. However, since the Beaming Display projects a small image onto a moving, distant viewpoint, M2P latency significantly affects displacement. To reduce M2P latency, we propose a low-latency Beaming Display system that can be modularized without relying on expensive high-speed devices. In our system, a 2D position sensor, which is placed coaxially on the projector, detects the light from the IR-LED on the headset and generates a differential signal for tracking. An analog closed-loop control of the steering mirror based on this signal continuously projects images onto the headset. We have implemented a proof-of-concept prototype, evaluated the latency and the augmented reality experience through a user-perspective camera, and discussed the limitations and potential improvements of the prototype.
PS9: 3DUIs: Selection
16:15 AEDT (UTC+11)
CivEng 109
Session Chair: F. Argelaguet
Toggle Papers
Multi-Level Precues for Guiding Tasks Within and Between Workspaces in Spatial Augmented Reality
Benjamin Volmer, University of South Australia;
Jen-Shuo Liu, Columbia University;
Brandon J Matthews, University of South Australia;
Ina Bornkessel-Schlesewsky, University of South Australia;
Steven Feiner, Columbia University;
Bruce H Thomas, University of South Australia
Toggle Abstract
We explore Spatial Augmented Reality (SAR) precues (predictive cues) for procedural asks within and between workspaces and for visualizing multiple upcoming steps in advance. We designed precues based on several factors: cue type, color transparency,
and multi-level (number of precues). Precues were evaluated in a procedural task requiring the user to press buttons in three surrounding workspaces. Participants performed fastest in conditions where tasks were linked with line cues with different levels of color transparency. Precue performance was also affected by whether the next task was in the same workspace or a different one.
3D Selection in Mixed Reality: Designing a Two-Phase Technique To Reduce Fatigue
Adrien Chaffangeon Caillet, Laboratoire d’Informatique de Grenoble;
Alix Goguey, Université Grenoble Alpes;
Laurence Nigay, Université Grenoble Alpes
Toggle Abstract
Mid-air pointing is widely used for 3D selection in Mixed Reality but leads to arm fatigue. In a first exploratory experiment we study a two-phase design and compare modalities for each phase: mid-air gestures, eye-gaze and microgestures. Results suggest that eye-gaze and microgestures are good candidates to reduce fatigue and improve interaction speed. We therefore propose two 3D selection techniques: Look&MidAir and Look&Micro. Both techniques include a first phase during which users control a cone directed along their eye-gaze. Using the flexion of their non-dominant hand index finger, users pre-select the objects intersecting this cone. If several objects are pre-selected, a disambiguation phase is performed using direct mid-air touch for Look&MidAir or thumb to finger microgestures for Look&Micro. In a second study, we compare both techniques to the standard raycasting technique. Results show that Look&MidAir and Look&Micro perform similarly. However they are 55% faster, perceived easier to use and are less tiring than the baseline. We discuss how the two techniques could be combined for greater flexibility and for object manipulation after selection.
Point & Portal: A New Action at a Distance Technique For Virtual Reality
Daniel L Ablett, University of South Australia;
Andrew Cunningham, University of South Australia;
Gun A. Lee, University of South Australia;
Bruce H Thomas, University of South Australia
Toggle Abstract
This paper introduces Point & Portal, a novel Virtual Reality (VR) interaction technique, inspired by Point & Teleport. This new technique enables users to configure portals using pointing actions, and supports seamless action at a distance and navigation without requiring line of sight. By supporting multiple portals, Point & Portal enables users to create dynamic portal configurations to manage multiple remote tasks. Additionally, this paper introduces Portal Relative Positioning for reliable portal interactions, and the concept of maintaining Level Portals. In a comparative user study, Point & Portal demonstrated significant advantages over the traditional Point & Teleport technique to bring interaction devices within-arm’s reach. In the presence of obstacles, Point & Portal exhibited faster speed, lower cognitive load and was preferred by participants. Overall, participants required less physical movement, pointing actions, and reported higher involvement and “good” usability.
PinchLens: Applying Spatial Magnification and Adaptive Control Display Gain for Precise Selection in Virtual Reality
Fengyuan Zhu, University of Toronto;
Ludwig Sidenmark, University of Toronto;
Mauricio Sousa, University of Toronto;
Tovi Grossman, University of Toronto
Toggle Abstract
We present PinchLens, a new free-hand target selection technique for acquiring small and dense targets in Virtual Reality. Traditional pinch-based selection does not allow people to precisely manipulate small and dense objects effectively due to tracking and perceptual inaccuracies. Our approach combines spatial magnification, an adaptive control-display gain, and visual feedback to improve selection accuracy. When a user starts the pinching selection process, a magnifying bubble expands the scale of nearby targets, an adaptive control-to-display ratio is applied to the user’s hand for precision, and a cursor is displayed at the estimated pinch point for enhanced visual feedback. We performed a user study to compare our technique to traditional pinch selection and several variations to isolate the impact of each of the technique’s features. The results showed that PinchLens significantly outperformed traditional pinch selection, reducing error rates from 18.9% to 1.9%. Furthermore, we found that magnification was the dominant feature to produce this improvement, while the adaptive control-display gain and visual cursor of pinch were also helpful in several conditions.
TouchRay: Towards Low-effort Object Selection at Any Distance in DeskVR
João Monteiro, Faculdade de Engenharia, Universidade do Porto;
Daniel Mendes, INESC TEC;
Rui Rodrigues, FEUP/INESC TEC
Toggle Abstract
DeskVR allows users to experience Virtual Reality (VR) while sitting at a desk without requiring extensive movements. This makes it better suited for professional work environments where productivity over extended periods is essential. However, tasks that typically resort to mid-air gestures might not be suitable for DeskVR. In this paper, we focus on the fundamental task of object selection. We present TouchRay, an object selection technique conceived specifically for DeskVR that enables users to select objects at any distance while resting their hands on the desk. It also allows selecting objects’ sub-components by traversing their corresponding hierarchical trees. We conducted a user evaluation comparing TouchRay against state-of-the-art techniques targeted at traditional VR. Results revealed that participants could successfully select objects in different settings, with consistent times and on par with the baseline techniques in complex tasks, without requiring mid-air gestures.
Effect of Grip Style on Peripersonal Target Pointing in VR Head Mounted Displays
Anil Ufuk Batmaz, Concordia University;
Rumeysa Turkmen, Kadir Has University;
Mine Sarac, Kadir Has University;
Mayra Donaji Barrera Machuca, Dalhousie University;
Wolfgang Stuerzlinger, Simon Fraser University
Toggle Abstract
When working in Virtual Reality (VR), the user’s performance is affected by how the user holds the input device (e.g., controller), typically using either a precision or a power grip. Previous work examined these grip styles for 3D pointing at targets at different depths in peripersonal space and found that participants had a lower error rate with the precision grip but identified no difference in movement speed, throughput, or interaction with target depth. Yet, this previous experiment was potentially affected by tracking differences between devices. This paper reports an experiment that partially replicates and extends the previous study by evaluating the effect of grip style on the 3D selection of nearby targets with the same device. Furthermore, our experiment re-investigates the effect of the vergence-accommodation conflict (VAC) present in current stereo displays on 3D pointing in peripersonal space. Our results show that grip style significantly affects user performance. We hope that our results are useful for researchers and designers when creating virtual environments.
Exploring Users Pointing Performance on Large Displays with Different Curvatures in Virtual Reality
A K M Amanat Ullah, University of British Columbia, Okanagan;
William Delamare, ESTIA;
Khalad Hasan, University of British Columbia
Toggle Abstract
Large curved displays inside Virtual Reality environments are becoming popular for visualizing high-resolution content during analytical tasks, gaming or entertainment. Prior research showed that such displays provide a wide field of view and offer users a high level of immersion. However, little is known about users’ performance (e.g., pointing speed and accuracy) on them. We explore users’ pointing performance on large virtual curved displays. We investigate standard pointing factors (e.g., target width and amplitude) in combination with relevant curve-related factors, namely display curvature and both linear and angular measures. Our results show that the less curved the display, the higher the performance, i.e., faster movement time. This result holds for pointing tasks controlled via their visual properties (linear widths and amplitudes) or their motor properties (angular widths and amplitudes). Additionally, display curvatures significantly affect the error rate for both linear and angular conditions. Furthermore, we observe that curved displays perform better or similar to flat displays based on throughput analysis. Finally, we discuss our results and provide suggestions regarding pointing tasks on large curved displays in VR.
HotGestures: Complementing Command Selection and Use with Delimiter-Free Gesture-Based Shortcuts in Virtual Reality
Zhaomou Song, University of Cambridge;
John J Dudley, University of Cambridge;
Per Ola Kristensson, University of Cambridge
Toggle Abstract
Conventional desktop applications provide users with hotkeys as shortcuts for triggering different functionality. In this paper we consider what constitutes an effective parallel to hotkeys in a 3D interaction space where the input modality is no longer limited to the use of a keyboard. We propose HotGestures: a gesture-based interaction system for rapid tool selection and usage.
Hand gestures are frequently used during human communication to convey information and provide natural associations with meaning. HotGestures provide shortcuts for users to seamlessly activate and use virtual tools by performing hand gestures. This approach naturally complements conventional menu interactions. We evaluate the potential of HotGestures in a set of two user studies and observe that our gesture-based technique provides fast and effective shortcuts for tool selection and usage. Participants found HotGestures to be distinctive, fast, and easy to use while also complementing conventional menu-based interaction.
Wednesday 18 Oct
PS10: Remote Collaboration, Telepresence and Teleoperation
10:30 AEDT (UTC+11)
Leigthton Hall
Session Chair: K. Kiyokawa
Toggle Papers
Visualizing Hand Force with Wearable Muscle Sensing for Enhanced Mixed Reality Remote Collaboration
Hyung-il Kim, KAIST;
Boram Yoon, UVR Lab, KAIST;
Seo Young Oh, KAIST;
Woontack Woo, KAIST
Toggle Abstract
In this paper, we present a prototype system for sharing a user’s hand force in mixed reality (MR) remote collaboration on physical tasks, where hand force is estimated using wearable surface electromyography (sEMG) sensor. In a remote collaboration between a worker and an expert, hand activity plays a crucial role. However, the force exerted by the worker’s hand has not been extensively investigated.
Our sEMG-based system reliably captures the worker’s hand force during physical tasks and conveys this information to the expert through hand force visualization, overlaid on the worker’s view or on the worker’s avatar. A user study was conducted to evaluate the impact of visualizing a worker’s hand force on collaboration, employing three distinct visualization methods across two view modes. Our findings demonstrate that sensing and sharing hand force in MR remote collaboration improves the expert’s awareness of the worker’s task, significantly enhances the expert’s perception of the collaborator’s hand force and the weight of the interacting object, and promotes a heightened sense of social presence for the expert. Based on the findings, we provide design implications for future mixed reality remote collaboration systems that incorporate hand force sensing and visualization.
Virtual Reality Telepresence: 360-Degree Video Streaming with Edge-Compute Assisted Static Foveated Compression
Xincheng Huang, University of British Columbia;
James Riddell, University of British Columbia;
Robert Xiao, University of British Columbia
Toggle Abstract
Real-time communication with immersive 360-degree video can enable users to be telepresent within a remotely streamed environment. Increasingly, users are shifting to mobile devices and connecting to the Internet via mobile-cellular networks. As the ideal media for 360-degree videos, some VR headsets now also come with cellular capacity, giving them potential for mobile applications. However, streaming high-quality 360-degree live video poses challenges for network bandwidth, particularly on cellular connections. To reduce bandwidth requirements, videos can be compressed using viewport-adaptive streaming or foveated rendering techniques. Such approaches require very low latency in order to be effective, which has previously limited their applications on traditional cellular networks. In this work, we demonstrate an end-to-end virtual reality telepresence system that streams ~6K 360-degree video over 5G millimeter-wave (mmW) radio. Our use of 5G technologies, in conjunction with mobile edge compute nodes, substantially reduces latency when compared with existing 4G networks, enabling high-efficiency foveated compression over modern cellular networks on par with WiFi. We performed a technical evaluation of our system’s visual quality post-compression with peak signal-to-noise ratio (PSNR) and FOVVideoVDP. We also conducted a user study to evaluate users’ sensitivity to compressed video. Our findings demonstrate that our system achieves visually indistinguishable video streams while using up to 80% less data when compared with un-foveated video. We demonstrate our video compression system in the context of an immersive, telepresent video calling application.
Remote Monitoring and Teleoperation of Autonomous Vehicles — Is Virtual Reality an Option?
Snehanjali Kalamkar, Coburg University of Applied Sciences and Arts;
Verena Biener, Coburg University of applied sciences and arts;
Fabian Beck, University of Bamberg;
Jens Grubert, Coburg University of Applied Sciences and Arts
Toggle Abstract
While the promise of autonomous vehicles has led to significant scientific and industrial progress, fully automated, SAE level 5 conform cars will likely not see mass adoption anytime soon. Instead, in many applications, human supervision, such as remote monitoring and teleoperation, will be required for the foreseeable future. While Virtual Reality (VR) has been proposed as one potential interface for teleoperation, its benefits and drawbacks over physical monitoring and teleoperation solutions have not been thoroughly investigated. To this end, we contribute three user studies, comparing and quantifying the performance of and subjective feedback for a VR-based system with an existing monitoring and teleoperation system, which is in industrial use today. Through these three user studies, we contribute to a better understanding of future virtual monitoring and teleoperation solutions for autonomous vehicles. The results of our first user study (n=16) indicate that a VR interface replicating the physical interface does not outperform the physical interface. It also quantifies the negative effects that combined monitoring and teleoperating tasks have on users irrespective of the interface being used. The results of the second user study (n=24) indicate that the perceptual and ergonomic issues caused by VR outweigh its benefits, like better concentration through isolation. The third follow-up user study (n=24) specifically targeted the perceptual and ergonomic issues of VR; the subjective feedback of this study indicates that newer-generation VR headsets have the potential to catch up with the current physical displays.
“Can You Move It?”: The Design and Evaluation of Moving VR Shots in Sport Broadcast
Xiuqi Zhu, Tsinghua University;
Chenyi Wang, Tsinghua University;
Zichun Guo, Tsinghua University;
Yifan Zhao, Tsinghua University;
Yang Jiao, Tsinghua University
Toggle Abstract
Virtual Reality (VR) broadcasting has seen widespread adoption in major sports events, attributed to its ability to generate a sense of presence, curiosity, and excitement among viewers. However, we have noticed that still shots reveal a limitation in the movement of VR cameras and hinder the VR viewing experience in current VR sports broadcasts. This paper aims to bridge this gap by engag- ing in a quantitative user analysis to explore the design and impact of dynamic VR shots on viewing experiences. We conducted two user studies in a digital hockey game twin environment and asked participants to evaluate their viewing experience through two ques- tionnaires. Our findings suggested that the viewing experiences demonstrated no notable disparity between still and moving shots for single clips. However, when considering entire events, moving shots improved the viewer’s immersive experience, with no notable increase in sickness compared to still shots. We further discuss the benefits of integrating moving shots into VR sports broadcasts and present a set of design considerations and potential improvements for future VR sports broadcasting.
Human Behavior Analysis in Human-Robot Cooperation with AR Glasses
Koichi Owaki, Osaka University;
Hideyuki Shimonishi, Osaka University;
Nattaon Techasarntikul, Osaka University
Toggle Abstract
To achieve efficient human-robot cooperation, it is necessary to work in close proximity while ensuring safety. However, in conventional robot control, maintaining a certain distance between humans and robots is required for safety, owing to control uncertainties and unexpected human actions, which can limit the efficiency of robot operations.
Therefore, this study aims to establish a human-robot cooperation aiding system that concerns both safety and efficiency in a close proximity situation. We propose two Augmented Reality (AR) interfaces to display robot information via AR glasses, allowing workers to see the robot information while focusing on their task and avoiding collisions with the robot. AR glasses can give hands-free communication required for a work environment like warehouses or convenience store backyards, and multiple information levels, simple or informative, to balance accuracy and easiness of human recognition ability. We conducted a comparative evaluation experiments with 24 participants and found that both safety and efficiency were improved using the proposed user interfaces (UIs).
We also collected the position, head motion, and eye-tracking data from the AR glasses to gain insight into human behavior during the tasks for each UI.
Consequently, we clarified the behavior of the participants under each condition and how they contributed to safety and efficiency.
AR-supported Human-Robot Collaboration: Facilitating Workspace Awareness and Parallelised Assembly Tasks
Rasmus Skovhus Lunding, Aarhus University;
Mathias N. Lystbæk, Aarhus University;
Tiare Feuchtner, University of Konstanz;
Kaj Grønbæk, Aarhus University
Toggle Abstract
While technologies for human-robot collaboration are rapidly advancing, plenty of aspects still need further investigation, such as ensuring workspace awareness, enabling the operator to reschedule tasks on the fly, and how users prefer to coordinate and collaborate with robots. To address these, we propose an Augmented Reality interface that supports human-robot collaboration in an assembly task by (1) enabling the inspection of planned and ongoing robot processes through dynamic task lists and a path visualization, (2) allowing the operator to also delegate tasks to the robot, and (3) presenting step-by-step assembly instructions. We evaluate our AR interface in comparison to a state-of-the-art tablet interface in a user study, where participants collaborated with a robot arm in a shared workspace to complete an assembly task. Our findings confirm the feasibility and potential of AR-assisted human-robot collaboration, while pointing to some central challenges that require further work.
DualStream: Spatially Sharing Selves and Surroundings using Mobile Devices and Augmented Reality
Rishi Vanukuru, University of Colorado Boulder;
Suibi Che-Chuan Weng, University of Colorado Boulder;
Krithik Ranjan, University of Colorado Boulder;
Torin Hopkins, University of Colorado Boulder;
Amy Banic, University of Wyoming;
Mark D Gross, University of Colorado Boulder;
Ellen Yi-Luen Do, University of Colorado Boulder
Toggle Abstract
In-person human interaction relies on our spatial perception of each other and our surroundings. Current remote communication tools partially address each of these aspects. Video calls convey real user representations but without spatial interactions. Augmented and Virtual Reality (AR/VR) experiences are immersive and spatial but often use virtual environments and characters instead of real-life representations. Bridging these gaps, we introduce DualStream, a system for synchronous mobile AR remote communication that captures, streams, and displays spatial representations of users and their surroundings. DualStream supports transitions between user and environment representations with different levels of visuospatial fidelity, as well as the creation of persistent shared spaces using environment snapshots. We demonstrate how DualStream can enable spatial communication in real-world contexts, and support the creation of blended spaces for collaboration. A formative evaluation of DualStream revealed that users valued the ability to interact spatially and move between representations, and could see DualStream fitting into their own remote communication practices in the near future. Drawing from these findings, we discuss new opportunities for designing more widely accessible spatial communication tools, centered around the mobile phone.
PS11: Graphics
10:30 AEDT (UTC+11)
Ritchi Theatre
Session Chair: D. Iwai
Toggle Papers
Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction
Jiayang Bai, Computer Science;
Zhen He, Nanjing University;
Shan Yang, Nanjing University;
Jie Guo, Nanjing University;
Zhenyu Chen, Nanjing Univetsity;
Yan Zhang, Nanjing University;
Yanwen Guo, Nanjing University
Toggle Abstract
Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics. To achieve locale-aware and robust prediction, this problem can be decomposed into three sub-tasks: depth-based image warping, panorama inpainting and high-dynamic-range (HDR) reconstruction, among which the success of panorama inpainting plays a key role. Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama. However, they usually achieve suboptimal performance since the missing contents occupy a very large portion in the panoramic space while CNNs are plagued by limited receptive fields. The spatially-varying distortion in the spherical signals further increases the difficulty for conventional CNNs. To address these issues, we propose a local-to-global strategy for large-scale panorama inpainting. In our method, a depth-guided local inpainting is first applied on the warped panorama to fill small but dense holes. Then, a transformer-based network, dubbed PanoTransformer, is designed to hallucinate reasonable global structures in the large holes. To avoid distortion, we further employ cubemap projection in our design of PanoTransformer. The high-quality panorama recovered at any locale helps us to capture spatially-varying indoor illumination with physically-plausible global structures and fine details.
Exemplar-Based Inpainting for 6DOF Virtual Reality Photos
Shohei Mori, Graz University of Technology;
Dieter Schmalstieg, Graz University of Technology;
Denis Kalkofen, Flinders University
Toggle Abstract
Multi-layer images are currently the most prominent scene representation for viewing natural scenes under full-motion parallax in virtual reality. Layers ordered in diopter space contain color and transparency so that a complete image is formed when the layers are composited in a view-dependent manner. Once baked, the same limitations apply to multi-layer images as to conventional single-layer photography, making it challenging to remove obstructive objects or otherwise edit the content. Object removal before baking can benefit from filling disoccluded layers with pixels from background layers. However, if no such background pixels have been observed, an inpainting algorithm must fill the empty spots with fitting synthetic content. We present and study a multi-layer inpainting approach that addresses this problem in two stages: First, a volumetric area of interest specified by the user is classified with respect to whether the background pixels have been observed or not. Second, the unobserved pixels are filled with multi-layer inpainting. We report on experiments using multiple variants of multi-layer inpainting and compare our solution to conventional inpainting methods that consider each layer individually.
Deep scene synthesis of Atlanta-world interiors from a single omnidirectional image
Giovanni Pintore, CRS4;
Fabio Bettio, CRS4;
Marco Agus, Hamad Bin Khalifa University;
Enrico Gobbetti, CRS4
Toggle Abstract
We present a new data-driven approach for extracting geometric and structural information from a single spherical panorama of an interior scene, and for using this information to render the scene from novel points of view, enhancing 3D immersion in VR applications. The approach copes with the inherent ambiguities of single-image geometry estimation and novel view synthesis by focusing on the very common case of Atlanta-world interiors, bounded by horizontal floors and ceilings and vertical walls. Based on this prior, we introduce a novel end-to-end deep learning approach to jointly estimate the depth and the underlying room structure of the scene. The prior guides the design of the network and of novel domain-specific loss functions, shifting the major computational load on a training phase that exploits available large-scale synthetic panoramic imagery. An extremely lightweight network uses geometric and structural information to infer novel panoramic views from translated positions at interactive rates, from which perspective views matching head rotations are produced and upsampled to the display size. As a result, our method automatically produces new poses around the original camera at interactive rates, within a working area suitable for producing depth cues for VR applications, especially when using head-mounted displays connected to graphics servers. The extracted floor plan and 3D wall structure can also be used to support room exploration. The experimental results demonstrate that our method provides low-latency performance and improves over current state-of-the-art solutions in prediction accuracy on available commonly used indoor panoramic benchmarks.
VRS-NeRF: Accelerating Neural Radiance Field Rendering with Variable Rate Shading
Tim Rolff, Universität Hamburg;
Susanne Schmidt, Universität Hamburg;
Ke Li, Deutsches Elektronen-Synchrotron (DESY);
Frank Steinicke, Universität Hamburg;
Simone Frintrop, Universität Hamburg
Toggle Abstract
Recent advancements in Neural Radiance Fields (NeRF) provide enormous potential for a wide range of Mixed Reality (MR) applications. However, the applicability of NeRF to real-time MR systems is still largely limited by the rendering performance of NeRF. In this paper, we present a novel approach for Variable Rate Shading for Neural Radiance Fields (VRS-NeRF). In contrast to previous techniques, our approach does not require training multiple neural networks or re-training of already existing ones, but instead utilizes the raytracing properties of NeRF. This is achieved by merging rays depending on a variable shading rate, which reduces the overall number of queries to the neural network. We demonstrate the generalizability of our approach by implementing three alternative functions for the determination of the shading rate. The first method uses the gaze of users to effectively implement a foveated rendering technique in NeRF. For the other two techniques, we utilize shading rates based on edges and saliency. Based on a psychophysical experiment and multiple image-based metrics, we suggest a set of parameters for each technique, yielding an optimal tradeoff between rendering performance gain and perceived visual quality.
Is Foveated Rendering perception affected by users’ motion?
Thállys Lisboa Simões, Universidade Federal Fluminense;
Horácio Brescia Macêdo Henriques, Universidade Federal Fluminense;
Thiago Porcino, Pontifical Catholic University of Rio de Janeiro;
Eder de Oliveira, Universidade Federal Fluminense;
Daniela Trevisan, Universidade Federal Fluminense;
Esteban Clua, universidade federal fluminense
Toggle Abstract
Virtual reality (VR) is gaining increasing popularity across various
domains, but the current state of technology imposes limitations on
the level of realism and complexity achievable in computer graphics
when displayed through VR head-mounted devices (HMDs). To
improve the user experience in HMDs, optimization techniques are
needed to enhance performance without sacrificing quality. One
such technique is Foveated Rendering (FR), which leverages the
human visual system to optimize resource usage. FR degrades the
image quality at the periphery of the human vision, where visual
acuity is lower, to save resources. This paper aims to investigate
if the perception of the peripheral area is affected whenever users
are in movement in a VR environment. Our findings show a significant
correlation between speed movement and Foveated rendering
parameters in both scenarios. The least amount of degradation was
observed in the idle state and the most in the high-speed state, indicating
that users perceive less degradation at higher speeds. These
results are particularly relevant for path-tracing-based algorithms,
due to the possibility of reducing the number of rays required for the rendering whenever there is movement.
RenderFusion: Balancing Local and Remote Rendering for Interactive 3D Scenes
Edward Lu, Carnegie Mellon University;
Sagar Bharadwaj Kalasibail Seetharam, Carnegie Mellon University;
Mallesham Dasari, Carnegie Mellon University;
Connor Smith, NVIDIA;
Srinivasan Seshan, Carnegie Mellon University;
Anthony Rowe, Carnegie Mellon University
Toggle Abstract
Many modern-day XR devices (e.g. mobile headsets, phones, etc.) lack the computing resources required to render complex 3D scenes in real-time. Typically, to render a high-resolution scene on a lightweight XR device, 3D designers arduously decimate and fine-tune the objects. As an alternative, remote rendering systems can utilize powerful nearby servers to stream rendering results to a client. While this is a promising solution, it can introduce a variety of latency and reliability issues, especially under variable network conditions. In this paper, we present a distributed rendering system that combines both remote rendering and on-device, “local” rendering to add robustness to network fluctuations and device workloads. To maximize user QoE, our approach dynamically swaps an object’s rendering medium, adjusting for client workload, low frame rates, and several perceptual characteristics. To model these characteristics, we perform a study under simulated conditions to measure how users perceive latency and complexity differences between objects in a scene. Using the results of the study, we then provide an algorithm for choosing the optimal object rendering medium, based on rendering complexity as well as network and latency models, ensuring that a target frame rate will be met. Finally, we evaluate this algorithm on a prototype implementation that can provide cross-platform split rendering using web technologies.
DEAMP: Dominant-Eye-Aware Foveated Rendering with Multi-Parameter Optimization
Zhimin Wang, Beihang University;
Xiangyuan Gu, Beihang University;
Feng Lu, Beihang University
Toggle Abstract
The increasing use of high-resolution displays and the demand for interactive frame rates presents a major challenge to widespread adoption of virtual reality. Foveated rendering address this issue by lowering pixel sampling rate at the periphery of the display. However, existing techniques do not fully exploit the feature of human binocular vision, i.e., the dominant eye. In this paper, we propose a Dominant-Eye-Aware foveated rendering method optimized with Multi-Parameter foveation (DEAMP). Specifically, we control the level of foveation for both eyes with two distinct sets of foveation parameters. To achieve this, each eye’s visual field is divided into three nested layers based on eccentricity. Multiple parameters govern the level of foveation of each layer, respectively. We conduct user studies to evaluate our method. Experimental results demonstrate that DEAMP is superior in terms of rendering time and reduces the disparity between pixel sampling rate and the visual acuity fall-off model while maintaining the perceptual quality.
PS12: Navigation and Locomotion
10:30 AEDT (UTC+11)
CivEng 109
Session Chair: B. Riecke
Toggle Papers
TENETvr: Comprehensible Temporal Teleportation in Time-Varying Virtual Environments
Daniel Rupp, RWTH Aachen University;
Torsten Wolfgang Kuhlen, RWTH Aachen University;
Tim Weissker, RWTH Aachen University
Toggle Abstract
The iterative design process of virtual environments commonly generates a history of revisions that each represent the state of the scene at a different point in time. Browsing through these discrete time points by common temporal navigation interfaces like time sliders, however, can be inaccurate and lead to an uncomfortably high number of visual changes in a short time. In this paper, we therefore present a novel technique called TENETvr (Temporal Exploration and Navigation in virtual Environments via Teleportation) that allows for efficient teleportation-based travel to time points in which a particular object of interest changed. Unlike previous systems, we suggest that changes affecting other objects in the same time span should also be mediated before the teleport to improve predictability. We therefore propose visualizations for nine different types of additions, property changes, and deletions. In a formal user study with 20 participants, we confirmed that this addition leads to significantly more efficient change detection, lower task loads, and higher usability ratings, therefore reducing temporal disorientation.
Reality Distortion Room: A Study of User Locomotion Responses to Spatial Augmented Reality Effects
You-Jin Kim, University of California, Santa Barbara;
Andrew D Wilson, Microsoft Research;
Jennifer Jacobs, University of California Santa Barbara;
Tobias Höllerer, University of California, Santa Barbara
Toggle Abstract
Reality Distortion Room (RDR) is a proof-of-concept augmented reality system using projection mapping and unencumbered interaction with the Microsoft RoomAlive system to study a user’s locomotive response to visual effects that seemingly transform the physical room the user is in. This study presents five effects that augment the appearance of a physical room to subtly encourage user motion. Our experiment demonstrates users’ reactions to the different distortion and augmentation effects in a standard living room, with the distortion effects projected as wall grids, furniture holograms, and small particles in the air. The augmented living room can give the impression of becoming elongated, wrapped, shifted, elevated, and enlarged. The study results support the implementation of AR experiences in limited physical spaces by providing an initial understanding of how users can be subtly encouraged to move throughout a room.
LeanOn: Simulating Balance Vehicle Locomotion in Virtual Reality
Ziyue Zhao, Xi’an Jiaotong-Liverpool University;
Yue Li, Xi’an Jiaotong-Liverpool University;
Hai-Ning Liang, Xi’an Jiaotong-Liverpool University
Toggle Abstract
Locomotion plays a critical role in user experience in Virtual Reality (VR). This work presents a novel locomotion device, LeanOn, which aims to enhance immersion and feedback experience in VR. Inspired by balance vehicles, LeanOn is a leaning-based locomotion device that allows users to control their location by tilting a board on two balance wheels, with rotation enabled by two buttons near users’ feet. To create a more realistic riding experience, LeanOn is equipped with a terrain vibration system that generates varying levels of vibration based on the roughness of the terrain. We conducted a within-subjects experiment (N=24) and compared the use of LeanOn and joystick steering in four aspects: cybersickness, spatial presence, feedback experience, and task performance. Participants used LeanOn with and without the vibration system to investigate the necessity of tactile feedback. The results showed that LeanOn significantly improved users’ feedback experience, including autotelic, expressivity, harmony, and immersion, and maintained similar levels of cybersickness and spatial presence, compared to joystick steering. Our work contributes to the field of VR locomotion by validating a leaning-based steering prototype and showing its positive effect on improving users’ feedback experience in VR. We also showed that tactile feedback in locomotion is necessary to further enhance immersion in VR.
Novel Design and Evaluation of Redirection Controllers using Optimized Alignment and Artificial Potential Field
Xue Liang Wu, National Yang Ming Chiao Tung University;
Huan-Chang Hung, National Yang Ming Chiao Tung University;
Sabarish V. Babu, Clemson University;
Jung-Hong Chuang, National Chiao Tung University
Toggle Abstract
Redirected walking allows users to naturally locomote within virtual environments that are larger than or different in layout from the physically tracked space. In this paper, we proposed novel optimization-driven alignment-based and Artificial Potential Field (APF) redirected walking controllers, as well as an integrated version of the two. The first two controllers employ objective functions of one variable, which is the included angle between the user’s heading vector and the target vector originating from the user’s physical position. The optimized angle represents the physical cell that is best aligned with the virtual cell or the target vector on which the designated point has the minimum APF value. The derived optimized angle is used to finely set RDW gains. The two objective functions can be optimized simultaneously, leading to an integrated controller that is potentially able to take advantage of the alignment-based controller and APF-based controller. Through extensive simulation-based studies, we found that the proposed alignment-based and integrated controllers significantly outperform the state-of-the-art controllers and the proposed APF based controller in terms of the number of resets. Furthermore, the proposed alignment controller and integrated controller provide a more uniform likelihood distribution across distance between resets, as compared to the other controllers.
Enhancing Seamless Walking in Virtual Reality: Application of Bone-Conduction Vibration in Redirected Walking
Seokhyun Hwang, Gwangju Institute of Science and Technology;
YoungIn Kim, Gwangju Institute of Science and Technology;
Youngseok Seo, Gwangju Institute of Science and Technology;
SeungJun Kim, Gwangju Institute of Science and Technology
Toggle Abstract
This study explored bone-conduction vibration (BCV) in redirected walking (RDW), a technology for seamless walking in large virtual spaces within confined physical areas, enhancing obstacle avoidance performance using nonelectrical vestibular stimulation without the side effects caused by electrical stimulation. We proposed four different BCV stimulation methods and evaluated their detection threshold (DT) extension performance and user experience in virtual reality (VR) conditions. The DT was successfully expanded from at least 23% to 45% under all BCV conditions while preserving the immersion and presence. Notably, user comfort increased when content sound was used for vestibular stimulation. Under the extended DT condition, a simulation study demonstrated that all BCV stimulation methods facilitated uninterrupted walking over extended distances when applying RDW to users with random movements. Thus, this research established the viability of using BCV in RDW applications and the potential for incorporating content sound into BCV stimulation techniques.
Edge-Centric Space Rescaling with Redirected Walking for Dissimilar Physical-Virtual Space Registration
Dooyoung Kim, KAIST;
Woontack Woo, KAIST
Toggle Abstract
We propose a novel space-rescaling technique for registering dissimilar physical-virtual spaces by utilizing the effects of adjusting physical space with redirected walking. Achieving a seamless and immersive Virtual Reality (VR) experience requires overcoming the spatial heterogeneities between the physical and virtual spaces and accurately aligning the VR environment with the user’s tracked physical space. However, existing space-matching algorithms that rely on one-to-one scale mapping are inadequate when dealing with highly dissimilar physical and virtual spaces, and redirected walking controllers could not utilize basic geometric information from physical space in the virtual space due to coordinate distortion. To address these issues, we apply relative translation gains to partitioned space grids based on the main interactable object’s edge, which enables space-adaptive modification effects of physical space without coordinate distortion. Our evaluation results demonstrate the effectiveness of our algorithm in aligning the main object’s edge, surface, and wall, as well as securing the largest registered area compared to alternative methods under all conditions. These findings can be used to create an immersive play area for VR content where users can receive passive feedback from the plane and edge in their physical environment.
Exploring Visual-Auditory Redirected Walking using Auditory Cues in Reality
Kumpei Ogawa, Tohoku University;
Kazuyuki Fujita, Tohoku University;
Shuichi Sakamoto, Tohoku University;
Kazuki Takashima, Tohoku University;
Yoshifumi Kitamura, Tohoku University;
Toggle Abstract
We examine the effect of auditory cues occurring in reality on redirection. Specifically, we set two hypotheses: the auditory cues emanating from fixed positions in reality (Fixed sound, FS) increase the noticeability of redirection, while the auditory cues whose positions are manipulated consistently with the visual manipulation (Redirected sound, RDS) decrease the noticeability of redirection. To verify these hypotheses, we implemented an experimental environment that virtually reproduced FS and RDS conditions using binaural recording, and then we conducted a user study ( N=18 ) to investigate the detection thresholds (DTs) for rotational manipulation and the sound localization accuracy of the auditory cues under FS and RDS, as well as the baseline condition without auditory cues (No sound, NS). The results show, against the hypotheses, FS gave a wider range of DTs than NS, while RDS gave a similar range of DTs to NS. Combining these results with those of sound localization accuracy reveals that, rather than the auditory cues affecting the participants’ spatial perception in VR, the visual manipulation made their sound localization less accurate, which would be a reason for the increased range of DTs under FS. Furthermore, we conducted a follow-up user study ( N=11 ) to measure the sound localization accuracy of FS where the auditory cues were actually placed in a real setting, and we found that the accuracy tended to be similar to that of virtually reproduced FS, suggesting the validity of the auditory cues used in this study. Given these findings, we also discuss potential applications.
PS13: Handwriting, Controller and Menus
13:15 AEDT (UTC+11)
Leigthton Hall
Session Chair: J. Grubert
Toggle Papers
Handwriting for efficient text entry in industrial VR applications: influence of board orientation and sensory feedback on performance.
Nicolas Fourrier, Segula Technologies, Naval and Energy Engineering Research and Innovation Unit;
Guillaume Moreau, IMT Atlantique;
Mustapha Benaouicha, Segula Technologies, Naval and Energy Engineering Research and Innovation Unit;
Jean-Marie Normand, Ecole Centrale de Nantes
Toggle Abstract
Text entry in Virtual Reality (VR) is becoming an increasingly important task as the availability of hardware increases and the range of VR applications widens. This is especially true for VR industrial applications where users need to input data frequently.
Large-scale industrial adoption of VR is still hampered by the productivity gap between entering data via a physical keyboard and VR data entry methods. Data entry needs to be efficient, easy-to-use and to learn and not frustrating. In this paper, we present a new data entry method based on handwriting recognition (HWR). Users can input text by simply writing on a virtual surface. We conduct a user study to determine the best writing conditions when it comes to surface orientation and sensory feedback. This feedback consists of visual, haptic, and auditory cues. We find that using a slanted board with sensory feedback is best to maximize writing speeds and minimize physical demand. We also evaluate the performance of our method in terms of text entry speed, error rate, usability and workload. The results show that handwriting in VR has high entry speed, usability with little training compared to other controller-based virtual text entry techniques. The system could be further improved by reducing high error rates through the use of more efficient handwriting recognition tools. In fact, the total error rate is 9.28% in the best condition. After 40 phrases of training, participants reach
an average of 14.5 WPM, while a group with high VR familiarity reach 16.16 WPM after the same training. The highest observed textual data entry speed is 21.11 WPM.
Evaluating the Performance of Hand-Based Probabilistic Text Input Methods on a Mid-Air Virtual Qwerty Keyboard
John J Dudley, University of Cambridge;
Jingyao Zheng, University of Cambridge;
Aakar Gupta, Meta Inc.;
Hrvoje Benko, Meta Inc.;
Matt Longest, Meta Inc;
Robert Wang, Meta Inc;
Per Ola Kristensson, University of Cambridge
Toggle Abstract
Integrated hand-tracking on modern virtual reality (VR) headsets can be readily exploited to deliver mid-air virtual input surfaces for text entry. These virtual input surfaces can closely replicate the experience of typing on a Qwerty keyboard on a physical touchscreen, thereby allowing users to leverage their pre-existing typing skills. However, the lack of passive haptic feedback, unconstrained user motion, and potential tracking inaccuracies or observability issues encountered in this interaction setting typically degrades the accuracy of user articulations. We present a comprehensive exploration of error-tolerant probabilistic hand-based input methods to support effective text input on a mid-air virtual Qwerty keyboard. Over three user studies we examine the performance potential of hand-based text input under both gesture and touch typing paradigms. We demonstrate typical entry rates in the range of 20 to 30 wpm and average peak entry rates of 40 to 45 wpm
Controllers or Bare Hands? A Controlled Evaluation of Input Techniques on Interaction Performance and Exertion in Virtual Reality
Tiffany Luong, ETH Zürich;
Yi Fei Cheng, ETH Zürich;
Max Möbus, ETH Zürich;
Andreas Rene Fender, ETH Zürich;
Christian Holz, ETH Zürich
Toggle Abstract
Virtual Reality (VR) systems have traditionally required users to operate the user interface with controllers in mid-air. More recent VR systems, however, integrate cameras to track the headset’s position inside the environment as well as the user’s hands when possible. This allows users to directly interact with virtual content in mid-air just by reaching out, thus discarding the need for hand-held physical controllers. However, it is unclear which of these two modalities—controller-based or free-hand interaction—is more suitable for efficient input, accurate interaction, and long-term use under reliable tracking conditions. While interacting with hand-held controllers introduces weight, it also requires less finger movement to invoke actions (e.g., pressing a button) and allows users to hold on to a physical object during virtual interaction.
In this paper, we investigate the effect of VR input modality (controller vs. free-hand interaction) on physical exertion, agency, task performance, and motor behavior across two mid-air interaction techniques (touch, raycast) and tasks (selection, trajectory-tracing). Participants reported less physical exertion, felt more in control, and were faster and more accurate when using VR controllers compared to free-hand interaction in the raycast setting. Regarding personal preference, participants chose VR controllers for raycast but free-hand interaction for mid-air touch. Our correlation analysis revealed that participants’ physical exertion increased with selection speed, quantity of arm motion, variation in motion speed, and bad postures, following ergonomics metrics such as consumed endurance and rapid upper limb assessment. We also found a negative correlation between physical exertion and the participant’s sense of agency, and between physical exertion and task accuracy.
Comparing Gaze, Head and Controller Selection of Dynamically Revealed Targets in Head-mounted Displays
Ludwig Sidenmark, University of Toronto;
Franziska Prummer, Lancaster University;
Joshua Newn, Lancaster University;
Hans Gellersen, Lancaster University
Toggle Abstract
This paper presents a head-mounted virtual reality study that compared gaze, head, and controller pointing for selection of dynamically revealed targets. Existing studies on head-mounted 3D interaction have focused on pointing and selection tasks where all targets are visible to the user. Our study compared the effects of screen width (field of view), target amplitude and width, and prior knowledge of target location on modality performance. Results show that gaze and controller pointing are significantly faster than head pointing and that increased screen width only positively impacts performance up to a certain point. We further investigated the applicability of existing pointing models. Our analysis confirmed the suitability of previously proposed two-component models for all modalities while uncovering differences for gaze at known and unknown target positions. Our findings provide new empirical evidence for understanding input with gaze, head, and controller and are significant for applications that extend around the user.
Compass+Ring: A Multimodal Menu to Improve Interaction Performance and Comfortability in One-handed Scenarios
Xin Chen, Yanshan University;
Dongliang Guo, Yanshan University;
Li Feng, Yanshan University;
Bo Chen, Anhui Agricultural University;
Wei Liu, University of Technology Sydney
Toggle Abstract
In numerous applications, an excellent interface design should allow users to perform secondary tasks as naturally as possible without affecting the main task. Multimodal handheld menus are regularly the preferred user interface that meets the natural switching of primary and secondary tasks. However, existing multimodal handheld menus have some limitations under single-handed conditions, or the comfort needs improvement. To address these issues, this paper proposes a novel multimodal handheld menu: Compass+Ring. The “compass” integrates gesture, gaze, and speech into a pie menu, whereas the “ring” serves as a shortcut menu. The Compass menu improves interaction performance and comfortability in one-handed scenarios, and the Ring menu alleviates eye fatigue when both hands are free. We evaluated five handheld menus: Touch, Gaze+Pinch, Speech+Pinch, Bangles, and Compass+Ring. We first analyze the usability of these menus in three different scenarios, and then conduct a user study about these menus in geometry matching and line drawing tasks. The results show that the Bangles menu and the Compass+Ring menu are more suitable for one-handed scenarios than the other three menus, and the Compass+Ring menu is superior to the Bangles menu in terms of efficiency and hand fatigue. In addition, participants indicate that the Ring menu can reduce eye strain for the Compass menu in two-handed scenarios and increase haptic perception.
A Comparative Evaluation of Tabs and Linked Panels for Program Understanding in Augmented Reality
Lucas Kreber, Trier University;
Stephan Diehl, Trier University
Toggle Abstract
Integrated development environments (IDEs) commonly employ a tab-based interface for displaying source code, which often poses challenges in efficient code navigation and retrieval. Previous research has proposed several novel approaches that have in common that they place code fragments on a 2D canvas and draw visual connections between them. In this paper, we investigate the extension of such interfaces to augmented reality (AR) environments. As AR allows to display information in three dimensions, the restriction to a 2D canvas for placing code fragments is not justified, and we implement it by allowing users to place code panels freely in 3D space. We call the resulting interface linked panels.
We present the results of a quantitative user study conducted with 24 participants, aiming to explore whether the benefits observed for the canvas-based approach in traditional 2D screen environments can be replicated with linked panels in augmented reality. The participants were given tasks to identify and resolve two bugs in two different software projects using the traditional tab-based and the panel-based approaches in AR. To find possible explanations of our quantitative results we also conducted a qualitative analysis evaluating participants’ comments and different placement strategies of panels in the panel-based approach.
Our results indicate that participants found more bugs with the tabs-version, but were equally fast with both tools. We also found that less skilled participants were faster with the tabs, while more skilled ones were faster with the panels. Although, participants experienced problems with the cluttered spatial arrangement of the panels, they preferred the panels version over the tabs version as it made better use of AR.
Leveraging Motion Tracking for Intuitive Interactions in a Tablet-Based 3D Scene Annotation System
Tianyu Song, Technical University of Munich;
Ulrich Eck, Technische Universitaet Muenchen;
Nassir Navab, Technische Universität München
Toggle Abstract
In the rapidly evolving field of computer vision, efficient and accurate annotation of 3D scenes plays a crucial role. While automation has streamlined this process, manual intervention is still essential for obtaining precise annotations. Existing annotation tools often lack intuitive interactions and efficient interfaces, particularly when it comes to annotating complex elements such as 3D bounding boxes, 6D human poses, and semantic relationships in a 3D scene. Therefore, it is often time-consuming and error-prone. Emerging technologies such as augmented reality (AR) and virtual reality (VR) have shown potential to provide an immersive and interactive environment for annotators to label objects and their relationships. However, the cost and accessibility of these technologies can be a barrier to their widespread adoption. This work introduces a novel tablet-based system that utilizes built-in motion tracking to facilitate an efficient and intuitive 3D scene annotation process. The system supports a variety of annotation tasks and leverages the tracking and mobility features of the tablet to enhance user interactions. Through a thorough user study investigating three distinct tasks – creating bounding boxes, adjusting human poses, and annotating scene relationships – we evaluate the effectiveness and usability of two interaction methods: touch-based interactions and hybrid interactions that utilize both touch and device motion tracking. Our results suggest that leveraging the tablet’s motion tracking feature could lead to more intuitive and efficient annotation processes. This work contributes to the understanding of tablet-based interaction and the potential it holds for annotating complex 3D scenes.
Fast and Robust Mid-Air Gesture Typing for AR Headsets using 3D Trajectory Decoding
Junxiao Shen, University of Cambridge ;
John J Dudley, University of Cambridge;
Per Ola Kristensson, University of Cambridge
Toggle Abstract
We present a fast mid-air gesture keyboard for head-mounted optical see-through augmented reality (OST AR) that supports users in articulating word patterns by merely moving their own physical index finger in relation to a virtual keyboard plane without a need to indirectly control a visual 2D cursor on a keyboard plane. To realize this, we introduce a novel decoding method that directly translates users’ three-dimensional fingertip gestural trajectories into their intended text.
We evaluate the efficacy of the system in three studies that investigate various design aspects, such as immediate efficacy, accelerated learning, and whether it is possible to maintain performance without providing visual feedback. We find that the new 3D trajectory decoding design results in significant improvements in entry rates while maintaining low error rates. In addition, we demonstrate that users can maintain their performance even without fingertip and gesture trace visualization.
PS14: Embodiment
13:15 AEDT (UTC+11)
Ritchi Theatre
Session Chair: D. Roth
Toggle Papers
I am a Genius! Influence of Virtually Embodying Leonardo da Vinci on Creative Performance
Geoffrey Gorisse, Arts et Métiers Institute of Technology;
Simon Wellenreiter, Arts et Métiers Institute of Technology;
Sylvain Fleury, Arts et Métiers Institute of Technology;
Anatole Lécuyer, Inria;
Simon Richir, Arts et Métiers Institute of Technology;
Olivier Christmann, Arts et Métiers Institute of Technology
Toggle Abstract
Virtual reality (VR) provides users with the ability to substitute their physical appearance by embodying virtual characters (avatars) using head-mounted displays and motion-capture technologies. Previous research demonstrated that the sense of embodiment toward an avatar can impact user behavior and cognition. In this paper, we present an experiment designed to investigate whether embodying a well-known creative genius could enhance participants’ creative performance. Following a preliminary online survey (N = 157) to select a famous character suited to the purpose of this study, we developed a VR application allowing participants to embody Leonardo da Vinci or a self-avatar. Self-avatars were approximately matched with participants in terms of skin tone and morphology. 40 participants took part in three tasks seamlessly integrated in a virtual workshop. The first task was based on a Guilford’s Alternate Uses test (GAU) to assess participants’ divergent abilities in terms of fluency and originality. The second task was based on a Remote Associates Test (RAT) to evaluate convergent abilities. Lastly, the third task consisted in designing potential alternative uses of an object displayed in the virtual environment using a 3D sketching tool. Participants embodying Leonardo da Vinci demonstrated significantly higher divergent thinking abilities, with a substantial difference in fluency between the groups. Conversely, participants embodying a self-avatar performed significantly better in the convergent thinking task. Taken together, these results promote the use of our virtual embodiment approach, especially in applications where divergent creativity plays an important role, such as design and innovation.
Beyond my Real Body: Characterization, Impacts, Applications and Perspectives of “Dissimilar” Avatars in Virtual Reality
Antonin Cheymol, Univ Rennes, INSA Rennes, Inria, CNRS, IRISA;
Rebecca Fribourg, Ecole Centrale de Nantes / Laboratoire AAU – équipe CRENAU;
Ferran Argelaguet Sanz, Inria;
Jean-Marie Normand, Ecole Centrale de Nantes;
Anatole Lécuyer, Inria
Toggle Abstract
In virtual reality, the avatar – the user’s digital representation – is an important element which can drastically influence the immersive experience. In this paper, we especially focus on the use of “dissimilar” avatars i.e., avatars diverging from the real appearance of the user, whether they preserve an anthropomorphic aspect or not. \replace{Previous}{Various} studies reported that dissimilar avatars can \replace{positively impact}{have diverse positive impacts on} the user experience, in terms for example of interaction, perception \replace{or}{and} behaviour. However, given the sparsity and multi-disciplinary character of research related to dissimilar avatars, it tends to lack common understanding and methodology, hampering the establishment of novel knowledge on this topic. In this paper, we propose to address these limitations by discussing: (i) a methodology for dissimilar avatars characterization, (ii) their impacts on the user experience, (iii) their different fields of application, and finally, (iv) future research direction on this topic. Taken together, we believe that this paper can support future research related to dissimilar avatars, and help designers of VR applications to leverage dissimilar avatars appropriately.
“To Be or Not to Be Me?”: Exploration of Self-Similar Effects of Avatars on Social Virtual Reality Experiences
Hayeon Kim, Yonsei University;
Jinhyung Park, Yonsei University;
In-Kwon Lee, Yonsei University
Toggle Abstract
Growing interest in the self-similarity effect of avatars in virtual reality (VR) has spurred the creation of realistic avatars that closely mirror their users. However, despite extensive research on the self-similarity effect in single-user VR environments, our understanding of its impact in social VR settings remains underdeveloped. This shortfall exists despite the unique socio-psychological phenomena arising from the illusion of embodiment that could potentially alter these effects. To fill this gap, this paper provides an in-depth empirical investigation of how avatars’ self-similarity influences social VR experiences. Our research uncovers several notable findings: 1) A high level of avatar self-similarity boosts users’ sense of embodiment and social presence but have minimal effects on the overall presence and even slightly hinders immersion. These results are driven by increased self-awareness. 2) Among various factors that contribute to the self-similarity of avatars, voice stands out as a significant influencer of social VR experiences, surpassing other representational factors. 3) The impact of avatar self-similarity shows negligible differences between male and female users. Based on these findings, we discuss the pros and cons of incorporating self-similarity into social VR avatars. Our study serves as a foundation for further research in this field.
Is that my Heartbeat? Measuring and Understanding Modality-dependent Cardiac Interoception in Virtual Reality
Abdallah El Ali, Centrum Wiskunde & Informatica (CWI);
Rayna Ney, University of Amsterdam;
Zeph M.C. van Berlo, University of Amsterdam;
Pablo Cesar, Centrum Wiskunde & Informatica (CWI)
Toggle Abstract
Measuring interoception (‘perceiving internal bodily states’) has diagnostic and wellbeing implications. Since heartbeats are distinct and frequent, various methods aim at measuring cardiac interoceptive accuracy (CIAcc). However, the role of exteroceptive modalities for representing heart rate (HR) across screen-based and Virtual Reality (VR) environments remains unclear. Using a PolarH10 HR monitor, we develop a modality-dependent cardiac recognition task that modifies displayed HR. In a mixed-factorial design (N=50), we investigate how task environment (Screen, VR), modality (Audio, Visual, Audio-Visual), and real-time HR modifications (±15%, ±30%, None) influence CIAcc, interoceptive awareness, mind-body measures, VR presence, and post-experience responses. Findings showed that participants confused their HR with underestimates up to 30%; environment did not affect CIAcc but influenced mind-related measures; modality did not influence CIAcc, however including audio increased interoceptive awareness; and VR presence inversely correlated with CIAcc. We contribute a lightweight and extensible cardiac interoception measurement method, and implications for biofeedback displays.
Would You Go to a Virtual Doctor? A Systematic Literature Review on User Preferences for Embodied Virtual Agents in Healthcare
Lucie Kruse, Universität Hamburg;
Julia Hertel, University of Hamburg;
Fariba Mostajeran, Universität Hamburg;
Susanne Schmidt, Universität Hamburg;
Frank Steinicke, Universität Hamburg
Toggle Abstract
Medical virtual agents (VAs) hold great potential to support patients in achieving their health goals, especially at times or in regions where the demand for physiological and psychological therapy exceeds the capacity of medical services. To create an accepted complement to on-site diagnosis, treatment, and counseling, it is critical to understand the impact of factors such as the agent’s visual representation, behavior, and responsibilities on creating a trustworthy human-agent relationship.
To gain insights into these factors, we conducted a systematic literature review including 59 papers on embodied VAs in the medical domain. Our review focused on the application fields and the role of VAs in medicine, as well as the technology used to display them. Using thematic analysis, we discuss our findings in terms of user preferences, as well as potentials and barriers faced in the interaction with medical VAs. Concerning the visual representation, the users’ wish for customization in terms of appearance and communication modalities was pointed out. It was also important that the agent’s information builds up on trustworthy sources, that they are motivating and adapted to the users’ knowledge. Finally, our results identify research gaps, in particular regarding the technological implementation and the use of artificial intelligence.
A Comparative Evaluation of AR Embodiments vs. Videos and Figures for Learning Bead Weaving
Peter Haltner, Dalhousie University;
Rowland Ndamzi Goddy-Worlu, Dalhousie University;
Claire Nicholas, The University of Oklahoma;
James Forren, Dalhousie University;
Derek Reilly, Dalhousie University
Toggle Abstract
The most common learning materials for handcraft today are videos and figures, which are limited in their ability to express embodied knowledge as an in-person tutor could. We developed WeavAR, an application for headworn augmented reality (AR) displays designed to teach basic bead weaving patterns. WeavAR combines virtual 3D hands showing weaving sequences recorded from an experienced bead weaver and a dynamic 3D bead model showing how the work progresses. Using a mixed within/between-subjects user study (n=30), we compared learning materials (AR to videos and figures) and learning material placement (in the area of work or to the side). Results show that the AR learning materials had comparable effectiveness to video and figures. Hand visualizations were found to lack crucial context, however, making them less useful than the 3D bead model. Extra measures to prevent obstruction are required when placing learning materials at the area of work.
“If It’s Not Me It Doesn’t Make a Difference” – The Impact of Avatar Personalization on User Experience and Body Awareness in Virtual Reality
Nina Döllinger, University of Würzburg;
Matthias Beck, University of Würzburg;
Erik Wolf, University of Würzburg;
David Mal, University of Würzburg;
Mario Botsch, TU Dortmund University;
Marc Erich Latoschik, University of Würzburg;
Carolin Wienrich, University of Würzburg
Toggle Abstract
Body awareness is relevant for the efficacy of psychotherapy. However, previous work on virtual reality (VR) and avatar-assisted therapy has often overlooked it. We investigated the effect of avatar individualization on body awareness in the context of VR-specific user experience, including sense of embodiment (SoE), plausibility, and sense of presence (SoP). In a between-subject design, 86 participants embodied three avatar types and engaged in VR movement exercises. The avatars were (1) generic and gender-matched, (2) customized from a set of pre-existing options, or (3) personalized photorealistic scans.
Compared to the other conditions, participants with personalized avatars reported increased SoE, yet higher eeriness and reduced body awareness. Further, SoE and SoP positively correlated with body awareness across conditions.
Our results indicate that VR user experience and body awareness do not always dovetail and do not necessarily predict each other. Future research should work towards a balance between body awareness and SoE.
Sensory Attenuation with a Virtual Robotic Arm Controlled Using Facial Movements
Masaaki Fukuoka, Faculty of Science and Technology, Keio University;
Fumihiko Nakamura, Faculty of Science and Technology, Keio University;
Adrien Verhulst, Sony Computer Science Laboratories and the Faculty of Science and Technology, Keio University;
Masahiko Inami, Department of Advanced Interdisciplinary Studies, The University of Tokyo;
Michiteru Kitazaki, Department of Computer Science and Engineering, Toyohashi University of Technology;
Toggle Abstract
When humans generate stimuli voluntarily, they perceive the stimuli more weakly than those produced by others, which is called sensory attenuation (SA). SA has been investigated in various body parts, but it is unclear whether an extended body induces SA. This study investigated the SA of audio stimuli generated by an extended body. SA was assessed using a sound comparison task in a virtual environment. We prepared robotic arms as extended bodies, and the robotic arms were controlled by facial movements. To evaluate the SA of robotic arms, we conducted two experiments. Experiment 1 investigated the SA of the robotic arms under four conditions. The results showed that robotic arms manipulated by voluntary actions attenuated audio stimuli. Experiment 2 investigated the SA of the robotic arm and innate body under five conditions. The results indicated that the innate body and robotic arm induced SA, while there were differences in the sense of agency between the innate body and robotic arm. Analysis of the results indicated three findings regarding the SA of the extended body. First, controlling the robotic arm with voluntary actions in a virtual environment attenuates the audio stimuli. Second, there were differences in the sense of agency related to SA between extended and innate bodies. Third, the SA of the robotic arm was correlated with the sense of body ownership.
PS15: Data Vizualization and Immersive Analytics
13:15 AEDT (UTC+11)
CivEng 109
Session Chair: N. Elmqvist
Toggle Papers
Exploring Trajectory Data in Augmented Reality: A Comparative Study of Interaction Modalities
Lucas Joos, University of Konstanz;
Karsten Klein, University of Konstanz;
Maximilian T. Fischer, University of Konstanz;
Frederik L. Dennig, University of Konstanz;
Daniel Keim, University of Konstanz;
Michael Krone, University of Tübingen
Toggle Abstract
The visual exploration of trajectory data is crucial in domains such as animal behavior, molecular dynamics, and transportation. With the emergence of immersive technology, trajectory data, which is often inherently three-dimensional, can be analyzed in stereoscopic 3D, providing new opportunities for perception, engagement, and understanding. However, the interaction with the presented data remains a key challenge. While most applications depend on hand tracking, we see eye tracking as a promising yet under-explored interaction modality, while challenges such as imprecision or inadvertently triggered actions need to be addressed. In this work, we explore the potential of eye gaze interaction for the visual exploration of trajectory data within an AR environment. We integrate hand- and eye-based interaction techniques specifically designed for three common use cases and address known eye tracking challenges. We refine our techniques and setup based on a pilot user study (n=6) and find in a follow-up study (n=20) that gaze interaction can compete with hand-tracked interaction regarding effectiveness, efficiency, and task load for selection and cluster exploration tasks. However, time step analysis comes with higher answer times and task load. In general, we find the results and preferences to be user-dependent. Our work contributes to the field of immersive data exploration, underscoring the need for continued research on eye tracking interaction.
Evaluating the Feasibility of Predicting Information Relevance During Sensemaking with Eye Gaze Data
Ibrahim Asadullah Tahmid, Virginia Tech;
Lee Lisle, Virginia Tech;
Kylie Davidson, Virginia Tech;
Kirsten Whitley, Department of Defense;
Chris North, Virginia Tech;
Doug Bowman, Virginia Tech
Toggle Abstract
Eye gaze patterns vary based on reading purpose and complexity, and can provide insights into a reader’s perception of the content.
We hypothesize that during a complex sensemaking task with many text-based documents, we will be able to use eye-tracking data to predict the importance of documents and words, which could be the basis for intelligent suggestions made by the system to an analyst.
We introduce a novel eye-gaze metric called `GazeScore’ that predicts an analyst’s perception of the relevance of each document and word when they perform a sensemaking task.
We conducted a user study to assess the effectiveness of this metric and found strong evidence that documents and words with high GazeScores are perceived as more relevant, while those with low GazeScores were considered less relevant.
We explore potential real-time applications of this metric to facilitate immersive sensemaking tasks by offering relevant suggestions.
Who Did What When? Discovering Complex Historical Interrelations in Immersive Virtual Reality
Melanie Derksen, TU Dortmund University;
Julia Becker, Bielefeld University;
Mohammad Fazleh Elahi, Universität Bielefeld;
Angelika Maier, Bielefeld University;
Marius Maile, Bielefeld University;
Ingo Oliver Pätzold, Bielefeld University;
Jonas Penningroth, Bielefeld University;
Bettina Reglin, Bielefeld University;
Markus Rothgänger, Bielefeld University;
Philipp Cimiano, Bielefeld University;
Erich Schubert, TU Dortmund University;
Silke Schwandt, Bielefeld University;
Torsten Wolfgang Kuhlen, RWTH Aachen University;
Mario Botsch, TU Dortmund University;
Tim Weissker, RWTH Aachen University
Toggle Abstract
Traditional digital tools for exploring historical data mostly rely on conventional 2D visualizations, which often cannot reveal all relevant interrelationships between historical fragments (e.g., persons or events). In this paper, we present a novel interactive exploration tool for historical data in VR, which represents fragments as spheres in a 3D environment and arranges them around the user based on their temporal, geo, categorical and semantic similarity. Quantitative and qualitative results from a user study with 29 participants revealed that most participants considered the virtual space and the abstract fragment representation well-suited to explore historical data and to discover complex interrelationships. These results were particularly underlined by high usability scores in terms of attractiveness, stimulation, and novelty, while researching historical facts with our system did not impose unexpectedly high task loads. Additionally, the insights from our post-study interviews provided valuable suggestions for future developments to further expand the possibilities of our system.
A Closer Look at Dynamic Medical Visualization Techniques
Alejandro Martin-Gomez, Johns Hopkins University;
Felix Merkl, University Hospital, LMU Munich;
Alexander Winkler, Technical University of Munich;
Ulrich Eck, Technische Universitaet Muenchen;
Christian Heiliger, Ludwig-Maximilians-University Hospital;
Konrad Karcz, Ludwig-Maximilians-University Hospital;
Nassir Navab, Technische Universität München
Toggle Abstract
In navigated surgery, physicians perform complex tasks assisted by virtual representations of anatomical structures and surgical tools. Integrating Augmented Reality (AR) in these scenarios enriches the information presented to the surgeon through a range of visualization techniques. Their selection is a crucial task as they represent the primary interface between the system and the surgeon.
In this work, we present a novel approach to conveying augmented content using dynamic visualization techniques, allowing users to gather depth and shape information from both pictorial and kinetic cues. We conducted user studies comparing two novel dynamic methods – Object Flow and Wave Propagation – and three state-of-the-art static visualization techniques among medical experts. Our studies provide a detailed comparison of the visualization techniques’ efficacy in conveying shape and depth information from medical data, as well as task load and usability reported by the participants and post hoc analyses. We found that kinetic cues can assist users in understanding complex anatomical structures in medical AR.
Exploring Effective Immersive Approaches to Visualizing WiFi
Alexander R Rowden, University of Maryland College Park;
Eric Krokos, US Government;
Kirsten Whitley, US Government;
Amitabh Varshney, University of Maryland
Toggle Abstract
WiFi networks are essential to our daily lives, but their signals are not visible to us. Therefore, it is challenging to evaluate the health of a network or make changes to ensure an optimal configuration. Traditional visualization approaches, such as contour lines, are not intuitive and lead to challenges in the analysis and comprehension of networks. In this paper, we introduce two novel visualizations: Wavelines and Stacked Bars. We then compared these visualizations to the state-of-the-art visualization technique of contour lines. We carried out a user study with 32 participants to validate that our novel visualizations can improve user confidence, accuracy, and completion time for the tasks of router localization, ranking of signal strengths, channel interference, and router coverage. We selected these tasks after extensive discussions with domain experts. We believe that our findings will assist network analysts in visually understanding our increasingly rich signal environments.
Performance Impact of Immersion and Collaboration in Visual Data Analysis
Daniel Garrido, Faculty of Engineering – University of Porto;
João Tiago Jacob, FEUP;
Daniel Castro Silva, FEUP
Toggle Abstract
Immersive Analytics is a recent field of study that focuses on utilizing emerging extended reality technologies to bring visual data analysis from the 2D screen to the real/virtual world. The effectiveness of Immersive Analytics, when compared to traditional systems, has been widely studied in this field’s corpus, usually concluding that the immersive solution is superior. However, when it comes to comparing collaborative to single-user immersive analytics, the literature is lacking in user studies. As such, we developed a comprehensive experimental study with the objective of quantifying and analysing the impact that both immersion and collaboration have on the visual data analysis process. A two-variable (immersion: desktop/virtual reality; number of users: solo/pair) full factorial study was conceived with a mixed design (within-subject for immersion and between subject for number of users). Each of the 24 solo and 24 pairs of participants solved five visual data analysis tasks in both a head-mounted display-based virtual world and a desktop computer environment. The results show that, in terms of task time to completion, there were no significant differences between desktop and virtual reality, or between the solo and pair conditions. However, it was possible to conclude that collaboration is more beneficial the more complex the task is in both desktop and virtual reality, and that for less complex tasks, collaboration can be a hindrance. System Usability Scale scores were significantly better in the virtual reality condition than the desktop one, especially when working in pairs. As for user preference, the virtual reality system was significantly more favoured both as a visual data analysis platform and a collaborative data analysis platform over the desktop system. All supplemental materials are available at https://osf.io/k94u5/.
Evaluating 3D User Interaction Techniques on Spatial Working Memory for 3D Scatter Plot Exploration in Immersive Analytics
Dongyun Han, Utah State University;
Isaac Cho, Utah State University
Toggle Abstract
This work evaluates three 3D user interaction techniques to investigate their visuo-spatial working memory support for users’ data exploration in immersive analytics. Two techniques are the common VR locomotion technique, Walking and Teleportation, while the other one is Grab, an object manipulation technique. We present two formal user studies in VR and AR. Our study is designed based on the Corsi block-tapping Task, a psychological test for assessing visuo-spatial working memory. Our study results show that Walking supports spatial memory best, and Grab follows. Though Teleportation is found to support it the least, participants rated Teleportation as the easiest way to move in the VR study. We also compare the Walking and Grab results in the VR and AR studies and discuss differences. At last, we discuss our limitations and future work.
Uncovering Best Practices in Immersive Space to Think
Kylie Davidson, Virginia Tech;
Lee Lisle, Virginia Tech;
Ibrahim Asadullah Tahmid, Virginia Tech;
Kirsten Whitley, Department of Defense;
Chris North, Virginia Tech;
Doug Bowman, Virginia Tech
Toggle Abstract
As immersive analytics research becomes more popular, user studies have been aimed at evaluating the strategies and layouts of users’ sensemaking during a single focused analysis task. However, approaches to sensemaking strategies and layouts are likely to change as users become more familiar/proficient with the immersive analytics tool. In our work, we build upon an existing immersive analytics approach–Immersive Space to Think–to understand how schemas and strategies for sensemaking change across multiple analysis tasks. We conducted a user study with 14 participants who completed three different sensemaking tasks during three separate sessions. We found significant differences in the use of space and strategies for sensemaking across these sessions and correlations between participants’ strategies and the quality of their sensemaking. Using these findings, we propose guidelines for effective analysis approaches within immersive analytics systems for document-based sensemaking.
PS16: Visual Feedback
16:00 AEDT (UTC+11)
CivEng 109
Session Chair: A. Dey
Toggle Papers
Task-dependent Visual Behavior in Immersive Environments: A Comparative Study of Free Exploration, Memory and Visual Search
Sandra Malpica, Universidad de Zaragoza, I3A;
Daniel Martin, Universidad de Zaragoza;
Diego Gutierrez, Universidad de Zaragoza;
Ana Serrano, Universidad de Zaragoza;
Belen Masia, Universidad de Zaragoza
Toggle Abstract
Visual behavior depends on both bottom-up mechanisms, where gaze is driven by the visual conspicuity of the stimuli, and top-down mechanisms, guiding attention towards relevant areas based on the task or goal of the viewer. While this is well-known, visual attention models often focus on bottom-up mechanisms. Existing works have analyzed the effect of high-level cognitive tasks like memory or visual search on visual behavior; however, they have often done so with different stimuli, methodology, metrics and participants, which makes drawing conclusions and comparisons between tasks particularly difficult. In this work we present a systematic study of how different cognitive tasks affect visual behavior in a novel within-subjects design scheme. Participants performed free exploration, memory and visual search tasks in three different scenes while their eye and head movements were being recorded. We found significant, consistent differences between tasks in the distributions of fixations, saccades and head movements. Our findings can provide insights for practitioners and content creators designing task-oriented immersive applications.
More Arrows in the Quiver: Investigating the Use of Auxiliary Models to Localize In-view Components with Augmented Reality
Sara Romano, Polytechnic University of Bari;
Enricoandrea Laviola, Polytechnic University of Bari;
Michele Gattullo, Polytechnic University of Bari;
Michele Fiorentino, Polytechnic University of Bari;
Antonio E. Uva, Polytechnic University of Bari
Toggle Abstract
The creation and management of content are among the main open issues for the spread of Augmented Reality. In Augmented Reality interfaces for procedural tasks, a key authoring strategy is chunking instructions and using optimized visual cues, i.e., tailored to the specific information to convey. Nevertheless, research works rarely present rationales behind their choice. This work aims to provide design guidelines for the localization of in-view and not occluded components, which is recurrent information in technical documentation. Previous studies revealed that the most suited visual cues to convey this information are auxiliary models, i.e., abstract shapes that highlight the space region where the component is located. Among them, 3D arrows are widely used, but they may produce ambiguity of information. Furthermore, from the literature, it is unclear how to design auxiliary model shapes and if they are affected by the component shapes. To fill this gap, we conducted two user studies. In the first study, we collected the preference of 45 users regarding the shape, color, and animation of auxiliary models for the localization of various component shapes. According to the results of this study, we defined guidelines for designing optimized auxiliary models based on the component shapes. In the second user study, we validated these guidelines by evaluating the performance (localization time and recognition accuracy) and user experience of 24 users. The results of this study allowed us to confirm that designing auxiliary models following our guidelines leads to a higher recognition accuracy and user experience than using 3D arrows.
Visual Cues for a Steadier You: Visual Feedback Methods Improved Standing Balance in Virtual Reality for People With Balance Impairments
M. Rasel Mahmud, The University of Texas At San Antonio;
Alberto Cordova, University of Texas – San Antonio;
John Quarles, University of Texas at San Antonio
Toggle Abstract
Virtual reality (VR) users frequently have balance problems while utilizing Head-Mounted Displays (HMDs) because HMDs obstruct their ability to see the real world. This has a greater impact on people with balance impairments since many rely more heavily on their visual cues to keep their balance. This is a significant obstacle to the universal usability and accessibility of VR. Although previous studies have verified the imbalance issue, not much work has been done to diminish it. In this study, we investigated how to increase VR balance by utilizing additional visual cues. To examine how different visual approaches (static, rhythmic, spatial, and center of pressure (CoP) based feedback) affect balance in VR, we recruited 100 people (50 with balance impairments due to multiple sclerosis and 50 without balance impairments) across two different geographic locations (United States and Bangladesh). All people completed both standing visual exploration as well as standing reach and grasp tasks. Results demonstrated that static, rhythmic, and CoP visual feedback approaches enhanced balance significantly (p < .05) in VR for people with balance impairments. The methods described in this study could be applied to design more accessible virtual environments for people with balance impairments.
Understanding Effects of Visual Feedback Delay in AR on Fine Motor Surgical Tasks
Talha Khan, University of Pittsburgh;
Toby Shen Zhu, University of Pittsburgh Medical Center;
Thomas Downes, University of Pittsburgh;
Lucille Cheng, University of Pittsburgh Medical Center;
Nicolás Matheo Kass, University of Pittsburgh Medical Center;
Jacob Biehl, University of Pittsburgh;
Edward Andrews, University of Pittsburgh Medical Center
Toggle Abstract
Latency is a pervasive issue in various systems that can significantly impact motor performance and user perception. In medical settings, latency can hinder surgeons’ ability to quickly correct movements, resulting in an experience that doesn’t align with user expectations and standards of care. Despite numerous studies reporting on the negative effects of latency, there is still a gap in understanding how it impacts the use of augmented reality (AR) in medical settings. This study aims to address this gap by examining how latency impacts motor task performance and subjective perceptions, such as cognitive load, on two display types: a monitor display, traditionally used inside an operating room (OR), and a Microsoft HoloLens 2 display. Our findings indicate that both level of latency and display type impact motor performance, and higher latencies on the HoloLens result in relatively poor performance. However, cognitive load was found to be unrelated to display type or latency, but was dependent on the surgeon’s training level. Surgeons did not compromise accuracy to gain more speed and were generally well aware of the latency in the system irrespective of their performance on task. Our study provides valuable insights into acceptable thresholds of latency for AR displays and proposes design implications for the successful implementation and use of AR in surgical settings.
Exploring the Impact of User and System Factors on Human-AI Interactions in Head-Worn Displays
Feiyu Lu, Meta;
Yan Xu, Meta;
Xuhai Xu, Meta Platform;
Brennan Jones, Meta;
Laird M Malamed, Meta Platforms
Toggle Abstract
Empowered by the rich sensory capabilities and the advancements in artificial intelligence (AI), head-worn displays (HWD) could understand the user’s contexts and provide just-in-time assistance to users’ tasks to augment their everyday lives. However, there has been limited understanding of how users perceive interacting with AI services, and how different factors impact the user experience in HWD applications. In this research, we investigated broadly what user and system factors play important roles in human-AI experiences during an AI-assisted spatial task. We conducted a user study to simulate an everyday scenario where augmented reality (AR) glasses could provide suggestions/assistance. We researched three AI system factors (performance, initiation, transparency) with multiple user factors (personality traits, trust propensity, and prior trust with AI). We not only identified the impact of user traits such as the levels of conscientiousness and prior trust with the AI, but also found interesting interactions between them and system factors such as AI’s performance and initiation strategy. Based on the findings, we suggest that future AI assistance on HWD needs to take users’ individual characteristics into account and customize the system design accordingly.
Cueing Sequential 6DoF Rigid-Body Transformations in Augmented Reality
Jen-Shuo Liu, Columbia University;
Barbara Tversky, Columbia Teachers College;
Steven Feiner, Columbia University
Toggle Abstract
Augmented reality (AR) has been used to guide users in multi-step tasks, providing information about the current step (cueing) or future steps (precueing). However, existing work exploring cueing and precueing a series of rigid-body transformations requiring rotation has only examined one-degree-of-freedom (DoF) rotations alone or in conjunction with 3DoF translations. In contrast, we address sequential tasks involving 3DoF rotations and 3DoF translations. We built a testbed to compare two types of visualizations for cueing and precueing steps. In each step, a user picks up an object, rotates it in 3D while translating it in 3D, and deposits it in a target 6DoF pose. Action-based visualizations show the actions needed to carry out a step and goal-based visualizations show the desired end state of a step. We conducted a user study to evaluate these visualizations and the efficacy of precueing. Participants performed better with goal-based visualizations than with action-based visualizations, and most effectively with goal-based visualizations aligned with the Euler axis. However, only a few of our participants benefited from precues, most likely because of the cognitive load of 3D rotations.
Interaction between AR Cue Types and Environmental Conditions in Autonomous Vehicles
Somin Kim, Hanyang University;
Myeongul Jung, Hanyang University;
Jiwoong Heo, Hanyang University;
Kwanguk Kim, Hanyang University
Toggle Abstract
As one of autonomous vehicles, conditional autonomous vehicles is expected to become popular in the near future. Conditional autonomous vehicles can send a take-over request (TOR) to a driver, and if they are immersed in non-driving-related tasks (NDRT), they will struggle to accommodate this request. Previous studies have shown that providing augmented reality (AR) information on traffic situations (status cues) or driver actions (command cues) can improve TOR performance. However, we are not aware of any studies comparing the types of AR cues (state versus command cues) and their interactions with environmental factors. Therefore, the current study investigated this and evaluated the TOR performance of 42 drivers. We used a 2 (environments: day and night) × 4 (AR cue types: without, status, command, and combined cues) mixed-subject experimental design, and dependent measures included driving, cognitive, and NDRT performances. The results suggest that overall driving and cognitive performance were significantly improved by the command AR cue. In contrast, the status AR cue improved the TOR performance in nighttime environments. The performance of AR cues can vary depending on environmental factors, and AR cue designs for autonomous vehicles should consider this interaction for successful collaboration between drivers and vehicles.
Thursday 19 Oct
PS17: Scene Representation and Reconstruction
11:00 AEDT (UTC+11)
Leigthton Hall
Session Chair: K. Ponto
Toggle Papers
Multi-layer Scene Representation from Composed Focal Stacks
Reina Ishikawa, Keio University;
Hideo Saito, Keio University;
Denis Kalkofen, Flinders University;
Shohei Mori, Graz University of Technology
Toggle Abstract
Multi-layer images are a powerful scene representation for high-performance rendering in virtual/augmented reality (VR/AR). The major approach to generate such images is to use a deep neural network trained to encode colors and alpha values of depth certainty on each layer using registered multi-view images. A typical network is aimed at using a limited number of nearest views. Therefore, local noises in input images from a user-navigated camera deteriorate the final rendering quality and interfere with coherency over view transitions. We propose to use a focal stack composed of multi-view inputs to diminish such noises. We also provide theoretical analysis for ideal focal stacks to generate multi-layer images. Our results demonstrate the advantages of using focal stacks in coherent rendering, memory footprint, and AR-supported data capturing. We also show three applications of imaging for VR.
Visual ScanPath Transformer: Guiding Computers to See the World
Mengyu Qiu, Nanjing University of Aeronautics and Astronautics;
Rong Quan, Nanjing University of Aeronautics and Astronautics;
Dong Liang, NUAA;
Huawei Tu, La Trobe University
Toggle Abstract
We propose to exploit the scanpath prediction technology to simulate human visual system to automatically generate gaze scanpaths for VR/AR applications, to alleviate the equipment and computational cost in foveated rendering. Specifically, we propose a novel deep learning-based scanpath prediction model called Visual ScanPath Transformer (VSPT), to predict human gaze scanpaths in both free viewing and task-driven viewing situations, based on which the VR/AR systems can execute foveated rendering rapidly and cheaply. The proposed VSPT first extracts highly task-related image features from the visual scene, and then explores the global dependency relationships among all the image regions to generate each image region a global feature. Next, VSPT simulates the human visual working memory to consider all the previous fixations’ influences when predicting each fixation. Experimental findings confirm that our model exhibits adherence to classical visual principles during saccadic decision-making, surpassing the current state-of-the-art performance in free-viewing and task-driven (goal-driven and question-driven) visual scenarios.
Adaptive Color Structured Light for Calibration and Shape Reconstruction
Xin Dong, SouthWest University;
Haibin Ling, Stony Brook University;
Bingyao Huang, Southwest University
Toggle Abstract
Color structured light (SL) plays an important role in spatial augmented reality and shape reconstruction. Compared to traditional non-color multi-shot SL, it has the advantage of fewer projections, and can even achieve single-shot. However, distortions caused by ambient light and imaging devices limit color SL’s applicability and accuracy. A common solution is to apply color adaptation techniques to cancel the disturbances. Previous studies focus on either robust fixed color patterns or adaptation approaches that may require preliminary geometric calibrations. In this paper, we propose an approach that can efficiently adapt color SL to arbitrary ambient light and imaging devices’ color responses, without device response function calibration or geometric calibration. First, we design a novel algorithm to quickly find the most distinct colors that are easily separable under a new environment and device setup. Then, we design a maximum a posteriori (MAP)-based color detection algorithm that can utilize ambient light and device priors to robustly detect the SL colors. In experiments, our adaptive color SL outperforms previous methods in both calibration and shape reconstruction tasks across a variety of setups.
Vanishing Point Aided Hash-Frequency Encoding for Neural Radiance Fields (NeRF) from Sparse 360° Input
Kai Gu, INRIA;
Thomas Maugey, INRIA;
Sebastian Knorr, Ernst-Abbe University of Applied Sciences;
Christine Guillemot, INRIA
Toggle Abstract
Neural Radiance Fields (NeRF) enable novel view synthesis of 3D scenes when trained with a set of 2D images. One of the key components of NeRF is the input encoding, i.e. mapping the coordinates to higher dimensions to learn high-frequency details, which has been proven to increase the quality. Among various input mappings, hash encoding is gaining increasing attention for its efficiency. However, its performance on sparse inputs is limited. To address this limitation, we propose a new input encoding scheme that improves hash-based NeRF for sparse inputs, i.e. few and distant cameras, specifically for 360° view synthesis. In this paper, we combine frequency encoding and hash encoding and show that this combination can increase dramatically the quality of hash-based NeRF for sparse inputs. Additionally, we explore scene geometry by estimating vanishing points in omnidirectional images (ODI) of indoor and city scenes in order to align frequency encoding with scene structures. We demonstrate that our vanishing point-aided scene alignment further improves deterministic and non-deterministic encodings on image regression and NeRF tasks where sharper textures and more accurate geometry of scene structures can be reconstructed.
SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep Multi-View Stereo
Yingye Xin, Technical University of Munich;
Xingxing Zuo, Technical University of Munich;
Dongyue Lu, Technical University of Munich;
Stefan Leutenegger, Technical University of Munich
Toggle Abstract
We present a real-time visual-inertial dense mapping method capable of performing incremental 3D mesh reconstruction with high quality using only sequential monocular images and inertial measurement unit (IMU) readings. 6-DoF camera poses are estimated by a robust feature-based visual-inertial odometry (VIO), which also generates noisy sparse 3D map points as a by-product. We propose a sparse point aided multi-view stereo neural network (SPA-MVSNet) that can effectively leverage the informative but noisy sparse points from the VIO system. The sparse depth from VIO is firstly completed by a single-view depth completion network. This dense depth map, although naturally limited in accuracy, is then used as a prior to guide our MVS network in the cost volume generation and regularization for accurate dense depth prediction. Predicted depth maps of keyframe images by the MVS network are incrementally fused into a global map using TSDF-Fusion. We extensively evaluate both the proposed SPA-MVSNet and the entire dense mapping system on several public datasets as well as our own dataset, demonstrating the system’s impressive generalization capabilities and its ability to deliver high-quality 3D reconstruction online. Our proposed dense mapping system achieves a 39.7% improvement in F-score over existing systems when evaluated on the challenging scenarios of the EuRoC dataset.
EV-LFV: Synthesizing Light Field Event Streams from an Event Camera and Multiple RGB Cameras
Zhicheng Lu, Beijing Technology and Business University;
Xiaoming Chen, Beijing Technology and Business University;
Yuk Ying Chung, The University of Sydney;
Weidong Cai, The University of Sydney;
Yiran Shen, Shandong University
Toggle Abstract
Light field videos captured in RGB frames (RGB-LFV) can provide users with a 6 degree-of-freedom immersive video experience by capturing dense multi-subview video. Despite its potential benefits, the processing of dense multi-subview video is extremely resource-intensive, which currently limits the frame rate of RGB-LFV (i.e., lower than 30 fps) and results in blurred frames when capturing fast motion. To address this issue, we propose leveraging event cameras, which provide high temporal resolution for capturing fast motion. However, the cost of current event camera models makes it prohibitive to use multiple event cameras for RGB-LFV platforms. Therefore, we propose EV-LFV, an event synthesis framework that generates full multi-subview event-based RGB-LFV with only one event camera and multiple traditional RGB cameras. EV-LFV utilizes spatial-angular convolution, ConvLSTM, and Transformer to model RGB-LFV’s angular features, temporal features, and long-range dependency, respectively, to effectively synthesize event streams for RGB-LFV. To train EV-LFV, we construct the first event-to-LFV dataset consisting of 200 RGB-LFV sequences with ground-truth event streams. Experimental results demonstrate that EV-LFV outperforms state-of-the-art event synthesis methods for generating event-based RGB-LFV, effectively alleviating motion blur in the reconstructed RGB-LFV.
Be Real in Scale: Swing for True Scale in Dual Camera Mode
Rui Yu, The Pennsylvania State University;
Jian Wang, Snap Inc.;
Sizhuo Ma, Snap Research;
Sharon X. Huang, The Pennsylvania State University;
Gurunandan Krishnan, Snap Research;
Yicheng Wu, Snap Inc.
Toggle Abstract
Many mobile AR apps that use the front-facing camera can benefit significantly from knowing the metric scale of the user’s face. However, the true scale of the face is hard to measure because monocular vision suffers from a fundamental ambiguity in scale. The methods based on prior knowledge about the scene either have a large error or are not easily accessible. In this paper, we propose a new method to measure the face scale by a simple user interaction: the user only needs to swing the phone to capture two selfies while using the recently popular Dual Camera mode. This mode allows simultaneous streaming of the front camera and the rear cameras and has become a key feature in many social apps. A computer vision method is applied to first estimate the absolute motion of the phone from the images captured by two rear cameras, and then calculate the point cloud of the face by triangulation. We develop a prototype mobile app to validate the proposed method. Our user study shows that the proposed method is favored compared to existing methods because of its high accuracy and ease of use. Our method can be built into Dual Camera mode and can enhance a wide range of applications (e.g., virtual try-on for online shopping, true-scale 3D face modeling, gaze tracking, and face anti-spoofing) by introducing true scale to smartphone-based XR. The code is available at https://github.com/ruiyu0/Swing-for-True-Scale.
PS18: Avatars
11:00 AEDT (UTC+11)
Ritchi Theatre
Session Chair: F. Steinicke
Toggle Papers
Injured Avatars: The Impact of Embodied Anatomies and Virtual Injuries on Well-being and Performance
Constantin Kleinbeck, Friedrich-Alexander Universität Erlangen-Nürnberg;
Hannah Schieber, Friedrich-Alexander University;
Julian Kreimeier, Friedrich-Alexander University Erlangen-Nürnberg;
Alejandro Martin-Gomez, Johns Hopkins University;
Mathias Unberath, Johns Hopkins University;
Daniel Roth, Klinikum rechts der Isar of Technical University of Munich
Toggle Abstract
Human cognition relies on embodiment as a fundamental mechanism. Virtual avatars allow users to experience the adaptation, control, and perceptual illusion of alternative bodies. Although virtual bodies have medical applications in motor rehabilitation and therapeutic interventions, their potential for learning anatomy and medical communication remains underexplored.
For learners and patients, anatomy, procedures, and medical imaging can be abstract and difficult to grasp. Experiencing anatomies, injuries, and treatments virtually through one’s own body could be a valuable tool for fostering understanding. This work investigates the impact of avatars displaying anatomy and injuries suitable for such medical simulations. We ran a user study utilizing a skeleton avatar and virtual injuries, comparing to a healthy human avatar as a baseline. We evaluate the influence on embodiment, well-being, and presence with self-report questionnaires, as well as motor performance via an arm movement task.
Our results show that while both anatomical representation and injuries increase feelings of eeriness, there are no negative effects on embodiment, well-being, presence, or motor performance. These findings suggest that virtual representations of anatomy and injuries are suitable for medical visualizations targeting learning or communication without significantly affecting users’ mental state or physical control within the simulation.
Visual Indicators Representing Avatars’ Authenticity in Social Virtual Reality and Their Impacts on Perceived Trustworthiness
Jinghuai Lin, University of Würzburg;
Johrine Cronjé, University of Würzburg;
Carolin Wienrich, University of Würzburg;
Paul Pauli, University of Würzburg;
Marc Erich Latoschik, University of Würzburg
Toggle Abstract
Photorealistic avatars show great potential in social VR and VR collaboration. However, identity and privacy issues are threatening avatars’ authenticity in social VR. In addition to the necessary authentication and protection, effective solutions are needed to convey avatars’ authenticity status to users and thereby enhance the overall trustworthiness. We designed several visual indicators (VIs) using static or dynamic visual effects on photorealistic avatars and evaluated their effectiveness in visualizing avatars’ authenticity status. In this study we explored suitable attributes and designs for conveying the authenticity of photorealistic avatars and influencing their perceived trustworthiness. Furthermore, we investigated how different interactivity levels influence their effectiveness (the avatar was either presented in a static image, an animated video clip, or an immersive virtual environment). Our findings showed that using a full name can increase trust, while most other VIs could decrease users’ trust. We also found that interactivity levels significantly impacted users’ trust and the effectiveness of VIs. Based on our results, we developed design guidelines for visual indicators as effective tools to convey authenticity, as a first step towards the improvement of trustworthiness in social VR with identity management.
RC-SMPL : Real-time Cumulative SMPL-based Avatar Body Generation
Hail Song, Korea Advanced Institute of Science and Technology;
Boram Yoon, UVR Lab, KAIST;
Woojin Cho, UVR Lab, KAIST;
Woontack Woo, KAIST
Toggle Abstract
We present a novel method for avatar body generation that cumulatively updates the texture and normal map in real-time. Multiple images or videos have been broadly adopted to create detailed 3D human models that capture more realistic user identities in both Augmented Reality (AR) and Virtual Reality (VR) environments. However, this approach has a higher spatiotemporal cost because it requires a complex camera setup and extensive computational resources. For lightweight reconstruction of personalized avatar bodies, we design a system that progressively captures the texture and normal values using a single RGBD camera to generate the widely-accepted 3D parametric body model, SMPL-X. Quantitatively, our system maintains real-time performance while delivering reconstruction quality comparable to the state-of-the-art method. Moreover, user studies reveal the benefits of real-time avatar creation and its applicability in various collaborative scenarios. By enabling the production of high-fidelity avatars at a lower cost, our method provides more general way to create personalized avatar in AR/VR applications, thereby fostering more expressive self-representation in the metaverse.
Who’s Watching Me?: Exploring the Impact of Audience Familiarity on Player Performance, Experience, and Exertion in Virtual Reality Exergames
Zixuan Guo, Academy of Film and Creative Technology;
Wenge Xu, Birmingham City University;
Jialin Zhang, Xi’an Jiaotong-Liverpool University;
Hongyu Wang, Xi’an Jiaotong-Liverpool University;
Cheng-Hung Lo, Xi’an Jiaotong-Liverpool University;
Hai-Ning Liang, Xi’an Jiaotong-Liverpool University
Toggle Abstract
Familiarity with audiences plays a significant role in shaping individual performance and experience across various activities in everyday life. This study delves into the impact of familiarity with non-playable character (NPC) audiences on player performance and experience in virtual reality (VR) exergames. By manipulating of NPC appearance (face and body shape) and voice familiarity, we explored their effect on game performance, experience, and exertion. The findings reveal that familiar NPC audiences have a positive impact on performance, creating a more enjoyable gaming experience, and leading players to perceive less exertion. Moreover, individuals with higher levels of self-consciousness exhibit heightened sensitivity to the familiarity with NPC audiences. Our results shed light on the role of familiar NPC audiences in enhancing player experiences and provide insights for designing more engaging and personalized VR exergame environments.%
Using Identification with AR Face Filters to Predict Explicit & Implicit Gender Bias
Marie A. Jarrell, IMT Atlantique, Lab STICC;
Etienne Peillard, Lab-STICC
Toggle Abstract
Augmented Reality (AR) filters, such as those used by social media platforms like Snapchat and Instagram, are perhaps the most commonly used AR technology. As with fully immersive Virtual Reality (VR) systems, individuals can use AR to embody different people. This experience in VR has been able to influence real world biases such as sexism. However, there is little to no comparative research on AR embodiment’s impact on societal biases. This study aims to set groundwork by examining possible connections between using gender changing Snapchat AR face filters and a person’s predicted implicit and explicit gender biases. We discovered that participants who experienced identification with gender manipulated versions of themselves showed both greater and lesser amounts of bias against men and women. These results depended the user’s gender, the filter applied, and the level of identification users reported with their AR manipulated selves. The results were similar to past VR findings but offered unique AR observations that could be useful for future bias intervention efforts.
Effects of Remote Avatar Transparency on Social Presence in Task-centric Mixed Reality Remote Collaboration
Boram Yoon, UVR Lab, KAIST;
Jae-eun Shin, KI-ITC ARRC, KAIST;
Hyung-il Kim, KAIST;
Seo Young Oh, UVR Lab, KAIST;
Dooyoung Kim, UVR Lab, KAIST;
Woontack Woo, KAIST
Toggle Abstract
Despite the importance of avatar representation on user experience for Mixed Reality (MR) remote collaboration involving various device environments and large amounts of task-related information, studies on how controlling visual parameters for avatars can benefit users in such situations have been scarce. Thus, we conducted a user study comparing the effects of three avatars with different transparency levels (Nontransparent, Semi-transparent, and Near-transparent) on social presence for users in Augmented Reality (AR) and Virtual Reality (VR) during task-centric MR remote collaboration. Results show that avatars with a strong visual presence are not required in situations where accomplishing the collaborative task is prioritized over social interaction. However, AR users preferred more vivid avatars than VR users. Based on our findings, we suggest guidelines on how different levels of avatar transparency should be applied based on the context of the task and device type for MR remote collaboration.
The Work Avatar Face-Off: Knowledge Worker Preferences for Realism in Meetings
Vrushank Phadnis, Google;
Kristin Moore, Google;
Mar Gonzalez-Franco, Google
Toggle Abstract
While avatars have grown in popularity in social settings, their use in the workplace is still debatable. We conducted a large-scale survey to evaluate knowledge worker sentiment towards avatars, particularly the effects of realism on their acceptability for work meetings. Our survey of 2509 knowledge workers from multiple countries rated five avatar styles for use by managers, known colleagues and unknown colleagues.
In all scenarios, participants favored higher realism, but fully realistic avatars were sometimes perceived as uncanny. Less realistic avatars were rated worse when interacting with an unknown colleague or manager, as compared to a known colleague. Avatar acceptability varied by country, with participants from the United States and South Korea rating avatars more favorably. We supplemented our quantitative findings with a thematic analysis of open-ended responses to provide a comprehensive understanding of factors influencing work avatar choices.
In conclusion, our results show that realism had a significant positive correlation with acceptability. Non-realistic avatars were seen as fun and playful, but only suitable for occasional use.
PS19: Perception 2
11:00 AEDT (UTC+11)
CivEng 109
Session Chair: J. Stefanucci
Toggle Papers
EEG-Based Error Detection Can Challenge Human Reaction Time in a VR Navigation Task
Michael Wimmer, Know-Center GmbH;
Nicole Weidinger, Know-Center GmbH;
Neven ElSayed, Know-Center GmbH;
Gernot R. Müller-Putz, Graz University of Technology;
Eduardo Veas, Graz University of Technology
Toggle Abstract
Error perception is known to elicit distinct brain patterns, which can be used to improve the usability of systems facilitating human-computer interactions, such as brain-computer interfaces. This requires a high-accuracy detection of erroneous events, e.g., misinterpretations of the user’s intention by the interface, to allow for suitable reactions of the system. In this work, we concentrate on steering-based navigation tasks. We present a combined electroencephalography-virtual reality (VR) study investigating different approaches for error detection and simultaneously exploring the corrective human behavior to erroneous events in a VR flight simulation. We could classify different errors allowing us to analyze neural signatures of unexpected changes in the VR. Moreover, the presented models could detect errors faster than participants naturally responded to them. This work could contribute to developing adaptive VR applications that exclusively rely on the user’s physiological information.
QAVA-DPC: Eye-Tracking Based Quality Assessment and Visual Attention Dataset for Dynamic Point Cloud in 6 DoF
Xuemei Zhou, Delft University of Technology (TU Delft) (Technische Universiteit Delft);
Irene Viola, CWI;
Evangelos Alexiou, TNO;
Jack Jansen, CWI;
Pablo Cesar, Delft University of Technology (TU Delft) (Technische Universiteit Delft)
Toggle Abstract
Perceptual quality assessment of Dynamic Point Cloud (DPC) contents plays an important role in various Virtual Reality (VR) applications that involve human beings as the end user, understanding and modeling perceptual quality assessment is greatly enriched by insights from visual attention. However, incorporating aspects of visual attention in DPC quality models is largely unexplored, as ground-truth visual attention data is scarcely available. This paper presents a dataset containing subjective opinion scores and visual attention maps of DPCs, collected in a VR environment using eye-tracking technology. The data was collected during a subjective quality assessment experiment, in which subjects were instructed to watch and rate DPCs at various degradation levels under 6 degrees-of-freedom inspection, using a head-mounted display. The dataset comprises 5 reference DPC contents, with each reference encoded at 3 distortion levels using 3 different codecs, amounting to a total of 9 degraded DPC contents. Moreover, it includes 1,000 gaze trials from 40 participants, resulting in 15,000 visual attention maps in total. The curated dataset can serve as authentic benchmark data for assessing the performance of objective DPC quality metrics. Additionally, it establishes a link between quality assessment and visual attention within the context of DPC. This work deepens our understanding of DPC quality and visual attention, driving progress in the realm of VR experiences and perception.
Deep Learning-based Simulator Sickness Estimation from 3D Motion
Junhong Zhao, Victoria University of Wellington;
Kien T. P. Tran, University of Auckland;
Andrew Chalmers, Victoria University of Wellington;
Weng Khuan Hoh, Victoria University of Wellington;
Richard Yao, Facebook;
Arindam Dey, Meta;
James Wilmott, Facebook;
James Lin, Oculus;
Mark Billinghurst, The University of Auckland;
Robert W. Lindeman, University of Canterbury;
Taehyun James Rhee, Victoria University of Wellington
Toggle Abstract
This paper presents a novel solution for estimating simulator sickness in HMDs using machine learning and 3D motion data, informed by user-labeled simulator sickness data and user analysis. We conducted a novel VR user study, which decomposed motion data and used an instant dial-based sickness scoring mechanism. We were able to emulate typical VR usage and collect user simulator sickness scores. Our user analysis shows that translation and rotation differently impact user simulator sickness in HMDs. In addition, users’ demographic information and self-assessed simulator sickness susceptibility data are collected and show some indication of potential simulator sickness. Guided by the findings from the user study, we developed a novel deep learning-based solution to better estimate simulator sickness with decomposed 3D motion features and user profile information. The model was trained and tested using the 3D motion dataset with user-labeled simulator sickness and profiles collected from the user study. The results show higher estimation accuracy when using the 3D motion data compared with methods based on optical flow extracted from the recorded video, as well as improved accuracy when decomposing the motion data and incorporating user profile information.
A Deep Cybersickness Predictor through Kinematic Data with Encoded Physiological Representation
Ruichen Li, The Hong Kong University of Science and Technology(Guangzhou);
Yuyang Wang, Hong Kong University of Science and Technology;
Handi Yin, The Hong Kong University of Science and Technology(Guangzhou);
Jean-Rémy Chardonnet, Arts et Métiers, Institut Image;
Pan Hui, The Hong Kong University of Science and Technology
Toggle Abstract
Users would experience individually different sickness symptoms during or after navigating through an immersive virtual environment, generally known as cybersickness. Previous studies have predicted the severity of cybersickness based on physiological and/or kinematic data. However, compared with kinematic data, physiological data rely heavily on biosensors during the collection, which is inconvenient and limited to a few affordable VR devices. In this work, we proposed a deep neural network to predict cybersickness through kinematic data. We introduced the encoded physiological representation to characterize the individual susceptibility; therefore, the predictor could predict cybersickness only based on a user’s kinematic data without counting on biosensors. Fifty-three participants were recruited to attend the user study to collect multimodal data, including kinematic data (navigation speed, head tracking), physiological signals (e.g., electrodermal activity, heart rate), and Simulator Sickness Questionnaire (SSQ). The predictor achieved an accuracy of 97.8% for cybersickness prediction by involving the pre-computed physiological representation to characterize individual differences, providing much convenience for the current cybersickness measurement.
Virtual Reality Sickness Reduces Attention During Immersive Experiences
Katherine J. Mimnaugh, University of Oulu;
Evan G Center, University of Oulu;
Markku Suomalainen, VTT Technical Research Centre of Finland;
Israel Becerra, CIMAT;
Eliezer Lozano, CIMAT;
Rafael Murrieta-Cid, CIMAT;
Timo Ojala, University of Oulu;
Steven LaValle, University of Oulu;
Kara D. Federmeier, University of Illinois
Toggle Abstract
In this paper, we show that Virtual Reality (VR) sickness is associated with a reduction in attention, which was detected with the P3b Event-Related Potential (ERP) component from electroencephalography (EEG) measurements collected in a dual-task paradigm. We hypothesized that sickness symptoms such as nausea, eyestrain, and fatigue would reduce the users’ capacity to pay attention to tasks completed in a virtual environment, and that this reduction in attention would be dynamically reflected in a decrease of the P3b amplitude while VR sickness was experienced. In a user study, participants were taken on a tour through a museum in VR along paths with varying amounts of rotation, shown previously to cause different levels of VR sickness. While paying attention to the virtual museum (the primary task), participants were asked to silently count tones of a different frequency (the secondary task). Control measurements for comparison against the VR sickness conditions were taken when the users were not wearing the Head-Mounted Display (HMD) and while they were immersed in VR but not moving through the environment. This exploratory study shows, across multiple analyses, that the effect mean amplitude of the P3b collected during the task is associated with both sickness severity measured after the task with a questionnaire (SSQ) and with the number of counting errors on the secondary task. Thus, VR sickness may impair attention and task performance, and these changes in attention can be tracked with ERP measures as they happen, without asking participants to assess their sickness symptoms in the moment.
Event Related Brain Responses Reveal the Impact of Spatial Augmented Reality Predictive Cues on Mental Effort
Benjamin Volmer, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
James Baumeister, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
Stewart Von Itzstein, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
Matthias Schlesewsky, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
Ina Bornkessel-Schlesewsky, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
Bruce H. Thomas, Australian Research Centre for Interactive and Virtual Environments, University of South Australia;
Toggle Abstract
This paper presents the results from a Spatial Augmented Reality (SAR) study which evaluated the cognitive cost of several predictive cues. Participants performed a validated procedural button pressing task, where the predictive cue annotations guided them to the upcoming task. While existing research has evaluated predictive cues based on their performance and self-rated mental effort, actual cognitive cost has yet to be investigated. To measure the user’s brain activity, this study utilized electroencephalogram (EEG) recordings. Cognitive load was evaluated by measuring brain responses for a secondary auditory oddball task, with reduced brain responses to oddball tones expected when cognitive load in the primary task is highest. A simple monitor n-back task and procedural task comparing monitor vs SAR were conducted, followed by a version of the procedural task comparing the SAR predictive cues. Results from the brain responses were able to distinguish between performance enhancing cues with a high and low cognitive load. Electrical brain responses also revealed that having an arc or arrow guide towards the upcoming task required the least amount of mental effort.
User self-motion modulates the perceptibility of jitter in world-locked augmented reality
Hope Lutwak, Meta;
T. Scott Murdison, Meta;
Kevin W. Rio, Meta
Toggle Abstract
A key feature of augmented reality (AR) is the ability to display virtual content that appears stationary as users move throughout the physical world (‘world-locked rendering’). Imperfect world-locked rendering gives rise to perceptual artifacts that can negatively impact user experience. One example is random variation in the position of virtual objects that are intended to be stationary (‘jitter’). The human visual system is highly attuned to detect moving objects, and moreover it can disambiguate between the retinal velocities that arise from object motion and self-motion, respectively. In this study, we investigated how the perceptibility of AR object jitter varies as a function of user self-motion. Using a commercially available AR HMD to display a 3D textured cube, we measured sensitivity to added jitter versus a no-jitter reference using a two-interval forced choice task. Three user motion conditions (stationary, head rotation, and walking) and three object placement conditions (floating in free space, on a desk, and against a wall) were tested in a full factorial design. We hypothesized that (1) as users move their head and eyes during self-motion, their sensitivity to jitter will decrease, due to added retinal velocity; and (2) rendering virtual objects near physical surfaces will increase sensitivity to jitter, by providing proximal veridical visual cues. Psychometric thresholds indicated that users were significantly less sensitive to jitter during self-motion than when they were stationary, consistent with hypothesis (1). Users were also more sensitive to jitter in one of the two object placement conditions, providing partial support for hypothesis (2). To generalize beyond distinct user motion and object placement conditions, we also analyzed eye tracking data. The amount of retinal slip (i.e. how much gaze drifted across the virtual object) predicted jitter thresholds better than recorded head movements alone, suggesting a retinally-driven decrease in jitter sensitivity during self-motion. These results can be used to inform requirements for AR world-locked rendering systems, as well as how these may be updated dynamically using online measurement of user head and eye movements.
PS20: Multisensory and Multimodal Interaction 2
13:45 AEDT (UTC+11)
Leigthton Hall
Session Chair: R. Lindeman
Toggle Papers
Co-immersion in Audio Augmented Virtuality: the Case Study of a Static and Approximated Late Reverberation Algorithm
Davide Fantini, University of Milan;
Giorgio Presti, Università degli Studi di Milano;
Michele Geronazzo, University of Padua;
Riccardo Bona, Università degli Studi di Milano;
Alessandro Giuseppe Privitera, University of Udine;
Federico Avanzini, University of Milano
Toggle Abstract
In immersive Audio Augmented Reality, a virtual sound source should be indistinguishable from the existing real ones. This property can be evaluated with the co-immersion criterion, which encompasses scenes constituted by arbitrary configurations of real and virtual objects. Thus, we introduce the term Audio Augmented Virtuality (AAV) to describe a fully virtual environment consisting of auditory content captured from the real world, augmented by synthetic sound generation. We propose an experimental design in AAV investigating how simplified late reverberation (LR) affects the co-immersion of a sound source. Participants listened to simultaneous virtual speakers dynamically rendered through spatial Room Impulse Responses, and were asked to detect the presence of an impostor, i.e., a speaker rendered with one of two simplified LR conditions. Detection rates were found to be close to chance level, especially for one condition, suggesting a limited influence on co-immersion of the simplified LR in the evaluated AAV scenes. This methodology can be straightforwardly extended and applied to different acoustics scenes, complexities, i.e., the number of simultaneous speakers, and rendering parameters in order to further investigate the requirements for immersive audio technologies in AAR and AAV applications.
Visual Facial Enhancements Can Significantly Improve Speech Perception in the Presence of Noise
Zubin Choudhary, University of Central Florida;
Gerd Bruder, University of Central Florida;
Greg Welch, University of Central Florida
Toggle Abstract
Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading, i.e., watching the movements of a speaker’s mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study (N = 21) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution (1832×1920 or 916×960 pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants’ speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants’ ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.
The Effect of Visual and Auditory Modality Mismatching between Distraction and Warning on Pedestrian Street Crossing Behavior
Renjie Wu, University of Adelaide;
Hsiang-Ting Chen, University of Adelaide
Toggle Abstract
Augmented reality (AR) headsets could provide useful information to users, but they may also be a source of distraction. Previous works have explored using AR to enhance pedestrian safety by providing real-time warnings, but there has been little research on the impact of modality matching between distractions and warnings on pedestrian street crossing behaviour. We conducted a VR experiment using a within-subjects 2-by-2 design (N = 24) with four conditions: (auditory distraction, visual distraction) x (auditory warning, visual warning). When experienced conditions with mismatched modalities, participants exhibited more cautious street crossing behaviours, such as reduced walking speed, and increased scan range after receiving the warning, and significantly faster reaction times to the incoming vehicle. The participants also expressed a preference for warnings to be presented in a modality different from the distraction. Our findings suggest that in the context of utilizing AR for pedestrian road safety, future AR interfaces should incorporate a warning modality that differs from the one causing distraction.
See or Hear? Exploring the Effect of Visual/Audio Hints and Gaze-assisted Instant Post-task Feedback for Visual Search Tasks in AR
Yuchong Zhang, Chalmers University of Technology;
Adam Nowak, Lodz University of Technology;
Yueming Xuan, Chalmers University of Technology;
Andrzej Romanowski, Lodz University of Technology;
Morten Fjeld, Chalmers University of Technology
Toggle Abstract
Augmented reality (AR) is emerging in visual search tasks for increasingly immersive interactions with virtual objects. We propose an AR approach providing visual and audio hints along with gaze-assisted instant post-task feedback for search tasks based on mobile head-mounted display (HMD). The target case was a book-searching task, in which we aimed to explore the effect of the hints together with the task feedback with two hypotheses. H1: Since visual and audio hints can positively affect AR search tasks, the combination outperforms the individuals. H2: The gaze-assisted instant post-task feedback can positively affect AR search tasks. The proof-of-concept was demonstrated by an AR app in HMD and a comprehensive user study (n=96) consisting of two sub-studies, Study I (n=48) without task feedback and Study II (n=48) with task feedback. Following quantitative and qualitative analysis, our results partially verified H1 and completely verified H2, enabling us to conclude that the synthesis of visual and audio hints conditionally improves the AR visual search task efficiency when coupled with task feedback.
Empirical Evaluation of the Effects of Visuo-Auditory Perceptual Information on Head Oriented Tracking of Dynamic Objects in VR
Mark Tolchinsky, Clemson University;
Rohith Venkatakrishnan, Clemson University;
Roshan Venkatakrishnan, Clemson University;
Christopher Pagano, Clemson University;
Sabarish V. Babu, Clemson University
Toggle Abstract
As virtual reality (VR) technology sees more use in various fields, there is a greater need to understand how to effectively design dynamic virtual environments. As of now, there is still uncertainty in how well users of a VR system are capable of tracking moving targets in a virtual space. In this work, we examined the influence of sensory modality and visual feedback on the accuracy of head-gaze moving target tracking. To this end, a between subjects study was conducted wherein participants would receive targets that were visual, auditory, or audiovisual. Each participant performed two blocks of experimental trials, with a calibration block in between. Results indicate that audiovisual targets promoted greater improvement in tracking performance over single-modality targets, and that audio-only targets are more difficult to track than those of other modalities.
D-SAV360: A Dataset of Gaze Scanpaths on 360º Ambisonic Videos
Edurne Bernal-Berdun, Universidad de Zaragoza – I3A;
Daniel Martin, Universidad de Zaragoza;
Sandra Malpica, Universidad de Zaragoza, I3A;
Pedro J Perez, Universidad de Zaragoza;
Diego Gutierrez, Universidad de Zaragoza;
Belen Masia, Universidad de Zaragoza;
Ana Serrano, Universidad de Zaragoza
Toggle Abstract
Understanding human visual behavior within virtual reality environments is crucial to fully leverage their potential. While previous research has provided rich visual data from human observers, existing gaze datasets often suffer from the absence of multimodal stimuli. Moreover, no dataset has yet gathered eye gaze trajectories (i.e., scanpaths) for dynamic content with directional ambisonic sound, which is a critical aspect of sound perception by humans. To address this gap, we introduce D-SAV, a dataset of 4,609 head and eye scanpaths for 360º videos with first-order ambisonics. This dataset enables a more comprehensive study of multimodal interaction on visual behavior in virtual reality environments. We analyze our collected scanpaths from a total of 87 participants viewing 85 different videos and show that various factors such as viewing mode, content type, and gender significantly impact eye movement statistics. We demonstrate the potential of D-SAV as a benchmarking resource for state-of-the-art attention prediction models and discuss its possible applications in further research. By providing a comprehensive dataset of eye movement data for dynamic, multimodal virtual environments, our work can facilitate future investigations of visual behavior and attention in virtual reality.
Hype D-Live: XR Live Music System to Entertain Passengers for Anxiety Reduction in Autonomous Vehicles
Takuto Akiyoshi, Nara Institute of Science and Technology;
Yuki Shimizu, Nara Institute of Science and Technology;
Yusaku Takahama, Nara Institute of Science and Technology;
Koki Nagata, Nara Institute of Science and Technology;
Taishi Sawabe, Nara Institute of Science and Technology
Toggle Abstract
Passengers in autonomous vehicles enjoy the comfort of being free from driving tasks, but they inevitably experience anxiety caused by autonomous vehicle stress (AVS). AVS encompasses vehicle behavior stress due to unpredictable acceleration, and external environmental stress due to potential collisions. Past research has explored approaches to improve passengers’ comfort through behavior control and information presentation. However, methods that utilize stressful vehicle behavior in Extended Reality (XR) entertainment to distract from AVS-related anxiety are limited. Hence, the goal of this study was to maximize passenger comfort in automated vehicles. To achieve this goal, we implemented an XR entertainment system that utilizes vehicle behavior and evaluated its effect on reducing anxiety. In this study, we proposed “Hype D-Live”, an XR live music system designed to reduce anxiety by providing multimodal visual, auditory, force, and vestibular stimuli using a hemispherical display and motion platform mounted on a vehicle. We developed system functions to adjust the force and vestibular senses according to the excitement level of the music and the direction of stressful acceleration and to reproduce moshing, a characteristic behavior at live music venues. However, we hypothesized that passengers might not fully enjoy the entertainment and could experience anxiety if the video content makes them aware of the external environment. Therefore, we conducted an experiment with a within-participant design, involving 24 participants (14 males and 10 females), comparing 3 types of video content for XR entertainment inside the autonomous vehicle: a real external environment, a virtual simulation of the external environment, and a virtual live music venue. The Wilcoxon signed rank test with the Bonferroni correction after the Friedman test revealed that, without the moshing function, the virtual live music venue video significantly enhanced enjoyment and reduced anxiety, compared to the real one.
PS21: Applications
13:45 AEDT (UTC+11)
Ritchi Theatre
Session Chair: B. Thomas
Toggle Papers
Shopping in between Realities – Using an Augmented Virtuality Smartphone in a Virtual Supermarket
Christian Eichhorn, Technical University of Munich;
David A. Plecher, Technical University;
Tobias Mesmer, Technical University of Munich;
Lucas Leder, TUM;
Tim Simecek, Technical University of Munich;
Nassim Boukadida, Technical University of Munich;
Gudrun Klinker, Technical University of Munich
Toggle Abstract
In this project, the full spectrum provided by Milgram’s Reality-Virtuality Continuum is utilized to enhance presence, usability, and interactions in an authentic Virtual Reality (VR) supermarket simulation used as a standardized evaluation platform for mHealth apps. We introduce a unique Unity-based modeling platform for supermarket environments and the option to design high-quality customized products. To achieve that, solutions are demonstrated by focusing on a recognizable replica of a discounter and utilizing Augmented Virtuality (AV) to include a physical smartphone in the virtual simulation. The user is able to manipulate the simulation from within the smartphone app through this versatile, highly usability-centered controller. To achieve reliable tracking of the smartphone screen, we propose a hybrid approach, which combines fiducial marker tracking with data acquired through a WiFi connection between the VR system and the smartphone. Furthermore, the AV concept is utilized to build scenarios for Mixed Reality (MR) use cases such as simulated AR to navigate to a chosen product in the market. After an initial pre-study with important insights to strengthen the platform, a broad user study involving the physical smartphone with a simulated AR scenario has been conducted. The goals with 30 participants were to evaluate spatial presence, involvement, experienced realism (IPQ), and usability of the system (SUS). Results showed ”Good” (SUS) usability and recorded data as well as the participants’ feedback brought important insights. We plan to release this unique VR supermarket platform to contribute to the science community and mHealth industry.
Welcome AboARd! Evaluating Augmented Reality as a Skipper’s Navigator
Julia Hertel, University of Hamburg;
Susanne Schmidt, Universität Hamburg;
Marc Briede, Hamburg Port Authority;
Oliver Anders, Hamburg Port Authority;
Thomas Thies, Hamburg Port Authority;
Frank Steinicke, Universität Hamburg
Toggle Abstract
Augmented Reality (AR) technology has been widely investigated to support various navigation tasks, including initial approaches that suggest its potential use on ships. For maritime navigation, skippers use a variety of information displayed on a ship’s bridge. However, the constant shift of focus between this information and the outside view of the ship might pose cognitive as well as safety challenges. Here, AR could facilitate the navigation of ships by overlaying the real-world view with spatially anchored visual navigation aids. Despite this potential, previous work mainly presents conceptual approaches, technical tests, or user studies performed in ship simulators only.
In this paper, we evaluate an AR-based assistance system in the actual real-world water environment, where technical issues and varying physical conditions could influence the system’s usability. In collaboration with hydrographic experts following a user-centered design approach, a functional AR system was developed that virtually displays navigation aids on the water surface. In a field study, ten skippers used the system to navigate a ship along a path through a port area. We assessed the accuracy, perceived workload, and user experience of participants. In addition, qualitative feedback was thematically analyzed to retrieve insights about the skippers’ attitude regarding using AR on actual ships. We report lessons learned about aspects such as ergonomics, perceived safety challenges, as well as envisioned further use cases and extended data integration.
MR.Sketch. Immediate 3D Sketching via Mixed Reality Drawing Canvases
Bálint István Kovács, TU Wien;
Ingrid Erb, TU Wien;
Hannes Kaufmann, TU Wien;
Peter Ferschin, TU Wien
Toggle Abstract
Sketching is a fundamental technique for early design and form finding. Digital 3D sketching can improve the early design phase by improving spatial understanding and enriching the design with additional information. However, the tools used for sketching should not hinder the expression and thought process inherent in form finding. Methods already exist for using 2D pen-on-tablet input for 3D sketching via stroke projection onto 3D drawing canvases. However, positioning the canvas and sketching lines are separate work steps. This breaks the flow of the designer’s thought process.
We propose a novel technique for mixed reality 3D sketching that involves the use of viewport-attached drawing canvases, spatial meshing and intersection canvas visualisation. By combining the inside-out tracking capabilities of current portable consumer devices with stylus-on-tablet freehand drawing input, we transform 2D to 3D projective sketching into a more seamless experience. Results of a pilot user study with 16 participants show significant user preference for our technique, as well as increased sketching speed and immediacy.
Specifying Volumes of Interest for Industrial Use Cases
Daniel Dyrda, Technical University of Munich;
Jack Sterling Klusmann, Technical University of Munich;
Linda Rudolph, Technical University of Munich;
Felix Stieglbauer, Technical University of Munich;
Maximilian Amougou, Technical University of Munich;
Dorothea Pantförder, Technical University of Munich;
Birgit Vogel-Heuser, Technical University of Munich;
Gudrun Klinker, Technical University of Munich
Toggle Abstract
Creating digital representations of industrial facilities presents substantial opportunities for the industry. The development of digital twins opens up many prospective applications for industrial plants, such as facilitating virtual training environments, streamlining change management, and providing real-time understanding of remote scenes. However, creating and maintaining these digital twins involves complex tasks, including scanning industrial plants and annotating their subsystems. Many use cases necessitate specifying a Volume of Interest (VoI) for further processing while presenting severe environmental challenges for segmentation quality and worker safety. Besides that, widespread adoption also depends on creating methods that are easy to use, even for workers with limited 3D interaction knowledge. Currently, no commonly implemented method fulfills all these diverse requirements.
This paper introduces an approach for specifying a VoI in industrial scenarios. Our approach defines a volume by intersecting the projection of hand-drawn surroundings on a small number of pictures of the target volume, utilizing Augmented Reality and Voxel Carving. It can successfully handle various sizes of target volumes and delivers an appropriately detailed result. Applying qualitative discussions and a quantitative evaluation, we ensure our application meets all requirements posed by the scenarios. This simple interaction metaphor, tailored to specific use cases, can serve as a versatile pattern for various digital twin scenarios. It offers a valuable alternative to 3D primitive-based segmentation methods.
Investigating the Impact of Augmented Reality and BIM on Retrofitting Training for Non-experts
John Sermarini, University of Central Florida;
Robert A. Michlowitz, University of Central Florida;
Joseph LaViola, University of Central Florida;
Lori C. Walters, University of Central Florida;
Roger Azevedo, University of Central Florida;
Joseph T. Kider Jr., University of Central Florida
Toggle Abstract
Augmented Reality (AR) tools have shown significant potential in providing on-site visualization of Building Information Modeling (BIM) data and models for supporting construction evaluation, inspection, and guidance. Retrofitting existing buildings, however, remains a challenging task requiring more innovative solutions to successfully integrate AR and BIM. This study aims to investigate the impact of AR+BIM technology on the retrofitting training process and assess the potential for future on-site usage. We conducted a study with 64 non-expert participants, who were asked to perform a common retrofitting procedure of an electrical outlet installation using either an AR+BIM system or a standard printed blueprint documentation set. Our findings indicate that AR+BIM reduced task time significantly and improved performance consistency across participants, while also decreasing the physical and cognitive demands of the training. This study provides a foundation for augmenting future retrofitting construction research that can extend the use of AR+BIM technology, thus facilitating more efficient retrofitting of existing buildings. A video presentation of this article and all supplemental materials are available at https://github.com/DesignLabUCF/SENSEable_RetrofittingTraining.
ARCHIE²: An Augmented Reality Interface with Plant Detection for Future Planetary Surface Greenhouses
Conrad Zeidler, German Aerospace Center (DLR);
Matthias Klug, Universität Bremen;
Gerrit Woeckner, Nature Robots GmbH;
Urte Clausen, Carl von Ossietzky University of Oldenburg;
Johannes Schöning, University of St. Gallen
Toggle Abstract
More than 50 years after the last human set foot on the Moon during the Apollo 17 mission, humans aim to return to the Moon in this decade. This time, humanity plans to establish lunar habitats for a sustainable longer presence. An integrated part of these lunar habitats will be planetary surface greenhouses. These greenhouses will produce food, process air, recycle water, and improve the psychological well-being of humans. Past research has shown that a large amount of crew time, a scarce and valuable resource in spaceflight, is needed for maintenance and repairs in a planetary surface greenhouse, leaving less time for crop cultivation and science activities. In this paper, we present the concept of an augmented reality interface named ARCHIE² to reduce crew time and the workload of astronauts and remote support teams on Earth to operate a planetary surface greenhouse. ARCHIE² allows users to visualize status information on plants, technical systems, and environmental parameters in the greenhouse or other features supporting the greenhouse operations using an augmented reality headset. In particular, we report on the implementation and performance of the ARCHIE² plant detection module that runs locally on the augmented reality headset. Using images with a resolution of 320×192 pixels, arugula selvatica plants were detected using an artificial neural network (based on a YOLOv5s model) from a horizontal distance up to 50 cm with an average inference time of 602 ms and an average of 48 FPS. Based on that, the plants were augmented with labels to visualize relevant plant-specific information supporting astronauts in the maintenance of the plants.
Wearable Augmented Reality: Research Trends and Future Directions from Three Major Venues
Tram Thi Minh Tran, School of Architecture, Design and Planning, The University of Sydney;
Shane Brown, Contxtual;
Oliver Weidlich, Contxtual;
Mark Billinghurst, University of South Australia;
Callum Parker, University of Sydney
Toggle Abstract
Wearable Augmented Reality (AR) has attracted considerable attention in recent years, as evidenced by the growing number of research publications and industry investments. With swift advancements and a multitude of interdisciplinary research areas within wearable AR, a comprehensive review is crucial for integrating the current state of the field. In this paper, we present a review of 389 research papers on wearable AR, published between 2018 and 2022 in three major venues: ISMAR, TVCG, and CHI. Drawing inspiration from previous works by Zhou et al. and Kim et al., which summarized AR research at ISMAR over the past two decades (1998–2017), we categorize the papers into different topics and identify prevailing trends. One notable finding is that wearable AR research is increasingly geared towards enabling broader consumer adoption. From our analysis, we highlight key observations related to potential future research areas essential for capitalizing on this trend and achieving widespread adoption. These include addressing challenges in Display, Tracking, Interaction, and Applications, and exploring emerging frontiers in Ethics, Accessibility, Avatar and Embodiment, and Intelligent Virtual Agents.
PS22: Fingers and Hands
13:45 AEDT (UTC+11)
CivEng 109
Session Chair: D. Bowman
Toggle Papers
Giant Finger: A Novel Visuo-Somatosensory Approach to Simulating Lower Body Movements in Virtual Reality
Seongjun Kang, Gwangju Institute of Science and Technology;
Gwangbin Kim, Gwangju Institute of Science and Technology;
SeungJun Kim, Gwangju Institute of Science and Technology
Toggle Abstract
Surreal experience in virtual reality (VR) occurs when visual experience is accompanied by congruent somatosensation. Thus, VR contents that require physical actions are often bounded to our physical capabilities to maintain somatosensory consistency. Alternatively, users often choose less immersive but safer interfaces that offer a wider action variability. In either case, this situation compromises the potential for a hyper-realistic experience. To address this, we introduce “Giant Finger,” a concept that replicates human lower body movements through two enlarged virtual fingers in VR. Through a user study, we affirmed Giant Finger’s ownership using proprioceptive drift and questionnaire responses. We also compared Giant Finger’s capability to perform a variety of tasks with existing methods. Despite its minimalistic approach, Giant Finger demonstrated a high level of efficacy in supporting lower body movements, with ownership and presence comparable to those of the body-leaning method with whole-body motion. Giant Finger can replace the sensations of real legs or support locomotion in confined spaces by providing proprioceptive illusions to the virtual lower body. The applications showcased in this paper suggest that Giant Finger can enable new forms of movement with high action variability and immersion in various fields such as gaming, industry, and accessibility.
PhyVR: Physics-based Multi-material and Free-hand Interaction in VR
Hanchen Deng, Beihang University;
Jin Li, Beihang university;
Yang Gao, Beihang university;
Xiaohui Liang, Beihang University;
Hongyu Wu, Beihang University;
Aimin Hao, Beihang University
Toggle Abstract
The realistic interaction with physical phenomena is a crucial aspect of human-computer interaction (HCI) in virtual reality (VR). However, the real-time performance of physical simulation, interactive computation, and rendering is the bottleneck of physics-based VR HCI. To address these challenges, we propose a novel physics-oriented framework for multi-material objects and free-hand interaction, termed PhyVR. This framework enables users to interact with diverse virtual phenomena dynamically. At the algorithm level, we develop a unified particle system to describe both the virtual multi-materials and the user’s avatar for the efficiency issue, optimize collision detection, and accelerate the HCI algorithms with a variable fine-coarse particle sampling scheme. At the rendering level, we introduce a hybrid particle-grid anisotropic algorithm for surface reconstruction, enabling real-time and visually convincing fluid rendering. Comprehensive experiments and user studies demonstrate that our framework effectively captures various physical interaction phenomena, providing an enhanced user experience and paving the way for expanding VR-related HCI applications.
XR Input Error Mediation for Hand-Based Input: Task and Context Influences a User’s Preference
Tica Lin, Meta;
Ben Lafreniere, Meta;
Yan Xu, Meta;
Tovi Grossman, University of Toronto;
Daniel Wigdor, Meta;
Michael Glueck, Meta
Toggle Abstract
Many XR devices use bare-hand gestures to reduce the need for handheld controllers. Such gestures, however, lead to false positive and false negative recognition errors, which detract from the user experience. While mediation techniques enable users to overcome recognition errors by clarifying their intentions via UI elements, little research has explored how mediation techniques should be designed in XR and how a user’s task and context may impact their design preferences.
This research presents empirical studies about the impact of user perceived error costs on users’ preferences for three mediation technique designs, under different simulated scenarios that were inspired by real-life tasks. Based on a large-scale crowd-sourced survey and an immersive VR-based user study, our results suggest that the varying contexts within each task type can impact users’ perceived error costs, leading to different preferred mediation techniques. We further discuss the study implications of these results on future XR interaction design.
FingerButton: Enabling Controller-Free Transitions between Real and Virtual Environments
Satabdi Das, The University of British Columbia;
Arshad Nasser, The University of British Columbia;
Khalad Hasan, University of British Columbia
Toggle Abstract
With the recent Virtual Reality (VR) Head-Mounted Displays (HMDs), users can seamlessly transition between the virtual and real worlds with techniques such as passthrough. These techniques leverage on-device cameras to capture the real world and show users a view of their physical surroundings while wearing the HMDs. However, they often require users to hold a controller or frequently tap on the HMD, limiting the potential for hands-free interaction and thereby hindering a truly immersive and natural VR experience. To address this limitation, we designed FingerButton, a finger-worn push button device that enables seamless transitions between real and virtual environments. We conducted two studies, where the first one explored a set of hand gestures for transitioning between two environments that are commonly used for “mode switching” within realities. In the second study, we compared FingerButton with the best two-hand gesture identified in the first study and other commercially available solutions (e.g., double tap) for a between-reality selection task. The results show that the physical finger button is faster and user-preferred than other techniques for the transition tasks. Overall, this research contributes to understanding and improving the interaction techniques for fluid switching between the real and virtual worlds, thereby enhancing VR user experiences.
Effects of Opaque, Transparent and Invisible Hand Visualization Styles on Motor Dexterity in a Virtual Reality Based Purdue Pegboard Test
Laurent Voisard, Concordia University;
Amal Hatira, Kadir Has University;
Mine Sarac, Kadir Has University;
Marta Kersten-Oertel, Concordia University;
Anil Ufuk Batmaz, Concordia University
Toggle Abstract
The virtual hand interaction technique is one of the most common interaction techniques used in virtual reality (VR) systems. A VR application can be designed with different hand visualization styles, which might impact motor dexterity. In this paper, we aim to investigate the effects of three different hand visualization styles — transparent, opaque, and invisible — on participants’ performance through a VR-based Purdue Pegboard Test (PPT). A total of 24 participants were recruited and instructed to place pegs on the board as quickly and accurately as possible. The results indicated that using the invisible hand visualization significantly increased the number of task repetitions completed compared to the opaque hand visualization. However, no significant difference was observed in participants’ preference for the hand visualization styles. These findings suggest that an invisible hand visualization may enhance performance in the VR-based PPT, potentially indicating the advantages of a less obstructive hand visualization style. We hope our results can guide developers, researchers, and practitioners when designing novel virtual hand interaction techniques.
A Mixed Reality Training System for Hand-Object Interaction in Simulated Microgravity Environments
Kanglei Zhou, Beihang University;
Chen Chen, Beihang University;
Yue Ma, Beihang University;
Zhiying Leng, Beihang University;
Hubert P. H. Shum, Durham University;
Frederick W. B. Li, University of Durham;
Xiaohui Liang, Beihang University
Toggle Abstract
As human exploration of space continues to progress, the use of Mixed Reality (MR) for simulating microgravity environments and facilitating training in hand-object interaction holds immense practical significance. However, hand-object interaction in microgravity presents distinct challenges compared to terrestrial environments due to the absence of gravity. This results in heightened agility and inherent unpredictability of movements that traditional methods struggle to simulate accurately. To this end, we propose a novel MR-based hand-object interaction system in simulated microgravity environments, leveraging physics-based simulations to enhance the interaction between the user’s real hand and virtual objects. Specifically, we introduce a physics-based hand-object interaction model that combines impulse-based simulation with penetration contact dynamics. This accurately captures the intricacies of hand-object interaction in microgravity. By considering forces and impulses during contact, our model ensures realistic collision responses and enables effective object manipulation in the absence of gravity. The proposed system presents a cost-effective solution for users to simulate object manipulation in microgravity. It also holds promise for training space travelers, equipping them with greater immersion to better adapt to space missions. The system reliability and fidelity test verifies the superior effectiveness of our system compared to the state-of-the-art CLAP system.
Expansion of Detection Thresholds for Hand Redirection using Noisy Tendon Electrical Stimulation
Maki Ogawa, the University of Tokyo;
Keigo Matsumoto, The University of Tokyo;
Kazuma Aoyama, Faculty of Informatics, Gunma University;
Takuji Narumi, the University of Tokyo
Toggle Abstract
To increase the flexibility of haptic feedback in virtual reality (VR), hand redirection (HR) has been proposed to shift the hand’s virtual position from its actual position. To expand the range of HR applications, a method to broaden the detection threshold (DT), which is the maximum amount of shift that can be applied without the user noticing, is required. Multisensory integration studies have revealed that the reliability of senses affects the weight of integration. To expand the DTs of HR, we propose a method to increase visual dominance in the integration of vision and proprioception by introducing noise to the latter, thereby decreasing its reliability through weak Gaussian white noise electrical stimulation (σ = 0.5 mA). The results of a user study comprising 22 participants (11 women and 11 men) confirm that noisy electrical stimulation significantly expands the DTs of HR with the mean range of DTs (RDT ) was 20.48° (SD = 7.90) with electrical stimulation and 19.15° (SD = 7.11) without electrical stimulation. Interestingly, this effect was only observed in women. The average RDT for men was 15.36° (SD = 6.13) and 15.18° (SD = 5.58), whereas that for women was 25.61° (SD = 5.89) and 23.12° (SD = 6.21), with and without electrical stimulation, respectively. Electrical stimulation was mostly tolerable for the participants and did not affect embodiment or presence ratings. These results suggest that expansion of the DT without disturbing the user’s VR experience is feasible.
PS23: Computer Vision and Machine Learning for XR
15:45 AEDT (UTC+11)
Leigthton Hall
Session Chair: A. Hinkenjann
Toggle Papers
FineStyle: Semantic-Aware Fine-Grained Motion Style Transfer with Dual Interactive-Flow Fusion
wenfeng song, Beijing information science and technology university;
Xingliang Jin, Beijing information science and technology university;
Shuai Li, Zhongguancun Laboratory;
Chenglizhao Chen, College of Computer Science and Technology, China University of Petroleum;
Aimin Hao, State Key Laboratory of Virtual Reality Technology and Systems, Beihang University;
Xia Hou, Beijing information science and technology university
Toggle Abstract
We present FineStyle, a novel framework for motion style transfer that generates expressive human animations with specific styles for virtual reality and vision fields. It incorporates semantic awareness, which improves motion representation and allows for precise and stylish animation generation. Existing methods for motion style transfer have all failed to consider the semantic meaning behind the motion, resulting in limited controls over the generated human animations. To improve, FineStyle introduces a new cross-modality fusion module called Dual Interactive-Flow Fusion (DIFF). As the first attempt, DIFF integrates motion style features and semantic flows, producing semantic-aware style codes for fine-grained motion style transfer. FineStyle uses an innovative two-stage semantic guidance approach that leverages semantic clues to enhance the discriminative power of both semantic and style features. At an early stage, a semantic-guided encoder introduces distinct semantic clues into the style flow. Then, at a fine stage, both flows are further fused interactively, selecting the matched and critical clues from both flows. Extensive experiments demonstrate that FineStyle outperforms state-of-the-art methods in visual quality and controllability. By considering the semantic meaning behind motion style patterns, FineStyle allows for more precise control over motion styles. Source code and model are available on https://github.com/XingliangJin/Fine-Style.git.
Towards Eyeglasses Refraction in Appearance-based Gaze Estimation
Junfeng Lyu, Tsinghua University;
Feng Xu, Tsinghua University
Toggle Abstract
For myopia and hyperopia subjects, eyeglasses would change the position of objects in their views, leading to different eyeball rotations for the same gaze target. Existing appearance-based gaze estimation methods ignore this effect, while this paper investigates it and proposes an effective method to consider it in gaze estimation, achieving noticeable improvements. Specifically, we discover that the appearance-gaze mapping differs for spectacled and unspectacled conditions, and the deviations are nearly consistent with the physical laws of the ideal lens. Based on this discovery, we propose a novel multi-task training strategy that encourages networks to regress gaze and classify the wearing conditions simultaneously. We apply the proposed strategy to some popular methods, including supervised and unsupervised ones, and evaluate them on different datasets with various backbones. The results show that the multi-task training strategy could be used on the existing methods to improve the performance of gaze estimation. To the best of our knowledge, we are the first to clearly reveal and explicitly consider eyeglasses refraction in appearance-based gaze estimation. Data and code are available at https://github.com/StoryMY/RefractionGaze.
State-Aware Configuration Detection for Augmented Reality Step-by-Step Tutorials
Ana Stanescu, Graz University of Technology;
Peter Mohr, Graz University of Technology;
Mateusz Kozinski, Technical University of Graz;
Shohei Mori, Graz University of Technology;
Dieter Schmalstieg, Graz University of Technology;
Denis Kalkofen, Graz University of Technology
Toggle Abstract
Presenting tutorials in augmented reality is a compelling application area, but previous attempts have been limited to objects with only a small numbers of parts. Scaling augmented reality tutorials to complex assemblies of a large number of parts is difficult, because it requires automatically discriminating many similar-looking object configurations, which poses a challenge for current object detection techniques. In this paper, we seek to lift this limitation. Our approach is inspired by the observation that, even though the number of assembly steps may be large, their order is typically highly restricted: Some actions can only be performed after others. To leverage this observation, we enhance a state-of-the-art object detector to predict the current assembly state by conditioning on the previous one, and to learn the constraints on consecutive states. This learned `consecutive state prior’ helps the detector disambiguate configurations that are otherwise too similar in terms of visual appearance to be reliably discriminated. Via the state prior, the detector is also able to improve the estimated probabilities that a state detection is correct. We experimentally demonstrate that our technique enhances the detection accuracy for assembly sequences with a large number of steps and on a variety of use cases, including furniture, Lego and origami. Additionally, we demonstrate the use of our algorithm in an interactive augmented reality application.
MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation
Ilia Indyk, HSE University;
Ilya Makarov, AIRI
Toggle Abstract
Depth estimation is crucial in various computer vision applications, including autonomous driving, robotics, and virtual and augmented reality. An accurate scene depth map is beneficial for localization, spatial registration, and tracking. It converts 2D images into precise 3D coordinates for accurate positioning, seamlessly aligns virtual and real objects in applications like AR, and enhances object tracking by distinguishing distances. The self-supervised monocular approach is particularly promising as it eliminates the need for complex and expensive data acquisition setups relying solely on a standard RGB camera. Recently, transformer-based architectures have become popular to solve this problem, but at high quality, they suffer from high computational cost and poor perception of small details as they focus more on global information. In this paper, we propose a novel fully convolutional network for monocular depth estimation, called MonoVAN, which incorporates the visual attention mechanism and applies super-resolution techniques in decoder to better capture fine-grained details in depth maps. To the best of our knowledge, this work pioneers the use of a convolutional visual attention in the context of depth estimation.
Our experiments on outdoor KITTI benchmark and the indoor NYUv2 dataset show that our approach outperforms the most advanced self-supervised methods, including such state-of-the-art models as transformer-based VTDepth from ISMAR’22 and hybrid convolutional-transformer MonoFormer from AAAI’23, while having a comparable or even fewer number of parameters in our model than competitors. We also validate the impact of each proposed improvement in isolation, providing evidence of its significant contribution. Code and weights are available at https://github.com/IlyaInd/MonoVAN.
Meta360: Exploring User-Specific and Robust Viewport Prediction in 360-Degree Videos through Bi-Directional LSTM and Meta-Adaptation
Junjie Li, School of artificial intelligence;
Yumei Wang, School of artificial intelligence;
Yu Liu, School of artificial intelligence
Toggle Abstract
Viewport prediction is a critical aspect of virtual reality (VR) video streaming, directly impacting user experience in adaptive streaming. However, most existing algorithms treat users as homogeneous entities and overlook the variations in user behaviors and video content. Additionally, they often struggle with long-term predictions and intense movement. Our research sheds light on the importance of considering user behavior variations and leveraging advanced techniques to optimize robust viewport prediction in VR video streaming. First, we address these limitations by conducting a comprehensive feature analysis on existing datasets to uncover distinctive user behaviors. Building upon these findings, we propose a novel approach that utilizes the power of Bidirectional Long Short-Term Memory (BiLSTM) networks and meta-learning. The BiLSTM architecture effectively captures long-term dependencies, which can strengthen the robustness of viewport prediction especially in long-term prediction and intense movement. Additionally, meta-learning enables personalized adaptation to individual users’ viewing behaviors. Through extensive evaluations on diverse datasets, our algorithm Meta360 demonstrates superior performance in terms of accuracy and robustness compared to state-of-the-art methods.
“Can You Handle the Truth?”: Investigating the Effects of AR-Based Visualization of the Uncertainty of Deep Learning Models on Users of Autonomous Vehicles
Achref Doula, TU Darmstadt;
Lennart Joshua Schmidt, Telecooperation Lab (TK), TU Darmstadt;
Max Mühlhäuser, TU Darmstadt;
Alejandro Sanchez Guinea, TU Darmstadt
Toggle Abstract
The recent advances in deep learning have paved the way for autonomous vehicles (AVs) to take charge of more complex tasks in the navigation process. However, predictions of deep learning models are subject to different types of uncertainty that may put the user and the surrounding environment in danger. In this paper, we investigate the effects that AR-based visualizations of 3 types of uncertainties in deep learning modules for path planning in AVs may have on drivers. The uncertainty types of the deep learning models that we consider are: the waypoint uncertainty, the situation uncertainty, and the path uncertainty. We propose 3 concepts to visualize the 3 uncertainty types on a Windshield display. We evaluate our AR-based concepts with a user study (N=20) using a VR-based immersive environment, to ensure the security of the participants. The results of our evaluation reveal that the absence of uncertainty visualization leads to lower driver engagement. More importantly, the combination of situation uncertainty and path uncertainty visualizations leads to higher driver engagement, and higher trust in the automated vehicle, while inducing an acceptable mental load for the drive.
Multi-modal classification of Cognitive Load in a VR-based training system
Srikrishna Subrahmanya Bhat, The University of Queensland;
Chelsea Dobbins, The University of Queensland;
Arindam Dey, University of Queensland;
Ojaswa Sharma, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)
Toggle Abstract
Training systems are used in many industries, ranging from surgery to space missions to rehabilitation. Virtual Reality (VR) is a technology that has been incorporated as an effective tool in such training systems to simulate the environment, especially in situations where the training can’t take place in the actual environment. For a training environment and task to be effective, it must sufficiently challenge the trainee. One parameter that can be used to measure this is cognitive load (CL), which is defined as the amount of working memory used while performing a learning task. This parameter needs to be sufficiently high to maximize learning but not too high as to overload the trainee. However, the challenge is to detect this state using objective physiological measures, which can be collected during the entire task. This paper presents a study to classify CL using a combination of Electroencephalogram (EEG) and Electrodermal Activity (EDA) signals during a procedural VR training task. Thirty participants undertook a study where they built a designated model within a given time over multiple levels that were constructed to induce low to high CL. Features generated from the data were subject to feature selection (FS), which was undertaken using the Mutual Information (MI) technique. Binary classification models were developed using Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (kNN), Extreme Gradient Boosting (Xgboost) and Multi-Layer Perceptrons (MLP). Results illustrated that the Xgboost classifier performed the best with an F1-score of 0.831 ± 0.030 and accuracy of 0.805 ± 0.033. SHAP analysis of the features illustrated greater contributions from the frontal and occipital regions of the brain and frequency domain features from tonic skin conductance.
PS24: Collaboration
15:45 AEDT (UTC+11)
Ritchi Theatre
Session Chair: Iana Podkosova
Toggle Papers
Gestures vs. Emojis: Comparing Non-Verbal Reaction Visualizations for Immersive Collaboration
Alexander Giovannelli, Virginia Tech;
Jerald Thomas, Virgina Tech;
Logan Lane, Virginia Tech;
Francielly Rodrigues, Virginia Tech;
Doug Bowman, Virginia Tech
Toggle Abstract
Collaborative virtual environments afford new capabilities in telepresence applications, allowing participants to co-inhabit an environment to interact while being embodied via avatars. However, shared content within these environments often takes away the attention of collaborators from observing the non-verbal cues conveyed by their peers, resulting in less effective communication. Exaggerated gestures, abstract visuals, as well as a combination of the two, have the potential to improve the effectiveness of communication within these environments in comparison to familiar, natural non-verbal visualizations. We designed and conducted a user study where we evaluated the impact of these different non-verbal visualizations on users’ identification time, understanding, and perception. We found that exaggerated gestures generally perform better than non-exaggerated gestures, abstract visuals are an effective means to convey intentional reactions, and the combination of gestures with abstract visuals provides some benefits compared to their standalone counterparts.
ARPuzzle: Evaluating the Effectiveness of Collaborative Augmented Reality
Guillaume Bataille, Orange;
Abdelhadi Lammini, Orange;
Jean-Rémy Chardonnet, Arts et Metiers Institute of Technology
Toggle Abstract
Collaborative Augmented Reality (CAR) offers disruptive ways for people to collaborate. However, this emerging technology must improve its acceptance, efficiency, and usability to scale up and, for example, support augmented operations executed by technicians. This paper presents our CAR system and its experimentation during a cooperative puzzle-solving task. Our system provides collaborators with a shared virtual space allowing verbal and non-verbal interpersonal communications, and intuitive interactions with shared virtual replicas of real objects. Our system also integrates avatars embodied by remote users. We conducted a dual-user study comparing colocated and remote solving of a puzzle virtual replica with its real solving. We evaluated task performance, collaboration, mutual awareness, spatial presence, and copresence, usability, and preference. We found that, if real is preferred and more efficient than our CAR system, CAR is reaching favorable usability levels. We also found that remote augmented reality including full-body avatars offers similar results to colocated augmented reality. This preliminary work paves the way for future research aiming to support and enhance the design and making of Collaborative Augmented Reality systems dedicated to augmented operations.
MRMAC: Mixed Reality Multi-user Asymmetric Collaboration
Faisal Zaman, Victoria University of Wellington;
Craig Anslow, Victoria University of Wellington;
Andrew Chalmers, Victoria University of Wellington;
Taehyun James Rhee, Victoria University of Wellington
Toggle Abstract
We present MRMAC, a Mixed Reality Multi-user Asymmetric Collaboration system that allows remote users to teleport virtually into a real-world collaboration space to communicate and collaborate with local users. Our system enables telepresence for remote users by live-streaming the physical environment of local users using a 360-degree camera while blending 3D virtual assets into the mixed-reality collaboration space. Our novel client-server architecture enables asymmetric collaboration for multiple AR and VR users and incorporates avatars, view controls, as well as synchronized low-latency audio, video, and asset streaming. We evaluated our implementation with two baseline conditions: conventional 2D and standard 360-degree videoconferencing. Results show that MRMAC outperformed both baselines in inducing a sense of presence, improving task performance, usability, and overall user preference, demonstrating its potential for immersive multi-user telecollaboration.
How Visualising Emotions Affects Interpersonal Trust and Task Collaboration in a Shared Virtual Space
Allison Jing, Meta;
Michael Frederick, Meta ;
Monica Sewell, Meta;
Amy Karlson, Meta Inc.;
Brian Simpson, Meta;
Missie Smith, Reality Labs Research
Toggle Abstract
Emotion is dynamic. Changes in emotion can be hard to process during face-to-face interaction, yet transferring them into a shared virtual space becomes more challenging. This research first explores nine visual representations to amplify emotions in a virtual space, leading to a bi-directional emotion-sharing system (FeelMoji i/o). The second study investigates the effect of explicit emotion-sharing in interpersonal trust and task collaboration through three conditions – verbal only, verbal+positive visual, and verbal+honest visual using FeelMoji through the proposal of a framework of four factors (usability, integrity, behaviour, and collaboration). The results indicate that FeelMoji yields frequent emotion consensus as task milestones and positive interdependent behaviours between collaborators, which help develop conversations, affirm decision-making, and build familiarity and trust between strangers. Moreover, we discuss how our study can inspire future investigation in human-AI agent behaviours and large-scale multi-user virtual environments.
What And How Together: A Taxonomy On 30 Years Of Collaborative Human-Centered XR Tasks
Ryan Ghamandi, University of Central Florida;
Yahya Hmaiti, University of Central Florida;
Tam Nguyen, University of Central Florida;
Amirpouya Ghasemaghaei, University of Central Florida;
Ravi Kiran Kattoju, University of Central Florida;
Eugene Matthew Taranta II, University of Central Florida;
Joseph LaViola, University of Central Florida
Toggle Abstract
We present a taxonomy of human-centered collaborative XR tasks. XR technologies have extended into the realm of collaboration, improving the quality and accessibility of teamwork. However, after a comprehensive assessment of the literature on the interaction between XR technologies and collaboration, no comprehensive method that emphasizes task actions and properties exists to classify collaborative tasks. Thus, our suggested taxonomy represents a classification system for collaborative tasks. After conducting a thorough literature review across different research venues, we conducted several exhaustive classification and review cycles for over 800 papers collected, which resulted in 148 papers retained to create the taxonomy. We dissected the actions and properties that the collaborative endeavors and tasks of these papers encompass as well as the types of categorizations and relations these papers illustrate. We expand on the design choices and usage of our taxonomy, followed by its limitations and future work. We built this taxonomy in order to reduce ambiguities and confusion regarding the design and comprehension of human-based collaborative tasks that use XR technology, which could prove useful in aiding the development and understanding of these tasks. Our taxonomy reveals a framework for understanding how collaborative tasks are designed and a systematic way of classifying different methods by which people can collaborate and interact in environments that involve XR, while still promoting efficient communication, teamwork, goal achievement and productivity.
Investigating Psychological Ownership in a Shared AR Space: Effects of Human and Object Reality and Object Controllability
Dongyun Han, Utah State University;
DongHoon Kim, Utah State University;
Kangsoo Kim, University of Calgary;
Isaac Cho, Utah State University
Toggle Abstract
Augmented reality (AR) provides users with a unique social space where virtual objects are natural parts of the real world. The users can interact with 3D virtual objects and virtual humans projected onto the physical environment. This work examines perceived ownership based on the reality of objects and partners, as well as object controllability in a shared AR setting. Our formal user study with 28 participants shows a sense of possession, control, separation, and partner presence affect their perceived ownership of a shared object. Finally, we discuss the findings and present a conclusion.
Exploring the Evolution of Sensemaking Strategies in Immersive Space to Think
Kylie Davidson, Virginia Polytechnic Institute and State University;
Lee Lisle, Virginia Polytechnic Institute and State University;
Kirsten Whitley, US Department of Defense, USA;
Doug A. Bowman, Virginia Polytechnic Institute and State University;
Chris North, Virginia Polytechnic Institute and State University;
Toggle Abstract
Existing research on immersive analytics to support the sensemaking process focuses on single-session sensemaking tasks. However, in the wild, sensemaking can take days or months to complete. In order to understand the full benefits of immersive analytic systems, we need to understand how immersive analytic systems provide flexibility for the dynamic nature of the sensemaking process. In our work, we build upon an existing immersive analytic system – Immersive Space to Think, to evaluate how immersive analytic systems can support sensemaking tasks over time. We conducted a user study with eight participants with three separate analysis sessions each. We found significant differences between analysis strategies between sessions one, two, and three, which suggest that immersive space to think can benefit analysts during multiple stages in the sensemaking process.
PS25: Well-being and Health
15:45 AEDT (UTC+11)
CivEng 109
Session Chair: Mark Billinghurst
Toggle Papers
A Systematic Review of Immersive Technologies for Physical Training in Fitness and Sports
Thuong Hoang, Deakin University;
Deepti Aggarwal, Deakin University;
Guy Wood-Bradley, Deakin University;
Tsz-Kwan Lee, Deakin University;
Rui Wang, CSIRO;
Hasan Ferdous, The University of Melbourne;
Alexander Baldwin, Suncorp
Toggle Abstract
The increased availability of immersive headsets in the games industry has promoted wide adoption of immersive technology amongst consumers. The benefits of spatial freedom and agency for body movements in virtual reality pave the way for everyday fitness and physical training applications. We conducted a systematic review through ACM Digital Library, IEEEXplore, and Scopus, to investigate how immersive technologies have been applied for physical training in fitness and sports. Our review included all peer-reviewed papers, published and written in English, discussing all forms of hardware for VR, AR, and MR technologies, targeted towards healthy population. We excluded applications for clinical populations and treatment of specific diseases, and all non-peer reviewed articles like position papers, workshops, tutorials, and magazine columns. From the shortlisted 144 papers, we summarize the development trends of immersive technologies and highlight the challenges of designing immersive technologies for everyday fitness. We also provide recommendations for future work and highlight the need to support better collaboration with industry partners, trainer-trainee experiences, and social dynamics of sports for designing better experiences.
Well-being in Isolation: Exploring Artistic Immersive Virtual Environments in a Simulated Lunar Habitat to Alleviate Asthenia Symptoms
Grzegorz Pochwatko, Institute of Psychology Polish Academy of Sciences;
Wieslaw Kopec, Polish-Japanese Academy of Information Technology;
Justyna Swidrak, Institute of Psychology Polish Academy of Sciences;
Justyna Swidrak, Fundació de Recerca Clínic Barcelona- IDIBAPS;
Anna Jaskulska, Kobo Association;
Kinga H. Skorupska, Polish-Japanese Academy of Information Technology;
Barbara Karpowicz, Polish-Japanese Academy of Information Technology;
Rafał Masłyk, Polish-Japanese Academy of Information Technology;
Maciej Grzeszczuk, Polish-Japanese Academy of Information Technology;
Steven Barnes, Uniwersytet SWPS;
Paulina Borkiewicz, Lodz Film School;
Paweł Kobyliński, National Information Processing Institute;
Michał Pabiś-Orzeszyna, University of Lodz;
Robert Balas, Institute of Psychology, Polish Academy of Sciences;
Jagoda Lazarek, Polish-Japanese Academy of Information Technology;
Florian Dufresne, European Space Agency;
Leonie Bensch, German Aerospace Center (DLR);
Tommy Nilsson, European Space Agency (ESA)
Toggle Abstract
Revived interest in lunar and planetary exploration is heralding a new era for human spaceflight, characterized by frequent strain on astronaut’s mental well-being, which stems from increased exposure to isolated, confined, and extreme (ICE) conditions. Whilst Immersive Virtual Reality (IVR) has been employed to facilitate self-help interventions to mitigate challenges caused by isolated environments in several domains, its applicability in support of future space expeditions remains largely unexplored. To address this limitation, we administered the use of distinct IVR environments to crew members (n=5) partaking in a simulated lunar habitat study. Utilizing a Bayesian approach to scrutinize small group data, we discovered a significant relationship between IVR usage and a reduction in perceived stress-related symptoms, particularly those associated with asthenia (syndrome often linked to chronic fatigue and weakness; a condition characterized by feelings of energy depletion or exhaustion that can be amplified in ICE conditions). The reductions were most prominent with the use of interactive virtual environments. The ‘Aesthetic Realities’ – virtual environments conceived as art exhibits – received exceptional praise from our participants. These environments mark a fascinating convergence of art and science, holding promise to mitigate effects related to isolation in spaceflight training and beyond.
Active Engagement with Virtual Reality Reduces Stress and Increases Positive Emotions
Irene Kim, Johns Hopkins University;
Ehsan Azimi, Johns Hopkins University;
Peter Kazanzides, Johns Hopkins University;
Chien-Ming Huang, Johns Hopkins University
Toggle Abstract
Stress, anxiety, and depression negatively affect productivity and the global economy with an estimated annual cost of $1 trillion U.S. dollars, according to the World Health Organization. Moreover, prolonged daily stress—even if minor—can lead to severe health consequences, including cancer and various mental disorders. Virtual reality (VR) has been shown to be a promising tool for relieving daily stressors given its accessibility and its projected availability as compared to visiting with mental health professionals. Prior work in this area has mostly focused on the restorative effects of nature simulations, demonstrating that passively experiencing immersive nature scenes improves positive affect. However, aside from providing opportunities for exercise, little is known about how active VR engagement can improve one’s mental health. To address this research gap, this paper presents a new, active form of VR therapy and assesses its effectiveness as compared to passive VR experiences. We developed VR Drawing—inspired by art therapy, which promotes positive emotions through artistic creation—and VR Throwing—inspired by “rage rooms”, which allow people to release negative emotions via intentional destruction. In a between-participants study (n=64), we found that both VR Drawing and VR Throwing significantly reduced participants’ stress levels and increased positive affect when compared to passively watching nature scenes in VR. Linear regression models suggest that the total number of user interactions positively affects improvement in positive emotions for VR Drawing, but has a negative impact on positive emotions for VR Throwing. This study provides empirical evidence of how active VR experiences may reduce stress and offers guidelines for creating future VR applications to promote psychological well-being.
Augmented Reality Rehabilitative and Exercise Games (ARREGs): A Systematic Review and Future Considerations
Cassidy R. Nelson, Virginia Tech;
Joseph L Gabbard, Virginia Tech
Toggle Abstract
Augmented Reality (AR) and exergames have been trending areas of interest in healthcare spaces for rehabilitation and exercise. This work reviews 25 papers featuring AR rehabilitative/exercise games and paints a picture of the literature landscape. The included results span twelve years, with the oldest paper published in 2010 and the most recent work in 2022. More specifically, this work contributes a bank of representative ARREGs and a synthesis of measurement strategies for player perceptions of Augmented Reality Rehabilitative and Exercise Game (ARREG) experiences, the elements that comprise the exergame experience, the intended use cases of ARREGs, whether participants are actually representative users, the utilized devices and AR modalities, the measures used to capture rehabilitative success, and the measures used to capture participant perceptions. Informed by the literature body, our most significant contribution is nine considerations for future ARREG development.
Exploring the Effects of VR Activities on Stress Relief: A Comparison of Sitting-in-Silence, VR Meditation, and VR Smash Room
Dongyun Han, Utah State University;
DongHoon Kim, Utah State University;
Kangsoo Kim, University of Calgary;
Isaac Cho, Utah State University
Toggle Abstract
In our lives, we encounter various stressors that may cause negative mental and bodily reactions to make us feel frustrated, angry, or irritated. Effective methods to manage or reduce stress and anxiety are essential for a healthy life, and several stress-management approaches are found to be useful for stress relief, such as meditation, taking a rest, walking around nature, or even breaking things in a smash room. Previous research has revealed that certain experiences in virtual reality (VR) are effective for reducing stress as traditional real-world methods. However, it is still unclear how the stress relief effects are associated with other factors like individual user profile in terms of different treatment activities. In this paper, we report our findings from a formal user study that investigates the effects of two virtual activities: (1) VR Meditation and (2) VR Smash Room experience, compared with a traditional Sitting-in-Silence method. Our results show that VR Meditation has a better stress relief effect compared to VR Smash Room and Sitting-in-Silence, and the effects of the treatments are correlated with the participants’ personalities. We discuss the findings and implications addressing potential benefits/impacts of different stress-relief activities in VR.
Beyond Well-Intentioned: An HCI Students’ Ethical Assessment of Their Own XR Designs
Veronika Krauß, Bonn-Rhein-Sieg University of Applied Sciences;
Jenny Berkholz, University of Siegen;
Lena Recki, Bonn-Rhein-Sieg University of Applied Sciences;
Alexander Boden, Bonn-Rhein-Sieg University of Applied Sciences
Toggle Abstract
Foreseeing the impact of augmented and virtual reality applications on users and society is challenging. Thus, efforts to establish an ethical mindset and include technology assessment techniques in HCI education are increasing. However, XR educational courses fostering students’ reasoning and perceived responsibility in designing ethical applications are lacking. We, therefore, developed the explorative design method Reality Composer to investigate and foster the students’ assessment of the ethical impact of and responsibilities in XR application design. We conducted a workshop with 40 international HCI master students applying this method and analyzed the resulting application concepts regarding the students’ ethical assessment. Our findings show that they critically discussed their concepts’ impact and identified potential countermeasures for negative social implications. However, they overestimated the users’ responsibility to securely use XR applications as well as a positive design intention. We contribute with our findings and developed method to understand students’ potential and derive future course design implications.
Evaluation of an Immersive COVID-19 Data Visualization
Furkan Kaya, Kadir Has University;
Elif Celik, Kadir Has University;
Anil Ufuk Batmaz, Concordia University;
Aunnoy K. Mutasim, Simon Fraser University;
Wolfgang Stuerzlinger, Simon Fraser University;
Toggle Abstract
COVID-19 restrictions have detrimental effects on the population, both socially and economically. However, these restrictions are necessary as they help reduce the spread of the virus. For the public to comply, easily comprehensible communication between decision makers and the public is thus crucial. To address this, we propose a novel 3-D visualization of COVID-19 data, which could increase the awareness of COVID-19 trends in the general population. We conducted a user study and compared a conventional 2-D visualization with the proposed method in an immersive environment. Results showed that the our 3-D visualization approach facilitated understanding of the complexity of COVID-19. A majority of participants preferred to see the COVID-19 data with the 3-D method. Moreover, individual results revealed that our method increases the engagement of users with the data. We hope that our method will help governments to improve their communication with the public in the future.