Year
2024
Season
Fall
Paper Type
Master's Thesis
College
College of Computing, Engineering & Construction
Degree Name
Master of Science in Computer and Information Sciences (MS)
Department
Computing
NACO controlled Corporate Body
University of North Florida. School of Computing
First Advisor
Dr. Kevin Pfeil
Second Advisor
Dr. Karthikeyan Umapathy
Third Advisor
Dr. Corey Pittman
Department Chair
Dr. Zornitza Prodanoff
College Dean
Dr. William F. Klostermeyer
Abstract
Virtual Reality (VR) is increasingly popular, but many barriers exist for individuals with little experience in coding, 3D modeling, or creating their own virtual experiences. The current tools used for content creation are often viewed as complex or frustrating, and they exhibit a steep learning curve. This problem presents an opportunity to develop tools incorporating Natural User Interfaces that better support end users. One such possible tool to assist users is Large Language Models (LLM), which can, extract a user's intention through speech or text. We posit how using LLMs can better support novice and expert developers alike, and using a human-in-the-loop approach, we can foster a Human-AI cocreative process. Towards the realization of that goal, we created a multimodal tool in which users can use a virtual reality system that incorporates a large language model, as well as direct manipulation, menus, eye gaze, and speech, to facilitate a more natural VR authoring experience. We created a template in Unity3D that can be customized for various tasks, including the construction of a 3D environment and the creation of commands.
In this thesis, we describe a summative research study with 22 participants to determine the usability and future of our tool. Our participants were tasked with authoring a predefined environment, and we got a system usability scale score (SUS) of 57\% which is between "Ok" and "Good" which was expected due to the flexibility and high range of freedom within the system. All users indicated some degree of ease of use when using the system. However, most users also highlighted on how the mechanisms were difficult, highlighting the learning curve of the tool itself.
Our results indicate that our multimodal approach, combining a large language model with other 3D user interface modalities, can provide a more intuitive and accessible interface for users. Future research and development will focus on fine-tuning these interactions and expanding the capabilities to better support the user.
Suggested Citation
Sayed, Ahmed A., "Large language models for multimodal user interaction in a virtual environment" (2024). UNF Graduate Theses and Dissertations. 1300.
https://digitalcommons.unf.edu/etd/1300