Comparison between Full Body Motion Recognition Camera Interaction and Hand Controllers Interaction used in Virtual Reality Exposure Therapy for Acrophobia

Virtual Reality has already been proven as a useful supplementary treatment tool for anxiety disorders. However, no specific technological importance has been given so far on how to apply Virtual Reality with a way that properly stimulates the phobic stimulus and provide the necessary means for lifelike experience. Thanks to technological advancements, there is now a variety of hardware that can help enhance stronger emotions generated by Virtual Reality systems. This study aims to evaluate the feeling of presence during different hardware setups of Virtual Reality Exposure Therapy, and, particularly how the user’s interaction with those setups can affects their sense of presence during the virtual simulation. An acrophobic virtual scenario is used as a case study by 20 phobic individuals and the Witmer–Singer presence questionnaire was used for presence evaluation by the users of the system. Statistical analysis on their answers revealed that the proposed full body Motion Recognition Cameras system generates a better feeling of presence compared to the Hand Controllers system. This is thanks to the Motion Recognition Cameras, which track and allow display of the user’s entire body within the virtual environment. Thus, the users are enabled to interact and confront the anxiety-provoking stimulus as in real world. Further studies are recommended, in which the proposed system could be used in Virtual Reality Exposure Therapy trials with acrophobic patients and other anxiety disorders as well, since the proposed system can provide natural interaction in various simulated environments.


Introduction
Anxiety disorders are common mental health disorders that 19.1% of adults and 31.9% of adolescents in the US suffer from, with 22.8% of them reported as serious cases [1]. As a result of its physical manifestations, anxiety disorders cost an estimated $42 billion per year in the US, approximately one third of the entire US health bill [2]. There are many different types of anxiety studies [27][28][29][30]. In simulations without interaction they can only identify the phobic situation but not directly confront and face the phobic stimuli from an egocentric or an allocentric point of view. In some VRET simulations basic interaction is provided: user input could be strictly gaze-derived [31], while motion feedback is provided via standard interfaces (i.e., mouse, keyboard, joystick [32], gamepad [8]). This way, the user at least controls the events of the simulation and does not remain a passive viewer of anxiety-inducing situations. However, recently it has been underlined that modern VR systems used in VRET should involve the user actively during the simulations by leveraging on motion tracking technologies [33][34][35]. Companies that produce VR device packages, such as Oculus and HTC, use trackable hand controllers to provide in-system interaction and presence to the user. So, by holding the controllers while the user moves their hands, they can see their virtual hands move in the virtual environment in real time [36,37]. There have even been efforts to include wearable motion tracking in VR systems, such as a wearable glove in [38]. Based on theories, any improvement in VRET depends on its ability to adequately initiate fear responses in fearful participants [8]. Key factors of the ability of VR to stimulate fear are the concepts of presence and immersion [17]. On the one hand, presence is a psychological state or subjective perception in which even though part or all of an individual's current experience is generated by and/or filtered through human-made technology, part or all of the individual's perception fails to accurately acknowledge the role of the technology in the experience [39], thus it is a subjective response to a given virtual environment. Immersion, on the other hand, is "a psychological state characterized by perceiving oneself to be enveloped by, included in, and interacting with an environment that provides a continuous stream of stimuli and experiences" [40]. In general, it is assumed that an increase in immersion -and therefore the improvement of the included technologiesincreases the experienced presence, and that an increase in presence leads to stronger fear responses during VRET. Lately, various implementations of VR technology have shown promising improvement regarding immersion and therefore presence of the users in the system [41].

Research Question
In order to improve VRET, we present and propose a more immersive VR system, by adding a motion recognition system that consists of a body recognition camera, a hand recognition camera and a VR headset. By using motion recognition sensors, we manage to avoid interfaces that require unnatural confrontation (i.e., hand controllers, joysticks, etc.) in order to simulate movement and actions, thus allowing the user to feel that they are moving and interacting with their whole body within the virtual environment, in real time, as they would in the real world. In more technologically oriented studies, the use of various tracking tools that can help improve the human experience in the virtual augmented or mixed reality world have been thoroughly studied [42][43][44][45][46]. Nevertheless, there is still much room for research on how to apply these technologies in the fields related to mental disorders and, more specifically, what tool can be used in diverse types of treatment.
In our original work we used only a body recognition camera [47], a demonstration video is available here: youtube.com/watch?v=yEeUvd1-2qQ. In our current work we use body and hand recognition cameras, a demonstration video is available here: youtube.com/watch?v=a0tjB6PMNjg. Therefore, we allow the user to communicate with the system through a most natural user interface, turning their physical actions into data that serve the motion tracking purposes, without adding more wearable equipment that would complicate their movement.
In this study, we compare the aforementioned proposed system with commercially available VR systems which include hand controllers to provide interaction with the virtual environment-in terms of the feeling of presence the user experiences. Our goal is to conclude whether the proposed system enhances immersion and presence and therefore could be of use in future VRET that will involve patients diagnosed with an anxiety disorder, to determine whether it can improve related treatments. In our study, we use a case of acrophobia treatment.

System Description (Hardware and Software)
In both systems, we used the same Desktop Computer as basic equipment that provides the infrastructure to run both systems. Its specifications are: Graphics Card: NVIDIA GeForce GTX 1070, CPU: AMD Ryzen 7 2700X, RAM: 16 GB G.Skill TridentZ DDR4, Video Output: HDMI 1.3, USB Ports: 3 × USB 3.0 and 1 × USB 2.0. In addition, in both systems, an Oculus Rift Virtual Reality Headset provides the virtual environment and two Oculus Rift Sensors track constellations of IR LEDs to translate the movement of the headset within the virtual environment. Peripheral Hardware of the Virtual Reality-Hand Controllers (VR-HC) system: (Figure 1) The system includes two Oculus Rift Touch Controllers, one for each hand, which provide intuitive hand presence and interaction within the virtual environment. Again, the Oculus Rift Sensors track constellations of IR LEDs to translate the movement of the hand controllers within the virtual environment.
treatments. In our study, we use a case of acrophobia treatment.

System Description (Hardware and Software)
In both systems, we used the same Desktop Computer as basic equipment that provides the infrastructure to run both systems. Its specifications are: Graphics Card: NVIDIA GeForce GTX 1070, CPU: AMD Ryzen 7 2700X, RAM: 16 GB G.Skill TridentZ DDR4, Video Output: HDMI 1.3, USB Ports: 3 × USB 3.0 and 1 × USB 2.0. In addition, in both systems, an Oculus Rift Virtual Reality Headset provides the virtual environment and two Oculus Rift Sensors track constellations of IR LEDs to translate the movement of the headset within the virtual environment. Peripheral Hardware of the Virtual Reality-Hand Controllers (VR-HC) system: (Figure 1) The system includes two Oculus Rift Touch Controllers, one for each hand, which provide intuitive hand presence and interaction within the virtual environment. Again, the Oculus Rift Sensors track constellations of IR LEDs to translate the movement of the hand controllers within the virtual environment.
Peripheral Hardware of the Virtual Reality-Motion Camera (VR-MC) system: ( Figure 2) In this system, we use the Astra S model designed by Orbbec, which is a Body Motion Recognition Camera (BMRC). The BMRC is connected to the computer with a USB 2.0 cable. The motion tracking accuracy has an error range of ±1-3 mm from a 1 m distance, whereas, at a 3 m distance, it is estimated at approximately ±12.7 mm. The optimized maximum range is about 6m. In detail, Astra S consists of 2 cameras: a Depth Camera, with image size 640 × 480 (VGA) @ 30FPS and an RGB Camera, with image size 1280 × 720 @ 30FPS (UVC Support), which have 60° horiz × 49.5° vert. (73° diagonal) field of view. As a result of its tracking range, the BMRC is not designed to track movements of small body parts, such as the fingers. For that purpose, we also used the Leap Motion 3D Controller designed by Leap Motion Inc, which is a Hand Motion Recognition Camera (HMRC). The HMRC has 135° field of view and up to 120 fps and 8 cubic feet interactive 3D space. The HMRC is attached to the front VR headset.  The same software tools are used in both systems. Specifically: (a) Operating System: Windows 10, with the drivers for the Astra S, Leap Motion, and Oculus Rift installed; (b) Unity 3D, which is used as the basic program for creating the virtual environment. All the hardware pieces (i.e., sensors, controllers, trackers, headset, camera, etc.) feed Unity 3D with data, which affect the virtual environment that will be presented through the VR headset; (c) Blender 3D Computer Graphics Software is used for creating 3D objects, animated visual effects, UV Mapping, and materials integrated in Unity 3D; (d) Adobe Photoshop is used for creating images for the materials which are inserted in the Blender program; (e) OVRPlugin is used for the operation of the Oculus Rift In this system, we use the Astra S model designed by Orbbec, which is a Body Motion Recognition Camera (BMRC). The BMRC is connected to the computer with a USB 2.0 cable. The motion tracking accuracy has an error range of ±1-3 mm from a 1 m distance, whereas, at a 3 m distance, it is estimated at approximately ±12.7 mm. The optimized maximum range is about 6m. In detail, Astra S consists of 2 cameras: a Depth Camera, with image size 640 × 480 (VGA) @ 30FPS and an RGB Camera, with image size 1280 × 720 @ 30FPS (UVC Support), which have 60 • horiz × 49.5 • vert. (73 • diagonal) field of view. As a result of its tracking range, the BMRC is not designed to track movements of small body parts, such as the fingers. For that purpose, we also used the Leap Motion 3D Controller designed by Leap Motion Inc, which is a Hand Motion Recognition Camera (HMRC). The HMRC has 135 • field of view and up to 120 fps and 8 cubic feet interactive 3D space. The HMRC is attached to the front VR headset. involve patients diagnosed with an anxiety disorder, to determine whether it can improve related treatments. In our study, we use a case of acrophobia treatment.

System Description (Hardware and Software)
In both systems, we used the same Desktop Computer as basic equipment that provides the infrastructure to run both systems. In this system, we use the Astra S model designed by Orbbec, which is a Body Motion Recognition Camera (BMRC). The BMRC is connected to the computer with a USB 2.0 cable. The motion tracking accuracy has an error range of ±1-3 mm from a 1 m distance, whereas, at a 3 m distance, it is estimated at approximately ±12.7 mm. The optimized maximum range is about 6m. In detail, Astra S consists of 2 cameras: a Depth Camera, with image size 640 × 480 (VGA) @ 30FPS and an RGB Camera, with image size 1280 × 720 @ 30FPS (UVC Support), which have 60° horiz × 49.5° vert. (73° diagonal) field of view. As a result of its tracking range, the BMRC is not designed to track movements of small body parts, such as the fingers. For that purpose, we also used the Leap Motion 3D Controller designed by Leap Motion Inc, which is a Hand Motion Recognition Camera (HMRC). The HMRC has 135° field of view and up to 120 fps and 8 cubic feet interactive 3D space. The HMRC is attached to the front VR headset.  The same software tools are used in both systems. Specifically: (a) Operating System: Windows 10, with the drivers for the Astra S, Leap Motion, and Oculus Rift installed; (b) Unity 3D, which is used as the basic program for creating the virtual environment. All the hardware pieces (i.e., sensors, controllers, trackers, headset, camera, etc.) feed Unity 3D with data, which affect the virtual environment that will be presented through the VR headset; (c) Blender 3D Computer Graphics Software is used for creating 3D objects, animated visual effects, UV Mapping, and materials integrated in Unity 3D; (d) Adobe Photoshop is used for creating images for the materials which are inserted in the Blender program; (e) OVRPlugin is used for the operation of the Oculus Rift The same software tools are used in both systems. Specifically: (a) Operating System: Windows 10, with the drivers for the Astra S, Leap Motion, and Oculus Rift installed; (b) Unity 3D, which is used as the basic program for creating the virtual environment. All the hardware pieces (i.e., sensors, controllers, trackers, headset, camera, etc.) feed Unity 3D with data, which affect the virtual environment that will be presented through the VR headset; (c) Blender 3D Computer Graphics Software is used for creating 3D objects, animated visual effects, UV Mapping, and materials integrated in Unity 3D; (d) Adobe Photoshop is used for creating images for the materials which are inserted in the Blender program; (e) OVRPlugin is used for the operation of the Oculus Rift equipment in Unity 3D; (f) Leap Motion Core Assets is used for the operation of the Leap Motion Unity 3D; (g) Nuitrack SDK as the motion recognition middleware between the developed software and the BMRC. Although we use Unity 3D in order to recreate simulations in both systems, for each system there is a different source file, because of the different way the user interacts within each system. Otherwise, both virtual rooms look the same Sensors 2020, 20, 1244 5 of 18 and include the same objects ( Figure 3). We want to keep the minimum amount of differences between both systems, because we aim to compare the two different interaction technologies, so, the rest of the system, both from the hardware and the software point of view, has to be identical. In this study, the user experiences and sees the same virtual room in both systems, apart from the way they interact within each environment.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 19 equipment in Unity 3D; (f) Leap Motion Core Assets is used for the operation of the Leap Motion Unity 3D; (g) Nuitrack SDK as the motion recognition middleware between the developed software and the BMRC. Although we use Unity 3D in order to recreate simulations in both systems, for each system there is a different source file, because of the different way the user interacts within each system. Otherwise, both virtual rooms look the same and include the same objects ( Figure 3). We want to keep the minimum amount of differences between both systems, because we aim to compare the two different interaction technologies, so, the rest of the system, both from the hardware and the software point of view, has to be identical. In this study, the user experiences and sees the same virtual room in both systems, apart from the way they interact within each environment.

Body Recognition
In order to track the user's position during the simulations, the user's body is mapped onto a virtual skeleton that consists of spheres and lines: the spheres represent the joints and the lines represent the bones between joints. It is important to note that the anatomy of the virtual skeleton is not identical to that of the human body, but it serves its purpose to track the user's movements. Each joint is essentially a point in 3D space represented by 3 coordinates: x, y, and z. The y-axis is the height axis and the x-z plane is the floor where the user stands and moves ( Figure 4). The BMRC is placed across the user at the beginning of the axes. Therefore, the user can walk on the x-z plane, and, as long as they remain within the tracking field of the BMRC, and face it, they can see their virtual body appear and move in accordance to their physical movements in the virtual environment. The Astra S BMRC is designed for skeleton tracking and recognizes the following 17 joints of the human body: HEAD, NECK, TORSO, LEFT_COLLAR, LEFT_SHOULDER, RIGHT_SHOULDER, WAIST, LEFT_HIP, RIGHT_HIP, LEFT_ELBOW, RIGHT_ELBOW, LEFT_WRIST, RIGHT_WRIST, LEFT_KNEE, RIGHT_KNEE, LEFT_ANKLE, RIGHT_ANKLE ( Figure 5). By design, it cannot track movements of smaller parts of the body, such as fingers, or the rotation of the hands.
Considering that the realism of the user's experience is considered to affect their feeling of presence, in order to present a more complete representation of the virtual skeleton of the user as

Body Recognition
In order to track the user's position during the simulations, the user's body is mapped onto a virtual skeleton that consists of spheres and lines: the spheres represent the joints and the lines represent the bones between joints. It is important to note that the anatomy of the virtual skeleton is not identical to that of the human body, but it serves its purpose to track the user's movements. Each joint is essentially a point in 3D space represented by 3 coordinates: x, y, and z. The y-axis is the height axis and the x-z plane is the floor where the user stands and moves ( Figure 4). The BMRC is placed across the user at the beginning of the axes. Therefore, the user can walk on the x-z plane, and, as long as they remain within the tracking field of the BMRC, and face it, they can see their virtual body appear and move in accordance to their physical movements in the virtual environment. The Astra S BMRC is designed for skeleton tracking and recognizes the following 17 joints of the human body: HEAD, NECK, TORSO, LEFT_COLLAR, LEFT_SHOULDER, RIGHT_SHOULDER, WAIST, LEFT_HIP, RIGHT_HIP, LEFT_ELBOW, RIGHT_ELBOW, LEFT_WRIST, RIGHT_WRIST, LEFT_KNEE, RIGHT_KNEE, LEFT_ANKLE, RIGHT_ANKLE ( Figure 5). By design, it cannot track movements of smaller parts of the body, such as fingers, or the rotation of the hands.      Both the Body Motion Recognition Camera (BMRC) and the Hand Motion Recognition Camera (HMRC) track the position of the wrist. In order to avoid any conflicts regarding the representation of the position of the wrists, and, therefore, the rest of the hand, in the virtual environment, we have set a higher priority for the position tracked by the Leap Motion HMRC. That means that if both the HMRC and the BMRC receive information about the position of the wrist, the system uses the one received by the HMRC. The position information of the BMRC is used only if the system receives no data from the HMRC, which can occur in case the user's hands are not in the field of the HMRC.

System Setup
The outlined green area is about 5m 2 and will be referred to as the "VR-HC User Action Area". Within this area, the user wears the Oculus Rift Virtual Reality Headset and holds the Oculus Rift Touch Controllers, one in each hand, with which they interact within the virtual environment. In front of the VR-HC User Action Area, two Oculus Rift Sensors have been placed, approximately 3 Considering that the realism of the user's experience is considered to affect their feeling of presence, in order to present a more complete representation of the virtual skeleton of the user as they move within the virtual environment, it was critical to include the Leap Motion HMRC to the system, which can recognize the following 25 joints of the human hand and wrist: (a) 2 joints for the wrist-one for each edge; (b) 1 joint for the palm; (c) 4 joints to mark the edges of the distal, intermediate, and proximal phalanges of each finger; (d) 2 joints to mark the edge of the metacarpal phalanges ( Figure 6).
Sensors 2020, 20, x FOR PEER REVIEW 6 of 19 they move within the virtual environment, it was critical to include the Leap Motion HMRC to the system, which can recognize the following 25 joints of the human hand and wrist: (a) 2 joints for the wrist-one for each edge; (b) 1 joint for the palm; (c) 4 joints to mark the edges of the distal, intermediate, and proximal phalanges of each finger; (d) 2 joints to mark the edge of the metacarpal phalanges ( Figure 6).   Both the Body Motion Recognition Camera (BMRC) and the Hand Motion Recognition Camera (HMRC) track the position of the wrist. In order to avoid any conflicts regarding the representation of the position of the wrists, and, therefore, the rest of the hand, in the virtual environment, we have set a higher priority for the position tracked by the Leap Motion HMRC. That means that if both the HMRC and the BMRC receive information about the position of the wrist, the system uses the one received by the HMRC. The position information of the BMRC is used only if the system receives no data from the HMRC, which can occur in case the user's hands are not in the field of the HMRC.

System Setup
The outlined green area is about 5m 2 and will be referred to as the "VR-HC User Action Area". Within this area, the user wears the Oculus Rift Virtual Reality Headset and holds the Oculus Rift Touch Controllers, one in each hand, with which they interact within the virtual environment. In front of the VR-HC User Action Area, two Oculus Rift Sensors have been placed, approximately 3 Both the Body Motion Recognition Camera (BMRC) and the Hand Motion Recognition Camera (HMRC) track the position of the wrist. In order to avoid any conflicts regarding the representation of the position of the wrists, and, therefore, the rest of the hand, in the virtual environment, we have set a higher priority for the position tracked by the Leap Motion HMRC. That means that if both the HMRC and the BMRC receive information about the position of the wrist, the system uses the one received by the HMRC. The position information of the BMRC is used only if the system receives no data from the HMRC, which can occur in case the user's hands are not in the field of the HMRC.

System Setup
The outlined green area is about 5m 2 and will be referred to as the "VR-HC User Action Area". Within this area, the user wears the Oculus Rift Virtual Reality Headset and holds the Oculus Rift Touch Controllers, one in each hand, with which they interact within the virtual environment. In front of the VR-HC User Action Area, two Oculus Rift Sensors have been placed, approximately 3 meters apart from each other and rotated appropriately in order to face the VR-HC User Action Area. Last, the Desktop Computer is placed on a table in front of the VR-HC User Action Area, and all cables from the peripheral devices end up there, for the operation of the system (Figure 7). The outlined area is about 5 m 2 and will be referred to as the "VR-MC User Action Area". Within this area, the user wears the Oculus Rift Virtual Reality Headset, on which we have attached the HMRC (Figure 7). The user, unlike in the VR-HC system, does not need to hold controllers or have any piece of hardware attached to their body in order to interact within the virtual environment, apart from the VR headset. The user can move and interact freely within the virtual room, while the BMRC and the HMRC track their movements. The Motion Recognition Camera is placed 1 m in front of the VR-MC User Action Area. Last, the Desktop Computer is placed on a table in front of the VR-MC User Action Area, and all cables from the peripheral devices end up there, for the operation of the system (Figure 8). area, the user wears the Oculus Rift Virtual Reality Headset, on which we have attached the HMRC (Figure 7). The user, unlike in the VR-HC system, does not need to hold controllers or have any piece of hardware attached to their body in order to interact within the virtual environment, apart from the VR headset. The user can move and interact freely within the virtual room, while the BMRC and the HMRC track their movements. The Motion Recognition Camera is placed 1 m in front of the VR-MC User Action Area. Last, the Desktop Computer is placed on a table in front of the VR-MC User Action Area, and all cables from the peripheral devices end up there, for the operation of the system (Figure 8).

Method and Procedure
We investigated the differences in presence between the VR-HC and VR-MC systems in conjunction with two experiments, one for each system, in a 45-minute session. A total of 20 people (12 men and 8 women, ages 18-30, avg. age 24) served as participants. They were recruited in the National Technical University of Athens campus in Zografou, Athens, Greece and diagnosed with acrophobia symptoms by Psychiatrists, Professor George Alevizopoulos MD. The study was outlined area is about 5 m 2 and will be referred to as the "VR-MC User Action Area". Within this area, the user wears the Oculus Rift Virtual Reality Headset, on which we have attached the HMRC (Figure 7). The user, unlike in the VR-HC system, does not need to hold controllers or have any piece of hardware attached to their body in order to interact within the virtual environment, apart from the VR headset. The user can move and interact freely within the virtual room, while the BMRC and the HMRC track their movements. The Motion Recognition Camera is placed 1 m in front of the VR-MC User Action Area. Last, the Desktop Computer is placed on a table in front of the VR-MC User Action Area, and all cables from the peripheral devices end up there, for the operation of the system (Figure 8).

Method and Procedure
We investigated the differences in presence between the VR-HC and VR-MC systems in conjunction with two experiments, one for each system, in a 45-minute session. A total of 20 people (12 men and 8 women, ages 18-30, avg. age 24) served as participants. They were recruited in the National Technical University of Athens campus in Zografou, Athens, Greece and diagnosed with acrophobia symptoms by Psychiatrists, Professor George Alevizopoulos MD. The study was

Method and Procedure
We investigated the differences in presence between the VR-HC and VR-MC systems in conjunction with two experiments, one for each system, in a 45-minute session. A total of 20 people (12 men and 8 women, ages 18-30, avg. age 24) served as participants. They were recruited in the National Technical University of Athens campus in Zografou, Athens, Greece and diagnosed with acrophobia symptoms by Psychiatrists, Professor George Alevizopoulos MD. The study was approved by the Ethics Committee of the National Technical University with protocol number #51105.
All 20 participants used both the systems, successively. In order to avoid the possibility that the order of using the systems could make any difference, the procedure was randomized: half of the participants first used the VR-HC and then the VR-MC system, and the other half in the reverse order. This assignment was performed randomly upon recruiting each participant. Both experiments required participants to perform the same simple tasks of traversing a room, going out on a balcony and touching an object that was hanging from the ceiling at a considerable distance from the balcony, by using the moving mechanisms of each of the two systems, i.e., the Hand Controllers and the Motion Recognition System. The purpose of the study is to compare how different interaction methods affect the feeling of presence of the user. Therefore, the virtual environment was designed simply and in the same way in both systems, since it is not the issue under study and we did not want the participants to focus on it.
The study took place at the Biomedical Engineering Laboratory of the National Technical University of Athens, Greece. All of the 20 participants performed the same procedure, during a single 45-minute session. The aim of the session was to use both the VR-HC and VR-MC systems and then compare their experiences by answering a Witmer-Signer presence questionnaire for each system. The session consisted of 5 phases: • Phase 1: The participant was informed about the study and its procedure (10 mins). • Phase 2: The simulation of the first system took place (5 mins). The participant, in this simulation, wears the Oculus Rift Virtual Reality Headset, holds the Hand Controllers and goes to the center of the VR-HC User Action Area, where they stand until the simulation starts ( Figure 9). Then, the participant will find themselves in a virtual environment; an apartment room, located on a high floor, with an open balcony door. The participant can see the controls that they are holding in the real world materialize in the virtual room and move in real-time, according to the actual movements of their hands. With them, they can interact within the virtual room. The participant has three consecutive tasks to execute. They have to walk across the room, towards the open balcony door and get out, on the balcony, touch the railing of the balcony with their two controllers and stay in this state for a few minutes. Then, they have to catch the object hanging before them, at a significant distance ahead of the balcony railing. When they do so, they can turn back, inside the room, and the task is considered completed. Last, the participant removes the Headset and the Controllers and returns to the real world.  The participant, in this simulation, wears the Oculus Rift Virtual Reality Headset and goes to the center of the VR-MC User Action Area, where they stand until the simulation starts ( Figure 10). Then, the participant will find themselves in a virtual environment; an apartment room, located on a high floor, with an open balcony door. The participant can see a visualization of their body's main skeleton materialize in the virtual room and move in real-time, according to their actual movements. They can interact within the virtual environment with their entire body. Again, the participant has the same three consecutive tasks to execute. They have to walk across the room, towards the open balcony door and get out, on the balcony, touch the railing of the balcony with their two controllers, and stay in this state for a few minutes. Then, they have to catch the object hanging before them, at a significant distance ahead of the balcony railing. When they do so, they can turn back, inside the room, and the task is considered completed. Last, the participant removes the Headset and returns to the real world. A demonstration video is available here: youtube.com/watch?v=9Jqy6GGwg4o.

Measures
The system used in each experiment was evaluated by the five factors derived by the Presence Questionnaire that was introduced by Witmer and Singer in 1998 [40]. The questionnaire has been revised so as to not include questions irrelevant to the study; in particular, questions about sound feedback have been removed. Each of the questions are relevant to one of the five factors that reflect the level of presence: Control, Sensory, Involvement, Realism, and Distraction factors. More information about the included questions and their significance to the study can be found in the Appendix. Following, are the five Factors we used to determine the presence each system offers, according to the Witmer-Singer Questionnaire: • The Sensory Factor depends on sensory modality, environmental richness, multimodal presentation, consistency of multimodal information, degree of movement perception, and active search. Those criteria relate mostly to the virtual environment itself, and, since the virtual environment used in both system simulations is the same, we do not expect this factor to differentiate the systems. • The Realism Factor depends on the realism of the scenes in the virtual environment, the consistency of information with the between the physical and the virtual world, as well as the meaningfulness of the experience for the user and the disorientation after they return to the physical world, at the end of the simulation. • The Control Factor reflects the degree of control the participant felt they had during the simulation with each system, depending on immediacy of control, anticipation, mode of control, and modifiability of the physical environment. • The Involvement and Distraction Factors are the ones of highest priority for the differentiation between the two systems, and completely aligned with the focus of the study. More specifically, involvement depends on the degree the user feels they interact naturally within the virtual environment, as well as feeling part of it. The degree of distraction depends on isolation of the user from their actual, physical environment, the user's selective attention toward the simulation and not their physical world, as well as interface awareness, which depends on the interface devices.

Results
The same Questionnaire was used for evaluation of both the VR-HC and VR-MC system. A questionnaire was handed out to each participant right after they completed the session with the respective system. In this section, we present quantitative and qualitative results derived from the data collected from the Witmer-Singer Questionnaire of each of the VR-HC and VR-MC systems. For each participant, we have derived the value of the aforementioned five factors, by calculating the mean of specific questions for each factor. The questions relevant to each factor, as suggested by Witmer-Singer, go as follows: 1. Control Factor: the mean of the questions 1, 2, 3,5,7,8,9,13,15,16, and 17.

Measures
The system used in each experiment was evaluated by the five factors derived by the Presence Questionnaire that was introduced by Witmer and Singer in 1998 [40]. The questionnaire has been revised so as to not include questions irrelevant to the study; in particular, questions about sound feedback have been removed. Each of the questions are relevant to one of the five factors that reflect the level of presence: Control, Sensory, Involvement, Realism, and Distraction factors. More information about the included questions and their significance to the study can be found in the Appendix A. Following, are the five Factors we used to determine the presence each system offers, according to the Witmer-Singer Questionnaire: • The Sensory Factor depends on sensory modality, environmental richness, multimodal presentation, consistency of multimodal information, degree of movement perception, and active search. Those criteria relate mostly to the virtual environment itself, and, since the virtual environment used in both system simulations is the same, we do not expect this factor to differentiate the systems.

•
The Realism Factor depends on the realism of the scenes in the virtual environment, the consistency of information with the between the physical and the virtual world, as well as the meaningfulness of the experience for the user and the disorientation after they return to the physical world, at the end of the simulation.

•
The Control Factor reflects the degree of control the participant felt they had during the simulation with each system, depending on immediacy of control, anticipation, mode of control, and modifiability of the physical environment.

•
The Involvement and Distraction Factors are the ones of highest priority for the differentiation between the two systems, and completely aligned with the focus of the study. More specifically, involvement depends on the degree the user feels they interact naturally within the virtual environment, as well as feeling part of it. The degree of distraction depends on isolation of the user from their actual, physical environment, the user's selective attention toward the simulation and not their physical world, as well as interface awareness, which depends on the interface devices.

Results
The same Questionnaire was used for evaluation of both the VR-HC and VR-MC system. A questionnaire was handed out to each participant right after they completed the session with the Sensors 2020, 20, 1244 10 of 18 respective system. In this section, we present quantitative and qualitative results derived from the data collected from the Witmer-Singer Questionnaire of each of the VR-HC and VR-MC systems. For each participant, we have derived the value of the aforementioned five factors, by calculating the mean of specific questions for each factor. The questions relevant to each factor, as suggested by Witmer-Singer, go as follows: 1.

4.
Realism Factor: the mean of the questions 7 and 9. 5.
The way in which the values of these factors are interpreted is as follows: the values of each factor range between 0 and 7, using the same seven-point scale format suggested by Witmer and Singer. If the value of a factor is high, then the system is considered to provide a significant sense of presence, regarding this specific factor. Respectively, if the value of a factor is low, then the system is considered to provide a low feeling of presence, based on that factor. First, we calculated the Mean and Standard Deviation values for each Factor of each participant (Table 1). Then a paired samples t-test was conducted between the VR-HC and VR-MC systems for each of the factors. The significance level (alpha value) is a = 0.05 and the degrees of freedom are df = 19. So, from the Table of (Table 2). That was expected, since questions regarding the Realism and Sensory Factors regard the quality of the virtual environment which was equal in both systems. However, even in those two factors, a significant statistical difference can be observed, because of questions 4, 10, and 21 at the Sensory factor and question 7 at the Realism factor. Those questions regard movement and interaction within the virtual environment, as well as similarities with the real world.
By calculating the mean of each factor for each participant, according to Table 3 and Figure 11, we can observe from an external viewpoint the difference between the VR-MC and VR-HC System. The factors of VR-MC received higher values and, consequently, the VR-MC system generates a significant sense of presence. Observing the results of our statistical analysis, as well as the significant statistical difference between the means of each of the five factors taken into consideration, we can conclude that the proposed system indeed accentuates the feeling of presence of the users.    In this experiment, we examined the impact of presence and interaction during a VRET simulation for acrophobia by comparing a standard VR hand-controller interactive system with a VR full-body motion-capture interactive system. Cross-examining both the information gathered during the relevant literature review and the findings from the study, it can be concluded that the use of the Body and Hand Motion Recognition Cameras in the VR-MC system overall is preferred by users, in comparison to the VR-HC system. By examining the significant statistical differences of the results, Figure 11. Comparison of the mean of the Control, Sensory, Involvement, Realism, and Distraction Factors between the VR-HC and VR-MC systems.
In this experiment, we examined the impact of presence and interaction during a VRET simulation for acrophobia by comparing a standard VR hand-controller interactive system with a VR full-body motion-capture interactive system. Cross-examining both the information gathered during the relevant literature review and the findings from the study, it can be concluded that the use of the Body and Hand Motion Recognition Cameras in the VR-MC system overall is preferred by users, in comparison to the VR-HC system. By examining the significant statistical differences of the results, especially for the Involvement and Distraction factors, it is safe to conclude that the fact that the motion tracking in the VR-MC system is performed remotely, improves the quality of the simulation, by making the user feel more present and immersed to the virtual environment. Therefore, VR systems can include movement, as well as full body presence without requiring any additional effort on the user's end, in contrast with current systems, which make movement tracking the user's responsibility, by adding wearable sensors. In conclusion, the addition of motion recognition cameras was proven of vital importance for the user's feeling of presence.

Limitations
Participants: The sample of the study consisted of only young people (ages 18-30, avg. age 24) which could possibly affect the results since young people are usually more familiar with VR technologies and applications. This means that they may become accustomed to the virtual environment and the experience of moving within it, which of course affects their responses to the questionnaire. In addition, the sample was relatively small, consisting only of 20 people, meaning that results could differ at larger scales.
Procedure: The experiment procedure was short in duration and included only two simple tasks for each participant: moving within the virtual environment and interacting with an object within it. Adding more tasks, as well as including more of the participants' senses in the tasks could increase their feeling of presence within the systems and help them develop a more holistic opinion about each one of them. In addition, by giving them the chance to use each system in more than one sessions could help them learn how to navigate within each system better. However, the point of this study is, among other things, to highlight the importance of ease when it comes to motion control on behalf of the user, even from their first interaction with the system, since presence may be diminished until the motion controllers are well learnt [48].
System Limitations: Indeed, as with every technological device used for interaction in VR systems, there is no guarantee that all the users will find it equally convenient and immersive. This can be due to their personal experience with technology, which can help them get accustomed to each system more easily or not, as well as the device's limitations, which are presented below.
The Oculus Hand Controllers were designed to simulate the virtual hands in a very realistic manner, by tracking the position of the controllers, as they move along with the user's hand movements [49]. Even though the user cannot see their hands in the virtual room, they can see the Hand Controllers moving in real time in it, according to their physical movements. Oculus also provides the option to show actual hands instead of the hand controllers. However, it is important to point out that the hand controllers do not really recognize the hands, even though hands are visible in the virtual environment. In reality, the hand visualization is in accordance with the buttons of the Hand Controllers, which are touch-sensitive. Once the user touches a button in the virtual room, it seems that their virtual finger is moving, but if they move their fingers without touching the buttons, no finger movement appears in the virtual room.
As mentioned in the System Description, the Astra S Model has a range of 0.6-8.0m. So, the Camera Sensor can only recognize the body and limbs of the users, while smaller movements of the hands and fingers are not distinguishable; the hands are visualized as spheres. At the same time, the position tracking accuracy of each point is reported by the camera's Confidence Factor; if it is above 80%, then it can be considered accurate. If the user stays within 1-3 m from the camera, the tracking is accurate more than 80% of the time. That does not mean that 20% of the time the virtual body is completely out-of-sync or lost in the virtual environment; it means that the virtual movements are not actually not 100% synchronized for a few milliseconds with the user's actual movements. The difference in position accuracy could be a few millimeters above or below the actual position. The cases of measurements with low confidence factor that can be noticed by the user occur when the user is "hiding" their limbs from the camera and the camera cannot track their joints (i.e., if they place hand behind their back). In other words, the camera works as a mirror; it reflects whatever it sees on a first layer. Of course, the tasks assigned to the participants in this study do not require such movements; all actions that take place can be recognized by the camera. A solution to this issue could be to add another BMRC to the motion tracking system, so that movements that take place away from the first one can still be tracked.
The Leap Motion HMRC allows us to include hands and fingers in the user's virtual skeleton. Attaching the Leap Motion HMRC to the front of the VR headset is suggested by the manufacturer of the controller, Leap Motion, and a special case for the controller that can be stuck to the front of the VR headset is provided. That particular placement of the controller allows the user to view their virtual hands when they are placed in front of them, and the user is looking toward them. That manner of tracking covers most cases of movement when the hands of the user could be in their visual field, even in real life. A scenario in which their hand could not be visible in the virtual environment could occur if the user looked at a different direction than the one their hand is and moves. In real life, thanks to peripheral vision, the user could see their hand work, but in the virtual environment that will not be possible, since the hand will not be in the tracking field of the Leap Motion HMRC.
Additionally, it is important to point out that tracking the movements of a user accurately in order for them to interact with objects in the virtual environment is not enough to increase presence. Touch is a significant sense of the human body and creating a user interface that could apply touch is crucial to improve virtual reality presence. For our current research, Leap Motion HMRC was the middle step, in order to find the value of life-like virtual interaction but does not provide a life-like touch sense of interaction. Current technologies such as Gloves, Suits, and Hand Controllers provide mechanical feedback which is relatively uncomfortable and falls under the original assertion of this study. Although forthcoming technologies such as ultrahaptic feedback [50] could provide the seance of touch without out wearable equipment.

Discussion
After examining the results of our study, we concluded that transferring the means of interaction away from the user and making it the system's responsibility to provide interaction resulted in increased presence. Additional and wearable equipment adds an extra burden on the user, affecting not only the interaction but also the user's behavior during the simulation, and, consequently the way they confront the stimuli. Overall, if properly tracked and presented, the hands are the most natural user interface a system can use to allow the user to interact within it, while the presence of the whole body inside the virtual world gives the user a sense of their virtual existence.
Furthermore, the purpose of our system is to be used by psychologists and psychiatrists as an additional tool to help their patients learn useful skills in order to confront their source of anxiety; in this case, acrophobia. A person that already finds themselves in an anxious environment should be able to at least interact within it in a natural manner and not face the extra anxiety of using the simulation system correctly. This is another reason why it is preferable to include non-wearable tracking methods in such systems.
By analyzing the results of our study, we observed that even though the feeling of presence provided by the VR-MC system was considerably better than that of the VR-HC system, there are still several issues to solve and obstacles to overcome until we come up with a completely immersive VR system for the treatment of phobias and mental illnesses in general. The fact that we are using a single BMRC means that in case the user would have to engage in more complex tasks that require moving in a wider virtual environment, the range of the camera would not be enough. Hence, we propose for further study the addition of more than one BMRCs to provide motion tracking from different angles and allow the user to move completely freely in the enclosing space. Last but not least, reviews [35] highlight that using wireless VR headsets could ensure more comfortable navigation for users in the physical environment during the simulations, so, we also suggest that wireless or mobile VR headsets shall be used in future versions of such systems to examine this matter.
Overall, considering the technical issues of Standard ET such as affordability, the time needed to applicate it, as well as the fact that it is impossible for clinicians to acquire every phobic stimulus possible to apply it, VRET seems to be at least as helpful as Standard ET. Acquisition costs of VR systems have dropped significantly, making it possible for VRET to be applied in a larger scale in clinicians' offices. By continuing researching the potential of systems like the one under study, we could come up with better combination of sensors to actually enhance the treatment methods and help more patients battling mental illness in the long run.

Appendix A
The questionnaires for both the VR-HC and VR-MC experiments had the exact same questions, which are as follows ( Figure A1). The only difference were the titles: (i) Presence Questionnaire for the VR-HC System Simulation and (ii) Presence Questionnaire for the VR-MC System Simulation. The answer to each question was given in the same seven-point scale format that Witmer and Singer suggested. Questions regarding the auditory aspects of the environment were removed from the questionnaire, since there were no sounds involved in the virtual environments. These questions were: "How much did the auditory aspects of the environment involve you?", "How well could you identify sounds?", and "How well could you localize sounds?". Questions marked with an asterisk (*) regard the quality of the visual aspects of the virtual environment rather than the means of interaction within it. The virtual environment was of the same quality and content in both the VR-HC and VR-MC systems, so it is expected that the users' answers to those questions will not differ significantly between the two systems. As explained in the results section, our expectation was correct. The answers of questions marked with a (C) (6, 10, 18, and 19) were taken into account based on their complements in the scale of 1 to 7 (1 = 7, 2 = 6, 3 = 5, 4 = 4, 5 = 3, 6 = 2, 7 = 1) because the answer scale of said questions had the opposite orientation, with the positive meaning being on the left side instead of the right-i.e., it is positive for a control device to not interfere at all, and for the user to answer the question with 1. you?", "How well could you identify sounds?", and "How well could you localize sounds?". Questions marked with an asterisk (*) regard the quality of the visual aspects of the virtual environment rather than the means of interaction within it. The virtual environment was of the same quality and content in both the VR-HC and VR-MC systems, so it is expected that the users' answers to those questions will not differ significantly between the two systems. As explained in the results section, our expectation was correct. The answers of questions marked with a (C) (6, 10, 18, and 19) were taken into account based on their complements in the scale of 1 to 7 (1 = 7, 2 = 6, 3 = 5, 4 = 4, 5 = 3, 6 = 2, 7 = 1) because the answer scale of said questions had the opposite orientation, with the positive meaning being on the left side instead of the right-i.e., it is positive for a control device to not interfere at all, and for the user to answer the question with 1.
Sensors 2020, 20, x FOR PEER REVIEW 16 of 19 Figure A1. The we questioner participants answered after both VR-MC and VR-HC experiments.