Mind the Depth: Visual Perception of Shapes Is Better in Peripersonal Space

Closer objects are invariably perceived as bigger than farther ones and are therefore easier to detect and discriminate. This is so deeply grounded in our daily experience that no question has been raised as to whether the advantage for near objects depends on other features (e.g., depth itself). In a series of five experiments (N = 114), we exploited immersive virtual environments and visual illusions (i.e., Ponzo) to probe humans’ perceptual abilities in depth and, specifically, in the space closely surrounding our body, termed peripersonal space. We reversed the natural distance scaling of size in favor of the farther object, which thus appeared bigger, to demonstrate a persistent shape-discrimination advantage for close objects. Psychophysical modeling further suggested a sigmoidal trend for this benefit, mirroring that found for multisensory estimates of peripersonal space. We argue that depth is a fundamental, yet overlooked, dimension of human perception and that future studies in vision and perception should be depth aware.

, possibly coordinating automatic defensive behavior whenever necessary (Graziano & Cooke, 2006). Furthermore, objects lying in proximity to the body might more often be candidates for manipulation, and thus, the enhanced PPS processing might reflect an attempt to maximize prehension efficiency or any voluntary action toward these objects (Brozzoli, Ehrsson, & Farnè, 2014;Brozzoli, Gentile, & Ehrsson, 2012). The functional linkage between PPS and actions, supported by neurophysiological and anatomical evidence from primate work (for a review, see Makin, Holmes, Brozzoli, & Farnè, 2012), prompted the idea that visual processing in PPS would mainly rely on the dorsal visual stream, optimized for action, whereas visual processing beyond it, in extrapersonal space (EPS), would mainly rely on the ventral stream, optimized for perception (Milner & Goodale, 2008;Previc, 1990).
This presupposed division of labor indicates that object detection would be more efficient for stimuli appearing close to the body, in light of the recruitment of parietal networks tapping on magnocellular processing (Milner & Goodale, 2008). This has been generally confirmed (de Gonzaga Gawryszewski et al., 1987;Plewan & Rinkenauer, 2017). In contrast, object discrimination would be more efficient for stimuli appearing far from the body, in light of the enhanced reliance on a ventral, parvocellular pathway (Goodale & Milner, 1992). Because retinal size scales with physical distance, it appears sound to ascribe perceptual processing in EPS to a subset of neurons that present higher spatial resolution (Goodale & Milner, 1992). However, to be appropriate, automatic defensive reactions to objects in the PPS require the brain to quickly discern whether objects are indeed harmful (e.g., bees) or not (e.g., ladybugs). Similarly, voluntary appetitive actions on objects in the PPS would require discriminating between the shapes of the objects. We therefore hypothesized that object discrimination may also benefit from PPS processing. To date, whether object-discrimination abilities are superior in PPS or EPS remains unanswered.
Here, we capitalized on immersive virtual environments that, compared with 2-D settings, provide clear depth percepts. We presented geometric shapes either close (50 cm, within PPS) or far (300 cm, in EPS) from healthy volunteers engaged in a shape-discrimination task (depth was thus irrelevant and orthogonal to the task at hand). Our aims were (a) to compare discrimination abilities in PPS and EPS when retinal-size scaling is artificially teased apart, (b) to explore the determinants of any depth-related difference (i.e., perspective vs. binocular cues), and (c) to model the spatial distribution of discrimination abilities in depth.
In the first experiment, we found that discrimination abilities are superior for stimuli presented in PPS compared with stimuli presented outside PPS, despite far stimuli having the same retinal size (thus looking bigger). In Experiment 2, we found that this advantage persists in a 2-D setting exploiting perspective cues (i.e., in the context of the Ponzo illusion), thus showing that binocular depth cues are not necessary in order to highlight an advantage for PPS. Experiment 3 further replicated results from the first experiment, ruling out a potential confound related to upper/lower visual field covariance with depth-that is, stimuli were presented at the same height (at fixation). In Experiment 4, retinal size was naturally scaled as a function of distance, allowing us to estimate the typical strength of the PPS advantage in more ecological settings. Finally, in Experiment 5, we presented shapes at six different distances and found that benefits over performance follow a sigmoidal trend, closely mirroring that found in studies using multisensory integration to probe PPS boundaries (Canzoneri et al., 2012;Ferri et al., 2015;Teneggi et al., 2013).

Participants
Participants were healthy volunteers who were enrolled in the study after we obtained informed written consent. They were all students of the University Claude Bernard of Lyon, were recruited through web advertising, and were paid for their participation. None of the participants had a history of neurologic or psychiatric disorders, and the vision of all participants was normal or corrected to normal.
We had no prior beliefs or pilot data to estimate a realistic effect size. We recruited 20 participants for Experiment 1 because this number reflects the average sample size in similar PPS studies. Once results were obtained, a power analysis (paired-samples t test, Cohen's d = 0.6, α = .05, one-tailed) indicated a minimum of 19 participants to reach a power of .8 (the effect size from Experiment 1 was computed as if reflecting a between-participants design, and thus, this power analysis revealed itself to be conservative). About 20 participants were thus enrolled for each of the following experiments, except for Experiment 2, which was performed concurrently with a parallel experiment that required a larger sample size. The recruitment was made independently for each of the five experiments, but recruitments for Experiments 3 and 4 were made in parallel, and a few participants completed both experiments; those participants always performed Experiment 3 before Experiment 4. In no case were optional stopping procedures applied; the experiments ended either because the prespecified number of participants was reached or (in Experiment 2) because other experiments running in parallel stopped as well. Thus, the significance of the results was never considered as a criterion to stop or continue data collection. A summary of demographic information for each experiment is reported in

Materials and apparatus
We adapted the task designed by O'Connor, Meade, Carter, Rossiter, and Hester (2014), which was originally employed to test spatial sensitivity to reward, reported to be reduced in far relative to near space. In Experiments 1, 3, 4, and 5, participants wore a virtual-reality headset (Oculus Rift; https://www.oculus.com). The experiments were implemented within Unity (Version 5.1.2; Unity Technologies, San Francisco, CA) and Oculus Runtime (Version 0.6; Facebook Technologies Ireland, Dublin, Ireland) software, which were used to create the virtual environment, display experimental stimuli on the head-mounted display, and record participants' responses. The experiments were run on a computer with an Intel Core i7 processor, AMD Fire Pro M6000 graphics card, and Windows 7 operating system. The scene was rendered in Oculus Rift DK2 software, with a resolution of 960 × 1,080 per eye, a frequency of 75 Hz, and a field of view equal to 106°.
In Experiment 2, participants faced a 15-in. screen at a distance of approximately 57 cm. The open-source software OpenSesame (http://osdoc.cogsci.nl/) was used to display experimental stimuli and record participants' responses. Stimuli were obtained with professional designing software (SolidWorks; Dassault Systèmes, Waltham, MA). The rendering of an empty room was designed to introduce depth cues by exploiting a Ponzo-like illusion. A very similar empty room was also created and presented as a virtual environment in Experiments 1, 3, 4, and 5.
Across all experiments, we obtained different distance conditions by presenting red, green, or blue shapes (cubes or spheres) at different positions. Shapes were presented close to (50 cm) or far from (300 cm) the observer in the virtual environment. Note that only close shapes were within reachable distance. In Experiment 2, shapes were presented in either the bottom or upper part of the grid, providing 2-D perspective cues; thus, shapes presented in the bottom of the grid were illusorily perceived to be closer to participants. Finally, in Experiment 5, shapes were presented at six equidistant points, ranging from 50 to 300 cm.
The retinal size of the shapes (≈14° of visual angle in the 3-D experiments, ≈2.2° in the 2-D experiment) was kept constant across distances and shapes, resulting in the more distant shapes being larger (Experiments 1 and 3) or appearing illusorily larger because of the perspective (Experiment 2). In Experiments 4 and 5, retinal size was naturally scaled: Farther shapes had the same real dimensions as closer ones, and thus retinal size was smaller.
In Experiment 1, closer shapes appeared in the bottom part of the visual field (below the fixation cross), and farther ones appeared in the upper visual field. In Experiment 2, the Ponzo-like illusion display imposed the same up-down arrangement by design (to allow a proper depth illusion). We ruled out this potential confound in Experiments 3, 4, and 5, in which all shapes were presented at the same height as the fixation cross. For all experiments, a further rendering included a cross, which was used as a fixation point across all trials. The position of the cross was midway between close and distant shapes (175 cm). Participants provided responses to object shape by means of keyboard presses (B and N keys on a standard QWERTY keyboard) using the index and middle fingers of their dominant hand. Figure 1 depicts the main features manipulated in each experiment.

Procedure
Participants sat in a dark, quiet room, with their head restrained by a chin rest. Each trial was composed of a first fixation phase (500 ms), followed by the presentation of a stimulus randomly chosen among the combination of shape (cube or sphere), color (red, green, blue), and distance (close or far). Stimuli were presented up to a maximum of 750 ms and were replaced by feedback (text presented for 1,000 ms) as soon as a response was provided. Participants were told that responses slower than 500 ms and faster than 100 ms would be discarded, to discourage anticipations or slow responses; they were asked to respond as quickly and accurately as possible using their index finger to indicate a cube and middle finger for a sphere if responding with their rightdominant hand (the opposite finger assignment was given to left-handers). In our design, distance was therefore irrelevant to the task and orthogonal to the response. All participants underwent a brief 24-trial practice block before starting the experiment, which consisted of another four blocks of trials. In Experiments 2 to 4, there were 60 trials each (240 trials overall). In Experiment 5, each of the four blocks was composed of 108 trials (432 overall). In Experiment 1, the whole procedure was repeated twice (i.e., four blocks of 60 trials each × 2), with a postural manipulation defining the two sessions: We asked participants to place their unseen nondominant hand in two different positions, namely, close to the chin rest (about 10 cm from their body) or farther away (roughly 50 cm from their body and therefore close to where the near virtual shape was presented). The order for hand position was counterbalanced across participants. We dropped this factor in the subsequent experiments because it had no effect on performance. The hand was therefore kept at about 10 cm from the body in the subsequent experiments.
About halfway through and after each experiment exploiting virtual reality (i.e., Experiments 1, 3, 4, and 5), we asked participants whether they had perceived two different distances and then to provide an approximate estimation for each of them. We used estimated distances given after the experiment (to allow adjustments after the initial response) to check for the presence of an effective depth perception. Several authors (for a review, see Renner, Velichkovsky, & Helmert, 2013) have found that explicit distance judgments are often underestimated by up to about 75% of the intended depths. Although here we probed the effect of distance implicitly, as it was task irrelevant, we use the labels "close" and "far" throughout the text and refrain from linearly mapping unities of the virtual environment to real distances.

Analyses
The raw data, the full analysis pipeline, and additional graphical depictions for all experiments can be found in the Supplemental Material available online. Data, excluding practice trials, were analyzed with the opensource software R (R Core Team, 2008). Accuracy and response times (the latter for responses that were both accurate and given within the window of 100-500 ms) Close Far Fig. 1. The main features of each experiment. Experiment 1 exploited a 3-D virtual-reality setting. Shapes were presented close to (50 cm) or far away from (300 cm) participants, below the fixation cross; this resulted in close shapes always being perceived to be lower than farther ones. Retinal size was kept constant. The proprioceptive input coming from the position of the hand was manipulated to be close to or far from the closer shape. Experiment 2 exploited a Ponzo illusion in a 2-D display. Shapes were presented in the lower (close) or upper (far) visual field. Retinal size was kept constant. Experiment 3 exploited a 3-D virtual-reality setting. Unlike in Experiment 1, shapes were presented at the fixation level, and their position on the transverse axis and retinal size were kept constant. Experiment 4 exploited a 3-D virtual-reality setting. Shapes were presented at the fixation level, and their position on the transverse axis was kept constant, but retinal size varied, being naturally scaled as a function of distance. Experiment 5 exploited a 3-D virtual-reality setting. Shapes were presented at the fixation level and at six different distances (50,100,150,200,250, and 300 cm, labeled D1 to D6 here). Retinal size was scaled as a function of distance.
were analyzed through mixed-effects multiple regression models (Baayen, Davidson, & Bates, 2008). A great advantage of mixed-effects models is that they are based on single-trial data (rather than on averaged data), they do not assume independence among observations, and the model-fitting procedure takes into account the covariance structure of the data, including random effects (i.e., individual variability). Models had a logistic link function, appropriate for binary variables, when assessing accuracy. As a first step, we defined a model containing the random effects. Linear mixed models generalize best when one includes the most complex random structure that does not prevent model convergence (Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017). Random effects were introduced sequentially, and their effect on model fit was assessed using likelihood tests (i.e., we compared the residuals of each model and chose the one with significantly lower deviance as assessed by a chisquare test). A random intercept for participant was included in all models. We then tested the contribution of random slopes for distance, hand position (Experiment 1 only), color of the presented shape, and shape. The latter variable (i.e., the presented shape, cube, or sphere) also indicates the response effector (i.e., index or middle finger), as contingencies were blocked for each participant, and thus indexes differences in discrimination performance of cubes compared with spheres and of responses with one effector over another. Finally, we also tested n-way interactions of random slopes that were previously retained in the models.
The models with the final random-effects structure were then used to evaluate the role of fixed effects. We used a stepwise Type 2 approach and likelihood tests to assess whether the improvements in model fit were statistically significant. Parametric bootstrapping was used to obtain 95% confidence intervals (CIs) for the beta coefficients and thus to evaluate the distribution of estimated mean differences between the levels of a factor. Additional analyses (e.g., analyses of variance, t tests) were also performed and are reported in the Supplemental Material in the Robustness Checks sections. All the robustness checks fully confirmed the results from the main inferential approach.
In Experiment 5, we explicitly required models to have only a random slope and fixed effect for distance. This allows obtaining, for each participant, estimates of the performance that are weighted by the random effects themselves and by the participant-specific and group-specific variances (e.g., noise; Baayen et al., 2008). We used such random slopes as dependent variables and evaluated which curve (among linear, logarithmic, exponential, and sigmoidal) best described their relationship with depth (the independent variable). The models' formulas are reported in Table  2. Nonlinear least-squares estimations were obtained using the nls() function in R, and goodness of fit was evaluated by means of both root-mean-square error (RMSE) and the Akaike information criterion (AIC). The first is a measure of dispersion of residuals, whereas the latter is best used for model comparison and accounts for both goodness of fit and complexity of the models. Because the fourth model (sigmoidal) included two more parameters, the AIC introduced a more severe penalization aimed at decreasing the chances of overfitting noise.

Experiment 1
Preliminary selection of random effects. The null models included random slopes for hand position and shape when accuracy was assessed. The best matrix of random effects for response times was more complex because it included a further random slope for distance and the Distance × Shape interaction. We used these specifications to test the contribution of fixed effects through a chi-square test for goodness of fit.
Discussion. In this experiment, visual stimuli were presented in an immersive 3-D setting using a virtual-reality headset. Despite the retinal size of different shapes being kept constant, and the farther ones being-and appearing-much bigger, we observed a response advantage to objects presented in PPS, even if they looked smaller (see Fig. 2). Whether participants placed their unseen nondominant hand close to, or far from, the more proximal virtual shape had no role in modulating the distance effect. This suggests that when only proprioception is available, the shape-discrimination advantage in PPS is not hand centered.

Experiment 2
Preliminary selection of random effects. No random slope improved model fit when accuracy was assessed; random slopes for distance and shape, together with their interaction, were selected when the role of fixed effects over response times was assessed.
Accuracy. Accuracy was high for both the close condition (M = 93.8%, SD = 5.3%) and far condition (M = 94.4%, SD = 4%). Distance, χ 2 (1, N = 32) = 1.52, p = .217, did not improve model fit. Thus, it had no effect on the odds of producing an accurate response. Discussion. The perception of depth allowed by the virtual-reality headset is due to both binocular cues (ocular disparity) and related ocular vergence, as well as to perspective cues. To isolate the role played by perspective cues in this experiment, we presented stimuli on a 2-D screen, using the rendering of an empty room as a background (Ponzo illusion). We still observed the advantage for shapes that appeared-illusorily-closer to participants, indicating that perspective cues alone are sufficient for the PPS advantage to emerge (see Fig. 2).

Experiment 3
Preliminary selection of random effects. No random slope improved model fit when accuracy was assessed; a random slope for shape was instead introduced when the role of fixed effects over response times was assessed.
Accuracy. Accuracy was high for both the close position (M = 92.9%, SD = 4.4%) and far position (M = 93%, SD = 4.6%). Distance, χ 2 (1, N = 21) = 0.01, p = .911, did not improve model fit. Thus, it had no effect on the odds of producing an accurate response. Discussion. In both Experiments 1 and 2, depth covaried with the height of stimuli in the visual field, such as in ecological situations in which closer objects usually appear in the lower hemifield (Previc, 1990). Nevertheless, even when shapes were presented along the same gaze line in Experiment 3 (and hence, such a potential confound was ruled out), the advantage for stimuli in PPS was confirmed (see Fig. 2).

Experiment 4
Preliminary selection of random effects. The random slope for distance improved model fit when accuracy was assessed; a further random slope for shape, together with its interaction term with distance, was included when response times were assessed. We used these specifications to test the contribution of fixed effects through a chi-square test for goodness of fit.
Accuracy. Accuracy was high for the close position (M = 89.2%, SD = 5.6%) and slightly, but not significantly, lower for the far position (M = 86.6%, SD = 8%). Distance, χ 2 (1, N = 21) = 3.38, p = .066, did not improve model fit. Discussion. This experiment, performed in ecologically veridical conditions in which farther objects appeared smaller than closer ones, demonstrates that the natural distance scaling of size substantially enhances the PPS advantage (see Fig. 2). As in the previous experiments, results cannot be ascribed to speed/accuracy trade-offs.

Experiment 5
Psychophysical modeling. Random slopes (for both accuracy and response times) were fitted for each participant and for the group average to four different equations (see Table 2). At the group level, a sigmoidal trend emerged when we assessed both accuracy (sigmoidal AIC = −8.5; exponential AIC = −7.94) and response times (sigmoidal AIC = 36.44; exponential AIC = 36.95; linear AIC = 37.37). At the individual participant level, the sigmoidal trend obtained the best performance for all participants and for both response times and accuracy when using the RMSE as an index of goodness of fit. The AIC was less conclusive. When response times were assessed, the AIC still favored the sigmoidal trend for 11 participants out of 20, but for the remaining participants, the exponential curve was preferred. The results when fitting accuracy were similar, but the exponential curve was favored for 11 participants; of the remaining participants, 8 showed a sigmoidal trend, and only 1 showed a logarithmic trend. Results are summarized in Table 3.

Discussion.
To model the spatial tuning of shape discrimination as a function of depth in Experiment 5, we presented shapes, not corrected for retinal size, at six different, equidistant points ranging from 50 cm to 300 cm. The fit to empirical data for several theoretical curves (sigmoidal, linear, logarithmic, and exponential) was then contrasted. A sigmoidal trend emerged at the group level when we assessed both accuracy and response times (see Fig. 3). Thus, the PPS advantage follows a sigmoidal trend, similar to what is commonly observed in studies using multisensory integration paradigms to assess the PPS boundary (e.g., Canzoneri et al., 2012;Ferri et al., 2015;Teneggi et al., 2013), except that here, only the visual modality was involved.

General Discussion
Throughout the same discrimination task, the features of different visual shapes were progressively stripped of important depth cues: (a) retinal-size differences in Experiments 1 and 3, (b) binocular cues as well as convergent and divergent eye movements in Experiment 2, and (c) upper/lower visual field covariance with depth in Experiments 3 and 4. Despite such drastic reductions, which ultimately left the mere illusion of depth perception, participants remained faster in discriminating close shapes in the absence of speed/accuracy trade-offs. This firmly indicates that close space is, per se, special and benefits from enhanced perceptual processing, even under extremely disadvantageous conditions (i.e., closer shapes being clearly smaller). It would be tempting to ascribe the PPS advantage in one of the most fundamental perceptual properties of objects such as shape discrimination to a specialized neural system. However, the  Fig. 3. Results from Experiment 5, in which we presented shapes at six different depths. Group-wise predicted sigmoidal curves are shown for mean accuracy (left panel) and mean response time (RT) advantage (right panel) as a function of distance (labeled here from D1, close, to D6, far). Error bars show standard errors of the mean. The y-axes refer to the odds of providing a correct response (accuracy) and the relative RT advantage observed with respect to participant-specific mean performance. ventral/dorsal dichotomy alone, although extensively supported by physiological and neuropsychological studies, cannot readily account for the PPS-dependent advantage in visual shape discrimination. It is beyond dispute that this dichotomy is not so strict (Milner & Goodale, 2008;Zachariou et al., 2015), and the dorsal pathway contains object representations that are, to some extent, independent from ventral ones (Freud, Culham, Plaut, & Behrmann, 2017;Freud, Ganel, et al., 2017;Freud, Plaut, & Behrmann, 2016;Quinlan & Culham, 2007;Wang, Li, Zhang, & Chen, 2016) and might contribute to perception. Additional candidate regions are a set of inferior parietal and premotor areas (Brozzoli et al., 2011;Fogassi et al., 1996;Graziano & Cooke, 2006;Rizzolatti et al., 1983) that are known to preferentially respond to stimuli presented in PPS. The latter neural network, which also includes the putamen (Graziano & Gross, 1993), contains a majority of neurons with bimodal (i.e., visual and tactile) receptive fields coding for PPS (Brozzoli et al., 2011;Fogassi et al., 1996), together with unimodal (visual) neurons. This network seems thus ideally suited to subserve the advantage in discriminating close versus far objects reported here. Whereas future studies may tease apart the contribution of unisensory versus multisensory neurons in driving this advantage for PPS, here we disclose that depth per se, even when completely irrelevant for the situation at hand, helps to determine people's visual perception of shapes, independently of physical size. In addition, we found that the sigmoidal performance curve, considered the fingerprint of the multisensorydefined boundary of PPS, can actually also be found for merely unimodal visual stimuli. The visual modality alone, therefore, can capture functional features of PPS that were previously thought to be exquisitely multisensory. Although we cannot state, at present, the extent to which visual and multisensory PPSs overlap, these findings open up new considerations in the evergrowing field of multisensory research: The convergence of multiple senses might not be a necessary feature to explain behavioral advantages in close space or even to probe PPS. We thus urge researchers conducting future studies to be depth aware, to better frame human visual abilities that are not homogeneously distributed in the three dimensions of the space around us.

Action Editor
Philippe G. Schyns served as action editor for this article.

Author Contributions
F. Hadj-Bouziane and A. Farnè contributed equally to this study. A. Farnè, F. Hadj-Bouziane, and E. Blini designed the experiments. C. Desoche, A. Kabil, and R. Salemme programmed the tasks. E. Blini collected the data, performed statistical analyses, and wrote the first draft of the manuscript.
All the authors discussed and reviewed the manuscript and approved the final version for submission.