2024-11-05

stereoscopy and virtual reality

collection of notes on virtual reality (vr) and stereoscopic image recording and viewing.

stereoscopy involves using two distinct images, one for each eye, to replicate the separate perspectives that each eye would naturally perceive. this technique primarily enhances depth perception. various animals employ different methods to perceive depth, such as analyzing light distribution, projection patterns, and parallax effects. a particularly significant method involves the relative distance of objects as projected onto eyes that are slightly apart. it is important to understand that eyes do not capture three-dimensional information; they receive flat, two-dimensional images. this limitation is why stereoscopy is sometimes referred to as "2.5d" rather than true "3d", as it does not provide full three-dimensional perception.

photographs and videos using one image per eye typically capture only the position and orientation of the camera at the time of recording. consequently, when viewing the result, moving the head will not reveal more of the objects sides. this limitation is another reason why the term "stereoscopic" is often preferred over "3d" which might be misleading. however, in the case that images are dynamically projected from three-dimensional data, such as in computer-generated graphics, and the display device supports head tracking, stereoscopy can simulate a viewing experience that realistically responds to head movements.

links

recording hardware

consumer recording devices are still quite rare in 2024.

single camera

  • beam splitter

    • an attachment to lenses or dual lens

    • two horizontally offset openings that direct the light onto halves of the image sensor

dual camera

  • shutter release, frame processing, and camera settings have to be synchronized
  • when recording videos, multiple cameras may start and record frames not exactly simultaneously, leading to visible latencies and drifts. "i can tell that if the time difference between two videos is more than half of the frame, the video will be unbearable to watch". genlock is used for synchronization
  • the recording on each camera must ideally be identical. using the same model of camera, the same lenses, and the same settings
  • also called 3d rig

mounting options

  • side-by-side: two cameras mounted next to each other

    • the cameras need to be small enough so that the distance between the lens centers is in the desired range
    • option

      • two handheld cameras mounted in portrait position using right-angle brackets, long-side vertical
      • image format changes. 4:3 becomes 3:4 in portrait position. cropping needed if 4:3 is wanted
  • half-mirror: two cameras mounted horizontally offset and also at a vertical right angle (atop or below) with an added mirror to redirect the frontal image to the second camera. a half-mirror lets light pass on one side and reflects it on the other.

    • for cameras that are too large to be mounted side by side

    • mirrors necessarily decrease image quality, at least by a small amount, and will require some extra care to be kept clean

reproduction

  • displays with one image per eye
  • parallax barrier, used for example by the nintendo 3ds
  • polarization filtering
  • color filtering. anaglyph 3d. usually cyan/red. used to be blue/red because cyan filters were hard to produce
  • active-shutter glasses

pupillary distance

the interpupillary distance and the ratio to the distance between stereo images as recorded and displayed has a major influence on realistic depth perception.

if the distance between stereo images does not align correctly with the viewers interpupillary distance (ipd), prominent objects may appear as if viewed with crossed eyes.

  • ipd remains constant, but the angle at which the eyes converge (vergence) changes when focusing on closer objects
  • the distance from the bridge of the nose to the pupil can differ between the left and right eye
  • eye height may also vary between the left and right eye
  • viewing devices can be equipped with either fixed or variable image distance settings. when ipd is variable, it is typically adjusted by moving the two displays closer together or further apart within a specified range
  • the term "hyperstereo" refers to images recorded with a separation greater than the normal human ipd, typically over 70 mm. because the two light streams enter at a wider angle, a greater potential side view of objects is captured, and more depth may be perceived. this effect may appear exaggerated when viewed from close distances and can add depth to otherwise flat-looking distant objects. binoculars sometimes employ a hyperstereo configuration. hyperstereo can make large objects appear as if they were miniatures
  • synonyms: interocular distance, intra-axial distance, stereo baseline

image formats

commonly used are two images taken side-by-side with a horizontal distance close to the interpupillary distance of the viewer. 65mm is a common compromise.

the combined image will retain the same frame dimensions of each individual image overlayed. a rectangular frame with added depth is possible. images can also be taken with a greater field of view using so-called fisheye lenses. this can be used to simulate standing in front of or in a space that wraps around, and allowing the viewer to move the head to look around. this requires a resolution high enough so that the image portion that is actually viewed at any given moment is still detailed enough. a common format for videos with 180 or 360 degrees of view is side-by-side with barrel projection.

the same image compression and file formats as for monoscopic videos are used. the difference to monoscopic videos is that two images are encoded side-by-side as one larger frame.

common formats:

  • by image placement

    • side-by-side: two images next to each other, left and right. most common
    • top-to-bottom: the stereo images are stacked. less common
    • overlayed
  • by projection

    • barrel: 180 or 360 degrees of view projected on bent rectangular surface. see distortion

    • fisheye: 180, 190, or 360 degrees projected onto circles or (half-) spheres

virtual reality headsets

more generally called head-mounted displays.

software standards

tracking

tracking in virtual reality refers to the continuous determination of the spatial position of objects in three-dimensional space. this information is used to simulate the visual changes perceived when moving through a three-dimensional virtual environment.

two primary tracking systems are commonly employed:

  • outside-in tracking

    • lighthouse tracking: in this system, base stations emit invisible, high-frequency infrared light flashes into the surrounding area. devices equipped with multiple sensors calculate their position by measuring the delay between the received signals. using the known, pre-configured positions of the base stations, the device can accurately determine its location in the room, allowing the virtual image to be adjusted accordingly. video explaining how base stations work with steam vr
    • scalability: multiple base stations can be added to provide seamless tracking from various directions across larger spaces
    • multi-device support: several devices can utilize the same base stations simultaneously
    • 360-degree tracking: two base stations are required for 360-degree tracking, as each covers approximately 180 degrees. with only one base station, turning around or moving too far to one side can obscure the signal, leading to tracking interruptions
    • smooth tracking: lighthouse tracking is known for its smooth and uninterrupted performance
  • inside-out tracking

    • headset sensors: cameras or other sensors located within the headset detect light or other signals from the controllers to calculate their positions
    • lighting requirements: this system typically requires a well-lit room for optimal performance
    • controller visibility: controllers generally need to remain within the field of view of the headsets sensors or cameras to maintain accurate tracking
    • room mapping: the headset builds an internal model of the room to calculate its position within the environment
  • full-body tracking

    • full-body tracking refers to the simultaneous tracking of the head, arms, and legs, enabling more immersive and accurate interactions in virtual environments

processing

  • pcvr

    • a separate personal computer does the image calculations and sends the results to the head-mounted displays
    • this offers the most options for high processing power and realistic images
    • a personal computer, and a cable or wireless connection to it, is required
  • standalone and hybrid

    • very limited processing power, comparable to smartphones

hardware

it is important to know that most software assumes and requires two controllers and active tracking.

fourth generation

  • bigscreen beyond

    • directed at apple users because it requires a 200+ euro iphone to buy

    • steamvr support

    • 4320x2160p

    • oled

    • extremely compact and lightweight. ~143x52x49mm, 127g

    • exact dimensions of the parts are public. 3d printed modifications are possible

third generation

after comparing the vive pro with vive controllers, the hp reverb g2, and valve index, i chose the vive pro.

  • htc vive pro

    • 2880x1600
    • oled
    • wireless support. the htc wireless set comes with a pci card that is installed in the pc, and a sender/receive. another sender/receiver is attached to the headset and has a short cable that replaces the long cable that connects to the link box. note however that wireless support also complicates the setup given the battery and connectivity demands
    • shows a gray or blue screen when not tracking. it is not usable without base stations
    • has an integrated camera that could theoretically be used for augmented reality applications
  • a pair of valve index controllers
  • two base stations for gaming. one base station works but is not enough for gaming since one practically can not look to the side or behind

alternatives

  • valve index

    • 2880x1600
    • lcd
    • integrated camera
    • remarkable field of view
  • hp reverb g2

    • 4320x2160
    • price is relatively low
    • inside-out tracking which is really not on par with outside-in tracking

      • does not work in the dark
      • tracking errors outside the headset cameras view
    • the vive pro tracking and oled beats the higher resolution
  • vive pro 2

    • 4896x2448
    • lcd not oled
    • more glare than hp reverb g2
    • wireless support at lower resolution
  • quest two

    • standalone

    • inside-out tracking

limitations and issues

  • resolution

    • 2160p oled per eye is about the minimum for generally enjoyable vr
    • low resolution typically leads to the distances between pixels being visible. this is called screen-door effect, and it is comparable to looking through a mosquito net
  • refresh rate

    • while gaming with 90 frames per second can be enjoyable, 120 frames per second is about the minimum that VR needs to approach convincing realism
    • in reality, the perceived fluidity when moving the head is much greater than what can be simulated with 90 frames per second. with too few frames, the image appears obviously artificial, interrupted and not smooth
  • weight and size

    • while not as apparent in the beginning, after time, weight and size may lead to significantly less movement of the user, and thereby immersion, in the virtual world. it poses an obstacle to usage
  • glare

    • the lenses may re-reflect light so that it appears as additional glow, often visible as arcs or rings
  • field of view

    • a small field of view is comparable to looking through a small window or a pair of swimming goggles
    • a typical field of view of the human eyes is 150 degrees vertically and 210 degrees horizontally
  • cables

    • since cables can usually be lead to go behind the head, this is not the biggest issue. however, cables pose issues with rotational movements. the players position in the real room can slowly rotate over time while playing and the cable gets in the way or is moved around. the cable also adds some weight

lenses

  • bend light from a square screen so that it wraps around the eye in a rough approximation of a wider, rounded field of view
  • change the way light enters the eye so that it is possible to focus on an object that is very close to the eyes, which is usually not possible
  • images are usually displayed with infinity focus

near- or far-sightedness

near- or farsightedness remains relevant even when viewing stereoscopic images. corrective lenses, such as glasses, must be used to ensure that the images do not appear blurry.

this occurs because, in vr headsets, the screens displaying the images are positioned at a fixed distance from the eyes. the lenses within the headset magnify these images, simulating a farther distance. however, the eyes ability to focus on these images depends on the individuals visual acuity. if someone is near- or farsighted, their eyes cannot properly focus on the magnified images without corrective lenses, resulting in blurred vision.

eye-tracking

stereoscopy on the web

  • webxr, experimental and only implemented in chrome browsers. interface to vr head mounted displays
  • youtube can render videos automatically as anaglyph 3d
  • software integrated websites like deovr

composing with depth

applications where added depth may work well: waves, spraying water, underwater environments, caves, interior design, dense urban areas, viewing products online, art installations and sculptures, anything where the volume of objects is of interest, 3d animated movies

applications where added depth may not work as well: fast-moving objects towards the viewer, unsightly views displayed too close, empty rooms and areas, foreground obstructions, excessive parallax and change of distance focus