stereoscopy and virtual reality

overview of virtual reality, vr, and stereoscopic image recording and viewing.


recording hardware

recording devices are still rare in 2023.

single camera

  • beam splitter

    • an attachment to lenses or dual lens

    • two horizontally offset openings that direct the light onto halves of the image sensor

dual camera

  • shutter release, frame processing, and camera settings have to be synchronized
  • when recording videos, multiple cameras may start and record frames not exactly simultaneously, leading to visible latencies and drifts. "i can tell that if the time difference between two videos is more than half of the frame, the video will be unbearable to watch". genlock is used for synchronization
  • the recording on each camera must ideally be identical. using the same model of camera, the same lenses, and the same settings
  • also called 3d rig

mounting options

  • side-by-side: two cameras mounted next to each other

    • the cameras need to be small enough so that the distance between the lens centers is in the desired range
    • option

      • two handheld cameras mounted in portrait position using right-angle brackets, long-side vertical
      • image format changes. 4:3 becomes 3:4 in portrait position. cropping needed if 4:3 is wanted
  • half-mirror: two cameras mounted horizontally offset and also at a vertical right angle (atop or below) with an added mirror to redirect the frontal image to the second camera. a half-mirror lets light pass on one side and reflects it on the other.

    • for cameras that are too large to be mounted side by side

    • mirrors necessarily decrease image quality, at least by a small amount, and will require some extra care to be kept clean


  • displays with one image per eye
  • parallax barrier, used for example by the nintendo 3ds
  • polarization filtering
  • color filtering. anaglyph 3d. usually cyan/red. used to be blue/red because cyan filters were hard to produce
  • active-shutter glasses

pupillary distance

the interpupillary distance and the ratio to the distance between stereo images as recorded and displayed has a major influence on realistic depth perception.

if the distance between stereo images is not correctly aligned to the viewers pupil distance, prominent objects may appear similar to looking with crossed-eyes.

  • ipd in humans varies when looking at closer objects
  • the distance from bridge of the nose to the pupil can vary between the left and right eye
  • eye height can vary between the left and right eye
  • viewing devices can come with a fixed or variable image distance. if the ipd is variable, then it is usually modified within a range by moving two displays closer or further apart
  • the term hyperstereo refers to images recorded set further apart than normal human ipd, roughly over 70 mm. because the two light streams come at a wider angle, more potential side-view of objects is received and more depth may be perceived. this may seem exaggerated at close distances and also adds more depth to otherwise flat-looking objects in the distance. binoculars sometimes use a hyperstereo configuration. hyperstereo can make large objects appear as if they were miniatures.
  • synonyms: interocular distance, intra-axial distance, stereo baseline

near- or far-sightedness

near- or far-sightedness persists even with stereoscopic images. glasses or other types of correcting lenses have to be used when viewing so that images do not appear blurry.

image formats

required for basic human depth perception are two images taken side-by-side with a horizontal distance close to the interpupillary distance of the viewer. 65mm is a common compromise.

the combined image will retain the same frame dimensions of each individual image. a rectangular image with depth is possible. images can also be taken with a greater field of view using so-called fisheye lenses. this can be used to simulate standing in front of or in a space that wraps around, and allowing the viewer to move the head to look around. this requires a resolution high enough so that the image portion that is actually viewed at any given moment is still detailed enough. a common format for videos with 180 or 360 degrees of view is side-by-side with barrel projection.

the same image compression and file formats as for monoscopic videos are used. the difference to monoscopic videos is that two images are encoded side-by-side.

common formats:

  • by image placement

    • side-by-side: two images next to each other, left and right
    • top-to-bottom: the stereo images are stacked
    • overlayed:
  • by projection

    • barrel: 180 or 360 degrees of view projected on bent rectangular surface. see distortion

    • fisheye: 180, 190, or 360 degrees projected onto circles or (half-) spheres

it has to be noted that the full information of the three-dimensional world is not captured by the recording of two two-dimentional images from separate viewpoints. the recording can only reproduce the position and direction of the camera when recording. this means, for example, that when looking at an image, moving the head will not reveal more of the sides of an object. this is also the reason why "stereoscopic" photography is often the preferred term, as saying "3d" recording might be ambiguous.

if the content is actually generated from three-dimensional information, then it is possible with stereoscopy to simulate realistic three-dimensional visual experiences. for example, monoscopic 3d video games calculate the three-dimensional shapes of objects, and project them onto a two-dimensional display relative to the players in-game position and viewing direction, and ideally also considering distance dependent ipd.

virtual reality headsets

more generally called head-mounted displays.

software standards

linux support: the author failed to get any vr applications to function under linux.


tracking refers to the continual determination of the spatial position of three-dimensional objects. this information can be used to simulate the visual changes perceived when moving through a three-dimensional environment.

two tracking systems have been established:

  • outside-in

    • with lighthouse tracking, so-called base stations send invisible, high-frequency, infrared light flashes into an area. devices calculate their position based on delays of the received signal between multiple sensors on the device. combined with the known, pre-configured position of base stations, the device can be localized in the room and the virtual image modified correspondingly. video explaining how base stations work with steam vr
    • multiple base stations can be added to track, without obstruction, seamlessly from multiple directions and across larger spaces
    • multiple devices can use the same base stations
    • two base stations are the minimum required for 360 as the base stations cover about 180 degrees each on the back and front. with only one base station turning around, or too far to one side, easily obscures the sight to the base station
    • lighthouse tracking is smooth and without interruptions
  • inside-out

    • cameras or other types of sensors inside the headset receive the light or other signals from the controllers to calculate their position

    • usually requires a well lit room

    • controllers usually have to be in view of the headset sensors/cameras

    • the headset builds an internal model of the room to calculate the headset position in the room

full-body tracking refers to tracking the position of the head, arms, and legs at once.


  • pcvr

    • a separate personal computer does the image calculations and sends the results to the head-mounted displays
    • this offers the most options for high processing power and realistic images
    • a personal computer, and a cable or wireless connection to it, is required
  • standalone and hybrid

    • very limited processing power comparable to smartphones

2024 hardware

  • promising upcoming device: bigscreen beyond

    • directed at apple users because it requires a 200+ euro iphone to buy

    • steamvr support

    • 4320x2160p

    • oled

    • extremely compact and lightweight. ~143x52x49mm, 127g

    • exact dimensions of the parts are public. 3d printed modifications are possible

2023 hardware

note that most software assumes and requires two controllers and active tracking.


after comparing the vive pro with vive controllers, the hp reverb g2, and valve index:

  • htc vive pro 1

    • 2880x1600
    • oled
    • wireless support. the htc wireless set comes with a pci card that is installed in the pc, and a sender/receive. another sender/receiver is attached to the headset and has a short cable that replaces the long cable that connects to the link box
    • shows a gray or blue screen when not tracking. it is not usable without base stations
    • has an integrated camera that could theoretically be used for augmented reality applications
  • a pair of valve index controllers
  • two base stations for gaming


  • valve index

    • 2880x1600
    • lcd
    • integrated camera
    • remarkable field of view
  • hp reverb g2

    • 4320x2160
    • price is relatively low
    • inside-out tracking which is really not on par with outside-in tracking

      • does not work in the dark
      • tracking errors outside the headset cameras view
    • the vive pro tracking and oled beats the higher resolution
  • vive pro 2

    • 4896x2448
    • lcd not oled
    • more glare than hp reverb g2
    • wireless support at lower resolution
  • quest two

    • standalone

    • inside-out tracking

limitations and issues

  • resolution

    • 2160p oled per eye is about the minimum for generally enjoyable vr
    • low resolution typically leads to the distances between pixels being visible. this is called screen-door effect, and it is comparable to looking through a mosquito net
  • refresh rate

    • while gaming with 90 frames per second can be enjoyable, 120 frames per second is about the minimum that VR needs to approach convincing realism
    • in reality, the perceived fluidity when moving the head is much greater than what can be simulated with 90 frames per second. with too few frames, the image appears obviously artificial, interrupted and not smooth
  • weight and size

    • while not as apparent in the beginning, after time, weight and size may lead to significantly less movement of the user, and thereby immersion, in the virtual world. it poses an obstacle to usage
  • glare

    • the lenses may re-reflect light so that it appears as additional glow, often visible as arcs or rings
  • field of view

    • a small field of view is comparable to looking through a small window or a pair of swimming goggles
    • a typical field of view of the human eyes is 150 degrees vertically and 210 degrees horizontally
  • cables

    • since cables can usually be lead to go behind the head, this is not the biggest issue. however, rotational movements are significantly limited. the players position in the real room can also rotate slowly over time while playing and the cable gets in the way or is moved around. the cable also adds weight


  • bend light from a square screen so that it wraps around the eye in a rough approximation of a wider, rounded field of view
  • change the way light enters the eye so that it is possible to focus on an object that is very close to the eyes, which is usually not possible
  • images are usually displayed with infinity focus


stereoscopy on the web

  • webxr, experimental and only implemented in chrome browsers. interface to vr head mounted displays
  • youtube can render videos automatically as anaglyph 3d
  • software integrated websites like deovr

composing with depth

applications where added depth works well: landscapes, waves, caves, interior design, viewing products online, anything where the volume of objects is of interest, 3d animated movies

applications where added depth does not work as well: fast moving objects towards the viewer, unsightly views displayed too close, empty rooms