2025-08-09

stereoscopy and virtual reality

collection of notes on virtual reality (vr) and stereoscopic image recording and viewing.

core principles

stereoscopy involves using two distinct images, one for each eye, to replicate the separate perspectives that each eye would naturally perceive. this technique primarily enhances depth perception. various animals employ different methods to perceive depth, such as analyzing light distribution, projection patterns, and parallax effects. a particularly significant method involves the relative distance of objects as projected onto eyes that are slightly apart. eyes do not capture three-dimensional information; they receive flat, two-dimensional images. this is why stereoscopy is sometimes referred to as "2.5d" rather than true "3d", as it does not provide full volumetric perception. photographs and videos using one image per eye typically capture only the position and orientation of the camera at the time of recording. consequently, when viewing the result, moving the head will not reveal more of the objects' sides. this is another reason why "stereoscopic" is often preferred over "3d", which can be misleading. if images are dynamically projected from three-dimensional data (e.g. computer graphics) and the display device supports head tracking, stereoscopy can simulate a viewing experience that responds realistically to head movements.

recording

single-camera systems

  • beam splitter

    • attachment to lenses or dual lens

    • two horizontally offset openings that direct light onto halves of the image sensor

dual-camera systems

  • shutter release, frame processing, and camera settings must be synchronized
  • in video, multiple cameras may start and record frames not exactly simultaneously, causing latency/drift: time difference over half a frame makes videos unbearable
  • genlock used for synchronization
  • cameras should be identical model, lenses, and settings
  • also called 3d rig

mounting options

  • side-by-side

    • cameras small enough to keep desired lens-center spacing
    • example: two handheld cameras mounted in portrait position using right-angle brackets

      • image format changes (4:3 becomes 3:4), cropping needed if 4:3 desired
  • half-mirror

    • cameras offset horizontally and vertically with mirror to redirect image to second camera

    • for cameras too large for side-by-side

    • mirrors reduce image quality slightly and require careful cleaning

hardware examples

consumer recording devices are still rare in 2024.

display / reproduction

viewing methods

  • displays with one image per eye
  • parallax barrier (e.g. nintendo 3ds)
  • polarization filtering
  • color filtering: anaglyph 3d (cyan/red common; blue/red historically due to filter limitations)
  • active-shutter glasses

image formats

two images side-by-side at horizontal distance close to viewer's interpupillary distance (ipd), ~65 mm common. same compression and file formats as monoscopic, but two images encoded in one larger frame. by placement:

  • side-by-side: most common
  • top-to-bottom: stacked
  • overlayed by projection:
  • barrel: 180°/360° projected on curved rectangle (distortion)
  • fisheye: 180°, 190°, or 360° projected onto circle or sphere high-fov images (fisheye, barrel) require high resolution for head movement without pixelation.

interpupillary distance (ipd)

interpupillary distance and stereo image separation ratio strongly affect realism. if stereo image separation misaligns with viewer ipd, objects may appear cross-eyed.

  • ipd constant; vergence varies with focus distance
  • bridge-to-pupil distance can differ between eyes
  • eye height may differ between eyes
  • viewing devices: fixed or variable ipd adjustment
  • hyperstereo: separation >70 mm for exaggerated depth, better distant-object perception, miniature effect for large objects
  • synonyms: interocular distance, intra-axial distance, stereo baseline

virtual reality

vr headset fundamentals

also called head-mounted displays. most software assumes two controllers and active tracking.

tracking

tracking: continuous 3d position determination to adjust virtual environment view.

  • outside-in tracking

    • lighthouse: ir flashes from base stations, device sensors calculate position from timing video explanation
    • multiple base stations for scalability
    • multi-device support
    • 360° tracking requires two base stations
    • known for smooth performance
  • inside-out tracking

    • headset cameras/sensors detect controller signals
    • needs well-lit environment
    • controllers must stay in headset's field of view
    • builds internal room map
  • full-body tracking: simultaneous head, arm, leg tracking

processing

  • pcvr

    • pc calculates and sends images to headset
    • highest processing power and realism
    • requires pc + cable or wireless link
  • standalone/hybrid: processing power similar to smartphones

hardware generations

fourth generation

  • bigscreen beyond

    • steamvr support

    • 4320x2160p, oled

    • compact/lightweight (~143×52×49 mm, 127 g)

    • public exact dimensions for 3d printing

    • requires iphone (€200+)

third generation

after comparing vive pro + controllers, hp reverb g2, valve index, chose vive pro.

  • htc vive pro

    • 2880x1600, oled
    • wireless option (pci card, tx/rx modules)
    • requires base stations; gray/blue screen otherwise
    • integrated camera
  • valve index controllers
  • two base stations for gaming (one insufficient) alternatives:
  • valve index: 2880x1600, lcd, wide fov
  • hp reverb g2: 4320x2160, low price, inside-out tracking (weaker than outside-in), fails in dark, tracking blind spots
  • vive pro 2: 4896x2448, lcd (glare > hp g2), wireless at lower res
  • quest 2: standalone, inside-out tracking

limitations and issues

  • resolution: ~2160p oled/eye minimum for enjoyable vr lower res -> screen-door effect
  • refresh rate: 120 fps needed for near-realistic fluidity 90 fps visibly less smooth
  • weight/size: reduces movement, immersion over time
  • glare: lens re-reflections cause arcs/rings
  • field of view: small fov feels like looking through goggles; human fov ~150° vertical, ~210° horizontal
  • cables: hinder rotation; add weight

lenses

  • bend light to widen fov
  • allow focus on very close objects
  • images usually at infinity focus

near- or farsightedness

corrective lenses still required in vr to avoid blur, as headset displays are fixed-distance and magnified by optics.

eye-tracking

software & web integration

applications

works well: waves, spraying water, underwater, caves, interiors, dense urban areas, product viewing, art installations, sculptures, volumetric objects, 3d animation. works poorly: fast-moving objects toward viewer, close unsightly views, empty rooms, foreground obstructions, excessive parallax/focus changes.

references & links