2025-05-26

implementation of digital music creation software utilities

basic input/output

typically the output is a series of sample values that represent pressure over time. to save memory, samples are processed in blocks (arrays) rather than one by one. processing each sample individually adds significant overhead due to repeated parameter setup. block-based processing is more efficient.

what sample formats and rates to use

  • possible sample formats include floating point, fixed point, or integer
  • many library functions use 64 bit ieee floating point (for example sin and fft). larger formats reduce rounding errors
  • floating point supports a wide dynamic range but is slower than integer or fixed point. summing floats can quickly accumulate error if no compensation is used
  • integer formats may still require division and rounding
  • the most precise approach is to keep samples as integers and avoid division when possible
  • sampling rate has a larger performance impact than sample size. modern processors handle 64 bit data well, but higher sampling rates multiply the number of iterations for every operation
  • higher sampling rates let you represent higher frequencies more accurately. for example, 96000 samples per second gives eight samples per cycle of a 12000 hz sine wave
  • sample values are typically centered at zero and range from minus one to one

how to handle multiple channels

multiple channels (for stereo or surround sound) can be stored either non interleaved (separate arrays per channel) or interleaved (channel samples alternating in one array). non interleaved storage is easier to process, while interleaved storage is more robust against playback interruptions since any input lag affects all channels equally. channel samples can be generated independently or generated once and then copied with modifications; independent generation allows more dynamic differences. sounds that are not centered often use different amplitudes, slight frequency shifts, or small delays per channel to simulate panning. common mixing controls attenuate one channel when panning left or right and leave both equal when centered.

file output

to record generated samples, write them to audio files. without real-time playback this process is similar to rendering in 3d graphics. wav files with 32 bit float samples are the most widely supported high quality format. flac uses integer samples, which requires rounding when converting from float. au is a very simple format that supports multiple channels, sample rates, and formats, has minimal overhead, and is easy to implement, but it is less widely supported than wav.

how to create transitions and easings

amplitude and frequency transitions (envelopes) can be represented by paths. a path is defined by a few control points and an interpolation method. paths can also be stored as discrete values in arrays; array lookup is faster than interpolation but uses more memory. you can combine paths with pointwise operations such as addition, composition, reverse, stretch, scale, and randomization of interpolation segments.

how to create noise

noise describes sound with a random distribution of many frequencies, for example percussive bursts, hissing, or wind. it is typically generated by sampling a uniform random number generator and then filtering to shape the frequency content. summing many sine waves can produce noise but is computationally expensive and requires handling phase interference effects.

random numbers

random numbers in software come from a pseudorandom generator that produces real or integer values in specified ranges. a uniform distribution assigns equal probability across the range and corresponds to white noise. other distributions are possible, including discrete distributions defined by custom arrays. see the distributions supported by the gnu scientific library for examples. generators use seed values so that the same sequence can be reproduced.

here is an example implementation of a random number generator for custom discrete probabilities in scheme:

cusum = (a, b...) ->
  # calculate cumulative sums from the given numbers.
  # (a b c ...) -> (a (+ a b) (+ a b c) ...)
  if b.length is 0 then [a]
  else [a].concat cusum(a + b[0], *b.slice(1))

random_discrete_f = (probabilities, state = random_state) ->
  ###
  (real ...) [random-state] -> procedure:{-> real}
  return a function that when called returns an index of a value in probabilities.
  each index will be returned with a probability given by the value at the index.
  each value is a fraction of the sum of probabilities.
  for example, if the values sum to 100, each entry in probabilities is a percentage.
  this allows for custom discrete random probability distributions.
  example usage:
  random_fn = random_discrete_f [10, 70, 20]
  samples = [random_fn(), random_fn()]
  ###

  cu_prob = cusum(probabilities[0], *probabilities.slice(1))
  total = cu_prob[cu_prob.length - 1]
  ->
    deviate = if state? and typeof state.random is 'function'
      state.random() * total
    else Math.random() * total
    for idx in [0...cu_prob.length]
      if deviate < cu_prob[idx]
        return idx

filtering

two particularly useful digital filter types are:

  • windowed sinc filter offers the best frequency separation, supports custom frequency responses, but has high processing cost
  • state variable filter has low processing cost and provides multiple outputs

statistics

samples can be analysed with statistical measures such as:

  • arithmetic mean: the average value
  • variance: the expected squared deviation of samples from their mean; measures how spread out values are around the average
  • kurtosis: how sharply peaked a distribution is relative to its width; normalized so a gaussian distribution has zero kurtosis
  • skewness: the asymmetry of distribution tails
  • correlation between bands: similarity of envelopes across different frequency bands
  • autocorrelation and cross correlation: measure linear similarity of two sample sets; useful for finding patterns in noise or testing for white noise

note the typical values for kurtosis and skewness are not limited to a range of 0 to 1

guided transforms

transform signals to match specific statistical properties using techniques such as:

  • targeted adjustment algorithms
  • general randomized algorithms like the monte carlo method or simulated annealing

what units to use

for frequencies, sample offsets and related calculations, you can use different units

time

  • measured in seconds or sample counts
  • sample counts do not change when the sample rate changes; durations shrink or stretch accordingly
  • seconds represent a fixed duration regardless of sample rate
  • seconds often map to decimal values that need rounding to align with the sample grid (for example, one second at 44100 hz is about 0.00002267573696 s per sample; at 48000 hz it is about 0.00002083333 s per sample)
  • integer sample counts are always exact

frequency

  • can be expressed in radians, hertz or samples per cycle
  • two pi radians equal one full cycle regardless of sample rate
  • a one hertz cycle spans a number of samples equal to the sample rate; hertz is based on seconds
  • if the sample rate is even, the maximum representable frequency in hertz is an integer; this is not always true for radians
  • expressing wavelength in sample counts always aligns exactly with the sample grid