implementation of digital music creation software utilities

related to digital signal processing

example implementations, for several of the things mentioned, can be found as c code in sph-sp

basic input/output

typically what is created is a series of sample values that can be interpreted as pressure at time. the sample values are typically processed as arrays of samples that are segments of the total output to save the memory used at any given time. processing sample by sample likely incurs a huge overhead from setting up parameters required for generating each sample and individual operations are usualy applied on blocks of samples

what sample formats and rates to use

  • possible sample formats are for example floating point, fixed point or integer
  • library functions usually use 64 bit ieee floating point (for example sin and fft functions). a large sample format works against rounding errors
  • floating point is slower and less accurate than integer or fixed point but can handle a very large range of values. but calculations with floats are not trivial, for example summing of floats can lead to quickly accumulating large errors, especially if no error compensation is used for the summation
  • when integers are used they might still have to be divided and become fractions that have to be rounded
  • ideally most precise would be taking care that all samples are created as integers and not divided
  • the sampling rate tends to have a much higher performance impact compared to sample size. most current process work a lot with 64 bit values anyway, but sampling rate on the other hand multiplies the necessary iterations for all operations
  • a high sampling rate can represent higher frequency sounds with more samples. 96000 samples per second allows to still approximate a 12000 hz sine with 8 samples
  • sample values are usually centered at zero and go from -1 to 1

how to handle multiple channels

multiple sample series are created for for stereo sound and sound from more directions. samples for each channel are typically either stored non-interleaved in separate arrays, or interleaved with each channel alternated like (1 2 3 1 2 3 ...) non-interleaved tends to be easier to process and interleaved can be more robust with interruptions on playback, as for example an input lag would affect all channels at the same time. the samples for multiple channels can be generated completely independent, or they can be generated once, copied and modified. the former allows for more dynamic changes. a sound that isnt fully centered has differing amplitudes on different channels. slight frequency modifications and delays might also be applied to a sound between channels, especially when the peak volume transitions between channels. common sound software has panning knobs that attenuate the volume for one channel when turned to either side and attenuate nothing when set in the center. this is mostly for two channels and of limited use as it usually affects a whole group of sounds equally

file output

to record generated samples they can be written to files. without real-time playback, this is similar to rendering in 3d modeling software. wav with 32 bit float samples is probably the most widely supported high-quality format. flac uses integer samples so there is some lossy conversion when converting from floating point format samples. au is maybe the simplest format. it supports enough channels, sample rates and formats, has extremely small overhead and importantly, is extremely easy to implement. unfortunately, it is not commonly supported as well as wav

how to create transitions and easings

amplitude, frequency envelopes and any other kind of transition can be represented by paths. paths can be created from a few intermediate points and interpolation for values between the points. paths can also be stored as discrete values in arrays. array access is much faster than interpolation but arrays tend to use more memory. paths can be combined with custom pointwise operations, for example addition. composition, reverse, stretch, scale and interpolation segment randomisation are some interesting operations

how to create noise

the term noise as used here describes sound with a random distribution of many frequencies. noisy sounds are initial bursts of percussive instruments, hissing, wind - practically anything that is not clearly separatable frequencies and harmonics, and the world has a lot of that. noise is typically created with samples of a uniform random number generator that are then optionally filtered to attenuate undesired frequency ranges. there do not seem to be many other effective methods for creating noise, especially noise with custom frequency content. summing many sines is possible but computationally intensive and the amplification and cancellation effects of summing phase shifted sines has to be taken into account

random numbers

random numbers in software are typically taken from a random number generator which allows to get samples for real or integer numbers in specified ranges. values might occur with equal probability across the whole range, which is the common uniform distribution and corresponds to white noise. but other probability distributions are possible, as well as discrete distributions defined by custom arrays. see the distributions supported by the gnu scientific library for example. random number generators usually start with seed values that determine all future numbers - that means by using seed values, the random series is repeatable

here is an example implementation of a random number generator for custom discrete probabilities in scheme:

(define (cusum a . b)
  "calculate cumulative sums from the given numbers.
   (a b c ...) -> (a (+ a b) (+ a b c) ...)"
   (cons a (if (null? b) null (apply cusum (+ a (car b)) (cdr b)))))

(define* (random-discrete-f probabilities #:optional (state *random-state*))
  "(real ...) [random-state] -> procedure:{-> real}
   return a function that when called returns an index of a value in probabilities.
   each index will be returned with a probability given by the value at the index.
   each value is a fraction of the sum of probabilities.
   for example, if the values sum to 100, each entry in probabilities is a percentage.
   this allows for custom discrete random probability distributions.
   example usage:
   (define random* (random-discrete-f (list 10 70 20)))
   (define samples (list (random*) (random*)))"
  (let* ((cuprob (apply cusum probabilities)) (sum (last cuprob)))
    (lambda ()
      (let (deviate (random sum state))
        (let loop ((a 0) (b cuprob))
          (if (null? b) a (if (< deviate (first b)) a (loop (+ 1 a) (tail b)))))))))


two particularly useful digital filter types are:

  • windowed sinc: best frequency separation, custom frequency responses possible, high processing cost
  • state-variable: low processing cost, multiple filter outputs

see filtering


samples can be analysed with statistics. some statistical values that could be calculated:

  • arithmetic mean: the average value
  • variance: the expectation of the squared deviation of a random variable from its mean. informally, it measures how far a set of numbers are spread out from their average value
  • kurtosis: measures how sharply peaked a probability distribution is, relative to its width. the kurtosis is normalized to zero for a gaussian distribution
  • skewness: measures the asymmetry of the tails of a probability distribution
  • correlations between bands: how similar are the envelopes between different frequency bands
  • autocorrelation and cross-correlation calculate if two sets of samples tend to fall onto a line. can be used to find specific sounds among noise or test for white noise

note: the definition of typical values for kurtosis and skewness are unfortunately not straightforward, for example because they are not real numbers in the range 0..1

guided transforms

transform signals to match a set of specific statistical properties.

for example, using

what units to use

for frequencies, sample offsets and related calculations, multiple alternative units are possible


  • for example in seconds or number of samples
  • times in sample counts do not change with the sample rate and durations become shorter or longer if the sample rate changes. whereas times in seconds represent the same time regardless of sample rate
  • but the second durations are likely to map to decimal values that can not be precisely represented on the sample grid and have to be rounded. for example 1s / 44100hz = 0.000022675736961451248s per sample and 1s / 48000hz = 0.000020833333333333333s
  • integer sample counts are always sample-exact


  • for example radians, hertz or a wavelength in number of samples
  • two pi radians are one full sine cycle regardless of sample rate
  • a full cycle of a one hertz sine has sample-rate number of samples. hertz is defined on the basis of seconds. if the sample rate is even then the maximum representable frequency in hertz is an integer. the same is not true for radians
  • as for time, frequencies expressed as the sample count of the wavelength always map exactly to the sample grid