about sound programming

facets of individual knowledge on this topic


related topics

analysis/synthesis of sound


model: amplitude, frequency


  • multiple fft of lengths that correspond to the frequency they measure (multiresolution stft)
  • apply a blackman window to remove edge discontinuities from the fft analysis
  • append zeros to the end of the fft input to increase its size and interpolate to smaller frequency spacings in the fft result to be able to select a more correct peak frequency
  • apply the fft with overlap and hops (0..10, 2..12, 4..14, and so on). each step corresponds to a measurement hop-size apart
  • extract peak frequencies of results


  • amplitude and frequency interpolation
  • sine generators


  • loud short (max about 50 ms) duration sounds
  • model: start and end index, phase, amplitude


  • detection with center of gravity of energy in signal windows and threshold. if the center is not in the middle then a transient is likely
  • center of gravity:
  • end is when it signal reached below half transient value




  • model: bands, band envelope: amplitude, mean, variance, skew, kurtosis


  • autocorrelation
  • cross-correlation
  • filter bank
  • envelope detection eventually with smoothing by low pass filter
  • calculate statistics on envelope

    • arithmetic mean: the average value
    • variance: variance is the expectation of the squared deviation of a random variable from its mean. informally, it measures how far a set of (random) numbers are spread out from their average value
    • kurtosis: measures how sharply peaked a probability distribution is, relative to its width. the kurtosis is normalized to zero for a gaussian distribution
    • skewness: measures the asymmetry of the tails of a probability distribution


  • random number generator samples
  • filter bank
  • band envelopes applied to bands

simple wave shapes


  • doesnt change angular direction abruptly
  • one full cycle in (2 * pi) radians
  • typically generated by

    • common sin() function using taylor series with optimisations

    • a lookup table where sine values are pre-calculated for a number of samples. benefit: faster way to get sine values and a common way to get sine values because of that. all frequencies whose amplitude values are available as elements can be generated precisely, others may need interpolation which wont be as precise. new lookup tables of different size can be created as needed

    • a fast, imprecise sine approximation function


  • triangle: linear change with fast change of direction
  • rectangle: change between values immediately
  • square: rectangle with equal sides
  • trapezoid: clipped triangle
  • saw: triangle with a rectangular angle


  • typically generated by samples from a random number generator and eventually applying filters to remove frequencies
  • there dont seem to be many other methods to create noise. summing many random sines possibly
  • no phase, no repeating event, no single frequency. instead bandwidth can be used to describe it
  • can be identified by autocorrelation that leads to an impulse result
  • a filter bank can be used to select frequency bands
  • it is less computationally expensive to filter noise to get a noise frequency distribution than to add something to get there. conversely, it is easier to add something to get a non-noise distribution than to extract it from noise

effects and operations

  • multiplication: shapes the amplitude of signals
  • division: shapes the amplitude of signals. dividing a signal by itself flattens the signal
  • subtraction: remove signals from each other
  • addition: adds signals to each other

frequency filtering


  • windowed-sinc

    • best for precise frequency removal
    • high computational effort because it typically uses convolution
    • use a in both directions decaying (windowed) sinc function for the convolution impulse response kernel
    • the longer the impulse response kernel, the smaller the transition bandwidth
    • addition of kernels: adding another filter into the stop-band of another
    • convolution of kernels: adding another filter into the pass-band of another
    • take values of the sinc function, window it with a blackman window which performs well in this context, and use the result as an impulse response of convolution with an input signal
    • when parameters change the kernel has to be adjusted
  • moving average

    • best for preserving time domain properties because of no shift between input and output signal


  • windowed-sinc low-pass with spectrally inverted or spectrally reversed impulse response
  • other options

    • subtract from current values the result of applying a moving average on the values

    • subtract from each value the preceeding value. two values that follow each other with similar intensity will tend to cancel each other out. variant: set the current output value to the previous value plus the current value minus the previous input value

band-pass and band-reject

  • can be created by a combination of windowed-sinc low-pass and high-pass.
  • band-pass: convolution of impulse response kernels

filter bank

  • an application of multiple band-pass filters. low/high-pass might be used at the edges

parametric equaliser

  • filter banks split a signal into bands and can be used to create a parametric equaliser where the bands are mixed after gain adjustment
  • center and width of bands as control parameters
  • if any control is unchanged, dont apply associated filter

discrete fourier transform

  • to analyse the frequency spectrum of a signal, typically a discrete fourier transform (dft) algorithm is used, particularly the fast fourier transform (fft)

  • converts from time domain to frequency domain

  • one result block describes the frequencies in the analysed block

  • the inverse fft (ifft) takes complex numbers like the fft returns and recreate a signal corresponding to the frequencies with the same length as the sampled block

  • input

    • array of real or complex numbers
  • output

    • array of complex numbers
    • index 0: dc, average of values
    • index > 0: hz
    • the output is unscaled and each magnitude value is times window_size. 1/input_size
    • values are complex numbers of which amplitude and phase of a sinusoid of a specific frequency can be derived of
    • indices correspond to the frequency of (index * 1 hertz)
    • for a sampling rate of 1000 (1000 samples per second) and an fft result over 1000 samples, the relevant output length is 501 samples
    • fast fourier transform: converts from time domain to frequency domain and the inverse. only works on blocks of samples and a continuous ifft usage for example is only possible with drawbacks. comes in different versions taking/returning real or complex numbers. the complex numbers contain phase information and are needed to recreate sounds with ifft
  • windowing: apply a blackman window to remove edge discontinuities from the fft input. input taken from a longer signal may be an abruptly starting or ending signal. this would introduce additional frequencies. to reduce these extra frequencies, windowing is used to reduce the amplitude of the signal at the sides. a window function goes from a low value (usually zero) at the beginnig of a given width to a higher value in the middle and than back to the low value at the end of the block. the blackman windows tends to perform well also in this context

  • zero padding: appending zeros to the end or both sides of an fft input block increases the number of result bins. with a higher number of bins, bins represent smaller frequency steps. with langer fft input size the frequencies in the input are not measured more exactly but results might be interpolated over more bins. zero padding can be used to reach a input size required by fft algorithms

  • overlap or hop: amplitude and frequency transition might not be sufficiently discovered as a fft returns the spectrum for a whole range. processing input in blocks spaced lesser apart than the fft input is wide leads to parts being reincluded in following fft input and increases time resolution. overlap is usually done like this: 0..200 100..300 200..400 ... . overlapping results can be recombined by fading out the volume of a previous block as the next block is faded in. each overlapped block is hop-size samples apart.

  • amplitude = 2 * complex_magnitude(fft_bin_value); phase = complex_angle(fft_bin_value); frequency = value_index

  • if no specific complex number functions for magnitude and phase are available, sqrt(real * real, imaginary * imaginary) can be used for magnitude, and atan(imaginary, real) for the angle. the version of atan to use is often called atan2

  • if fft input is short, the maximum recognisable frequency is low, but brief spectrum changes can be recognised

  • for fft and subsequent ifft values are often scaled by sqrt(input_size) fft output and sqrt(input_size) ifft output

  • fft with a blackman window can be used with an overlap factor of 0.661. taken from http://edoc.mpg.de/395068

  • sample-rate / fft-input-size = bin-hz

  • a wide window gives better frequency resolution but poor time resolution. a narrower window gives good time resolution but poor frequency resolution (bins represent larger frequency ranges)

grain control

  • a signal is split into small chunks which are transformed
  • example operations: randomise, repeat, reduce, effects on selected grains, swap
  • example parameters: grain-size, repetition
  • output can be shorter or longer than the input. this might lead to a growing length difference between the unprocessed and processed signal and require buffering. one option is to buffer samples and continue when longer grains have been played. processing could also skip or repeat input samples to stay in sync with the input signal
  • can be used to model tempo changing effects without change of pitch. repetition of parts is an audibly interesting effect

converging signals

two signals and the output shall fall in between

b - a = 100%
a + 0.5 * (b - a)


  • input samples are put out at a later time
  • input samples can be copied to the delay and at the same time played as eventual other delay output


can be created with many short delays with decaying volume and frequency characteristics

envelope detector

  • the time series samples almost are the envelope
  • envelopes are positively valued. either shift a -1..1 range signal to the positive by adding 1 or take only the positive part
  • a low pass filter can be used to close gaps and create a smoother envelope


sine+noise+transient models

sounds are analysed and vectorised into sine (frequency, amplitude, phase), noise (frequency bands, amplitude) and transient (like sines but higher resolution) information, and after eventual modification, resynthesised using sine and noise generators or inverse fourier transform


  • additive synthesis: every signal can be approximated by summing sinusoidials of appropriate frequencies. when adding different sines together, they can cancel each other out if they have similar amplitudes and correlated phase shifts. to avoid this, each sine can be added with decreasing amplitude
  • frequency modulation: frequency changes corresponding to the values of another signal, which creates new harmonics. "in the context of audio coding, fm synthesis can be considered a "lossy compression method" for additive synthesis"
  • subtractive synthesis: remove features from a signal with more frequencies or other features than desired. filters are usually used to shape the sounds

digital sound representation


  • pulse-code modulation / amplitude
  • spectrum: for example frequency in time range or multiple stacked lines of analysis results with colors used to indicate intensity

sample formats

  • float, fixed point, integer
  • floating-point representations are slower and less accurate than fixed-point representations, but they can handle a larger range of numbers easily
  • float can have many values added and then be scaled back into the desired range
  • the larger the underlying bit size of float values, the lower are rounding errors
  • calculation with floats is not trivial, for example summing of floats can lead to quickly accumulating large errors if no error compensation is used for the summing
  • integers might still have to be divided and become fractions
  • in digital to analog converters somewhere conversion to exact integer is done
  • if care is taken that all samples are created as integers and not divided, it would not be for general purpose signal processing but most precise
  • the larger the size, the more different values a sample can have, and the more different values can be in the signal

sampling rate

  • how many samples represent one second of sound
  • the higher the sampling rate, the more frequencies can be represented in the signal
  • the maximum representable frequency is half the sampling_rate, as any higher frequency would be spaced in smaller duration than the sample values represent


multiple separate sound channels like stereo channels are typically stored in one of two ways:

  • non-interleaved: each channel is stored in a separate sample array. for example for three channels 1 1 1, 2 2 2, 3 3 3
  • interleaved: the samples for one index in all channels are stored together in packets. for example for three channels: 1 2 3 1 2 3 ...
  • non-interleaved can be easier to process and interleaved can be more robust to playback interruptions as for example a lag tends to affect all channels at the same time


radians and hertz

two pi radians are one full sine cycle. one hertz is one full cycle per second. one hertz equals sampling-rate number of samples. the maximum representable frequency in hertz is (sampling-rate / 2), an integer if sampling-rate is even.

samples or seconds

sample count is an exact integer measure for a progressing time value, durations or signal widths for example. the use of seconds can be problematic because the length of one sample might be represented with inexact numbers. for example 1 / 44100 = 0.000022675736961451248 and 1 / 48000 = 0.000020833333333333333 seconds do not depend on the sampling rate.


  • stateless: need only parameter values that need not be kept
  • stateful: for example with carryover values that fall outside the currently processed range
  • sample to sample: can affect multiple samples only with state and delay after accumulating samples
  • segment to segment: processors might depend on preceding or following values
  • one to many: output creates a longer signal
  • many to one: a longer signal becomes a shorter one

digital music making tools

  • vst, lv2 plugins or recording environment tools as effects and synthesisers
  • analog hardware devices for synthesizers with keyboards or sequencing
  • software sequencers like fruity loops where instruments, synthesiser sounds, recordings and effects can be layed out and played back
  • taking sound samples and prepare their ordered playback in a program like ardour or cubase
  • using puredata to generate sounds and connect hardware devices via note and trigger values that midi transmits and with graphical interfaces that puredata can create

basic toolset for digital signal processing

  • fftr to extract frequency information
  • a way to plot frequency and time domain information
  • features to create arrays with samples
  • file input output (dac/adc, digital to analog / analog to digital, where the analog signal often goes to speakers or comes from microphones)
  • path drawing with lines, bezier curves and arcs for envelopes
  • sine function
  • windowed sinc and moving average filters
  • filter bank
  • uniformly distributed random number generator for noise


  • frequency: number of occurrences of a repeating event per unit time
  • amplitude: the amplitude of a periodic variable is a measure of its change over a single period
  • phase: relative shift or progression of a repeating event
  • envelope: for example a path of loudness transition
  • sequencer: specifies the starting points and durations of sounds
  • fundamental dimensions of sound: pressure, time, direction
  • physically sound is air pressure changes in longitudinal waves. physical sound is often created by energy dissipation over molecular bonds. excited solids that oscillate as energy is distributed. the resulting sound is influenced by the mixture of materials, shape, surface tension and more
  • a sound program is a potential for sounds to occur. elements can appear and disappear like entering and exiting a dimension
  • there is always some kind of seed, for example the literal arguments in program code
  • playback equipment is limited in what it can produce. for example a square wave can not be perfectly reproduced for example because there are inertial effects in playback equipment
  • when sequencing of sounds, starting points can be specified relative to a shared starting point like the beginning of a song or relative to the previous note
  • incidence intervals and durations are one way to look at sequencing
  • quantisation: with a many-to-few mapping of values, for example when converting analog signals to digital, multiple input values map to the same values and have to be rounded which reduces imprecision and can lead to aliasing where multiple values converge into one
  • clipping: limiting values to a maximum
  • wave < instrument < composition
  • low frequency: less change between samples
  • "a real discrete-time signal is defined as any time-ordered sequence of real numbers. similarly, a complex discrete-time signal is any time-ordered sequence of complex numbers"
  • a vocoder splits into frequency bands and removes bands
  • every linear time-invariant system can be represented by a convolution
  • "a time series on wikipedia is a series of data points indexed (or listed or graphed) in time order. most commonly, a time series is a sequence taken at successive equally spaced points in time."