2019-02-18

this page describes an algorithm for enriching the time and frequency information of short time fourier transform series with frequency information from longer time fourier transform series.

take fft series for whole input range

- multiple series for durations
`2**min..2**max`

inclusively - apply a hann window then zero pad each fft input of each duration to equal length
- window dependent hop-size
- start with zero padded input at hop-size minus one
- delete fft values below a threshold to denoise
- calculate magnitude change quantifier between each two fft results of same duration

- multiple series for durations
combine fft results

- for each number of fft results of duration that fits into the time period of the next double duration fft result
- replace each range of bins with the corresponding bins of longer duration results after modifying the values
- linearly interpolate bin magnitude and imaginary values from lower to higher resolution where weight decreases with fft duration
- combined this way, the result is one fft series with a number of elements as for the starting duration

- select peak frequencies for sines
- subtract peak frequencies for residual noise

change quantifier

- to give higher priority to durations with large volume or frequency changes
- treat fft bins as point vectors (index, magnitude) and sort by magnitude to get two sorted tuples. ((complex ...) (complex ...)) -> (((index magnitude) ...) ((index magnitude) ...))
- calculate relative change between elements of both tuples: ((change(index-a, index-b) + change(magnitude-a, magnitude-b)) / 2) -> (change ...)
- sum the change values and normalise by bin count to get a single change quantifier. (/ sum(change ...) bin-count)

choice of durations

- each doubled duration has double the frequency resolution and half the time resolution and adds details of two bins for what would be one bin in the half duration
- start with at least 8 because it is a power of two, 4 would after windowing leave only effective resolution of one bin. an fft on two samples can measure nothing but change because any difference could be a changing part of any frequency. a four sample fft from 44100 sample rate data could still say if a signal contains frequencies below or above 11025 hz. powers of two double with each step, that is why they are used for simplicity for analysing the relations between duration results

duration and frequency

- there are frequencies whose period would not fit into the frame.
`frequency_cycle_sample_width = sample_rate / frequency_hz`

- if a long duration contains a very low frequency and many smaller fft are in this range, all smaller fft get this lower frequency added as low frequencies need long durations to even exist
- change is going to be high in short durations by frequency amplitudes themselves

- there are frequencies whose period would not fit into the frame.
combination

- because duration series can be weighted when they are combined, the algorithm can be seen as duration-adaptive
- the change quantifier is not yet actually used. it could be used to give less weight to higher resolution ffts with much change, so to prefer lower resolutions
- it might be useful to weigh by if full frequency cycle would fit into duration and completely prefer from duration where it was included

result

- the exact starting point of transients is only approximated depending on the shortest duration fft
- the fft durations used are fixed to power of two increments

- assumption: duration and frequency resolution are linearly proportional, duration and time resolution are linearly anti-proportional, time resolution and frequency resolution are linearly anti-proportional
- assumption: an fft gives information about all contained frequencies even for frequencies whose full cycle does not fit into duration, and the change in sample values is considered part of change of any frequency in the lower range
- the hann window is used because it should work well with 1/2 overlap and give a compromise between rectangular window functions and functions that widen peaks and reduce sidelobe noise the most
- short ffts are a low frequency resolution image
- fft histogram: fft result bins without dc

- sample-rate: 96000
- input-size: sample-rate
- durations: 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 96000
- median duration: 512 or 1024 samples, 2 ** 9 or 2 ** 10, (5 + 1/3) or (10 + 2/3) miliseconds
- average duration: 15138 samples, nearest 16384 which is 2 ** 14 and about 170 miliseconds
- shortest duration: 8 samples, 0.083_ miliseconds

- align ffts more freely to areas where statistics show potential for interesting content
- quantify desired time resolution and start end time of fft areas
- use low resolution ffts only for transient periods
- predict continuation of frequency development
- use comparison with input time/magnitude data to tune parameters

about splitting the signal first into bands

- as a way to use ffts of different duration ranges shorter than low frequencies that have been filtered
- filtering adds stopband noise, but so does the fft perhaps. but only frequency peaks needed
- only a limited number of bandpasses until many transition bands overlap
- may have to convert to fft format anyway to make it easier to calculate correlations