2020-12-30

frequency resolution enhancement of short duration fast fourier transforms with information from long duration transforms

this page describes an experimental algorithm for enriching the time and frequency information of short time fourier transform series with frequency information from longer time fourier transform series.

algorithm

take fft series for whole input range
- multiple series for durations 2**min..2**max inclusively
- apply a hann window then zero pad each fft input of each duration to equal length
- window dependent hop-size
- start with zero padded input at hop-size minus one
- delete fft values below a threshold to denoise
- calculate magnitude change quantifier between each two fft results of same duration
combine fft results
- for each number of fft results of duration that fits into the time period of the next double duration fft result
- replace each range of bins with the corresponding bins of longer duration results after modifying the values
- the core point: linearly interpolate bin magnitude and imaginary values from lower to higher resolution where weight decreases with fft duration
- combined this way, the result is one fft series with a number of elements as for the starting duration
select peak frequencies for sines
subtract peak frequencies for residual noise

assumptions, properties and notes

change quantifier
- to give higher priority to durations with large volume or frequency changes
- treat fft bins as point vectors (index, magnitude) and sort by magnitude to get two sorted tuples. ((complex ...) (complex ...)) -> (((index magnitude) ...) ((index magnitude) ...))
- calculate relative change between elements of both tuples: ((change(index-a, index-b) + change(magnitude-a, magnitude-b)) / 2) -> (change ...)
- sum the change values and normalise by bin count to get a single change quantifier. (/ sum(change ...) bin-count)
choice of durations
- each doubled duration has double the frequency resolution and half the time resolution and adds details of two bins for what would be one bin in the half duration
- start with at least 8 because it is a power of two, 4 would after windowing leave only effective resolution of one bin. an fft on two samples can measure nothing but change because any difference could be a changing part of any frequency. a four sample fft from 44100 sample rate data could still say if a signal contains frequencies below or above 11025 hz. powers of two double with each step, that is why they are used for simplicity for analysing the relations between duration results
duration and frequency
- there are frequencies whose period would not fit into the frame. frequency_cycle_sample_width = sample_rate / frequency_hz
- if a long duration contains a very low frequency and many smaller fft are in this range, all smaller fft get this lower frequency added as low frequencies need long durations to even exist
- change is going to be high in short durations by frequency amplitudes themselves
combination
- because duration series can be weighted when they are combined, the algorithm can be seen as duration-adaptive
- the change quantifier is not yet actually used. it could be used to give less weight to higher resolution ffts with much change, so to prefer lower resolutions
- it might be useful to weigh by if full frequency cycle would fit into duration and completely prefer from duration where it was included
result
- the exact starting point of transients is only approximated depending on the shortest duration fft
- the fft durations used are fixed to power of two increments
assumption: duration and frequency resolution are linearly proportional, duration and time resolution are linearly anti-proportional, time resolution and frequency resolution are linearly anti-proportional
assumption: an fft gives information about all contained frequencies even for frequencies whose full cycle does not fit into duration, and the change in sample values is considered part of change of any frequency in the lower range
the hann window is used because it should work well with 1/2 overlap and give a compromise between rectangular window functions and functions that widen peaks and reduce sidelobe noise the most
short ffts are a low frequency resolution image
fft histogram: fft result bins without dc

example variables

sample-rate: 96000
input-size: sample-rate
durations: 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 96000
median duration: 512 or 1024 samples, 2 ** 9 or 2 ** 10, (5 + 1/3) or (10 + 2/3) miliseconds
average duration: 15138 samples, nearest 16384 which is 2 ** 14 and about 170 miliseconds
shortest duration: 8 samples, 0.083_ miliseconds

development ideas

align ffts more freely to areas where statistics show potential for interesting content
quantify desired time resolution and start/end time of fft areas
use low resolution ffts only for transient periods
predict continuation of frequency development
use comparison with input data to tune parameters

consideration

about splitting the signal first into bands

as a way to use ffts of different duration ranges shorter than low frequencies that have been filtered
filtering adds stopband noise, but so does the fft perhaps. but only frequency peaks needed
only a limited number of bandpasses until many transition bands overlap
may have to convert to fft format anyway to make it easier to calculate correlations