I’ve been working on an old project (first mention, last update) of mine, a realtime fast-fourier transform on audio input. A spectograph, in other words. You can look at some of the ancient screenshots (1, 2) (which were in fact made shortly after that post–well over a year ago) to see what it used to look like, but I’ve made lots of improvements. I might post new screenshots tomorrow.
The last I posted, in August of 2007, RealtimeFFT had been recoded in C++ using RtAudio, FFTW, and Allegro. Speeds were around 30 fps (but would sometimes jump to 100 fps due to the use of an apparently-buggy Allegro preview version) when I began working on it again a few days ago. I’ve expanded on its speed in this configuration by doing several things.
One of these improvements is multithreading. rtfft now uses multiple threads to calculate its FFTs. This allows the four separate FFTs to be performed simultaneously on a four-core processor like the Q9450.
I also fixed a major bug in the way audio was read from the stream. Previously, a chunk of audio was sometimes skipped or, even worse, zeroed. I don’t know quite why this was happening; I gave up trying to debug the horribly ugly code I’d written and basically just rewrote the audio input side of the program.
The color scheme has gotten a major, major update. It looks pretty now, unlike the red-orange colors you see linked above. In addition, the output looks much more solid and less fuzzy.
I’m now using four FFTs–two for each stereo channel–and multiplying and adding the results from them to give a much more accurate graphical representation. Previously I wasn’t even using stereo data.
I finally realized that having Allegro blit a backbuffer to the screen was exceedingly slow in some cases. I switched versions of Allegro from the apparently-bugged preview release I had been using (interestingly resulted in a speed increase of almost 50 percent in itself), and then began using AllegroGL. Frame rates reach about 225 fps even with all the added FFT calculations. Removing some of these added calculations (by performing only one channel of the stereo FFTs, for instance) causes the speed to increase to 300 fps. I’ve seen it hit 450 when not scrolling the screen.
I’m now in the process of analyzing the code to see what else can be done to improve speeds. Using an extremely fast set of timer functions that I wrote today (something less than 0.0000012 milliseconds per call, but as that’s too small to matter I stopped trying to measure it), I’ve determined that most of the time is spent blitting the image to create the scroll–something that I assume can be done much, much faster using OpenGL acceleration and by making the backbuffers be video bitmaps instead of system-RAM bitmaps. Speaking of OpenGL acceleration, I have yet to test whether it is faster for OpenGL to draw the bitmap as a texture, or to have AllegroGL blit the bitmap to screen.
In any case, as far as I can tell I have only reached approximately half of the data-limiting rate. 44100 samples per second, 2 channels * 2 bytes per sample is 176,400 bytes per second, and I’m reading them off the buffer in chunks of 256. That’s supposedly a data rate of 690 frames per second(!).
Also, the latency has decreased. Previously, in the 30 fps version, a sound emitted through the speakers would show up on the display anywhere between 60 and 200 milliseconds later, depending on when the frame happened to be drawn. Now the delays are much smaller–it looks to be around 20-50 milliseconds but I haven’t measured it.
Tomorrow I’m pulling off some more challenges–I’m going to switch to 32-bit samples, better utilize hardware OpenGL acceleration and threads, and possibly switch to Portaudio, which may or may not be faster. There’s always the possiblity of using DirectSound directly, but I’d like to keep as much cross-platform compatibility as possible. Also tomorrow is some testing to make sure it’s displaying frequencies absolutely correctly–I’m a bit suspiscious of its algorithm for deciding where a particular pixel will be drawn. FFT results corresponding to frequencies between pixels may not be drawn at all, now that I think about it….