The rodent whisker system is a prominent experimental subject for the study of sensorimotor integration and active sensing. As a result of improved video-recording technology and progressively better neurophysiological methods, there is now the prospect of precisely analyzing the intact vibrissal sensori motor system. The vibrissae and snout analyzer (ViSA), also noted as BWTT, a widely used algorithm based on computer vision and image processing, has been proven successful for tracking and quantifying rodent sensorimotor behavior, but at a great cost in processing time.
Unfortunately, the ViSA processing rate lags far behind the data-generation rate of modern cameras. The acceleration of the whisker-tracking algorithm could speed up behavioral and neurophysiological research considerably. It could also become the cornerstone for supporting online whisker tracking, which shall not only eliminate the need for maintaining large storage to keep raw videos, but shall also allow novel experimental paradigms based upon real-time behavior.
In order to accelerate this offline algorithm and eventually employ it for online whisker tracking (less than 1 ms/frame latency), we have explored various optimizations and acceleration platforms, including OpenMP multithreading, NVidia GPUs and Maxeler Dataflow Engines.
Our experimental results indicate that the optimal solution for an offline implementation of ViSA is currently the OpenMP-based CPU execution. By using 16 CPU threads, we achieve more than 4,500x speedup compared to the original Matlab serial version, resulting in an average processing latency of 1.2 ms/frame, which is a solid step towards real-time (and online) tracking. Analysis shows that running the algorithm on a 32-thread-enabled machine can reduce this number to 0.72 ms/frame, thereby enabling real-time performance. This will allow direct interaction with the whisker system during behavioral experiments.
In conclusion, our approach shows that a combination of software optimizations and the careful selection of hardware platform yields the best performance increase.
For detailed information please refer you our respective publication: LINK
There is strong experimental interest in online tracking of live subjects. For the online mode, the recording device will create batches of 1K-frame images each second and stream them into our processing system. The initial thought was to only use DFEs to make use of its advantage in stream processing. However, evaluation on the C-DFE accelerated version indicates that DFE is not able to provide sufficient hardware resources to satisfy the online processing goal, i.e. the FPGA runs out of resources. On the other hand, the OMP-accelerated version has processing speed that is adequate for real-time processing. Therefore, a strategy to instigate a powerful multi-core CPU will be enough for the online-processing requirement. Its potential deployment shown above. The recording facility will generate batches of images and transfer them to the work node through Ethernet cables. Then, the OMP-accelerated version will process the input batch of frame images and generate output that will be sent out through Ethernet.
This project is still ongoing!