Py-faster-rcnn, Caffe: power of vgg16 and GTX970

It feels great to exclaim that my research group lab AGV, has now got its own ML server, equipped with Intel Xeon 3.4 Ghz, Octa-core, 24GB RAM, 256GB SSD, 1TB HDD and finally the beast GTX970. This server has motivated me enough, that I am after deep learning now with more desperation. Within a day or two of its arrival, I installed CUDA, CuDNN, Caffe and to test it, cloned the py-faster-rcnn repository. My first choice to test the machine was the py-faster-rcnn as I have to work over it for my two projects:
1. Traffic sign recognition
2. Depth regression

I trained a VGG16 network using the PASCAL VOC dataset and modified the demo.py code accordingly. I clearly remember that it took 8 hrs to train the network, and I am still awestruck by the output it gave. Have a look:

 

screenshot-from-2016-10-14-01_41_51 screenshot-from-2016-10-14-01_41_44 screenshot-from-2016-10-14-01_41_33 screenshot-from-2016-10-14-01_42_07 screenshot-from-2016-10-14-01_41_58 screenshot-from-2016-10-14-01_41_18 screenshot-from-2016-10-14-01_41_04 screenshot-from-2016-10-14-01_40_50 screenshot-from-2016-10-14-01_42_47

 

Its clearly observable (check out the terminal screen shot) that it can be operated in real time i.e on board of any robot. Making some code changes, and dataset preprocessing of some custom dataset with different classes would enable this architecture to classify almost everything. Kudos to the object proposal pipeline, which is still the state of the art in object detection.

Let say if we train it over classes of traffic signs, signals, vehicles, person, animals, then it becomes a boon for self-driving cars as a single net like this would be able to handle all of these classes and label them. Plus it creates a bounding box i.e localizes the object within the image which after suitable transformations can also give position of object in real world roughly.

I will be soon writing the complete guideline to train a custom dataset over faster RCNN. There has been a lot of issues over this on github. I am still stuck at the last step of the process, but the day I solve it, I will have a really powerful architecture that would be able to classify almost anything.

One Response to “Py-faster-rcnn, Caffe: power of vgg16 and GTX970