JK Jung's blog

Quick link: jkjung-avt/hand-detection-tutorial

I came accross this very nicely presented post, How to Build a Real-time Hand-Detector using Neural Networks (SSD) on Tensorflow, written by Victor Dibia a while ago. Now that I’d like to train an TensorFlow object detector by myself, optimize it with TensorRT, and deploy it on Jetson TX2, I immediately thought about following Victor’s example and train a hand detector. However, when I started to follow Victor’s post and do the training, I immediately found not only were there quite a few missing dots, but some of the information was also out-dated.

So I decided to work on the hand-detector problem by myself. And I wanted to create a tutorial which is up-to-date and easy to follow. After quite some hard work, I was finally able to put together all necessary code/scripts and create this tutorial.

Note that, not like TensorFlow’s Quick Start documentation (which starts by describing how to train an object detection model on Google Cloud Platform), my goal was to train the model locally using my own PC/server. Let’s dive in.

Training an object detector is more demanding than training an image classifier. Ideally, you should have a decent NVIDIA GPU for this task. As stated in my jkjung-avt/hand-detection-tutorial/README.md, I used a good desktop PC with an NVIDIA GeForce GTX-1080Ti, running Ubuntu Linux 16.04, to do the training.

Make sure you have your training PC/server ready and a recent version of TensorFlow is properly installed on it. (Reference: Install TensorFlow)

Please clone my GitHub repository: jkjung-avt/hand-detection-tutorial. And refer to the README.md within. All steps required to train the hand detector are listed there already. I’d just add a few words about some of the steps here.

So far I have implemented and tested ssd_mobilenet_v1_egohands and ssd_inception_v2_egohands. The ssd_mobilenet_v1_egohands, set to train for 20,000 steps, took a little bit over 2 hours to train on my desktop PC (GTX-1080Ti). Its loss was around 2.5 at the end of training, and the ‘coco_detection_metrics’ evaluation result was as follows. The IoU 0.50 mAP value was 0.968, which was very good. It basically means, if we use the trained model to inference on images coming from the same distribution, the model could detect hands at both very high precision and very high recall!

The high mAP result could be verified if you use TensorBoard to check the output of the eval.sh script.

  1. I shall deploy my trained hand detector (SSD) models onto Jetson TX2, and verify the accuracy and inference speed.

  2. I shall write something about how to adapt code in this tutorial to other datasets.

  3. I wanted to test other object detection models, including Faster R-CNN and Mask R-CNN, from Tensorflow detection model zoo. Hopefully, I would be able to do that and share more soon.

I made every effort in coding and writing this tutorial, so that it could be very easy to follow. It should be straightforward to adapt this tutorial/code to other object detection models and other datasets.

After all, I really spent a lot of time reading/writing code and developing this tutorial. If you find it useful, please help to share it with more people who might be interested. Meanwhile, I do appreciate people giving me stars on GitHub. That motivates me to write more posts/sharing.

Stars on my GitHub repo

Link nội dung: https://hnou.edu.vn/avt-hoa-a11375.html