Detecting objective through Detectron2.

5 min readNov 20, 2020

What is Detectron2 ?

Detectron 2 is a framework for building state of the art object detection and image segmentation models. It is developed by the Facebook Research team.
It is powered by the [PyTorch] deep learning framework. ncludes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc…
Here we benchmark the training speed of a Mask R-CNN in detectron2, with some other popular open source Mask R-CNN implementations, this based on: https://detectron2.readthedocs.io/notes/benchmarks.html.

Installing Detectron2

I have used Detectron2 an alpha stage Installing through colab ( as notebook above) which details as bash script on Colab: Install dependencies: (use cu101 because colab has CUDA 10.1)

!pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html !pip install cython pyyaml==5.1!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'import torch, torchvisionprint(torch.__version__, torch.cuda.is_available())!gcc --versions
!pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html

Install Detectron2

Face Detection Data

Dataset is hosted on Kaggle. This description:

Images marked with bounding boxes. Have around 500 images with around 1100 faces manually tagged via bounding box.

Must be download the JSON file containing the annotations and uploaded it to Google Drive.

!gdown --id 1K79wJgmPTWamqb04Op2GxW0SW9oxw8KS
## After that, need to read json file:
faces_df = pd.read_json('face_detection.json', lines = True)

Data Preprocessing:

This can see via below gist:

https://gist.github.com/vannguyen3007/696af39057936d0c67c06d98c680c20f

Then you have a dataset for training model that need to check and reuse them:

df= pd.DataFrame(dataset)

print(df.file_name.unique().shape[0], df.shape[0])

That have a total of 409 images ( a lot less than the promised 500 ) and 1132 annotations. Save via csv file:

df.to_csv('annotations.csv', header = True, index = None)

Explore Data Analysis

Using OpenCV to load an image, add the bounding boxes, and resize it. Making a function to help it:

Showing some annotated images:

Next step, we can use torchvision to create a create a grid of images:

Face Detection with Detectron2

Converting every annotation row to a single record with a list of annotations. We should build a polygon that is of the exact same shape as the bounding box. This is required for the images segmentation models in Detectron2.

Need to prepare coco_eval for evaluation model. Fine-tuning a Detectron2 model will load a configuration file, change a few values, and start the training process.

To use the Mask R-CNN X101-FPN model. This is pre-trained on the coco_dataset and achieves very good performance.

Config file as here:

Standard stuff ( batch size , max number of iterations , learning rate )

WARMUP_ITERS -the learning rate starts from 0 and goes to the preset one for this number of iterations.
STEPS — the checkpoints (number of iterations) at which the learning rate will be reduced by GAMMA

Need to specify the number of classes and the period at which we’ll evaluate on the test:

cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(classes)cfg.TEST.EVAL_PERIOD = 500

Define to the number of classes

Time to train, using on custom trainer:

os.makedirs(cfg.OUTPUT_DIR, exist_ok = True)trainer = CocoTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

Custom trainer

Evaluating Object Detection Models.

What's the best source for tech and programming news?

Generally Intersection over union (IOU) is a measure of overlap between two bounding box. In computer vision it is used for correctly detecting an object. To know object detection first you have to know about object localization. Object localization refers figuring out where is the object in the picture and showing it with rectangular box.

Work of IOU

An algorithm predict bounding box on the basis of object,

Predicted bounding box is judged correctly if the IOU is greater than 0.5, i.e IOU ≥ 0.5, it’s just a human convention you can also choose some other threshold like 0.6 or more for accurate results.

And if the predict bounding box and ground truth bounding box overlapped perfectly than the IOU=1.

In object detection there is a problem that an algorithm detect multiple bounding boxes for a single object. To solve this problem there is a technique call non-max suppression.

Non-max suppression cleans up the multiple detection and end with just one detection per object. For this it chooses the bounding box with highest probability and suppressed all the other bounding boxes whose IOU with it is greater, so in last only one bounding box is left which is more accurate.

Download for pre-trained model this bash script:

!gdown --id 18Ev2bpdKsBaDufhVKf0cT6RmM3FjW3nL!mv face_detector.pth output/model_final.pth

You can set a minimum threshold of 85% certainty at which we’ll consider the predictions as correct as correct. As well as , let’s run the evaluator with the trained model: