Jump start in object detection with Detectron2 (license plate detection)

As most of you probably encountered AI/ML problems can be somewhat confusing, so much so that people completely give up on solid ideas just because they don’t know how to proceed with it and build something that works and solves their problem. Well, today I’m gonna try and change that, today I will showcase a really quick way to build a proof-of-concept solution for your object detection problem. I’m going to show you how to build an object detection model using Detectron2 framework, which is probably the best object detection platform up to date.

In this tutorial I will build an object detection model that can recognize vehicle license plates. Just FYI — about a year ago I wrote a similar atricle, where we detected license plates heuristically (using OpenCV image transformations) without any real AI. This time I’m gonna revert to the same basic task at hand — detect license plates on photos. The problem becomes harder though, now all images are from various surroundings, plates are of different formats (European white/yellow, UK yellow/black, USA plates, Russian, Japanese, Chinese, Australian and etc). And this time we’re gonna build a real AI model to solve this!

The license plate detection problem is chosen just as an example, what I really want to demonstrate is how fast we can build an ML solution to object detection problem. Quickly going from rough idea to a pretty accurate model.

Without further adieu — let’s get started!

Big Data Jobs

Part 1 — Preparing the dataset

First and foremost — get your images! Doesn’t matter where they come from, just get your hands on at least 30–40 images for your problem. In this case I’m just gonna use a few images from my recent screenshots, and a few from google searches, and a few from Wiki media… You get the gist of it.

Dataset folder with downloaded images

Don’t rush to get hundreds and thousands of samples (leave it be until you reach closer to production stage), just FYI this demo project was trained on only 63 images. Of course it all depends on the complexity of your problem, but from my rough experience — 30 images will be a good start and 50–100 images will result in a decent proof-of-concept model.

Now with the images ready we can go ahead and start adding annotations to them. You got to remember that there are multiple annotation formats for object detection. There’s COCO, PASCAL VOC, YOLO, CreateML — it’s totally up to you which one of these annotations to use in your dataset. But again from personal experience, and to keep this tutorial as short and as fast as I can — we will use PascalVOC annotation format here.

We will use LabelImg tool to prepare annotations for our dataset. Everything’s pretty straight forward in here, just install LabelImg, fire it up and when in LabelImg open the directory where you dropped your images. We will use only one label plate for this dataset. Make sure PascalVOC format is selected, draw RectBox around the number plate and save your results.

Annotations in LabelImg

Part 2 — Adapter objects

Now to the fun part, we’re gonna need a few common objects for this project. I prefer building them around pydantic models, this way they can be easily integrated into a FastAPI web service when you decide to deploy your model. We’re gonna need to represent a rectangle, a selected region on the image with label, a single dataset sample, and a model’s prediction output. You can check the final version of these objects in helper_objects.py script.


Obviously we want to represent coordinates of our objects both in dataset and in predictions. We’re gonna go with a simple x-y-w-h rectangle for that.

class Rectangle(BaseModel):
x: float
y: float
w: float
h: float def __init__(self, x: float, y: float, w: float, h: float) -> None:
super().__init__(x=x, y=y, w=w, h=h)


In the dataset each object of interest will have its coordinates and a label.

class LabeledBox(BaseModel):
label: str
region: Rectangle # Relative coordinates, from 0.0 to 1.0 def __init__(self, label: str, region: Rectangle) -> None:
super().__init__(label=label, region=region)


Single complete dataset sample will consist of an image, a set of labeled boxes and a name.

class Sample(BaseModel):
name: str
image: Image
boxes: List[LabeledBox] class Config:
arbitrary_types_allowed = True def __init__(self, name: str, image: Image, boxes: List[LabeledBox]) -> None:
super().__init__(name=name, image=image, boxes=boxes)

P.S. class Config is needed to allow usage of PIL Image class in a pydantic model


Eventually a single model’s prediction will contain a label, a box and a score.

class Prediction(BaseModel):
label: str
score: float
region: Rectangle def __init__(self, label: str, score: float, region: Rectangle) -> None:
super().__init__(label=label, score=score, region=region)

Trending AI Articles:

1. Why Corporate AI projects fail?2. How AI Will Power the Next Wave of Healthcare Innovation?3. Machine Learning by Using Regression Model4. Top Data Science Platforms in 2021 Other than Kaggle

Part 3 — Loading the dataset

Our first task will be to load our annotated samples and parse them into our adapter Sample objects. The coding part here is pretty boring, it’s just XML parsing and file scanning, so I won’t comment much about it. You can check the completed final functions in utils_dataset_pascalvoc.py script.

Loading a single sample from provided image path and xml path:

def load_sample_from_png_and_pascal_voc_xml(image_file_path: str, xml_file_path: str) -> Sample:
image_pil = open_image_pil(image_file_path)
xml_file = open(xml_file_path, ‘r’)
xml_text = xml_file.read()
xml_file.close() name = [line for line in xml_text.split(‘n’) if ‘<filename>’ in line][0].replace(‘<filename>’, ”).replace(‘</filename>’, ”).strip()
boxes = [] objects = xml_text.split(‘<object>’)
objects = objects[1:]
for object in objects:
lines = object.split(‘n’)
line_name = [line for line in lines if ‘<name>’ in line][0]
line_xmin = [line for line in lines if ‘<xmin>’ in line][0]
line_ymin = [line for line in lines if ‘<ymin>’ in line][0]
line_xmax = [line for line in lines if ‘<xmax>’ in line][0]
line_ymax = [line for line in lines if ‘<ymax>’ in line][0] label = line_name.replace(‘<name>’, ”).replace(‘</name>’, ”).strip()
xmin = int(line_xmin.replace(‘<xmin>’, ”).replace(‘</xmin>’, ”).strip())
ymin = int(line_ymin.replace(‘<ymin>’, ”).replace(‘</ymin>’, ”).strip())
xmax = int(line_xmax.replace(‘<xmax>’, ”).replace(‘</xmax>’, ”).strip())
ymax = int(line_ymax.replace(‘<ymax>’, ”).replace(‘</ymax>’, ”).strip()) x = xmin / image_pil.width
y = ymin / image_pil.height
w = (xmax – xmin) / image_pil.width
h = (ymax – ymin) / image_pil.height region = Rectangle(x, y, w, h)
box = LabeledBox(label, region)
boxes.append(box) return Sample(name, image_pil, boxes)

Loading a single sample from provided name and folder path (with support for .png, .jpeg, .jpg image formats, although very rudely executed):

def load_sample_from_folder(image_and_xml_file_name: str, folder_path: str) -> Sample:
# Build image file path, trying different image format options
image_file_path = folder_path + ‘/’ + image_and_xml_file_name + ‘.png’
if not os.path.isfile(image_file_path):
image_file_path = image_file_path.replace(‘.png’, ‘.jpeg’)
if not os.path.isfile(image_file_path):
image_file_path = image_file_path.replace(‘.jpeg’, ‘.jpg’) # Build XML file path, and show warning if no markup found
xml_file_path = folder_path + ‘/’ + image_and_xml_file_name + ‘.xml’
if not os.path.isfile(xml_file_path):
print(‘load_sample_from_folder(): Warning! XML not found, xml_file_path=’ + str(xml_file_path))
return None # Load sample
return load_sample_from_png_and_pascal_voc_xml(image_file_path, xml_file_path)

And finally loading all images from a folder, with slight multithreading added to speed things up a bit (you’ll notice the performance difference after 200 images):

def load_samples_from_folder(folder_path: str) -> List[Sample]:
samples = [] # Get all files, strip their extensions and resort
all_files = os.listdir(folder_path)
all_files = [‘.’.join(f.split(‘.’)[:-1]) for f in all_files]
all_files = set(all_files)
all_files = sorted(all_files) # Load samples in parallel
executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
for sample in executor.map(load_sample_from_folder, all_files, repeat(folder_path)):
if sample is not None:
samples.append(sample) # Filter out None values
samples = [s for s in samples if s is not None] return samples

Part 4 — Conversion to Detectron2 dataset format

After we’ve parsed PascalVOC and loaded our Sample objects in memory we can now convert them to Detectron2’s expected format. You can check the resulting code directly in utils_dataset_detectron.py script.

Building mappings between labels and ids:

def build_labels_maps(samples: List[Sample]) -> (Dict[str, int], Dict[int, str]):
labels = []
for sample in samples:
for box in sample.boxes:
if box.label not in labels:
labels = sorted(labels)
labels_to_id_map = {}
id_to_labels_map = {}
for i in range(0, len(labels)):
labels_to_id_map[labels[i]] = i
id_to_labels_map[i] = labels[i]
return labels_to_id_map, id_to_labels_map

Convert from a Sample to Detectron2 Dict:

def convert_sample_to_detectron_dict(sample: Sample, labels_to_id_map: Dict[str, int], bbox_mode: BoxMode = BoxMode.XYWH_ABS) -> Dict:
# Generate ID, load image and save it in temp
id = generate_uuid()
image_pil = sample.image
image_path = save_image_pil_in_temp(image_pil, id) # Build common
file_name = image_path # the full path to the image file.
height = image_pil.height # integer. The shape of the image.
width = image_pil.width # integer. The shape of the image.
image_id = id # (str or int): a unique id that identifies this image. Required by many evaluators to identify the images, but a dataset may use it for different purposes.
annotations = [] # (list[dict]): Required by instance detection/segmentation or keypoint detection tasks. Each dict corresponds to annotations of one instance in this image, and may contain the following keys: # Build boxes
for box in sample.boxes:
x = int(box.region.x * width)
y = int(box.region.y * height)
w = int(box.region.w * width)
h = int(box.region.h * height) # Mask polygons
triangle_1 = [
x + w / 2, y + h / 2,
x, y,
x + w, y
triangle_2 = [
x + w / 2, y + h / 2,
x + w, y,
x + w, y + h
triangle_3 = [
x + w / 2, y + h / 2,
x + w, y + h,
x, y + h
triangle_4 = [
x + w / 2, y + h / 2,
x, y + h,
x, y
] bbox = [x, y, w, h] # (list[float], required): list of 4 numbers representing the bounding box of the instance.
bbox_mode = bbox_mode # (int, required): the format of bbox. It must be a member of structures.BoxMode. Currently supports: BoxMode.XYXY_ABS, BoxMode.XYWH_ABS.
category_id = labels_to_id_map[box.label] # (int, required): an integer in the range [0, num_categories-1] representing the category label. The value num_categories is reserved to represent the “background” category, if applicable.
segmentation = [triangle_1, triangle_2, triangle_3, triangle_4] annotation = {
‘bbox’: bbox,
‘bbox_mode’: bbox_mode,
‘category_id’: category_id,
‘segmentation’: segmentation
} annotations.append(annotation) return {
‘file_name’: file_name,
‘height’: height,
‘width’: width,
‘image_id’: image_id,
‘annotations’: annotations

Now to convert samples in bulk:

def convert_samples_to_detectron_dicts(samples: List[Sample]) -> List[Dict]:
labels_to_id_map, id_to_labels_map = build_labels_maps(samples)
detectron_dicts = []
for sample in samples:
d = convert_sample_to_detectron_dict(sample, labels_to_id_map)
return detectron_dicts

Part 5 — Detectron2 model: configuration, training, inference

With dataset converted and all preparations completed we can jump into some shared functions that build Detectron2 models. Again the complete code can be found in utils_model.py. For model configuration we leave the ability to change base model (given that you use another model from Detectron2’s model zoo) and tune prediction score threshold, learning rate, number of training iterations to run and batch size.

Building configuration:

def build_config(
model_zoo_config_name: str,
dataset_name: str, class_labels: List[str],
trained_model_output_dir: str,
prediction_score_threshold: float,
base_lr: float, max_iter: int, batch_size: int
) -> CfgNode:
trained_model_weights_path = trained_model_output_dir + “/model_final.pth”
cfg = get_cfg()
cfg.DATASETS.TRAIN = (dataset_name,)
cfg.OUTPUT_DIR = trained_model_output_dir
if os.path.exists(trained_model_weights_path):
cfg.MODEL.WEIGHTS = trained_model_weights_path
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = prediction_score_threshold
cfg.SOLVER.BASE_LR = base_lr
cfg.SOLVER.MAX_ITER = max_iter
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(class_labels)
return cfg

For training we basically just create a Trainer object with our configuration and run its train() method. You will see a lot of useful debug info in the console – obviously you’ll have output related to current training progress, but it will also show your model’s architecture and your labels distribution in the dataset.

def run_training(cfg: CfgNode):
trainer = DefaultTrainer(cfg)

When running models in inference mode we first need to create a Predictor object, run an image through this Predictor and convert its outputs to our desired format

def build_predictor(cfg: CfgNode) -> DefaultPredictor:
return DefaultPredictor(cfg)def convert_detectron_outputs_to_predictions(class_labels: List[str], outputs) -> List[Prediction]:
results = []
instances = outputs[“instances”].to(“cpu”)
pred_boxes = instances.pred_boxes
scores = instances.scores
pred_classes = instances.pred_classes
for i in range(0, len(pred_boxes)):
box = pred_boxes[i].tensor.numpy()[0]
score = float(scores[i].numpy())
label_key = int(pred_classes[i].numpy())
label = class_labels[label_key] x = box[0]
y = box[1]
w = box[2] – box[0]
h = box[3] – box[1]
region = Rectangle(int(x), int(y), int(w), int(h)) prediction = Prediction(label, score, region)
results.append(prediction) return resultsdef run_prediction(cfg: CfgNode, predictor: DefaultPredictor, class_labels: List[str], pil_image: Image, debug: bool = True, save: bool = False):
# Prep image
cv_image = convert_pil_to_cv(pil_image)
image_name = pil_image.filename.replace(‘dataset_test’, ”).replace(‘dataset’, ”).strip() # Run prediction and time it
t1 = time.time()
outputs = predictor(cv_image)
t2 = time.time()
d = t2 – t1 # Debug predictions
visualize_detectron_outputs(cfg, cv_image, image_name, outputs, debug=debug, save=save)
predictions = convert_detectron_outputs_to_predictions(class_labels, outputs)
print(‘run_prediction(): Testing “‘ + image_name + ‘” took ‘ + str(round(d, 2)) + ‘ seconds, and resulted in predictions ‘ + str(predictions))

Part 6 — Training script

After all this preparation work the main training script becomes surprisingly simple, we just load the dataset, convert it to dict, build model config and run training

# Configuration
dataset_dir = ‘dataset’
model_zoo_config_name = ‘COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml’
trained_model_output_dir = ‘training_output’
dataset_name = ‘license-plate-detection-dataset’
class_labels = [“plate”]
prediction_score_threshold = 0.9
base_lr = 0.0025
max_iter = 1000
batch_size = 64# Build dataset – load samples and filter out the ones with empty boxes
samples = load_samples_from_folder(dataset_dir)
samples = [s for s in samples if len(s.boxes) != 0]# Build dataset – convert to detectron format and provide its function and then register it
detectron_dicts = convert_samples_to_detectron_dicts(samples)
def dataset_function():
return detectron_dicts
register_detectron_dataset(dataset_name, class_labels, dataset_function)# Build detectron config & run trainer
cfg = build_config(model_zoo_config_name, dataset_name, class_labels, trained_model_output_dir, prediction_score_threshold, base_lr, max_iter, batch_size)

When the training finishes you’ll have a trained model in model_final.pth file inside your output directory. You can run this training script again and again – it will keep training and improving the model. If the model is already present in the output directory – training resumes from the last state of the model.

I think it’s also worth saying a few words about the hardware used for training — I trained this model on a regular consumer-grade personal gaming PC with one NVIDIA 2070 SUPER (8GB) GPU, Intel Core i5–10600K CPU and 32 GB RAM. Generally speaking any modern NVIDIA GPU with at least 8GB VRAM should be perfectly suitable for training Detectron2 models. I wouldn’t advise you training it on CPU, it’ll be unbearably slow. Allthough running models in inference mode on CPU can be feasable.

Part 7 — Testing script

Everything’s almost identical to the training script, just load the dataset, build config, build Predictor, load test image and run predictor.

# Configuration
dataset_dir = ‘dataset’
dataset_test_dir = ‘dataset_test’
model_zoo_config_name = ‘COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml’
trained_model_output_dir = ‘training_output’
dataset_name = ‘license-plate-detection-dataset’
class_labels = [“plate”]
prediction_score_threshold = 0.9
base_lr = 0
max_iter = 0
batch_size = 0# Build dataset – load samples and filter out the ones with empty boxes
samples = load_samples_from_folder(dataset_dir)
samples = [s for s in samples if len(s.boxes) != 0]# Build dataset – convert to detectron format and provide its function and then register it
detectron_dicts = convert_samples_to_detectron_dicts(samples)
def dataset_function():
return detectron_dicts
register_detectron_dataset(dataset_name, class_labels, dataset_function)# Build detectron config & predictor
cfg = build_config(model_zoo_config_name, dataset_name, class_labels, trained_model_output_dir, prediction_score_threshold, base_lr, max_iter, batch_size)
predictor = build_predictor(cfg)# Test images from training dataset
image_paths = [

]# Dataset test images
dataset_test_image_paths = os.listdir(dataset_test_dir)
dataset_test_image_paths = sorted(dataset_test_image_paths)
dataset_test_image_paths = [dataset_test_dir + ‘/’ + p for p in dataset_test_image_paths]# Merge test & train images
image_paths.extend(dataset_test_image_paths)# Run predictions
for image_path in image_paths:
image_pil = open_image_pil(image_path)
run_prediction(cfg, predictor, class_labels, image_pil, debug=False, save=True)

To finish this off here are a few more examples of predictions on images from testing set, this turned out to be better than expected, especially given the small dataset size.

That’s it for now, I hope this tutorial will help you in building your object detection models, or that it at least cleared up some confusion about jumping into this area of ML problems.

GitHub repo with source code for this tutorial

In case you’d like to check my other work or contact me:

Personal websiteGitHubPyPIDockerHubBlogLinkedIn (feel free to connect)

Don’t forget to give us your 👏 !


Jump start in object detection with Detectron2 (license plate detection) was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read MoreBecoming Human: Artificial Intelligence Magazine – Medium