AFOD Open Source: Major Small-Object Detection Upgrade for SpireCV
background
On September 30, 2024, the official Ultralytics team announced the official release of YOLOv11, marking another major upgrade of the YOLO series of real-time target detectors, and also marking the rapid development of target detection.


In the field of small target detection, due to its poor visual characteristics and high noise, it has long been Target detection a difficulty in it. This is especially true in UAV application scenarios. Due to the high flying height of UAV, there are often Lots of small goals, there are few features that can be extracted, and due to the large fluctuations in UAV flight height, the proportion of objects changes drastically, resulting in The difficulty of detection increases sharply; Moreover, there are many complex scenes in the actual flight perspective, and there will be a lot of occlusion between dense small targets, which can easily be blocked by other targets or the background.
principle
The AFOD algorithm, namely AutoFocusObjectDetector, is SpireCV's new open-source algorithm designed for small target detection from the UAV perspective. The Chinese name is Attention Target Detection. The following is the detection of distant vehicle targets using the AFOD algorithm combined with the GX40 pod without zooming (the pixels are much smaller than 32x32).

The main advantage of attention target detection is to take into account both small target detection accuracy and frame rate performance. It is divided into two stages in time sequence:
-
- Global target search, generally 1280×1280 resolution
-
- After searching for the target, enter the sub-region detection stage, usually with a resolution of 640×640
-
The details are shown in the figure below:

This detector requires the input of two general target detectors, one for full-image search and the other for sub-region search. The type of target to be detected will be defined on the specific data set, and the category information and pixel position of the target (enclosed rectangular box) will be output.
The relevant configuration parameters are detailed as follows:
-
lock_thres: How many consecutive frames the same target is detected and enters sub-area detection. The default is 5 frames.
-
unlock_thres: How many consecutive frames the target is lost in the sub-area, returning to global detection, the default is 5 frames
-
lock_scale_init: The control parameter of the initial sub-region size, specifically a multiple of the width of the target pixel, the default is 12 times
-
lock_scale: Control parameter of sub-region size (after stable tracking of sub-region), default is 8 times
-
categories_filter: Filter target name. If it is empty, no filtering will be performed. The filter target names are as follows:[“person”, “car”]
-
keep_unlocked: Whether to output targets that are not automatically noticed, not output by default (false)
-
use_square_region: Whether it is a square area during initial detection. If so, for non-square input images, white space on both sides will not be detected. It is not used by default (false)
Universal object detector:
The two general target detectors used by the AFOD algorithm this time are the target detector models (640x640, 1280x1280) trained on the visdrone2019 det data set by yolov11s and yolov11s6. The following is the detection effect achieved by yolov11s6 combined with the 10x optical zoom of the GX40 pod when the P600 UAV hovers at an altitude of 40 meters. It is not difficult to find that the detector can effectively identify vehicles within 1,600 meters and pedestrians within 1,400 meters.
use

