In filmmaking and television post-production, you’ll often use motion-tracking techniques with the help of software and their object-tracking algorithms. Object tracking is an application in computer vision that enables systems to take information to track an object's position, but it's a process that's expanded to other fields over the years.
In this article, I'll discuss object tracking, starting from its core definition and expanding on its applications in filmmaking and television. You’ll learn about object tracking, the types of object tracking, how object tracking algorithms work, and how to implement object tracking for single and multiple objects in a video using the Boris FX Mocha Pro object tracker.
Let’s dive in!
What is Object Tracking
Object tracking is a process in computer vision where an algorithm detects an object in an image or video footage and then predicts the object's future position in a sequence of images or frames in a video. That prediction allows the program to use an object-tracking algorithm to track the movement of the object detected.
Computer vision programs recognize and analyze people and objects in the footage, running algorithms or AI models trained with deep learning using visual data, allowing them to recognize patterns and use machine learning to identify images in visual content.
Object tracking involves both object detection and tracking algorithms. Object detection is limited to identifying an object defined by the user by surrounding it with a bounding box (the tracker) on a reference frame. Object tracking then uses that information from the detected object to predict its movement and follow it accurately throughout the video footage.
Once the object is defined, the object tracking algorithm can follow its movement across time in a video, space in video and images, or even across multiple camera angles in the footage. It can also identify and follow one or more objects in an image or video.
How Object Tracking Works
Let's start by giving an input image to the program, such as video footage or a picture. Some programs can even track footage from a real-time camera.
Next, tell the program which objects to analyze for tracking. Select an object from the footage and create a bounding box. It will serve as a tracker later.
The object tracking algorithm will analyze and classify the object with a unique ID or label. It can label multiple items with object detections.
The final step is to monitor the identified object as it moves across several frames while collecting the needed information (tracking data) associated with it for later use.
Types of Object Tracking
Two main types of object tracking algorithms exist depending on the purpose and the type of input images they are trained on. Then, each object tracking type can be divided into two other types based on the number of objects they track.
Let’s start by understanding the difference between the two main object tracking types: image and video tracking.
An image-based object tracking type identifies and tracks an object in a 2D environment, defining the object and using the bounding box coordinates to predict and track its position in the image. It works best with input images with high contrast, few patterns, and discernible variations between the detected object and the rest of the items in the frame.
Image tracking places the target object in a virtual setting in the augmented reality field. It allows users to see how the object will look in another scenario. For example, the app IKEA Place allows customers to virtually place furniture at their home and move the camera without losing track of the object to see its surroundings.
The most popular type of object tracking is video tracking. In video tracking, the object tracking algorithms identify objects in video footage and estimate their movement using the bounding box coordinates to track the object in time (frame by frame) and space (location in the frame) using the previous location and the next one to predict future movement.
Video tracking is also known as real-time object tracking because it can track live footage besides recorded video files.
Video object tracking is used in video editing, traffic monitoring, self-driving cars, and security due to its ability to process real-time footage.
Based on how many items the objects tracking algorithm follows, it is possible to categorize video and image tracking into two levels or types: single object tracking and multiple object tracking.
Single Object Tracking
A single object tracking, or visual object tracking, will only focus on one target object in an image or video at a time. The object tracking algorithm identifies the defined object in the bounding box and keeps track of that single object in the following frames, regardless of other similar objects present in the image.
Single object tracking is used for basic tracking systems where you need to get tracking data from one element, such as a person, a vehicle, a building, etc.
Multiple Object Tracking
Multiple object tracking is a more intricate process than single object tracking. This type of object tracking identifies and tracks multiple objects in a video footage or image.
First, you need to define each object of interest in the frame with bounding boxes to allow the algorithm to label each object separately. Then, the algorithm estimates each object's future position and keeps track of all of them simultaneously, frame by frame, until no more frames are left or the object exits the frame.
The target objects don't need to be of the same class to be tracked by the object tracking algorithm. Multiple object tracking can track vehicles in car races, runners, swimmers, and autonomous vehicles.
Object Tracking Issues
The ideal object-tracking algorithm is one that only requires object detection once on the reference frame, is fast in detecting and keeping track of the object under challenging scenarios, and retraces the object if lost. Although object tracking has improved over the years, some common issues exist when using object tracking models.
These are the most common object-tracking issues in your video footage or images.
Crowded backgrounds make it harder for the object tracker to detect, identify, and follow the object around the image, especially for small objects that are the same color as the background, such as clothing. A solution can be to use a single-color background or blur it with the help of other tools to allow the object tracker to detect and track your desired object accurately.
Occlusion is probably the most common issue when tracking objects, and it occurs when the object of interest is lost due to overlapping with other elements in the image, which can be caused by a similar object entering the screen or our target being obstructed by another object. Occlusion can be solved by re-defining the object in a different frame and tracking footage.
Variations in an image's lighting can affect the object and make it look different, especially in real-life footage. If it can't do it automatically, the object tracking application may require help understanding what happened to the object.
Avoid using low-resolution footage, as it'll make it harder to identify objects for tracking. Since tracking software uses pixel information, having fewer will make the tracking inconsistent and inaccurate.
Changes in the object's scale can cause the tracker to lose the object. However, object tracking systems are trained to find these variations and adjust the bounding boxes in the identified object.
The object's shape can also change across the footage and affect the object-tracking process. Many tracking systems can overcome this with advanced skew, rotation, and perspective tracking options.
Footage moving fast can confuse the object tracker significantly, especially if it adds motion blur. Adjusting tracking options will be essential to achieve good object tracking.
Object Tracking Uses
There are endless ways of using object tracking. Here are some real-life uses where object tracking is applied.
Surveillance: the ability of video object tracking in real-time makes it perfect for using object tracking in surveillance systems to monitor subjects and activities captured by the camera.
Traffic monitoring: identify and track cars and people.
Retail stores: track customers and items they pick up.
Self-driving vehicles: use object tracking to identify traffic, obstacles, and people, estimate the best route, and avoid car accidents.
Other sectors that have embraced object tracking are healthcare, construction, and agriculture.
Object tracking is employed in the video editing industry to:
Remove unwanted objects like wires, backgrounds, tattoos, etc.
Add graphics such as logos or labels. For example, to add each participant's name and nationality in an international competition.
Change the color of the tracked object.
Track special effects.
Video Object Tracking with Mocha Pro
In video editing, object tracking is essential for motion tracking, rotoscoping, and VFX. Most video editing software includes an object tracker for motion tracking, Final Cut Pro, After Effects, Fusion, and more.
In this tutorial, I'll show you how to use Boris FX Mocha Pro's object tracker for video object tracking.
Step 1: Add Mocha Pro to the Video Clip
Apply Mocha Pro as an effect in your video editing software and click Mocha Pro UI to Mocha Pro’s workspace.
Step 2: Define the Object
In Mocha Pro's interface, define the object to track. Choose a frame in the video clip where you can see the object and create a box around the object. You can choose any x-splines tools and shapes to create your bounding box.
Step 3: Tracking Options
In the Track motion options, choose what information you want to track from the object between transition, scale, rotation, skew, and perspective.
Step 4: Track Video
Press the track button (backward or forward) and wait until Mocha Pro automatically identifies the object in the following frames and tracks it for the rest of the clip.
Step 5: Export the Video Object Tracking Data
To use the tracking data information on your video editing software, simply close Mocha Pro and click Save. Once back in your video editing software workspace, go to Mocha Pro’s settings.
Single and Multi-Object Tracking
Mocha Pro's object tracker can track several items in a video clip. In some scenarios, you want to track more than one object to isolate two cars in a car race scene for an action movie, to change the color of different elements in the video footage, or to label several items in the scene.
Step 1: The process for single and multi-object tracking with Mocha Pro is very similar. Start by defining the number of objects you need to track in the scene.
Step 2: Choose a frame and create a shape to add to the object tracker. Occasionally, an object must be identified in a different frame than the others. When this happens, make sure you turn off the tracking for the other layers that have been tracked by clicking on the gear icon. Then, proceed to track the new object. Keep in mind your computer's CPU and GPU limitations when tracking multiple objects, especially for longer video clips.
Step 3: Remember to label each layer to keep all your tracked objects organized. The tracking data with its label will be on Mocha Pro's settings from your video editing software.
Object tracking is a technique that can have multiple purposes in different fields. In video editing, it helps us track one or more objects in video footage to predict and track their movement to add special effects, remove or add objects, or use color correction tools. Despite being used in different sectors, object tracking algorithms serve the same purpose: defining a target (object detection), estimating motion, and object tracking.
With Boris FX Mocha Pro, you can access an award-winning object tracking tool that comes as a standalone application, as a plug-in, or inside your Continuum and Sapphire effects to create outstanding special effects.
What is the difference between object tracking and object detection?
Object detection focuses on identifying, locating, and classifying objects in images or video frames. The object needs to be visible to detect it, and object tracking uses the information from detection to estimate where those objects' future locations will be in the following frames. Predicting a location is possible even if the object is not visible in the subsequent frames.