Object detection and segmentation is the most important and challenging fundamental task of computer vision. It is a critical part in many applications such as image search, scene understanding, etc. However it is still an open problem due to the variety and complexity of object classes and backgrounds.
The easiest way to detect and segment an object from an image is the color based methods . The object and the background should have a significant color difference in order to successfully segment objects using color based methods.
Simple Example of Detecting a Red Object
In this example, I am going to process a video with a red color object and create a binary video by thresholding the red color. (Red color area of the video is assigned to '1' and other area is assigned to '0' in the binary image so that you will see a white patch wherever the red object is in the original video)
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <iostream>
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
using namespace cv;
using namespace std;
int main( int argc, char** argv )
{
VideoCapture cap(0); //capture the video from web cam
if ( !cap.isOpened() ) // if not success, exit program
{
cout << "Cannot open the web cam" << endl;
return -1;
}
namedWindow("Control", CV_WINDOW_AUTOSIZE); //create a window called "Control"
int iLowH = 0;
int iHighH = 179;
int iLowS = 0;
int iHighS = 255;
int iLowV = 0;
int iHighV = 255;
//Create trackbars in "Control" window
cvCreateTrackbar("LowH", "Control", &iLowH, 179); //Hue (0 - 179)
cvCreateTrackbar("HighH", "Control", &iHighH, 179);
cvCreateTrackbar("LowS", "Control", &iLowS, 255); //Saturation (0 - 255)
cvCreateTrackbar("HighS", "Control", &iHighS, 255);
cvCreateTrackbar("LowV", "Control", &iLowV, 255); //Value (0 - 255)
cvCreateTrackbar("HighV", "Control", &iHighV, 255);
while (true)
{
Mat imgOriginal;
bool bSuccess = cap.read(imgOriginal); // read a new frame from video
if (!bSuccess) //if not success, break loop
{
cout << "Cannot read a frame from video stream" << endl;
break;
}
Mat imgHSV;
cvtColor(imgOriginal, imgHSV, COLOR_BGR2HSV); //Convert the captured frame from BGR to HSV
Mat imgThresholded;
inRange(imgHSV, Scalar(iLowH, iLowS, iLowV), Scalar(iHighH, iHighS, iHighV), imgThresholded); //Threshold the image
//morphological opening (remove small objects from the foreground)
erode(imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
dilate( imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
//morphological closing (fill small holes in the foreground)
dilate( imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
erode(imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
imshow("Thresholded Image", imgThresholded); //show the thresholded image
imshow("Original", imgOriginal); //show the original image
if (waitKey(30) == 27) //wait for 'esc' key press for 30ms. If 'esc' key is pressed, break loop
{
cout << "esc key is pressed by user" << endl;
break;
}
}
return 0;
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
You can download this OpenCV visual c++ project from here.
Explanation
The following image shows how a color image is represented using 3 matrices.
How BGR image is formed |
Usually, one can think that BGR color space is more suitable for color based segmentation. But HSV color space is the most suitable color space for color based image segmentation. So, in the above application, I have converted the color space of original image of the video from BGR to HSV image.
HSV color space is also consists of 3 matrices, HUE, SATURATION and VALUE. In OpenCV, value range for HUE, SATURATION and VALUE are respectively 0-179, 0-255 and 0-255. HUE represents the color, SATURATION represents the amount to which that respective color is mixed with white and VALUE represents the amount to which that respective color is mixed with black.
In the above application, I have considered that the red object has HUE, SATURATION and VALUE in between 170-180, 160-255, 60-255 respectively. Here the HUE is unique for that specific color distribution of that object. But SATURATION and VALUE may be vary according to the lighting condition of that environment.
Hue values of basic colors
- Orange 0-22
- Yellow 22- 38
- Green 38-75
- Blue 75-130
- Violet 130-160
- Red 160-179
How to find the exact range of HUE, SATURATION and VALUE for a object is discussed later in this post.
After thresholding the image, you'll see small white isolated objects here and there. It may be because of noises in the image or the actual small objects which have the same color as our main object. These unnecessary small white patches can be eliminated by applying morphological opening. Morphological opening can be achieved by a erosion, followed by the dilation with the same structuring element.
Thresholded image may also have small white holes in the main objects here and there. It may be because of noises in the image. These unnecessary small holes in the main object can be eliminated by applying morphological closing. Morphological closing can be achieved by a dilation, followed by the erosion with the same structuring element.
Now let's discuss new OpenCV methods in the above application.
- void inRange(InputArray src, InputArray lowerb, InputArray upperb, OutputArray dst);
Checks that each element of 'src' lies between 'lowerb' and 'upperb'. If so, that respective location of 'dst' is assigned '255' , otherwise '0'. (Pixels with value 255 is shown as white whereas pixels with value 0 is shown as black)
Arguments -
- InputArray src - Source image
- InputArray lowerb - Inclusive lower boundary (If lowerb=Scalar(x, y, z), pixels which have values lower than x, y and z for HUE, SATURATION and VALUE respectively is considered as black pixels in dst image)
- InputArray upperb - Exclusive upper boundary (If it is upperb=Scalar(x, y, z), pixels which have values greater or equal than x, y and z for HUE, SATURATION and VALUE respectively is considered as black pixels in dst image)
- OutputArray dst - Destination image (should have the same size as the src image and should be 8-bit unsigned integer, CV_8U)
- void erode( InputArray src, OutputArray dst, InputArray kernel, Point anchor=Point(-1,-1), int iterations=1, int borderType=BORDER_CONSTANT, const Scalar& borderValue=morphologyDefaultBorderValue() )
This function erode the source image and stores the result in the destination image. In-place processing is supported. (which means you can use the same variable for the source and destination image). If the source image is multi-channel, all channels are processed independently and the result is stored in the destination image as separate channels.
Arguments -
- InputArray src - Source image
- OutputArray dst - Destination image (should have the same size and type as the source image)
- InputArray kernel - Structuring element which is used to erode the source image
- Point anchor - Position of the anchor within the kernel. If it is Point(-1, -1), the center of the kernel is taken as the position of anchor
- int iterations - Number of times erosion is applied
- int borderType - Pixel extrapolation method in a boundary condition
- const Scalar& borderValue - Value of the pixels in a boundary condition if borderType = BORDER_CONSTANT
- void dilate( InputArray src, OutputArray dst, InputArray kernel,
- Point anchor=Point(-1,-1), int iterations=1,
- int borderType=BORDER_CONSTANT,
- const Scalar& borderValue=morphologyDefaultBorderValue() );
- InputArray src - Source image
- OutputArray dst - Destination image (should have the same size and the type as the source image)
- InputArray kernel - Structuring element which is used to dilate the source image
- Point anchor - Position of the anchor within the kernel. If it is Point(-1, -1), the center of the kernel is taken as the position of anchor
- int iterations - Number of times dilation is applied
- int borderType - Pixel extrapolation method in a boundary condition
- const Scalar& borderValue - Value of the pixels in a boundary condition if borderType = BORDER_CONSTANT
- void cvtColor( InputArray src, OutputArray dst, int code, int dstCn=0 )
This function convert a source image from one color space to another. In-place processing is supported. (which means you can use the same variable for the source and destination image)
- InputArray src - Source image
- OutputArray dst - Destination image (should have the same size and the depth as the source image)
- int code - Color space conversion code (e.g - COLOR_BGR2HSV, COLOR_RGB2HSV, COLOR_BGR2GRAY, COLOR_BGR2YCrCb, COLOR_BGR2BGRA, etc)
- int dstCn - Number of channels in the destination image. If it is 0, number of channels is derived automatically from the source image and the color conversion code.
All other OpenCV methods in the above application have been discussed in early OpenCV tutorials.
Simple Example of Tracking Red objects
In the previous example, I showed you how to detect a color object. In the following example, I'll show you how to track a color object. There are 3 steps involving to achieve this task.
- Detect the object
- Find the exact position (x, y coordinates) of the object
- Draw a line along the trajectory of the object
Here is how it is done with OpenCV / C++.
#include <iostream>
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
using namespace cv;
using namespace std;
int main( int argc, char** argv )
{
VideoCapture cap(0); //capture the video from webcam
if ( !cap.isOpened() ) // if not success, exit program
{
cout << "Cannot open the web cam" << endl;
return -1;
}
namedWindow("Control", CV_WINDOW_AUTOSIZE); //create a window called "Control"
int iLowH = 170;
int iHighH = 179;
int iLowS = 150;
int iHighS = 255;
int iLowV = 60;
int iHighV = 255;
//Create trackbars in "Control" window
createTrackbar("LowH", "Control", &iLowH, 179); //Hue (0 - 179)
createTrackbar("HighH", "Control", &iHighH, 179);
createTrackbar("LowS", "Control", &iLowS, 255); //Saturation (0 - 255)
createTrackbar("HighS", "Control", &iHighS, 255);
createTrackbar("LowV", "Control", &iLowV, 255);//Value (0 - 255)
createTrackbar("HighV", "Control", &iHighV, 255);
int iLastX = -1;
int iLastY = -1;
//Capture a temporary image from the camera
Mat imgTmp;
cap.read(imgTmp);
//Create a black image with the size as the camera output
Mat imgLines = Mat::zeros( imgTmp.size(), CV_8UC3 );;
while (true)
{
Mat imgOriginal;
bool bSuccess = cap.read(imgOriginal); // read a new frame from video
if (!bSuccess) //if not success, break loop
{
cout << "Cannot read a frame from video stream" << endl;
break;
}
Mat imgHSV;
cvtColor(imgOriginal, imgHSV, COLOR_BGR2HSV); //Convert the captured frame from BGR to HSV
Mat imgThresholded;
inRange(imgHSV, Scalar(iLowH, iLowS, iLowV), Scalar(iHighH, iHighS, iHighV), imgThresholded); //Threshold the image
//morphological opening (removes small objects from the foreground)
erode(imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
dilate( imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
//morphological closing (removes small holes from the foreground)
dilate( imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
erode(imgThresholded, imgThresholded, getStructuringElement(MORPH_ELLIPSE, Size(5, 5)) );
//Calculate the moments of the thresholded image
Moments oMoments = moments(imgThresholded);
double dM01 = oMoments.m01;
double dM10 = oMoments.m10;
double dArea = oMoments.m00;
// if the area <= 10000, I consider that the there are no object in the image and it's because of the noise, the area is not zero
if (dArea > 10000)
{
//calculate the position of the ball
int posX = dM10 / dArea;
int posY = dM01 / dArea;
if (iLastX >= 0 && iLastY >= 0 && posX >= 0 && posY >= 0)
{
//Draw a red line from the previous point to the current point
line(imgLines, Point(posX, posY), Point(iLastX, iLastY), Scalar(0,0,255), 2);
}
iLastX = posX;
iLastY = posY;
}
imshow("Thresholded Image", imgThresholded); //show the thresholded image
imgOriginal = imgOriginal + imgLines;
imshow("Original", imgOriginal); //show the original image
if (waitKey(30) == 27) //wait for 'esc' key press for 30ms. If 'esc' key is pressed, break loop
{
cout << "esc key is pressed by user" << endl;
break;
}
}
return 0;
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
You can download this OpenCV visual c++ project from here.
Object Tracking |
Explanation
In this application, I use moments to calculate the position of the center of the object. We have to calculate 1st order spatial moments around x-axis and y-axis and the 0th order central moments of the binary image.
0th order central moments of the binary image is equal to the white area of the image in pixels.
- X coordinate of the position of the center of the object = 1st order spatial moment around x-axis / 0th order central moment
- Y coordinate of the position of the center of the object = 1st order spatial moment around y-axis / 0th order central moment
In the above application, I considered that if the white area of the binary image is less than or equal to 10000 pixels, there are no objects in the image because my object is expected to have an area more than 10000 pixels.
Now, let's discuss new OpenCV methods that can be found in the above application.
- Moments moments( InputArray array, bool binaryImage=false )
- InputArray array - Single channel image
- bool binaryImage - If this is true, all non zero pixels are considered as ones when calculating moments.
- void line(Mat& img, Point pt1, Point pt2, const Scalar& color, int thickness=1, int lineType=8, int shift=0)
This function draws a line between two points on a given image
- Mat& img - image which you want to draw the line
- Point pt1 - First point of the line segment
- Point pt2 - Other point of the line segment
- const Scalar& color - Color of the line (values of Blue, Green and Red colors respectively)
- int thickness - Thickness of the line in pixels
- static MatExpr zeros(Size size, int type)
This function returns a black image (with pixels with zero values) with a given size and type.
- Size size - Size of the required image ( Size(No of columns, No of rows) )
- int type - Type of the image (e.g - CV_8UC1, CV_32FC4, CV_8UC3, etc)
How to Find Exact Range for 'Hue', 'Saturation' and 'Value' for a Given Object
- Track bars should be placed in a separate window so that ranges for HUE, SATURATION and VALUE can be adjusted. And set the initial ranges for HUE, SATURATION and VALUE as 0-179, 0-255 and 0-255 respectively. So, we will see a complete white image in the 'Control' window.
- First, adjust 'LowH' and 'HighH' track bars so that the gap between 'LowH' and 'HighH' is minimized. Here you have to be careful that white area in 'Ball' window that represents the object should not be affected, while you are trying to minimize the gap.
- Repeat the step 2 for 'LowS' and 'HighS' trackbars
- Repeat the step2 for 'LowV' and 'HighV' trackbars
Now you can find the optimum HUE, SATURATION and VALUE ranges for the object. It is 163-179, 126-217 and 68-127 in my case as you can see in the below picture.
Next Tutorial : Object Detection & Shape Recognition using Contours
Previous Tutorial : Rotate Image & Video