circuitprofessor.com

path to learn electronics

Introduction to computer vision

ByMithila kumanjana

Jun 10, 2023
computer vision

Imagine ! with your eyes closed, a world in which computers are able to see and understand their surroundings in the same ways that people do. Sounds fun, no? That’s what we’re going to look into together.

computer vision

Have you ever thought about that how your mobile phone recognizes your face? or have you ever thought about how Facebook suggest when we tag others when we upload a photo?.how does the robot understand the environment?

Computer vision comes to play.

Before diving into computer vision, we need to know the basics about images. then we can move into coding. we use Python language and the OpenCV library. we will cover the most of topics about computer vision in this article

Topic list (click to jump )

  1. image formats
  2. colours
    1. RGB
    2. HSV
  3. resolution
  4. spyder and miniconda
  5. Reading images
  6. imread flags
  7. processing images 1.0
  8. imwrite()
  9. Changing colourspace
  10. Finding colours
  11. Real-time colour detection

image formats

first, let’s look at what the image is. In computer vision, we deal with digital photos. Then what is the digital photo?JPEG, PNG, GIF, TIFF and RAW are the basic image formats. If you consider a video, it is actually a series of pictures. we going to process videos also.

colours

We work with ones and zeros because the world is digital. it means that binary integers are used to collect data. Additionally, the primary colours used in the digital world are red, green, and blue. By combining these primary colours, we can get secondary and other colours.

in this article, we mainly talk about two colour models.

RGB colour model

To better comprehend the RGB colour scheme, let’s open Paint.

computer vision

We can make new colours by increasing the values for the RED, BLUE, and GREEN sections. Generally, we can set each section’s maximum value to 255 (decimal).and the minimum value is 0 (decimal). because we use an 8-bit colour scheme. that means we use 8 bits for the RED value and another 8 bit for BLUE and another 8 bit for green.

coorRED value(8 bit)GREEN value(8 bit)colour(8 bit)
Red25500
Green02550
Blue00255
White255255255
Black000
Orange2551650
Purple1280128

go back to the topic list

HSV colour model

The H stands for Hue, S stand for saturation and V stands for value.

computer vision

Hue is used to describing the color itself, like red, blue, or green. Think of a colour wheel with all the hues arranged in a circle on it. Where colour is on this wheel is indicated by its hue. Consider a rainbow as an illustration; each colour indicates a different hue.

Saturation is a term used to describe a colour’s intensity or purity. Consider it to be the intensity or vibrancy of a hue. Fully saturated colours appear rich and pure. However, as the saturation falls, the colour begins to appear more washed out or faded.

Value describes a colour’s brightness or darkness. It controls how a colour appears to our eyes—how light or dark. If we consider shades of gray, for instance, the value defines whether it is light gray or a dark gray. The value in the HSV colour model runs from black to white, and changing it affects the colour’s overall brightness.

In the next parts, we’ll talk more about this HSV model.

go back to the topic list

The resolution

now let’s talk about resolution.

computer vision

Think about Square Rule Notebook. You can see little squares if you zoom in on your photograph. and colours are added to those squares. in computer vision, we name each pixel by a combination of columns and rows.

[ column, raw ].

computer vision

go back to the topic list

Spyder and Miniconda

We use Spyder ide and Miniconda to run our Python programs.

you have to run these commands in your Anaconda terminal

conda create --name spyder5_1env --clone base 
conda activate spyder5_1env
conda install spyder-kernels=2.3
conda install numpy
pip install opencv-python

then you have to enter the Python interpreter

computer vision

go back to the topic list

Reading Images

Let’s write our first Python code in computer vision. First, you can create your own image using software like Paint or any other graphics editing software of your choice. then name as test1.jpg.(it is better to use low resolution image )

import cv2  # Import the OpenCV library for computer vision tasks
import matplotlib.pyplot as plt  # Import the matplotlib library for plotting images
import numpy as np  # Import the numpy library for numerical operations

BGR_im = cv2.imread("test1.jpg") 

 # Read the image file "test1.jpg" and store it in BGR_im
# Here, BGR_im represents the image in the BGR (Blue-Green-Red) color format

cv2.namedWindow("BGR image", cv2.WINDOW_NORMAL) 
 
# Create a named window to display the BGR image
# cv2.WINDOW_NORMAL allows resizing of the window if necessary

cv2.imshow("BGR image", BGR_im)  


# Display the BGR image in the created window
# The image will be displayed using the OpenCV library's imshow() function

print(type(BGR_im))
print(BGR_im.shape)

cv2.waitKey(0) & 0xFF  

# Wait for a key press to continue the execution
# The waitKey() function waits indefinitely for a key event to occur.
# The returned value is bitwise ANDed with 0xFF to ensure compatibility across different platforms

cv2.destroyAllWindows() 

 # Close and destroy all the previously created windows
# This ensures that all the windows created by imshow() are closed properly and memory is released

Note: it is better to use low-resolution images for testing our first computer vision practical. (like 200 * 100 jpg). you know that the image is a combination of arrays.

We define the variable as BGR_im because when we use the OpenCV library to display an image, it reads the image in the reverse order of the colour layers, namely Blue-Green-Red (BGR) instead of the more commonly used Red-Green-Blue (RGB) format. Therefore, to accurately represent the image and avoid colour distortions, we store it in the BGR_im variable.

this is what your output looks like.

computer vision

A 3-dimensional numpy array brings up the variable “BGR_im”. It represents a three-layer or three-channel image. There are 100 rows and 200 columns in each channel. The blue colour channel is represented by the first layer, the green colour channel by the second layer, and the red colour channel by the third layer.

computer vision

Furthermore, we can utilize the ‘matplotlib.pyplot’ library to display our image. when using this library, the image is read in the proper Red-Green-Blue (RGB) order. This means that the colour channels are interpreted in the correct sequence. not like opencv.

let’s move into the next practical in our computer vision journey.

go back to the topic list

cv2.imread(filename[,flags])

we can read images under three different conditions.

Integer flagflagdescription
1cv2.IMREAD_COLORLoads a color image. Default
0cv2.IMREAD_GRAYSCALELoads image in grayscale mode
-1cv2.IMREAD_UNCHANGEDLoads a colour image. Default

In certain computer vision scenarios, we encounter situations where we need to read an image in a different manner.

go back to the topic list

Processing images practical 1.0

let’s split the RGB layers and then print it as our second computer vision practical.

import cv2  # Import the OpenCV library
import matplotlib.pyplot as plt  # Import the matplotlib library
import numpy as np  # Import the numpy library

bgr_im = cv2.imread("test1.jpg")  # Read the image file "test1.jpg" and store it in the variable bgr_im
# The image is read in BGR (Blue-Green-Red) format by default

[B, G, R] = cv2.split(bgr_im)  # Split the BGR image into its individual color channels

# Create named windows to display the images
cv2.namedWindow("BGR image", cv2.WINDOW_NORMAL)
cv2.namedWindow("B_channel", cv2.WINDOW_NORMAL)
cv2.namedWindow("G_channel", cv2.WINDOW_NORMAL)
cv2.namedWindow("R_channel", cv2.WINDOW_NORMAL)

# Display the images in their respective windows
cv2.imshow("BGR image", bgr_im)
cv2.imshow("B_channel", B)
cv2.imshow("G_channel", G)
cv2.imshow("R_channel", R)

cv2.waitKey(0) & 0xFF  # Wait for a key press to continue the execution

# The waitKey() function waits indefinitely for a key event to occur.
# The returned value is bitwise ANDed with 0xFF to ensure compatibility across different platforms

cv2.destroyALLWindows()  

# Close and destroy all the previously created windows
# This ensures that all the windows created by imshow() are closed properly and memory is released

this is what your output looks like

computer vision

And you may observe the continued presence of borders within the image, which can be attributed to the image format utilized. In the upcoming sections on computer vision, we will delve into the techniques of utilizing code to eliminate borders from images.

Now we consider about data type and shape of the B, G, and R variables.

print(type(B))
print(B.shape)
print(type(G))
print(G.shape)
print(type(R))
print(R.shape)

Then our final output will be like this

<class 'numpy.ndarray'>
(100, 200)
<class 'numpy.ndarray'>
(100, 200)
<class 'numpy.ndarray'>
(100, 200)

That indicates that these variables possess two dimensions. And we previously discussed black = 0,0,0 and while = 255,255,255, in three-dimensional cases. in this, we only have one value.

then the white space area should definitely be set to 255. Therefore, if we take into consideration the B array, the blue rectangle’s area should also be 255. This G and R array is identical to others.

These layers appear grayscale because each layer represents the brightness or intensity values of the respective colour channel. Grayscale images use shades of grey, ranging from black (lowest intensity) to white (highest intensity), to show the intensity values. When printed independently, the R, G, and B layers are shades of grey since they primarily contain intensity information rather than colour information.

that is why we notice that these layers appear grey!

go back to the topic list

let’s explore some basic OpenCV functions

imwrite

you can save the final output using imwrite()

import cv2
import matplotlib.pyplot as plt
import numpy as np

BGR_im = cv2.imread("test1.jpg", 1)  # Read image in color mode
b, g, r = cv2.split(BGR_im)  # Split BGR image
im2 = cv2.merge([r, g, b])  # Merge channels (convert to RGB)

cv2.namedWindow("BGR image", cv2.WINDOW_NORMAL)  # Create resizable window
cv2.imshow("BGR image", BGR_im)  # Show BGR image

cv2.imwrite("test2.png", im2)  # Save modified image as PNG

cv2.waitKey(0) & 0xFF  # Wait for key press to exit
cv2.destroyAllWindows()  # Close all windows

Changing colourspace

Changing an image’s colour space is sometimes necessary for computer vision. Two colour spaces (RGB and HSV) were covered in the previous section. Let’s now look at techniques for converting between several colour spaces.

There are several common techniques for converting an RGB image to a grayscale image, including the average method and the weighted method.

average method

Grayscale = (1/3) R+(1/3) G+(1/3) B


In this method, we sum up the values of R (red), G (green), and B (blue) in their respective layers, and then divide the sum by 3. This averaging process ensures an equal contribution from each colour channel to obtain the final result.

computer vision
import cv2
import matplotlib.pyplot as plt
import numpy as np

BGR_im = cv2.imread("van.jpg",1) #READING IMAGE
b,g,r = cv2.split(BGR_im)


cv2.namedWindow("BGR image",cv2.WINDOW_NORMAL)
cv2.imshow("BGR image",BGR_im)

gray_im = np.uint8((1 / 3 )* r + (1 / 3 )* g + (1 / 3) * b)  # Convert to grayscale

cv2.namedWindow("gray image",cv2.WINDOW_NORMAL)
cv2.imshow("gray image",gray_im)


cv2.waitKey(0) &0xFF
cv2.destroyAllWindows()

This is your final output look like

computer vision

The Weighted Method

The average method is simple but it doesn’t function as expected since our eyes process RGB colors differently. Green light is the most sensitive to us. red light and blue light less than green light. This calls for the distribution of the colours to have different weights. then takes use weighted technique.

Grayscale = 0.299R + 0.587G + 0.114B

This equation is a commonly used standard in various computer vision applications.

import cv2
import matplotlib.pyplot as plt
import numpy as np

BGR_im = cv2.imread("cat.jpg",1) #READING IMAGE
b,g,r = cv2.split(BGR_im)


cv2.namedWindow("BGR image",cv2.WINDOW_NORMAL)
cv2.imshow("BGR image",BGR_im)

gray_im1 = np.uint8((1 / 3 )* r + (1 / 3 )* g + (1 / 3) * b)  # Convert to grayscale

cv2.namedWindow("gray image 1",cv2.WINDOW_NORMAL)
cv2.imshow("gray image 1",gray_im1)

gray_im2 = np.uint8(0.299*r + 0.587*g + 0.114*b)  # Convert to grayscale

cv2.namedWindow("gray image 2",cv2.WINDOW_NORMAL)
cv2.imshow("gray image 2",gray_im2 )


cv2.waitKey(0) &0xFF
cv2.destroyAllWindows()

this is what your output looks like

computer vision

you can see there are some small changes in “gray image 2”

go back to the topic list

cv2.cvtColor

this a simple function to convert RGB images to gray

import cv2
import matplotlib.pyplot as plt
import numpy as np

BGR_im = cv2.imread("woman.jpg",1) #READING IMAGE
b,g,r = cv2.split(BGR_im)


cv2.namedWindow("BGR image",cv2.WINDOW_NORMAL)
cv2.imshow("BGR image",BGR_im)

gray_im1 = np.uint8((1 / 3 )* r + (1 / 3 )* g + (1 / 3) * b)  # Convert to grayscale


cv2.namedWindow("gray image 1",cv2.WINDOW_NORMAL)
cv2.imshow("gray image 1",gray_im1)

gray_im2 = np.uint8(0.299*r + 0.587*g + 0.114*b)  # Convert to grayscale

cv2.namedWindow("gray image 2",cv2.WINDOW_NORMAL)
cv2.imshow("gray image 2",gray_im2 )

gray_im3 = cv2.cvtColor(BGR_im,cv2.COLOR_BGR2GRAY)  # Convert to grayscale

cv2.namedWindow("gray image 3",cv2.WINDOW_NORMAL)
cv2.imshow("gray image 3",gray_im3  )


cv2.waitKey(0) &0xFF
cv2.destroyAllWindows()

this is what your output looks like

computer vision

you can see there are some small changes in “gray image 3”

go back to the topic list

Finding colours in the image

in this section, we move into a few advanced concepts in computer vision. we are going to find colours using our computer vision.

To work with the HSV colour space in coding, we need to understand the techniques and methods specific to handling HSV colour space.

In this colour space, we need to provide three values, similar to the RGB colour space, but with different ranges. The ranges for Value and Saturation are from 0 to 1, while the range for Hue is from 0 to 359.

but In our programmes, HSV colour space, the hue range is typically [0, 179], the saturation range is [0, 255], and the value range is [0, 255].

It’s important to note that different software may utilize different scales for these ranges. Therefore, when comparing OpenCV values with values from other software, it may be necessary to normalize or adjust these ranges accordingly.

computer vision
import cv2
import matplotlib.pyplot as plt
import numpy as np

BGR_im = cv2.imread("sky2.jpg", 1)  # Read BGR image
HSV_im = cv2.cvtColor(BGR_im, cv2.COLOR_BGR2HSV)  # Convert image to HSV color space

Blue_ub = np.array([130, 255, 255])  # Upper threshold for blue color in HSV
Blue_ln = np.array([110, 50, 50])  # Lower threshold for blue color in HSV

mask = cv2.inRange(HSV_im, Blue_ln, Blue_ub)  # Create a mask for blue color pixels

# Create named windows for visualization
cv2.namedWindow("BGR_im", cv2.WINDOW_NORMAL)
cv2.namedWindow("mask", cv2.WINDOW_NORMAL)
cv2.namedWindow("mask_im", cv2.WINDOW_NORMAL)

cv2.imshow("BGR_im", BGR_im)  # Display the original BGR image
cv2.imshow("mask", mask)  # Display the binary mask

cv2.waitKey(0) & 0xFF  # Wait for a key press to exit
cv2.destroyAllWindows()  # Close all windows

we focus on the blue colour and our output looks like this.

computer vision

Actually, this basic concept can be further developed in more advanced ways within the field of computer vision. There are numerous techniques and algorithms available that can enhance the accuracy, efficiency, and robustness of image processing and analysis tasks.

Here is a helpful dictionary representing HSV color space values for computer vision coding

color_dict_HSV = {
    'black': [[180, 255, 30], [0, 0, 0]],
    'white': [[180, 18, 255], [0, 0, 231]],
    'red1': [[180, 255, 255], [159, 50, 70]],
    'red2': [[9, 255, 255], [0, 50, 70]],
    'green': [[89, 255, 255], [36, 50, 70]],
    'blue': [[128, 255, 255], [90, 50, 70]],
    'yellow': [[35, 255, 255], [25, 50, 70]],
    'purple': [[158, 255, 255], [129, 50, 70]],
    'orange': [[24, 255, 255], [10, 50, 70]],
    'gray': [[180, 18, 230], [0, 0, 40]]
}

you can edit previous code using this dictionary

import cv2
import matplotlib.pyplot as plt
import numpy as np

color_dict_HSV = {
    'blue': [[130, 255, 255], [110, 50, 50]],
    'green': [[89, 255, 255], [36, 50, 70]],
    'red1': [[180, 255, 255], [159, 50, 70]],
    'red2': [[9, 255, 255], [0, 50, 70]],
    'yellow': [[35, 255, 255], [25, 50, 70]],
    'purple': [[158, 255, 255], [129, 50, 70]],
    'orange': [[24, 255, 255], [10, 50, 70]],
    'black': [[180, 255, 30], [0, 0, 0]],
    'white': [[180, 18, 255], [0, 0, 231]],
    'gray': [[180, 18, 230], [0, 0, 40]]
}

BGR_im = cv2.imread("sky2.jpg", 1)  # Read BGR image
HSV_im = cv2.cvtColor(BGR_im, cv2.COLOR_BGR2HSV)  # Convert image to HSV color space

color_name = 'blue'  # Select the desired color from the dictionary
color_range = color_dict_HSV[color_name]  # Get the HSV range for the selected color

color_ub = np.array(color_range[0])  # Upper threshold for selected color in HSV
color_lb = np.array(color_range[1])  # Lower threshold for selected color in HSV

mask = cv2.inRange(HSV_im, color_lb, color_ub)  # Create a mask for the selected color

# Create named windows for visualization
cv2.namedWindow("BGR_im", cv2.WINDOW_NORMAL)
cv2.namedWindow("mask", cv2.WINDOW_NORMAL)
cv2.namedWindow("mask_im", cv2.WINDOW_NORMAL)

cv2.imshow("BGR_im", BGR_im)  # Display the original BGR image
cv2.imshow("mask", mask)  # Display the binary mask

cv2.waitKey(0) & 0xFF  # Wait for a key press to exit
cv2.destroyAllWindows()  # Close all windows

go back to the topic list

Real-time colour detection

This is another simple computer vision exercise where you can detect colours in real-time using your webcam.

import cv2

# Create a VideoCapture object to access the webcam
cap = cv2.VideoCapture(0)

while True:
    # Read frames from the webcam
    ret, frame = cap.read()

    if not ret:
        break

    # Convert the frame to the HSV color space
    hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

    # Define the color ranges for detection
    lower_blue = (100, 50, 50)
    upper_blue = (130, 255, 255)

    # Create a mask for the specified color range
    mask = cv2.inRange(hsv_frame, lower_blue, upper_blue)

    # Apply the mask to the frame
    result = cv2.bitwise_and(frame, frame, mask=mask)

    # Display the original frame and the resulting masked frame
    cv2.imshow('Original', frame)
    cv2.imshow('Masked', result)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the VideoCapture object and close the windows
cap.release()
cv2.destroyAllWindows()

This is what your final output looks like.

computer vision

This article provides an introduction to computer vision for beginners. Through straightforward code examples, we’ll go over fundamental ideas including colour spaces, resolutions, and colour recognition. Although there are other Python libraries for computer vision. but in this article, we focused on OpenCV. You will ultimately be able to recognize colours in a video stream. We’ll cover advanced computer vision facts and methods in upcoming articles.

Leave a Reply

Your email address will not be published. Required fields are marked *