Final Approach: Python OpenCV
After experimenting with 3 different approaches, I realized that fully implementing them would be too risky. Each required a deep technical understanding and advanced knowledge of complex systems that I don’t yet possess. While the concepts were promising, I ultimately decided to take a more practical route that aligned with my current skills and ensured the project would be functional and deliverable within my timeline. Luckily, when I decided to broaden my research and look into more practical solutions, I came across a fascinating YouTube tutorial on building a hand sign detection system specifically for American Sign Language (ASL) alphabets that offered a more systematic approach.
Learning From Mr. Murtaza
The tutorial I discovered from the YouTube channel called "Murtaza's Workshop - Robotics and AI" presented a comprehensive step-by-step guide for building a robust hand sign detection system using ASL alphabets A, B, and C. What caught my attention was its two-step approach:
First detecting the hand's position (object detection)
Then classifying the specific hand sign (classification)
This approach immediately struck me as more sophisticated than my previous attempts. While it does require the use of third-party software, it also opens up an opportunity for me to explore a new programming language, Python. But this adds up an exciting learning aspect to this project!
Key Components of the Approach
The tutorial covered several techniques that I hadn't considered:
Image preprocessing: Cropping hand images and placing them on a standardized 300×300 white square background for consistent classification input
Aspect ratio handling: Properly managing the different shapes formed by hands making different signs
Structured data collection: Saving manually triggered image samples for training by pressing 's' to ensure quality control
Leveraging existing libraries: Using OpenCV for webcam handling, MediaPipe for hand tracking, and cvzone for simplified detection
Google's Teachable Machine
One of the most interesting aspects was the use of Google's Teachable Machine for training the classification model. This low-code solution allows for quick model training through a drag-and-drop interface. I didn't know that you can easily train a model by just uploading images! This saves me so much time!
Let's Start Implementing!
Previously, I had experimented with Python-based approaches and found them promising, but I was hesitant to fully commit to third-party tools. However, after watching this tutorial and realizing that achieving real-time hand detection may require going beyond Unity's capabilities, I now feel more confident in giving this approach a try. Here's a detailed follow along with the tutorial walkthrough of the process:
Step 1: Setting Up the Environment
The initial setup required installing a few key packages:
MediaPipe for hand detection
OpenCV-python for camera access and image processing
CVzone for simplified hand tracking
Step 2: Creating the Initial Script for Webcam Access
When I started writing a basic script to access the webcam and the display view, I immediately ran into a small hurdle. The tutorial used cv2.VideoCapture(0) to access the default webcam, but when I ran this code, it surprisingly opened the camera on my iPhone instead of my laptop's webcam.
After investigating, I realized this was happening because of Apple's Continuity Camera feature, which allows Macs to use an iPhone as a webcam. In OpenCV, cv2.VideoCapture(0) tries to access the first available camera device, which in my case was my iPhone since it was connected through Continuity Camera and registered as the primary camera.
The simple fix was to use:
cap = cv2.VideoCapture(1) # Use index 1 instead of 0
This small change selected my laptop's built-in webcam (the second camera in the device list) instead of my iPhone. It was a good reminder that code examples often need small adjustments for different hardware setups, especially when working with Apple's ecosystem which has these seamless device integrations.
Step 3: Implementing Hand Detection
Instead of using MediaPipe directly, the tutorial use the cvzone library which provides a simplified interface for hand tracking. This made the implementation much cleaner:
Step 4: Setting Up Data Collection
For data collection, the tutorial used the cvzone library again which provides a simplified HandTrackingModule built on top of MediaPipe. Here's the actual code I implemented:
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import time
cap = cv2.VideoCapture(1)
detector = HandDetector(maxHands=1)
offset = 20
imgSize = 244
folder = "Data/A"
counter = 0
while True:
success, img = cap.read()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']
# Crop the hand from the original image
imgCrop = img[max(0, y - offset): min(y + h + offset, img.shape[0]),
max(0, x - offset): min(x + w + offset, img.shape[1])]
# Display the cropped hand image
cv2.imshow("ImageCrop", imgCrop)
# Resize the cropped image to fit within imgSize
aspectRatio = h / w
if aspectRatio > 1:
k = imgSize / h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop, (wCal, imgSize))
imgWhite = np.ones((imgSize, imgSize, 3), np.uint8) * 255
wGap = math.ceil((imgSize - wCal) / 2)
imgWhite[:, wGap:wCal + wGap] = imgResize
cv2.imshow("ImageWhite", imgWhite)
else:
k = imgSize / w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop, (imgSize, hCal))
imgWhite = np.ones((imgSize, imgSize, 3), np.uint8) * 255
hGap = math.ceil((imgSize - hCal) / 2)
imgWhite[hGap:hCal + hGap, :] = imgResize
cv2.imshow("ImageWhite", imgWhite)
# Show the original image from the camera
cv2.imshow("Image", img)
key = cv2.waitKey(1)
if key == ord('s'):
counter += 1
cv2.imwrite(f'{folder}/Image_{time.time()}.jpg',imgWhite)
print(counter)
This script does several important things:
Uses the HandDetector from cvzone - This simplified the hand detection process compared to using MediaPipe directly.
Creates a bounding box around the detected hand - The detector automatically provides the bounding box coordinates.
Crops the hand image with padding - I added an offset of 20 pixels around the hand to ensure no parts were cut off.
Handles different aspect ratios - The code checks if the hand is taller than it is wide (aspectRatio > 1) and processes it accordingly.
Places the hand on a white background - Creates a square white image and centers the hand on it, preserving the aspect ratio.
Saves images with timestamps - When I press 's', it saves the processed image with a unique timestamp filename.
I ran this script several times, changing the folder variable each time to collect data for different hand signs. For each sign, I collected about 300 images from different angles and with slight variations in hand position to build a robust dataset.
Step 5: Training with Google's Teachable Machine
Created a new image project
Uploaded my collected images to the appropriate classes
Trained the model (which took only a few minutes)
Exported the model as a TensorFlow lite model
The video tutorial I was following recommended exporting the model as a standard Keras/TensorFlow model. However, I quickly discovered that this wasn't going to work with my setup. When attempting to use these formats, I encountered compatibility issues with the TensorFlow version installed in my PyCharm environment.
After several unsuccessful attempts, ChatGPT suggested I export the model in TensorFlow Lite format instead. It worked! This proved to be the right solution for my particular setup, although it would require some code adjustments during implementation. The TensorFlow Lite model worked seamlessly with my environment, likely because it's designed to be more portable across different TensorFlow versions and platforms.
Step 6: Implementing the Classification
Moving Forward: Connecting to Unity
I am glad it worked out perfectly! This script performs hand sign classification in real-time using the TensorFlow Lite model exported from Teachable Machine. This implementation successfully detected my hand signs and classified them as A, B, or C in real-time!
These detailed steps demonstrate the process of implementing a hand sign detection and classification system using existing libraries and tools. It's amazing how powerful these pre-built components are when combined properly.
Now that I have a functioning hand sign detection and classification system running in Python, the next step is to connect this to Unity for developing a more interactive sign language learning game.




Comments
Post a Comment