Final Approach: Python OpenCV

After experimenting with 3 different approaches, I realized that fully implementing them would be too risky. Each required a deep technical understanding and advanced knowledge of complex systems that I don’t yet possess. While the concepts were promising, I ultimately decided to take a more practical route that aligned with my current skills and ensured the project would be functional and deliverable within my timeline. Luckily, when I decided to broaden my research and look into more practical solutions, I came across a fascinating YouTube tutorial on building a hand sign detection system specifically for American Sign Language (ASL) alphabets that offered a more systematic approach.

Learning From Mr. Murtaza

The tutorial I discovered from the YouTube channel called "Murtaza's Workshop - Robotics and AI" presented a comprehensive step-by-step guide for building a robust hand sign detection system using ASL alphabets A, B, and C. What caught my attention was its two-step approach:

  1. First detecting the hand's position (object detection)

  2. Then classifying the specific hand sign (classification)

This approach immediately struck me as more sophisticated than my previous attempts. While it does require the use of third-party software, it also opens up an opportunity for me to explore a new programming language, Python. But this adds up an exciting learning aspect to this project!

Key Components of the Approach

The tutorial covered several techniques that I hadn't considered:

  • Image preprocessing: Cropping hand images and placing them on a standardized 300×300 white square background for consistent classification input

  • Aspect ratio handling: Properly managing the different shapes formed by hands making different signs

  • Structured data collection: Saving manually triggered image samples for training by pressing 's' to ensure quality control

  • Leveraging existing libraries: Using OpenCV for webcam handling, MediaPipe for hand tracking, and cvzone for simplified detection

Google's Teachable Machine

One of the most interesting aspects was the use of Google's Teachable Machine for training the classification model. This low-code solution allows for quick model training through a drag-and-drop interface. I didn't know that you can easily train a model by just uploading images! This saves me so much time!

Let's Start Implementing!

Previously, I had experimented with Python-based approaches and found them promising, but I was hesitant to fully commit to third-party tools. However, after watching this tutorial and realizing that achieving real-time hand detection may require going beyond Unity's capabilities, I now feel more confident in giving this approach a try. Here's a detailed follow along with the tutorial walkthrough of the process:


Step 1: Setting Up the Environment

The initial setup required installing a few key packages:

  • MediaPipe for hand detection

  • OpenCV-python for camera access and image processing

  • CVzone for simplified hand tracking

Unfortunately, the process wasn't as straightforward as I expected. The tutorial video didn't mention which software was used, so I asked ChatGPT to help me identify it, and I found out it was PyCharm. From there, I ran into a series of installation issues. Some packages were incompatible with the version of PyCharm I had installed, and when I downgraded or changed the version, different packages would stop working. It took a lot of trial and error, uninstalling and reinstalling packages, before I finally found the version that works with all packages.

Step 2: Creating the Initial Script for Webcam Access

When I started writing a basic script to access the webcam and the display view, I immediately ran into a small hurdle. The tutorial used cv2.VideoCapture(0) to access the default webcam, but when I ran this code, it surprisingly opened the camera on my iPhone instead of my laptop's webcam.

After investigating, I realized this was happening because of Apple's Continuity Camera feature, which allows Macs to use an iPhone as a webcam. In OpenCV, cv2.VideoCapture(0) tries to access the first available camera device, which in my case was my iPhone since it was connected through Continuity Camera and registered as the primary camera.

The simple fix was to use:

cap = cv2.VideoCapture(1)  # Use index 1 instead of 0

This small change selected my laptop's built-in webcam (the second camera in the device list) instead of my iPhone. It was a good reminder that code examples often need small adjustments for different hardware setups, especially when working with Apple's ecosystem which has these seamless device integrations.

Step 3: Implementing Hand Detection

Instead of using MediaPipe directly, the tutorial use the cvzone library which provides a simplified interface for hand tracking. This made the implementation much cleaner:

















I discovered that the "detector.findHands()" function returns both the detected hands and the image with landmarks drawn on it, which saves time from having to write additional code to visualize the hand tracking. Each detected hand is returned as a dictionary containing the bounding box coordinates, landmarks, center point, and other useful information.

Step 4: Setting Up Data Collection

For data collection, the tutorial used the cvzone library again which provides a simplified HandTrackingModule built on top of MediaPipe. Here's the actual code I implemented:

import cv2

from cvzone.HandTrackingModule import HandDetector

import numpy as np

import math

import time


cap = cv2.VideoCapture(1)

detector = HandDetector(maxHands=1)


offset = 20

imgSize = 244


folder = "Data/A"

counter = 0


while True:

   success, img = cap.read()

   hands, img = detector.findHands(img)


   if hands:

       hand = hands[0]

       x, y, w, h = hand['bbox']


       # Crop the hand from the original image

       imgCrop = img[max(0, y - offset): min(y + h + offset, img.shape[0]),

                 max(0, x - offset): min(x + w + offset, img.shape[1])]


       # Display the cropped hand image

       cv2.imshow("ImageCrop", imgCrop)


       # Resize the cropped image to fit within imgSize

       aspectRatio = h / w

       if aspectRatio > 1:

           k = imgSize / h

           wCal = math.ceil(k * w)

           imgResize = cv2.resize(imgCrop, (wCal, imgSize))

           imgWhite = np.ones((imgSize, imgSize, 3), np.uint8) * 255

           wGap = math.ceil((imgSize - wCal) / 2)

           imgWhite[:, wGap:wCal + wGap] = imgResize

           cv2.imshow("ImageWhite", imgWhite)

       else:

           k = imgSize / w

           hCal = math.ceil(k * h)

           imgResize = cv2.resize(imgCrop, (imgSize, hCal))

           imgWhite = np.ones((imgSize, imgSize, 3), np.uint8) * 255

           hGap = math.ceil((imgSize - hCal) / 2)

           imgWhite[hGap:hCal + hGap, :] = imgResize

           cv2.imshow("ImageWhite", imgWhite)


   # Show the original image from the camera

   cv2.imshow("Image", img)

   key = cv2.waitKey(1)

   if key == ord('s'):

       counter += 1

       cv2.imwrite(f'{folder}/Image_{time.time()}.jpg',imgWhite)

      
     print(counter)

This script does several important things:

  1. Uses the HandDetector from cvzone - This simplified the hand detection process compared to using MediaPipe directly.

  2. Creates a bounding box around the detected hand - The detector automatically provides the bounding box coordinates.

  3. Crops the hand image with padding - I added an offset of 20 pixels around the hand to ensure no parts were cut off.

  4. Handles different aspect ratios - The code checks if the hand is taller than it is wide (aspectRatio > 1) and processes it accordingly.

  5. Places the hand on a white background - Creates a square white image and centers the hand on it, preserving the aspect ratio.

  6. Saves images with timestamps - When I press 's', it saves the processed image with a unique timestamp filename.

I ran this script several times, changing the folder variable each time to collect data for different hand signs. For each sign, I collected about 300 images from different angles and with slight variations in hand position to build a robust dataset.











Step 5: Training with Google's Teachable Machine

After collecting around 300 images for each class (A, B, C), I went to Google's Teachable Machine website:
  1. Created a new image project

  2. Uploaded my collected images to the appropriate classes

  3. Trained the model (which took only a few minutes)

  4. Exported the model as a TensorFlow lite model

The video tutorial I was following recommended exporting the model as a standard Keras/TensorFlow model. However, I quickly discovered that this wasn't going to work with my setup. When attempting to use these formats, I encountered compatibility issues with the TensorFlow version installed in my PyCharm environment.

After several unsuccessful attempts, ChatGPT suggested I export the model in TensorFlow Lite format instead. It worked! This proved to be the right solution for my particular setup, although it would require some code adjustments during implementation. The TensorFlow Lite model worked seamlessly with my environment, likely because it's designed to be more portable across different TensorFlow versions and platforms.





















Step 6: Implementing the Classification










I discovered that this code segment handles the prediction results by: 1. Verifying the prediction index is valid
2. Printing the prediction and confidence score to the console for debugging
3. Displaying the predicted sign label and confidence percentage above hand
4. Drawing a rectangle around the detected hand for visual feedback


Moving Forward: Connecting to Unity

I am glad it worked out perfectly! This script performs hand sign classification in real-time using the TensorFlow Lite model exported from Teachable Machine. This implementation successfully detected my hand signs and classified them as A, B, or C in real-time!

These detailed steps demonstrate the process of implementing a hand sign detection and classification system using existing libraries and tools. It's amazing how powerful these pre-built components are when combined properly.

Now that I have a functioning hand sign detection and classification system running in Python, the next step is to connect this to Unity for developing a more interactive sign language learning game.

Reference

Murtaza’s Workshop - Robotics and AI. (2022b, July 4). Easy Hand sign Detection | American Sign Language ASL | Computer Vision [Video]. YouTube. https://www.youtube.com/watch?v=wa2ARoUUdU8

Comments

Popular posts from this blog

UDP vs TCP

Initial Research