Approach #3: Image Capture and Skin Color Detection

My third approach was to capture an image and analyze it. I discovered skin color detection, which I quickly find it fascinating and was tempted to research more into this area.

The Research Behind Skin Detection

Skin color detection is fundamentally about separating skin pixels from non-skin pixels in an image - a critical first step in applications ranging from face detection to hand gesture recognition. What makes this particularly challenging is the dramatic variation in skin tones across different populations. Europeans, Africans, and Asians all have distinct skin color characteristics that any robust detection system needs to accommodate.

The research revealed two primary approaches:

  1. Pixel-based detection - Each pixel is classified individually based on color values

  2. Region-based detection - Spatial relationships between pixels are considered, incorporating texture and intensity data

For my implementation, I focused on pixel-based detection using color thresholds, as it's computationally less expensive - critical for a real-time application.

The literature highlighted several key factors affecting skin detection:

  • Camera hardware variations (different devices capture colors differently)

  • Illumination inconsistencies (natural vs artificial light)

  • Individual variation in skin tones

  • Shadows and highlights creating color distortions

  • The choice of color space (RGB, HSV, YCbCr, etc.)

Skin Detection Function Explained

The core of my approach was a function called IsHandPixel() that examines each pixel to determine if it looks like skin:











This function is straightforward:

1. It converts RGB values to a 0.0 - 1.0 range (by dividing by 255 for easier comparison and use in color-based detection like finding skin tones.)

2. It checks:

  • The pixel needs enough red (r >0.4)
  • It needs some green too (g > 0.28)
  • It needs a little blue (b > 0.15)
  • There should be a noticeable difference between red and green
  • Red should be greater than both green and blue
This was based on research suggesting that skin pixels typically have higher red channel values than green or blue, regardless of ethnicity, and that the difference between red and green components is particularly significant in distinguishing skin from common backgrounds.

Limitations of this Approach

One of the most significant limitations I discovered during my experiments was the critical need for a plain, uniform background. When I tested the algorithm against a white wall in my apartment, it worked well. But as soon as I moved to my classroom with its various decorative items, the detection became unstable.

The problem was clear: many everyday objects contain colors that fall within the same range as human skin tones. In particularly bad cases, the system would detect more false positives from the background than actual skin pixels from my hand. Pure color-based skin detection methods almost universally require controlled environments with non-skin-colored backgrounds to achieve acceptable accuracy.

Solution

After realizing the limitations of pure color detection, particularly its dependency on plain backgrounds, I needed a more robust approach. To improve accuracy, I explored Region of Interest (ROI) system. 

Learning About ROI Techniques

ROI is a specific area within an image or video that contains important information to be analyzed. After studying various approaches in the literature, I understood I had several options:

  1. User-defined ROI with visual guides

  2. Automatic ROI using object detection

  3. Background subtraction techniques

  4. Edge detection and segmentation

By combining skin color detection with this spatial constraint, the algorithm significantly reduces false positives. It only considers pixels that are both likely to be skin and located inside the guide border, increasing the accuracy of hand detection.

Developing the Implementation


Step 1: Convert the Guide Border Coordinates

Before searching for the hand, I made the function converts the guide border’s screen coordinates to the webcam image's coordinate space. This ensures the detection is limited to the correct area.

Step 2: Loop Through Each Pixel in the Webcam Image

The function examines every pixel in the webcam image, checking if it looks like a hand pixel.

Step 3: Limit Detection to the Guide Border Area

Only pixels inside the guide border are considered. This spatial filter reduces false positives.


Step 4: Track the Bounding Box of Detected Pixels

The function tracks the smallest rectangle that contains all the detected pixels.


Step 6: Determine If a Hand Is Detected

If enough skin-colored pixels are found inside the guide border, the function considers a hand detected.



Insights

I think this was a fun little experiment and exploration. Although it successfully detected my hand, as shown in the image, the ROI (Region of Interest) box wasn’t placed in the correct location. Still, it managed to detect the intended area. One of the main challenges with this approach was the lack of accessible resources for integrating it directly within Unity. The method also required a much deeper understanding of mathematical programming, which made it quite complex. In this attempt, I combined Unity’s logic with the skin detection pixel method. Honestly, it felt like a guessing game and a bit of a risk to my software. I decide to discontinue with this approach.

Reference


https://arxiv.org/pdf/1708.02694

Mosesdaudu. (2024, February 14). Getting ROI for computer vision projects - Mosesdaudu - Medium. Mediumhttps://medium.com/@mosesdaudu001/getting-roi-for-computer-vision-projects-ecd92cb5947d

Noel. (2023, December 21). Region of interest in Computer Vision - Scaler topics. Scaler Topicshttps://www.scaler.com/topics/region-of-interest-opencv/





















Comments

Popular posts from this blog

UDP vs TCP

Initial Research

Final Approach: Python OpenCV