American Sign Language Recognition using Deep Learning Chandhini

  • Slides: 18
Download presentation
American Sign Language Recognition using Deep Learning Chandhini Grandhi Sean Liu Divyank Rahoria

American Sign Language Recognition using Deep Learning Chandhini Grandhi Sean Liu Divyank Rahoria

Background ● Problem ○ Sign Language is a form of communication for individuals with

Background ● Problem ○ Sign Language is a form of communication for individuals with speech impairment. ○ Challenge for non-sign language speakers to communicate with signlanguage speakers. ● Solution ○ Development of model to translate sign language gestures to text using Deep Learning

Literature Survey ● Reference [1] - Uses Inception, Convolutional Neural Network (CNN) for recognizing

Literature Survey ● Reference [1] - Uses Inception, Convolutional Neural Network (CNN) for recognizing spatial features and Recurrent Neural Network (RNN) to train on temporal features. ● Reference [2] - Uses Convolutional Neural Network (CNN) to predicting Sign Language and achieved 95% accuracy. ● Reference [3] - Uses Convolutional Neural Network (CNN) and recorded the weights and model for real-time prediction. ● Reference [4] - Talks about the relevant features of the model, feature extraction and uses Artificial Neural Network (ANN) to classify signs.

How can ML solve this problem? ● Machine learning allows the user to feed

How can ML solve this problem? ● Machine learning allows the user to feed a computer algorithm an immense amount of data and have the computer analyze and make data-driven recommendations and decisions based on only the input data. ● Machine learning can help us designing the model Sign Language recognition and that can minimize the issue of wrong translation due to human error.

How can ML solve this problem? ● Machine learning and deep learning models can

How can ML solve this problem? ● Machine learning and deep learning models can work with large number of dataset of real sign language images. ● There are different classes of deep neural networks like CNN that can be used for this type of image classification problem. ● We could find the appropriate dataset that we could use to solve this kind of problem.

Dataset Used ● Images of hand gestures available here ● 29 classes - 26

Dataset Used ● Images of hand gestures available here ● 29 classes - 26 alphabets and 3 special case signs. (space, del, nothing) ● 3000 images per class ○ ○ ○ Used 200 images per class 160 for training 40 for testing

Models used: CNN ● ● ● CNN is a deep learning model to assign

Models used: CNN ● ● ● CNN is a deep learning model to assign importance to various features of an image. CNN learns the filters/ features of the image. Our architecture: ○ 4 Conv 2 D layers ; 4 Max Pooling layers. ○ 2 dense layers with dropout of 40% ○ Batch normalization layers after dropout

Models Used: VGG ● VGG-16 (2014): Replaced large filters to stacked 3 x 3

Models Used: VGG ● VGG-16 (2014): Replaced large filters to stacked 3 x 3 filters to increase depth to 16 layers ● Learn more features at low computational cost ● High memory consumption ● Slow to train

Models Used: Inception. Net ● Split convolutional operation to multiple small filters ● Introduced

Models Used: Inception. Net ● Split convolutional operation to multiple small filters ● Introduced 1 x 1 Bottleneck layers to limit feature maps ● Computational cost increases ● Requires reducing image size to 50 x 50

Models Used: Res. Net 50 ● ● Res. Net-50 is a convolutional neural network

Models Used: Res. Net 50 ● ● Res. Net-50 is a convolutional neural network that is 50 layers deep. The model consists of 5 stages each with a convolution and Identity block. Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers Deeper Network Residual block of Res. Net 50 Architecture

Results: CNN ● ● Training from Scratch: ~2 hours Testing accuracy = 78. 79%

Results: CNN ● ● Training from Scratch: ~2 hours Testing accuracy = 78. 79% Larger batch size: 100 Inference: ○ Takes long time to train ○ Good results because it learns temporal and spatial dependencies of the image.

Results: VGG ● Training from Scratch: ~6 hours ● Testing accuracy = 75. 38%

Results: VGG ● Training from Scratch: ~6 hours ● Testing accuracy = 75. 38% ● Inference: ○ Training from scratch needs more iterations to converge. ○ Can fine-tune the architecture relevant to the task.

Results: Pretrained VGG 16 ● Using pretrained VGG model, accuracy converges at around 5

Results: Pretrained VGG 16 ● Using pretrained VGG model, accuracy converges at around 5 epochs ● Testing accuracy = 99. 05% ● Inference: ○ Pre-trained model converges faster ○ Produces best accuracy because of the weights learned on a bigger dataset.

Results: Inception. Net ● Training from Scratch: ~6 minutes ● Testing accuracy = 79.

Results: Inception. Net ● Training from Scratch: ~6 minutes ● Testing accuracy = 79. 48% ● Reduced image size to 50 x 50 ● Larger batch size: 128 ● Inference: ○ Improved performance over CNN because of its multilevel feature extraction.

Results: Res. Net 50 ● ● ● Training Accuracy: 99. 42% (100 epochs) Training

Results: Res. Net 50 ● ● ● Training Accuracy: 99. 42% (100 epochs) Training loss: 0. 0213 Testing Accuracy: 96. 98% Testing loss: 0. 1169 Inference: ○ Fast training ○ Good accuracy results as Res. Net finds a optimised number of layers to negate the vanishing gradient problem than other plain layer architecture.

Results : Compilation of all Models Model Test Accuracy CNN 78. 79 % VGG

Results : Compilation of all Models Model Test Accuracy CNN 78. 79 % VGG 16 - Trained from scratch 75. 38 % VGG 16 - Pretrained 99. 05 % Inception. Net 79. 48 % Resnet 50 96. 93 %

Further items to be completed ● Combine the methods of classification with RNN to

Further items to be completed ● Combine the methods of classification with RNN to deal with sequences of images for efficient sign language translation

References 1. 2. 3. 4. K. Bantupalli and Y. Xie, "American Sign Language Recognition

References 1. 2. 3. 4. K. Bantupalli and Y. Xie, "American Sign Language Recognition using Deep Learning and Computer Vision, " 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 4896 -4899, doi: 10. 1109/Big. Data. 2018. 8622141. L. Y. Bin, G. Y. Huann and L. K. Yun, "Study of Convolutional Neural Network in Recognizing Static American Sign Language, " 2019 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 2019 M. Taskiran, M. Killioglu and N. Kahraman, "A Real-Time System for Recognition of American Sign Language by using Deep Learning, " 2018 41 st International Conference on Telecommunications and Signal Processing (TSP), Athens, 2018 A. Thongtawee, O. Pinsanoh and Y. Kitjaidure, "A Novel Feature Extraction for American Sign Language Recognition Using Webcam, " 2018 11 th Biomedical Engineering International Conference (BMEi. CON), Chiang Mai, 2018, pp. 1 -5, doi: 10. 1109/BMEi. CON. 2018. 8609933.