Abstract
This paper describes the most common algorithms with image approach convolutional neural network and two-dimensional DCT with machine learning classification KNN, SVM and RF. These algorithms are evaluated for applicability to the Uzbek language and a comparative analysis on the accuracy and recognition rate. The command words of the Uzbek language were chosen for the experiments. According to the results, it was found that both methods give high rates of recognition accuracy and are 92% (CNN) and 90% (2DDCT+Zigzag+SVM). Also the combinations of 2D-DCT+Zigzag+ KNN and 2D-DCT+Zigzag+ RF with average recognition accuracy of 86% and 85%, respectively, were considered in the paper.
First Page
55
Last Page
61
References
- P. Ibrahim., Y.R Srinivas. Speech recognition using HMM with MFCC-an analysis using frequency Spectral decomposing technique. “Signal Image Processing an International Journal (SIPIJ)”, 2010.
- A.M. Badshah., J. Ahmad., N. Rahim., S.W. Baik. Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network. 2017 “International conference on platform technology and service”, 2017.
- M. Gales., S. Young. The application of hidden Markov models in speech recognition. “Foundations and Trends in Signal Processing”, 2007. 195.
- N.G. Andrew., Y. Zhang. “Speech recognition using deep learning algorithms. published”, 2013.
- M.B. Gulmezoglu. A novel approach to isolated word recognition. “IEEE transactions on speech and audio processing”, 1999. 620.
- N.E. Sukmawati., A. Satriyo., R.A. Sutikno. Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM). “Proceeding of 5th international seminar on new paradigm and innovation on natural science and its application”, 2015.
- M. Ahmadi., N.J Bailey., B.S. Hoyle. Phoneme recognition using speech image (spectrogram). “Published in IEEE proceedings of third international conference on signal processing”, 1996.
- J. Zhang., S. Xiao., H. Zhang., L. Jiang. Isolated word recognition with audio derivation and CNN. “Proceedings international conference on tools with artificial intelligence”, 2018. 336.
- D. Polap., M. Woźniak. “Image approach to voice recognition. 2017 IEEE symposium series on computational intelligence”, 2018. 1.
- J.M. Padmanabhan., J.J. Premkumar. Machine learning in automatic speech recognition. “A survey. IETE Technical review institution of electronics and telecommunication engineers”, 2015. 240.
- C. Glackin., J. Wall., G. Chollet., N. Dugan., N. Cannings. Convolutional neural networks for phoneme recognition. “Proceedings of the 7th international conference on pattern recognition applications and methods”, 2018. 190.
- L. Yingying., P. Siyuan., X. Nanfeng. Speech Recognition Method Based on Spectrogram. “Proceedings of the international conference on mechatronics and intelligent robotics (ICMIR)”, 2017.
- A.H. Waibel., T. Hanazawa., G. Hinton., K. Shikano., K. Lang. “Phoneme recognition using time-delay neural networks”, 1989.
- W. Fisher., M. Doddington., R. George., M. Goudie., M. Kathleen M. “The DARPA Speech recognition research database: specifications and status”, 1986. 93.
- Q.T. Nguyen. Speech classification using sift features on spectrogram images. “Vietnam journal of computer science”, 2016. 247.
- M. Al-Darkazali. “Image processing methods to segment speech spectrograms for word level recognition”, 2017.
- Y. Longhao., C. Jianting. “Patients’ EEG Data analysis via spectrogram image with a convolution neural network. conference: international conference on intelligent decision technologies”, 2016.
- B. Venkatesh., P. Andrej., J. Rasmusson., L. Lundberg. Classifying environmental sounds using image recognition networks. “Procedia computer science”, 2017. 2048.
- J. Dennis., H.D. Tran., H. Li. “Spectrogram image feature for sound event classification in mismatched conditions. IEEE signal processing letters”, 2010. 130.
- E. Geoffrey., N.S. Hinton., A. Krizhevskiy., I.R Sutskever., R. Salakhutdinov. “Dropout a simple way to prevent neural networks from overfitting. journal of machine learning research”, 2014. 1929.
- A. Rosebrock. “Deep learning for computer vision with python starter bundle”, 2017.
- S. Rekik., D. Guerchi., S.A. Selouani. “Speech steganography using wavelet and fourier transforms”, 2012.
- http://cs231n.github.io/convolutional-networks
Recommended Citation
Musayev, M M.; Khujayorov, I Sh; Abdullaeva, M I.; and Ochilov, M M.
(2022)
"UZBEK COMMANDS RECOGNITION BY PROCESSING THE SPECTROGRAM IMAGE,"
Technical science and innovation: Vol. 2022:
Iss.
2, Article 5.
DOI: https://doi.org/10.51346/tstu-01.22.2-77-0174
Available at:
https://btstu.researchcommons.org/journal/vol2022/iss2/5
Included in
Civil and Environmental Engineering Commons, Electrical and Computer Engineering Commons