Pengelasan Sebutan Huruf Hijaiyah menggunakan Teknik Pembelajaran Mesin (Classification of Hijaiyah Letters Pronunciation using Machine Learning Techniques)
Abstract
Fitur Mel-frequency cepstral coefficients (MFCC) dan teknik pengelasan berdasarkan pembelajaran mesin sering digunakan dalam mengelaskan sebutan huruf-huruf hijaiyah. Walaupun begitu, berdasarkan kajian-kajian lepas, prestasi ketepatan pengelasan sebutan huruf hijaiyah masih lagi rendah walaupun dengan penggunaan algoritma pembelajaran mesin dan fitur MFCC. Oleh itu, kajian khas untuk menganalisis fitur dan teknik pembelajaran mesin yang sesuai akan dibincangkan dalam kertas kajian ini. Selain itu, bilangan huruf hijaiyah juga ditingkatkan kepada 30 huruf mengikut resam uthmani. Kajian ini mahu membuktikan bahawa penggunaan fitur dan teknik pengelasan yang sesuai mampu mengelaskan sebutan huruf hijaiyah dan memberikan prestasi ketepatan yang tinggi walaupun dengan jumlah huruf yang banyak. Kajian ini dilakukan berdasarkan kepada enam fasa utama dalam metodologi kajian ini termasuklah pemprosesan isyarat, penyarian fitur, pemprosesan dan pemilihan fitur, pengelasan dan akhir sekali pengujian, penilaian dan analisis. Kadar persampelan yang digunakan bagi kesemua modul pemprosesan isyarat pertuturan dalam kajian ini adalah 44.1 kHz. Dapatan kajian menunjukkan fitur MFCC merupakan fitur paling sesuai bagi mengelaskan sebutan huruf hijaiyah berbanding fitur-fitur lain yang telah diekstrak berdasarkan kepada ‘rank’ dalam hasil pemilihan fitur. Perbandingan prestasi ketepatan menunjukkan teknik pengelasan Random Forest (RF) mencapai ketepatan yang tinggi dengan menggunakan fitur MFCC, iaitu purata sebanyak 97~99% bagi setiap huruf hijaiyah berbanding teknik pengelasan lain yang telah diuji dalam kajian ini. Kesimpulannya, penggunaan fitur MFCC dan teknik pengelasan RF mampu memberikan prestasi ketepatan pengelasan sebutan huruf hijaiyah yang tinggi sekaligus meningkatkan prestasi pengelasan sebutan huruf hijaiyah kajian lepas, sehingga 98.29% secara purata untuk 30 huruf.
Kata kunci: Sebutan huruf hijaiyah; Pengelasan pertuturan; MFCC; Pembelajaran mesin; Pengecaman pertuturan
ABSTRACT
Mel-frequency cepstral coefficients (MFCC) features and classification techniques based on machine learning are often used in classifying hijaiyah letter pronunciations, however, the classification accuracy performance of hijaiyah letter pronunciations is still low even with the use of machine learning algorithms and MFCC features. Therefore, this study to analyze the features and relevant machine learning techniques will be presented in this study paper. In addition, the number of hijaiyah letters was also increased to 30 letters following the Uthmani resm. This research aims to prove that the suitable feature and relevant classification technique allows for precise classification of the pronunciation of each letter even with large amounts of letters. This research is conducted based on the six main stages in research methodologies which includes signal processing, feature searching, processing and feature selection, classification and lastly, testing, evaluation and analysis. The sampling rate used for all speech signal processing modules in this study is 44.1 kHz. The findings of the study show that the MFCC feature is the most suitable feature to classify the pronunciation of hijayah letters compared to other features that have been extracted based on the rank in the feature selection results. Comparison of accuracy performance shows that Random Forest (RF) classification technique achieves high accuracy by using MFCC feature, which is an average of 97 ~ 99% for each hijaiyah letter compared to other classification techniques that have been tested in this study. In conclusion, the use of MFCC features and RF classification techniques are able to provide a high performance of hijaiyah pronunciation classification accuracy, which is 98.29% on average even with the use of 30 letters.
Keywords: Hijaiyah letters pronunciation; Speech classification; MFCC; Machine learning; Speech recognition
Keywords
Full Text:
PDFReferences
Adiwijaya, Aulia, M. N., Mubarok, M. S., Untari Novia, W., & Nhita, F. (2017). A comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters pronounciation classification system. 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017, c, 2–6. https://doi.org/10.1109/ICoICT.2017.8074689
Al-Sabri, A., Afzan Adam & Fadhilah Rosdi (2018). Automatic detection of Shadda in modern standard Arabic continuous speech. International Journal on Advanced Science, Engineering and Information Technology, 8(4–2), 1810–1819. https://doi.org/10.18517/ijaseit.8.4-2.6813
al-Qattan. (1973). Studi Ilmu-ilmu Qur'an.
Andreas Brinch, N., Lars Kai, H., & U K. (2006). Pitch Based Sound Classification. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 3. https://doi.org/10.1109/icassp.2006.1660772
Anon. (2020). Feature Extraction. MathWorks. https://ww2.mathworks.cn/en/discovery/feature-extraction.html
Anusuya, M. A., & Katti, S. K. (2009). Speech Recognition by Machine: A Review. (IJCSIS) International Journal of Computer Science and Information Security, 6(3), 181–205. http://sites.google.com/site/ijcsis/
Atul, G. (2020). Feature Selection Techniques in Machine Learning. https://www.analyticsvidhya.com/blog/2020/10/feature-selection-techniques-in-machine-learning/
Azhar Mohd Khairy, Afzan Adam & Mohd Ridzwan Yaakub. (2018). Data Analytics in Malaysian Education System: Revealing The Success of Sijil Pelajaran Malaysia From Ujian Aptitud Sekolah Rendah. Asia-Pacific Journal of Information Technology & Multimedia, 07(02), 29–45. https://doi.org/10.17576/apjitm-2018-0702-03
Bachu, R. G., Kopparthi, S., Adapa, B., & Barkana, B. D. (2010). Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. Advanced Techniques in Computing Sciences and Software Engineering, 279–282. https://doi.org/10.1007/978-90-481-3660-5_47
Ben, L. & Karolina, K. (2021). What is Speech Recognition? Dicapai pada 31 Mac 2022 dari https://www.techtarget.com/searchcustomerexperience/definition/speech-recognition
Chavan, K., & Gawande, U. (2015). Speech recognition in noisy environment, issues and challenges: A review. Proceedings of the IEEE International Conference on Soft-Computing and Network Security, ICSNS 2015, February 2015, 23–28. https://doi.org/10.1109/ICSNS.2015.7292420
Cornellius Yudha, W. (2020). 5 SMOTE Techniques for Oversampling your Imbalance Data. Towards Data Science. https://towardsdatascience.com/5-smote-techniques-for-oversampling-your-imbalance-data-b8155bdbe2b5
Doshi, S. (2018). Music Feature Extraction in Python. Towards Data Science. https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d
Dimitris, E. (2021). Feature Selection for Machine Learning: 3 Categories and 12 Methods. Towards Data Science. https://towardsdatascience.com/feature-selection-for-machine-learning-3-categories-and-12-methods-6a4403f86543
Elvira Sukma, Wahyuni. (2018). Arabic Speech Recognition using MFCC Feature Extraction and ANN Classification. Proceedings - 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2017, 2018-Janua, 22–25. https://doi.org/10.1109/ICITISEE.2017.8285499
Ezuana Sukawai & Nazlia Omar. (2020). Pembangunan Korpus Bagi Analisis Sentimen Dalam Bahasa Melayu Secara Separa Selia. Jurnal Teknologi Maklumat Dan Multimedia Asia-Pasifik. https://doi.org/10.17576/apjitm-2020-0901-08
Fadhilah Rosdi, Mumtaz Begum Mustafa, Siti Salwah Salim & Nor Azan Mat Zin (2019). Automatic speech intelligibility detection for speakers with speech impairments: The identification of significant speech features. Sains Malaysiana, 48(12), 2737–2747. https://doi.org/10.17576/jsm-2019-4812-15
Gabriel, A. (2019). Feature selection techniques for classification and Python tips for their application. Towards Data Science. https://towardsdatascience.com/feature-selection-techniques-for-classification-and-python-tips-for-their-application-10c0ddd7918b
Gevaert, W., Tsenov, G., Mladenov, V., & Member, S. (2010). Neural Networks used for Speech Recognition. Journal of Automatic Control, University of Belgrade, 20, 1–7. https://doi.org/10.2298/JAC1001001G
Graham, W. (2014). Loudness. SLTinfo. https://www.sltinfo.com/loudness/
Hiriyannaiah, S., Srinivas, A. M. D., Shetty, G. K., G.M., S., & Srinivasa, K. G. (2020). A computationally intelligent agent for detecting fake news using generative adversarial networks. In Hybrid Computational Intelligence (pp. 69–96). Elsevier. https://doi.org/10.1016/b978-0-12-818699-2.00004-4
Jason, B. (2016). How to Perform Feature Selection With Machine Learning Data in Weka. https://machinelearningmastery.com/perform-feature-selection-machine-learning-data-weka/
Jason, B. (2019). A Gentle Introduction to Imbalanced Classification. https://machinelearningmastery.com/what-is-imbalanced-classification/
Kaseh Abu-bakar & Muhammad Faiz Abdullah. (2018). Tekanan Perkataan Arab Sebagai Bahasa Asing dalam Kalangan Penutur Melayu. GEMA Online ® Journal of Language Studies, 18(1). https://doi.org/10.17576/gema-2018-1801-06
Satyam, K. (2021). Stop using SMOTE to Handle All Your Imbalanced Data. Towards Data Science. https://towardsdatascience.com/stop-using-smote-to-handle-all-your-imbalanced-data-34403399d3be
Lindasalwa Muda, Mumtaj Begam & Elamvazuthi, I. (2010). Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. Journal of Computing, 2(3), 138–143. http://arxiv.org/abs/1003.4083
Mansour, A., Zulfiqar, A., & Ghulam, M. (2011). Gender Classification with Voice Intensity. Proceedings - UKSim 5th European Modelling Symposium on Computer Modelling and Simulation, EMS 2011, 205–209. https://doi.org/10.1109/EMS.2011.37
Mary, W. (2019). Garbage In, Garbage Out: The Importance of Good Data. Medium. https://medium.com/@marybrwolff/garbage-in-garbage-out-the-importance-of-good-data-ce1bb775468e
Moawad, A. (2012). Speech Recognition System. In Global Linguistics (Issue July 2012, pp. 45–94). https://doi.org/10.1515/9783110214048.45
Nate, G. (2012). Speech processing pre-emphasis: how does it work? - Mathematics Stack Exchange. Mathematics Exchange. https://math.stackexchange.com/questions/44216/speech-processing-pre-emphasis-how-does-it-work
Nurul Wahidah Arshad, S.N. Abdul Aziz, Faradila Naim, Rohana Abdul Karim, Rosyati Hamid & Nor Farizan Zakaria. (2011). Speech processing for makhraj recognition: The design of adaptive filter for noise canceller. 2011 7th International Conference on Information Technology in Asia: Emerging Convergences and Singularity of Forms -
Proceedings of CITA’11. https://doi.org/10.1109/CITA.2011.5999501
Nurul Wahidah Arshad, Suriazalmi Mohd Sukri, Lailatul Niza Muhammad, Hasan Ahmad, Rosyati Hamid, Faradila Naim & Noor Zirwatul Ahlam Naharuddin. (2013). Makhraj Recognition for Al-Quran Recitation using MFCC. International Journal of Intelligent Information Processing, 4(2), 45–53. https://doi.org/10.4156/ijiip.vol4.issue2.5
Nilsson, N. J. (2013). The Quest for Artificial Intelligence. Cambridge University Press.
Oksana. (2021). The Differences Between Audio Formats: MP3, FLAC, WAV, AIFF, M4A & OGG. https://www.lalal.ai/blog/difference-between-audio-formats-mp3-flac-wav-aiff-m4a-ogg/
Putra, B., Atmaja, B., & Prananto, D., (2012). Developing Speech Recognition System for Quranic Verse Recitation Learning Software. IJID (International Journal on Informatics for Development). 10.14421/ijid.2012.01203
Rahimi, N. M., Baharudin, H., Zamri, &, & Abstrak, M. (2010). Tahap Sebutan Huruf Konsonan Arab dalam Kalangan Murid Prasekolah (The Level of Arabic Consonants Pronunciation Among Preschool Children). Jurnal Pendidikan Malaysia, 35(1), 41–46.
Rajeev, R., & Rajesh Kumar, D. (2016). Isolated Word Recognition using HMM for Maithili dialect. 2016 International Conference on Signal Processing and Communication, ICSC 2016, 323–327. https://doi.org/10.1109/ICSPCom.2016.7980600
Shivani, G., & Atul, G. (2019). Dealing with Noise Problem in Machine Learning Data-sets: A Systematic Review. Elsevier. https://doi.org/https://doi.org/10.1016/j.procs.2019.11.146
Siti Nor Azimah Sabaruddin & Tengku Intan Zarina Tengku Puji. (2013). Kajian Mengenai Kaedah Pengajaran dan Pembelajaran (P&P) Al-Quran Braille : Suatu Analisa. Universiti Kebangsaan Malaysia, 53(9), 1689–1699.
Schuster (2013). Sound Intensity and Level. Boundless Physics. https://courses.lumenlearning.com/boundless-physics/chapter/sound-intensity-and-level/
Sonia, S. & David, P.S. (2013). Performance of Different Classifiers in Speech Recognition. International Journal of Research in Engineering and Technology, 02(04), 590–597. https://doi.org/10.15623/ijret.2013.0204032
Swastik, S. (2020). SMOTE | Overcoming Class Imbalance Problem Using SMOTE. https://www.analyticsvidhya.com/blog/2020/10/overcoming-class-imbalance-using-smote-techniques/
Tableu. (2021). Data Cleaning: Definition, Benefits, And How-To. https://www.tableau.com/learn/articles/what-is-data-cleaning
Teena (2019). How Many Letters in Arabic Alphabet? - Learn Arabic with Teena. Dicapai 5 Oktober 2021 dari https://learnarabicwithteena.com/arabic-alphabet/arabic-alphabet/
ThePhysicsClassroom. (2021). Physics Tutorial: Intensity and the Decibel Scale. https://www.physicsclassroom.com/class/sound/Lesson-2/Intensity-and-the-Decibel-Scale
Untari Novia, W., Mubarok, M. S., & Adiwijaya, (2017). A classification of marked hijaiyah letters’ pronunciation using hidden Markov model. AIP Conference Proceedings. http://dx.doi.org/10.1063/1.4994439
Weenink, D. (2018). Speech Signal Processing with Praat.
Wisnu Adi, P., Adiwijaya, & Untari Novia, W. (2018). Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction. Journal of Physics: Conference Series, 971(1). https://doi.org/10.1088/1742-6596/971/1/012050
Xuedong, H., Alex, A., & Hsiao-Wuen, H. (2001). Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall.
Xuedong, H., & Li, D. (2009). An Overview of Modern Speech Recognition. In Handbook of Natural Language Processing (pp. 339–367).
Ye, Wu., & Rick, R. (2017). 7 Techniques to Handle Imbalanced Data. https://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html
Zitouni, I. (2014). Natural Language Processing of Semitic Languages. Springer Science & Business.
Zbynek, T., & Josef, P. (1999). Speech production based on the mel-frequency cepstral coefficients. EuroSpeech, 99, 2335–2338.
DOI: http://dx.doi.org/10.17576/gema-2023-2301-15
Refbacks
- There are currently no refbacks.
eISSN : 2550-2131
ISSN : 1675-8021