SOLVING THE PROBLEMS OF NORMALIZATION OF NON-STANDARD WORDS IN THE TEXT OF THE UZBEK LANGUAGE
Keywords:
non-standard words, taxonomy, text normalization, state machine, classification, maximum entropy classifiersAbstract
Text normalization is an important component of the text-to-speech (TTS) system, and the difficulty of text normalization lies in distinguishing between non-standard words (non-standard words). In this paper, a taxonomy of non-standard words based on Uzbek speech has been developed, and a two-stage strategy for determining non-standard words has been proposed. The proposed two-stage strategy for identifying non-standard words provides an accuracy of 98.53% in the open test. Experiments show that non-standard taxonomy of words provides high initial performance
References
Jurafsky D. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (University of Colorado, Boulder) Upper Saddle River, NJ: Prentice Hall (Prentice), D. Jurafsky, J.H. Martin, Computational Linguistics, 2000, Т. 26, № 4.
Richard Sproat, Alan Black, Stanley Chen, Shankar Kumar, Marsi Ostendorf, and Christopher Richards, «Normalization of Non-Standard Words», Computer Speech and Language, 2001, 15(3): pp. 287-333.
Allen, Jonathan, M. Sharon Hunnicutt, and Dennis Klatt, «From Text to Speech: the MITalk System», Cambridge University Press, Cambridge, 2001.
Abdurakhmonova N. Z. “Modeling Analytic Forms of Verb in Uzbek as Stage of Morphological Analysis in Machine Translation” Journal of Social Sciences and Humanities Research. 2017, 5(03):89-100.
Abduraxmonova, N. Z. “Linguistic support of the program for translating English texts into Uzbek (on the example of simple sentences): Doctor of Philosophy (PhD) il dis. aftoref.”, 2018.
Musaev M. M., “Modern methods of digital processing of speech signals” Bulletin of TUIT, 2017, Vol. 2, № 42, pp. 2-13 [In Russian].
Musaev M.M., Xujayarov I.Sh., Ochilov M.M., “Recognition of phonemes of the Uzbek language based on machine learning algorithms” Informatics and energy problems, 2019, Vol. 6, [In Uzbek].
Alimuradov A.K., Churakov P.P., “Review and classification of methods for processing speech signals in speech recognition systems” Measurement. Monitoring. Control. Control, 2015, Vol. 2, № 12, pp. 27-35 [In Russian].
Musaev M., Khujayorov I. and Ochilov M., “The Use of Neural Networks to Improve the Recognition Accuracy of Explosive and Unvoiced Phonemes in Uzbek Language”, Information Communication Technologies Conference (ICTC), Nanjing, China, 2020.
Musaev M. M., Rakhimov M. F. “Algorithms for parallel processing of speech signals” Bulletin of TUIT, 2018, Vol. 2, № 46, pp. 2-13 [In Russian].
M.M. Musaev, U.A. Berdanov, K.E. Shukurov, «Hardware and software solution signal compression algorithms based on the Chebyshev polynomial» International Journal of Information and Electronics Engineering, 2014, t. Vol. 4, № 5, pp. 380-383.
Jalil, Masita, Ismailov, Alisher and others The Development of the Uzbek Stemming Algorithm. Advanced Science Letters. 2017, pp. 4171-4174.
Sproat, Richard, editor, “A Computational Theory of Writing System”, Cambridge University Press, Stanford, 2000