State-of-the-Art NER Models for the Uzbek Language

Madina  Samatboyeva

doi:10.51699/cajlpc.v7i2.1478

Authors

Madina Samatboyeva Department of Computational Linguistics and Digital Technology. Tashkent State University of Uzbek Language and Literature, Tashkent, Uzbekistan

DOI:

https://doi.org/10.51699/cajlpc.v7i2.1478

Keywords:

Named Entity Recognition, Uzbek language, machine learning, deep learning, BERT, XLM-RoBERTa, Bidirectional LSTM, Natural Language Processing, corpus linguistics

Abstract

Named Entity Recognition (NER) is one of the fundamental tasks in Natural Language Processing and plays an important role in many applications such as information extraction, machine translation, and question answering systems. In recent years, machine learning and deep learning approaches have significantly improved the performance of NER systems. However, developing accurate NER models for low-resource languages such as Uzbek remains a challenging task due to the limited availability of annotated corpora and linguistic resources. This paper reviews state-of-the-art NER models used in machine learning for the Uzbek language, including traditional statistical methods and modern neural network architectures. In particular, models based on Conditional Random Fields, Bidirectional LSTM, and transformer-based architectures such as BERT and XLM-RoBERTa are analyzed. The study discusses their effectiveness, advantages, and limitations in the context of Uzbek language processing. The findings highlight that transformer-based multilingual models demonstrate the best performance for Uzbek NER tasks and provide promising directions for future research.

References

S. Wang, X. Sun, X. Li, R. Ouyang, F. Wu, T. Zhang, J. Li, G. Wang, and C. Guo, “GPT NER: Named entity recognition via large language models,” in Findings of the Association for Computational Linguistics: NAACL 2025, 2025. [Online]. Available: https://aclanthology.org/2025.findings-naacl.239

H. Zhang, W. Li, D. Huang, Y. Tang, Y. Chen, P. Payne, and F. Li, “KoGNER: A novel framework for knowledge graph integration on biomedical named entity recognition,” 2025. [Online]. Available: https://arxiv.org/abs/2503.15737

W. Zhou, S. Zhang, Y. Gu, M. Chen, and H. Poon, “UniversalNER: Targeted distillation from large language models for open named entity recognition,” 2023. [Online]. Available: https://arxiv.org/abs/2308.03279

S. Mayhew et al., “Universal NER: A gold-standard multilingual named entity recognition benchmark,” in Proc. NAACL, 2024. [Online]. Available: https://aclanthology.org/2024.naacl-long.243

Y. Zhang, M. Liu, H. Tang, S. Lu, and L. Xue, “Simple named entity recognition (NER) system with RoBERTa for ancient Chinese,” in Proc. Second Workshop on Ancient Language Processing, 2025. [Online]. Available: https://aclanthology.org/2025.alp-1.27

B. B. Elov and M. T. Samatboyeva, “Process of automatic NER identification in the Uzbek language corpus,” in Proc. Int. Conf. Computational Linguistics and NLP, 2024.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186. doi: 10.18653/v1/N19-1423

“BERT (language model),” Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/BERT_(language_model)

“BERT language model definition,” TechTarget. [Online]. Available: https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model

“Bidirectional encoder representations from transformers (BERT),” Emergent Mind. [Online]. Available: https://www.emergentmind.com/topics/bidirectional-encoder-representations-from-transformers-bert

X. Li, X. Sun, Y. Sun, and J. Li, “Named entity recognition as dependency parsing,” in Proc. ACL, 2020, pp. 6470–6480. doi: 10.18653/v1/2020.acl-main.577

Z. Li, Z. Zhao, Y. Wei, and Y. Sun, “FLAT: Chinese NER using flat-lattice transformer,” in Proc. ACL, 2020, pp. 6836–6842. doi: 10.18653/v1/2020.acl-main.611

Y. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” in Proc. NAACL-HLT, 2016, pp. 260–270. doi: 10.18653/v1/N16-1030

G. Luo, X. Huang, C. Lin, and Z. Nie, “Joint entity recognition and disambiguation,” in Proc. EMNLP, 2015, pp. 879–888. doi: 10.18653/v1/D15-1109

A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI, Tech. Rep., 2019. [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf