Open-source NLP toolkit for 24 Turkic languages
From Turkish to Sakha, Kazakh to Uyghur — tokenization, morphology, POS tagging, dependency parsing, NER, transliteration, embeddings, and machine translation in one pip install.
Why TurkicNLP? Over 200 million people speak a Turkic language, yet most lack basic NLP tools. TurkicNLP bridges this gap with a unified Python library covering 6 language branches — Oghuz, Kipchak, Karluk, Siberian, Oghur, and Arghu — plus historical languages like Ottoman Turkish and Old Turkic runic inscriptions.
pip install "turkicnlp[all]"
import turkicnlp
turkicnlp.download("kaz")
nlp = turkicnlp.Pipeline("kaz", processors=["tokenize", "pos", "lemma", "depparse"])
doc = nlp("Мен мектепке бардым")
| Library | turkic-nlp/turkicnlp |
| Code samples | turkic-nlp/turkic-nlp-code-samples |
| Paper | arXiv:2602.19174 |
| Website | turkic-nlp.github.io |
| Community & communication | TurkicNLP Discord |
| Datasets & models | 🤗 HuggingFace |
Turkic languages span from Turkey to Siberia, China to the Balkans
Maintained by Sherzod Hakimov · Contributions welcome