TurkicNLP

TurkicNLP

Open-source NLP toolkit for 24 Turkic languages
From Turkish to Sakha, Kazakh to Uyghur — tokenization, morphology, POS tagging, dependency parsing, NER, transliteration, embeddings, and machine translation in one pip install.

arXiv License 24 Languages 4 Script Families


Why TurkicNLP? Over 200 million people speak a Turkic language, yet most lack basic NLP tools. TurkicNLP bridges this gap with a unified Python library covering 6 language branches — Oghuz, Kipchak, Karluk, Siberian, Oghur, and Arghu — plus historical languages like Ottoman Turkish and Old Turkic runic inscriptions.

Highlights

Quick start

pip install "turkicnlp[all]"
import turkicnlp
turkicnlp.download("kaz")
nlp = turkicnlp.Pipeline("kaz", processors=["tokenize", "pos", "lemma", "depparse"])
doc = nlp("Мен мектепке бардым")

Explore

Library turkic-nlp/turkicnlp
Code samples turkic-nlp/turkic-nlp-code-samples
Paper arXiv:2602.19174
Website turkic-nlp.github.io
Community & communication TurkicNLP Discord
Datasets & models 🤗 HuggingFace

Turkic languages map
Turkic languages span from Turkey to Siberia, China to the Balkans


Maintained by Sherzod Hakimov · Contributions welcome