🇹🇷 Mizan Turkish Evaluation Leaderboard

Performance comparison for Turkish embedding models


10	KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5	65.42	50.63	12186	131072	1700000000	T5ForConditionalGeneration


1	google/embeddinggemma-300m	65.42	50.63	3980	2048	307M	Gemma3TextModel
2	newmindai/bge-m3-stsb	63.53	49.97	4884	8194	567M	XLMRobertaModel
3	BAAI/bge-m3	62.87	51.16	4884	8194	567M	XLMRobertaModel
4	Lajavaness/bilingual-embedding-large	62.47	47.36	4884	514	559M	BilingualModel
5	newmindai/TurkEmbed4STS	62.42	47.23	4884	8192	305M	NewModel
6	intfloat/multilingual-e5-large	61.51	46.95	4884	514	559M	XLMRobertaModel
7	trmteb/turkish-embedding-model-fine-tuned	60.74	41.47	4665	512	110M	BertModel
8	ytu-ce-cosmos/turkish-e5-large	60.36	48.20	4884	514	559M	XLMRobertaModel
9	Alibaba-NLP/gte-multilingual-base	60.12	46.49	4884	8192	305M	NewModel
10	nomic-ai/nomic-embed-text-v2-moe	59.54	50.75	4884	2048	475M	NomicBertModel
11	magibu/embeddingmagibu-200m	59.25	45.59	8515	8192	206M	Gemma3TextModel
12	sentence-transformers/paraphrase-multilingual-mpnet-base-v2	58.93	24.77	4884	514	278M	XLMRobertaModel
13	intfloat/multilingual-e5-large-instruct	58.85	46.69	4884	514	559M	XLMRobertaModel
14	newmindai/TurkEmbed4Retrieval	58.36	46.55	4884	512	305M	NewModel
15	newmindai/Mursit-Embed-Qwen3-1.7B-TR	56.84	34.76	2865	40960	1.7B	Qwen3ForCausalLM
16	newmindai/Mursit-Large-TR-Retrieval	56.43	46.42	8724	2048	403M	ModernBertModel
17	newmindai/modernbert-base-tr-uncased-allnli-stsb	56.32	31.90	6077	8192	134M	ModernBertModel
18	newmindai/Mursit-Base-TR-Retrieval	55.86	47.52	8724	1024	155M	ModernBertModel
19	emrecan/bert-base-turkish-cased-mean-nli-stsb-tr	54.33	38.83	7263	512	110M	BertModel
20	newmindai/TurkEmbed4STS-HD	54.25	27.08	4884	8192	305M	NewForTokenClassification
21	ibm-granite/granite-embedding-278m-multilingual	53.93	36.00	4884	514	278M	XLMRobertaModel
22	newmindai/Mursit-Embed-Qwen3-4B-TR	53.65	37.00	2865	40960	4B	Qwen3ForCausalLM
23	nvidia/llama-embed-nemotron-8b	53.52	32.55	3507	131072	8B	LlamaBidirectionalModel
24	KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5	52.84	32.62	2294	131072	494M	Qwen2Model
25	ibm-granite/granite-embedding-107m-multilingual	51.07	30.82	4884	514	106M	XLMRobertaModel
26	sentence-transformers/LaBSE	50.73	30.92	5800	512	471M	BertModel
27	sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2	50.30	19.48	4884	512	117M	BertModel
28	numind/NuSentiment-multilingual	50.17	28.28	4884	514	278M	XLMRobertaModel
29	dbmdz/bert-base-turkish-uncased	46.44	24.86	5876	512	110M	BertModel
30	minishlab/potion-multilingual-128M	45.96	33.51	5986		128M	StaticModel
31	ytu-ce-cosmos/turkish-large-bert-cased	45.30	19.12	8670	1024	337M	BertForPreTraining
32	dbmdz/bert-base-turkish-cased	45.17	24.41	7263	512	110M	BertModel
33	newmindai/TurkEmbed4STS-Static	43.06	32.00	4075		64M	StaticModel
34	KocLab-Bilkent/BERTurk-Legal	42.02	32.63	8228	512	184M	BertForMaskedLM
35	newmindai/Mursit-Large	41.75	23.71	8724	2048	403M	ModernBertForMaskedLM
36	nomic-ai/nomic-embed-text-v1	41.67	32.67	1277	8192	136M	NomicBertModel
37	ytu-ce-cosmos/turkish-base-bert-uncased	40.95	20.81	6062	512	110M	BertForPreTraining
38	nomic-ai/nomic-embed-text-v1.5	40.30	25.28	1277	8192	136M	NomicBertModel
39	newmindai/Mursit-Base	40.23	17.93	8724	1024	155M	ModernBertForMaskedLM
40	mixedbread-ai/mxbai-embed-large-v1	40.04	21.07	1277	512	335M	BertModel
41	jhu-clsp/mmBERT-base	39.65	12.15	5710	8192	306M	ModernBertForMaskedLM
42	boun-tabilab/TabiBERT	37.77	11.50	12186	8192	148M	ModernBertForMaskedLM
43	sentence-transformers/all-MiniLM-L12-v2	33.19	14.76	1277	512	33M	BertModel
44	sentence-transformers/multi-qa-MiniLM-L6-cos-v1	32.34	15.45	1277	512	22M	BertModel
45	boun-tabi-LMG/TURNA	31.62	16.11	7923	1024	495M	T5ForConditionalGeneration
46	sentence-transformers/all-mpnet-base-v2	31.58	12.68	1277	514	109M	MPNetForMaskedLM
47	sentence-transformers/all-MiniLM-L6-v2	30.22	12.87	1277	512	22M	BertModel
48	minishlab/potion-base-8M	30.14	23.24	1277		7M	StaticModel
49	sentence-transformers/paraphrase-MiniLM-L6-v2	28.88	9.86	1277	512	22M	BertModel
50	answerdotai/ModernBERT-base	23.80	2.99	2188	8192	149M	ModernBertForMaskedLM
51	answerdotai/ModernBERT-large	23.74	2.44	2188	8192	394M	ModernBertForMaskedLM
52	google-bert/bert-base-uncased	23.50	3.28	1277	512	109M	BertForMaskedLM

Tokenizer Quality Visualizations

Interactive bubble plots showing tokenizer quality metrics vs model performance. Bubble size and color represent Turkish Token Count. Hover for details, zoom, and pan.

How to Use:

Search: Use the search box to find specific models
Color Coding: Scores are color-coded from red (low) to green (high)
Sorting: Click on column headers to sort
Rankings: Models ranked by MTEB Score
Toggle Columns: Use the checkboxes above to show/hide additional metrics
Filter by Model Type: Use the radio buttons to filter models by their type

MTEB Turkish + Turkish Legal Dataset Overview

MTEB Turkish Task Details

MTEB Turkish Task Details

MultilingualSentimentClassification	PairClassification	Historical Turkish document retrieval	Knowledge QA	~1.19K

MTEB Turkish Task Details

WebFAQRetrieval	Retrieval	Turkish FAQ retrieval task	FAQ/QA	~145K
XQuADRetrieval	Retrieval	Turkish question answering retrieval	QA	~1.19K
TurHistQuadRetrieval	Retrieval	Historical Turkish document retrieval	Historical	~1.33K
MKQARetrieval	Retrieval	Multilingual knowledge QA retrieval	Knowledge QA	~10K
MassiveIntentClassification	Classification	Intent classification for Turkish	Intent	~5K
MassiveScenarioClassification	Classification	Scenario classification for Turkish	Scenario	~5K
MultilingualSentimentClassification	Classification	Multilingual sentiment classification	Sentiment	211
SIB200Classification	Classification	SIB200 language identification	Language ID	~899
TurkishMovieSentimentClassification	Classification	Turkish movie review sentiment	Movies	~2.64K
TurkishProductSentimentClassification	Classification	Turkish product review sentiment	Products	800
SIB200ClusteringS2S	Clustering	SIB200 clustering task	Language ID	99
XNLI	PairClassification	Turkish natural language inference	NLI	~7.5K
XNLIV2	PairClassification	Enhanced Turkish NLI task	NLI	~5.01K
STS22.v2	STS	Turkish semantic textual similarity	STS	~208

Turkish Legal Tasks

Turkish Legal Task Details

Turkish Legal Task Details

TurkishCourtOfCassation	Regulation	Turkish Court of Cassation caselaw retrieval	Regulation	~1.39K

Turkish Legal Task Details

TurkishLegalQA	Contracts	Turkish legal question answering retrieval	Contracts	272
TurkishTaxRulings	Regulation	Turkish legal tax rulings retrieval	Regulation	~120K
TurkishCourtOfCassation	Case Law	Turkish Court of Cassation caselaw retrieval	Caselaw	~1.39K

Task Distribution:

Turkish Tasks (14):

Classification: 6 tasks (sentiment, intent, scenario, language identification)
Retrieval: 4 tasks (FAQ, QA, historical documents, knowledge QA)
Pair Classification: 2 tasks (natural language inference)
Clustering: 1 task (language clustering)
STS: 1 task (semantic textual similarity)

Turkish Legal Tasks (3):

Contracts: 1 task (Turkish legal QA retrieval)
Regulation: 1 task (Turkish tax rulings retrieval)
Caselaw: 1 task (Turkish Court of Cassation case law retrieval)

Total: 17 tasks across 8 categories

Dataset Statistics Summary

Dataset Statistics Summary

Avg. Tokens per Sample	8 categories	Classification, Retrieval, STS, NLI, Clustering

Dataset Statistics Summary

Total Tasks	17 tasks	Comprehensive evaluation: Turkish NLP + Legal
Turkish Tasks	14 tasks	Classification, Retrieval, STS, NLI, Clustering
Legal Tasks	3 tasks	Contracts, Regulation, Caselaw
Task Categories	8 categories	Turkish: 5 types, Legal: 3 types
Languages	Turkish	Turkish-focused
Avg. Tokens per Sample	~150 tokens	Varies by task type and domain

Metrics Explanation:

Task Categories:

MTEB Score: Average performance by task categories (refers to Mean (TaskType))
Mean (Task): Average performance across all individual tasks
Classification: Performance on Turkish classification tasks
Clustering: Performance on Turkish clustering tasks
Pair Classification: Performance on pair classification tasks (like NLI)
Retrieval: Performance on Turkish information retrieval tasks
STS: Performance on Semantic Textual Similarity tasks

Turkish Legal Categories:

Contracts: Performance on Turkish legal contract analysis tasks
Regulation: Performance on Turkish legal regulation analysis tasks
Caselaw: Performance on Turkish Court of Cassation case law retrieval tasks

Tokenizer Quality Metrics:

Unique Token Count: Number of unique tokens generated by the tokenizer on Turkish MMLU dataset
Turkish Token Count: How many unique tokens are valid Turkish words/morphemes
Turkish Token %: Percentage of unique tokens that are linguistically valid Turkish
Pure Token Count: How many unique tokens are morphologically pure (root words)
Pure Token %: Percentage of unique tokens that are root words without suffixes

Model Information:

Parameters: Number of model parameters
Embed Dim: Embedding dimension size
Max Seq Length: Maximum sequence length the model can process
Vocab Size: Size of the model's vocabulary
Model Architecture: The underlying model architecture
Tokenizer Type: The tokenizer implementation used

About Mizan:

This leaderboard presents results from the Mizan benchmark, which evaluates embedding models on Turkish language tasks across multiple domains including:

Text classification and sentiment analysis
Information retrieval and search
Semantic textual similarity
Text clustering and pair classification
Turkish Legal: Contract analysis, regulation, and case law retrieval

Submit Your Model:

Use the Submit tab to submit your Turkish embedding model for evaluation. Your request will be reviewed by administrators and you'll receive email notifications about the progress.

Contact:

For any questions or feedback, please contact info@newmind.ai