🇹🇷 Mizan Turkish Evaluation Leaderboard
Performance comparison for Turkish embedding models
10 | KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5 | 65.42 | 50.63 | 12186 | 131072 | 1700000000 | T5ForConditionalGeneration |
Tokenizer Quality Visualizations
Interactive bubble plots showing tokenizer quality metrics vs model performance. Bubble size and color represent Turkish Token Count. Hover for details, zoom, and pan.
How to Use:
- Search: Use the search box to find specific models
- Color Coding: Scores are color-coded from red (low) to green (high)
- Sorting: Click on column headers to sort
- Rankings: Models ranked by MTEB Score
- Toggle Columns: Use the checkboxes above to show/hide additional metrics
- Filter by Model Type: Use the radio buttons to filter models by their type
Submit Model for Evaluation
Submit your Turkish embedding model for evaluation on the Mizan benchmark. Authentication with Hugging Face is required to submit evaluations.
Evaluation Process:
- Sign In: First, sign in with your Hugging Face account using the button above
- Submit Request: Fill out the form with your model details and email
- Admin Review: Your request will be reviewed by administrators
- Evaluation: If approved, your model will be evaluated on Mizan benchmark
- Results: You'll receive email notifications and results will appear on the leaderboard
Important Notes:
- Authentication Required: You must be logged in with Hugging Face to submit evaluations
- You'll receive email updates about your request status
- Make sure your model is publicly available on HuggingFace
- Valid email address is required for receiving results
MTEB Turkish + Turkish Legal Dataset Overview
MTEB Turkish Task Details
PairClassification | Historical Turkish document retrieval | Knowledge QA | ~1.19K |
Retrieval | Turkish FAQ retrieval task | FAQ/QA | ~145K | |
Retrieval | Turkish question answering retrieval | QA | ~1.19K | |
Retrieval | Historical Turkish document retrieval | Historical | ~1.33K | |
Retrieval | Multilingual knowledge QA retrieval | Knowledge QA | ~10K | |
Classification | Intent classification for Turkish | Intent | ~5K | |
Classification | Scenario classification for Turkish | Scenario | ~5K | |
Classification | Multilingual sentiment classification | Sentiment | 211 | |
Classification | SIB200 language identification | Language ID | ~899 | |
Classification | Turkish movie review sentiment | Movies | ~2.64K | |
Classification | Turkish product review sentiment | Products | 800 | |
Clustering | SIB200 clustering task | Language ID | 99 | |
PairClassification | Turkish natural language inference | NLI | ~7.5K | |
PairClassification | Enhanced Turkish NLI task | NLI | ~5.01K | |
STS | Turkish semantic textual similarity | STS | ~208 |
Turkish Legal Tasks
Turkish Legal Task Details
Regulation | Turkish Court of Cassation caselaw retrieval | Regulation | ~1.39K |
Contracts | Turkish legal question answering retrieval | Contracts | 272 | |
Regulation | Turkish legal tax rulings retrieval | Regulation | ~120K | |
Case Law | Turkish Court of Cassation caselaw retrieval | Caselaw | ~1.39K |
Task Distribution:
Turkish Tasks (14):
- Classification: 6 tasks (sentiment, intent, scenario, language identification)
- Retrieval: 4 tasks (FAQ, QA, historical documents, knowledge QA)
- Pair Classification: 2 tasks (natural language inference)
- Clustering: 1 task (language clustering)
- STS: 1 task (semantic textual similarity)
Turkish Legal Tasks (3):
- Contracts: 1 task (Turkish legal QA retrieval)
- Regulation: 1 task (Turkish tax rulings retrieval)
- Caselaw: 1 task (Turkish Court of Cassation case law retrieval)
Total: 17 tasks across 8 categories
Dataset Statistics Summary
Avg. Tokens per Sample | 8 categories | Classification, Retrieval, STS, NLI, Clustering |
Total Tasks | 17 tasks | Comprehensive evaluation: Turkish NLP + Legal |
Turkish Tasks | 14 tasks | Classification, Retrieval, STS, NLI, Clustering |
Legal Tasks | 3 tasks | Contracts, Regulation, Caselaw |
Task Categories | 8 categories | Turkish: 5 types, Legal: 3 types |
Languages | Turkish | Turkish-focused |
Avg. Tokens per Sample | ~150 tokens | Varies by task type and domain |
Metrics Explanation:
Task Categories:
- MTEB Score: Average performance by task categories (refers to Mean (TaskType))
- Mean (Task): Average performance across all individual tasks
- Classification: Performance on Turkish classification tasks
- Clustering: Performance on Turkish clustering tasks
- Pair Classification: Performance on pair classification tasks (like NLI)
- Retrieval: Performance on Turkish information retrieval tasks
- STS: Performance on Semantic Textual Similarity tasks
Turkish Legal Categories:
- Contracts: Performance on Turkish legal contract analysis tasks
- Regulation: Performance on Turkish legal regulation analysis tasks
- Caselaw: Performance on Turkish Court of Cassation case law retrieval tasks
Tokenizer Quality Metrics:
- Unique Token Count: Number of unique tokens generated by the tokenizer on Turkish MMLU dataset
- Turkish Token Count: How many unique tokens are valid Turkish words/morphemes
- Turkish Token %: Percentage of unique tokens that are linguistically valid Turkish
- Pure Token Count: How many unique tokens are morphologically pure (root words)
- Pure Token %: Percentage of unique tokens that are root words without suffixes
Model Information:
- Parameters: Number of model parameters
- Embed Dim: Embedding dimension size
- Max Seq Length: Maximum sequence length the model can process
- Vocab Size: Size of the model's vocabulary
- Model Architecture: The underlying model architecture
- Tokenizer Type: The tokenizer implementation used
About Mizan:
This leaderboard presents results from the Mizan benchmark, which evaluates embedding models on Turkish language tasks across multiple domains including:
- Text classification and sentiment analysis
- Information retrieval and search
- Semantic textual similarity
- Text clustering and pair classification
- Turkish Legal: Contract analysis, regulation, and case law retrieval
Submit Your Model:
Use the Submit tab to submit your Turkish embedding model for evaluation. Your request will be reviewed by administrators and you'll receive email notifications about the progress.
Contact:
For any questions or feedback, please contact info@newmind.ai
Links:
- GitHub: embeddings-benchmark/mteb v1.38.51 - Mizan is currently based on MTEB v1.38.51 (MTEB v2.0.0 support coming soon)
- Github: malibayram/tokenizer_benchmark - Tokenizer evaluation is done with code from this repository, developed by Mehmet Ali Bayram, which utilizes ITU NLP tools for Turkish linguistic analysis.