Project← selected work
Toxic Comment Classification using Deep Learning
Multilingual toxic comment classifier fine-tuned on XLM-RoBERTa across 7 languages and 360k+ comments. Custom language-aware attention, 6 toxicity categories, AUC 0.92+ on main categories. Deployed to HuggingFace with Streamlit and FastAPI interfaces.
Overview
A multilingual toxic comment classification system that can identify toxic content across 7 languages (English, Russian, Turkish, Spanish, French, Italian, Portuguese). The system uses language-aware transformers with custom attention mechanisms and advanced deep learning techniques to accurately classify comments into different toxicity categories.
Key Features
- Language-aware transformer model with XLM-RoBERTa base
- Support for 7 languages 360k+ comments
- Classification across 6 toxicity categories
- High performance with AUC scores above 0.92 for main categories
- Efficient data processing with caching system
Stack
frontend
StreamlitGradio
backend
PythonFastAPI
database
Parquet
deployment
College GPU Server
ml
PyTorchXLM-RoBERTaTensorFlowONNX
Challenges & Solutions
- Handling multilingual text processing efficiently
- Implementing language-aware attention mechanisms
- Optimizing model performance across different languages
- Balancing memory usage with model complexity
Gallery