Toxic Comment Classification using Deep Learning
Project
← selected work

Toxic Comment Classification using Deep Learning

Multilingual toxic comment classifier fine-tuned on XLM-RoBERTa across 7 languages and 360k+ comments. Custom language-aware attention, 6 toxicity categories, AUC 0.92+ on main categories. Deployed to HuggingFace with Streamlit and FastAPI interfaces.

February 2025 - April 2025my part Machine Learning Engineerwith Nauman Pathan
PythonDeep LearningNLPXLM-RoBERTaPyTorch
Overview

A multilingual toxic comment classification system that can identify toxic content across 7 languages (English, Russian, Turkish, Spanish, French, Italian, Portuguese). The system uses language-aware transformers with custom attention mechanisms and advanced deep learning techniques to accurately classify comments into different toxicity categories.

Key Features
  • Language-aware transformer model with XLM-RoBERTa base
  • Support for 7 languages 360k+ comments
  • Classification across 6 toxicity categories
  • High performance with AUC scores above 0.92 for main categories
  • Efficient data processing with caching system
Stack
frontend
StreamlitGradio
backend
PythonFastAPI
database
Parquet
deployment
College GPU Server
ml
PyTorchXLM-RoBERTaTensorFlowONNX
Challenges & Solutions
  • Handling multilingual text processing efficiently
  • Implementing language-aware attention mechanisms
  • Optimizing model performance across different languages
  • Balancing memory usage with model complexity
Gallery