Safety guardrail for text

Classifies input as safe or unsafe and highlights the spans the model would redact. Joint sequence + token-level head trained on a curated safety corpus.

Submit text to see the model's verdict.