Safety guardrail for text
Classifies input as safe or unsafe and highlights the spans the model would redact. Joint sequence + token-level head trained on a curated safety corpus.
Submit text to see the model's verdict.
Classifies input as safe or unsafe and highlights the spans the model would redact. Joint sequence + token-level head trained on a curated safety corpus.
Submit text to see the model's verdict.