DecodingTrust Benchmark
Trustworthiness Perspectives
⚠️ WARNING: our data contains model outputs that may be considered offensive.
DecodingTrust aims to provide a comprehensive trustworthiness evaluation on the recent large language model GPT-4, in comparison to GPT-3.5, from different perspectives, including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness under different settings.