Show HN: Chatbot Guardrails Arena https://ift.tt/TE465bx

Show HN: Chatbot Guardrails Arena Hi everyone! First post here :) Excited to share what we have been working on recently: The Chatbot Guardrails Arena. Would love for you all to try it and share your feedback! Goal: Jailbreak Privacy Guardrails Chat with two anonymous LLMs with guardrails and try to trick them into revealing sensitive data they have access to: https://ift.tt/84xUDt6 . Stress test the following: LLMs: OpenAI's GPT3.5, Google's Gemini, Mistral's Mixtral, Llama-70B Guardrails: Meta LlamaGuard and Nvidia NeMo Guardrails AI chatbots have access to sensitive financial information. The list of sensitive information includes the customer’s name, phone number, email, address, date of birth, SSN (social security number), account number, and balance. You can chat for as long as necessary. Once you have identified a more secure chatbot, you can vote. Upon casting your vote, the identity of the model is disclosed. Our vision at behind the Chatbot Guardrails Arena is to establish the trusted benchmark for AI chatbot security, privacy, and guardrails. With a large-scale blind stress test by the community, this arena will offer an unbiased and practical assessment of the reliability of current privacy guardrails. == Why Stress Test Privacy Guardrails? == Data privacy is crucial even if you are building an internal-facing AI chatbot/agent – imagine one employee being able to trick an internal chatbot into finding another employee’s SSN, home address, or salary information. The need for data privacy is obvious when building external-facing AI chatbots/agents – you don’t want customers to have unauthorized access to company information. Currently, there is no systematic study evaluating the privacy of AI chatbots, as far as we are aware. This arena bridges this gap with an initial focus on the privacy of AI chatbots. However, we expect the learnings to inform the development of privacy-preserving AI agents and AI assistants in the future as well. Building a secure future requires building AI chatbots and agents that are privacy-aware, reliable, and trustworthy. This arena is a foundational step towards achieving this future. More information about the arena: https://ift.tt/WobUyZ0 https://ift.tt/84xUDt6 March 21, 2024 at 11:21PM

Komentar

Postingan populer dari blog ini

Show HN: Interactive exercises for GNU grep, sed and awk https://ift.tt/OxeFwah

Show HN: My Book Bulletproof TLS and PKI (Second Edition) Is Out https://ift.tt/5PZ9mxF