Is it possible to implement a perfect guardrail on an AI model such that it will never ever spit out a certain piece of information? I feel like these models are so complex that you can always eventually find the perfect combination of words to circumvent any attempts to prevent prompt injection.
this post was submitted on 21 Aug 2024
45 points (100.0% liked)
Cybersecurity
5618 readers
166 users here now
c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.
THE RULES
Instance Rules
- Be respectful. Everyone should feel welcome here.
- No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
- No Ads / Spamming.
- No pornography.
Community Rules
- Idk, keep it semi-professional?
- Nothing illegal. We're all ethical here.
- Rules will be added/redefined as necessary.
If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.
Learn about hacking
Other security-related communities [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
Notable mention to [email protected]
founded 1 year ago
MODERATORS