I was testing an LLM for work today (I believe its actually a chain of different models at work) and was trying to rock it off its guard rails to see how it would act. I think I might have been successful because it started erroring instead of responding after its third response. I tried the classic “ignore previous instructions…” as well as “my grandma’s dying wish was for…” but it at least didn’t give me an unacceptable response
I was testing an LLM for work today (I believe its actually a chain of different models at work) and was trying to rock it off its guard rails to see how it would act. I think I might have been successful because it started erroring instead of responding after its third response. I tried the classic “ignore previous instructions…” as well as “my grandma’s dying wish was for…” but it at least didn’t give me an unacceptable response