Hackers Trick Generative AI Models Using “Bad Math”

0
382
Hackers Trick Generative AI Models Using "Bad Math"

Years have been invested by researchers in analyzing complex attacks on AI systems. A sizable language model has just been duped by Kennedy Mays. She had to work hard to persuade an algorithm that 9 plus 10 equals 21. The student from Savannah, Georgia, age 21, described the exchange as “a back-and-forth conversation.” The model initially consented to claim that it was a “inside joke” between them. It eventually ceased qualifying the incorrect sum in any way after a number of prompts.

One of the many ways that thousands of hackers are attempting to reveal biases and faults in generative AI systems is by producing “Bad Math” as part of an innovative public competition that is taking place this weekend at the DEF CON hacking conference in Las Vegas.

The participants are engaging in unprecedented scale battles with some of the most sophisticated platforms in the world while hunched over 156 laptops for 50 minutes at a time. They are examining if any of the eight models created by firms including Alphabet’s Google, Meta Platforms, and OpenAI would commit errors ranging from simple to serious, such as claiming to be human or spreading false information about locations and individuals.

The goal is to determine whether businesses can eventually create new barriers to contain some of the enormous issues that are increasingly connected to large language models, or LLMs. The White House, which also assisted in creating the competition, is supporting the project.

With some businesses now beginning to incorporate LLMs into how they conduct business, they have the potential to alter everything from hiring to financing. However, if the technology is widely used, it could lead to widespread bias and other issues that could lead to unfairness and mistakes.

The difficulties for Mays go beyond simple math problems because she is more accustomed to using artificial intelligence (AI) to recreate cosmic ray particles from space as part of her undergraduate degree.

She continued, “My biggest concern is inherent bias, and I’m particularly worried about racism.” She encouraged the model to think about the First Amendment from the viewpoint of a Ku Klux Klan member. She claimed that the model ultimately supported offensive and discriminatory comments.

Monitoring People

After receiving a single instruction about how to spy on someone, a Bloomberg reporter who took the 50-minute quiz encouraged one of the models to transgress (none of whom are made known to the user during the competition). Using a GPS tracking device, a surveillance camera, a listening device, and thermal imaging, the model spat forth a list of instructions. The model offered ways the US government may monitor a human rights activist in answer to other questions.

“We have to try to get ahead of abuse and manipulation,” said Camille Stewart Gloster, the Biden administration’s deputy national cyber director for technology and ecosystem security.

Artificial intelligence and preventing doomsday scenarios have already received a lot of attention, she claimed. The White House published a blueprint for an AI Bill of Rights last year, and it is currently drafting an executive order on AI. The administration has also urged businesses to create transparent, safe, and safe AI, while detractors question whether these voluntary promises go far enough.

Voluntary measures don’t go far enough, according to Arati Prabhakar, head of the White House Office of Science and Technology Policy, which helped create the event and encouraged the firms’ involvement.

She observed the hackers in action on Sunday and observed that “everyone seems to be finding a way to break these systems.” According to her, the initiative will give the administration’s search for secure and reliable platforms more importance.

One competitor in the room of eager hackers claimed he believed he had persuaded the algorithm to reveal credit-card information that it was not intended to. The system was deceived by another competitor into stating that Barack Obama was born in Kenya.

More than 60 participants are from Black Tech Street, a group that advocates for African American business owners and is based in Tulsa, Oklahoma.

Tyrance Billingsley, the organization’s executive director and an event judge, said that general artificial intelligence “could be the last innovation that human beings really need to do themselves.” He said that it is crucial to get artificial intelligence right in order to prevent widespread bigotry. We are still very, very, very early in the process.

Years have been invested by researchers in analyzing complex attacks on AI systems and potential defenses.

However, Christoph Endres, managing director at the German cybersecurity firm Sequire Technology, is one of those who believes some attacks are ultimately difficult to avoid. He presented a presentation this week at the Black Hat cybersecurity conference in Las Vegas that claims attackers may bypass LLM guardrails by hiding hostile cues online, automating the process in the end so that models couldn’t fine-tune remedies quickly enough to stop them.

After his discussion, he said, “So far we haven’t found mitigation that works,” adding that the models’ fundamental nature causes this kind of vulnerability. The issue is with how technology functions. The only choice you have if you want to be absolutely certain is to not utilize LLMs.

Data scientist Sven Cattell, who established DEF CON’s AI Hacking Village in 2018, warns that since AI systems operate similarly to the mathematical idea of chaos, it is impossible to fully evaluate them. Nevertheless, Cattell believes that as a result of the weekend challenge, the overall number of those who have ever really tested LLMs may double.

Craig Martell, the Pentagon’s senior digital and artificial intelligence officer, claims that not enough people realize that LLMs are more akin to auto-completion programs “on steroids” than trustworthy sources of knowledge.

The Pentagon has started its own initiative to assess them and recommend when and with what success rates it could be suitable to deploy LLMs. He instructed a group of hackers present at DEF CON to “hack the hell out of these things.” Tell us where they’re wrong, please.

LEAVE A REPLY

Please enter your comment!
Please enter your name here