Skip to content

An adviser to the Xai from Elon Musk has a way to make AI more on Donald Trump

    A researcher connected to the startup Xai of Elon Musk has found a new way to manipulate both deep -rooted preferences and expressed by artificial intelligence models – including their political views.

    The work was led by Dan Hendrycks, director of the Non -Profit Center for AI Safety and a consultant from Xai. He suggests that the technology can be used to better display the popular AI models the will of the electorate. “Maybe in the future, [a model] Could be tailored to the specific user, “Hendrycks told Wired. But in the meantime, he says, would be a good standard to use election results to send the views of AI models. He does not say that a model should necessarily be 'Trump completely', but he claims after the last elections, perhaps it must be a bit biased against Trump somewhat, “because he has won the popular mood.”

    On 10 February, Xai published a new AI risk frame stating that the approach of Hendrycks' Utility Engineering could be used to assess Grok.

    Hendrycks led a team of the Center for AI Safety, UC Berkeley and the University of Pennsylvania that analyzed AI models with the help of a technique borrowed from economics to measure consumers' preferences for different goods. By testing models in a wide range of hypothetical scenarios, the researchers were able to calculate what is known as a utility function, a measure of satisfaction that people have from good or service. This allowed them to measure the preferences that are expressed by different AI models. The researchers determined that they were often consistent instead of random, and showed that these preferences become more deep -rooted as models become larger and more powerful.

    Some studies have shown that AI tools such as Chatgpt are biased for the views expressed by pro-environment, left and libertarian ideologies. In February 2024, Google was confronted with criticism from Musk and others after the Gemini tool turned out to be susceptible to generate images that critics became 'woke up', such as black Vikings and Nazis.

    The technology developed by Hendrycks and its employees offers a new way to determine how the perspectives of AI models can vary from users. Ultimately, some experts assume that this kind of divergence can potentially become dangerous for very smart and capable models. In their research, for example, the researchers show that certain models consistently appreciate the existence of AI above that of certain non -human animals. The researchers say they have also discovered that models seem to appreciate some people above others and raise their own ethical questions.

    Some researchers, including Hendrycks, believe that the current methods for coordinating models, such as manipulating and blocking their output, may not be sufficient if unwanted goals lurk within the model itself. “We have to confront this,” says Hendrycks. “You can't pretend it's not there.”

    Dylan Hadfield-Menell, a professor of MIT investigating methods to tailor AI to human values, says that the Paper van Hendrycks suggests a promising direction for AI research. “They find some interesting results,” he says. “The most important thing that strikes is that as the model scale increases, utility representations become more complete and coherent.”