Last month, the American financial markets, after a Chinese start-up called Deepseek, said that it had built one of the world's most powerful artificial intelligence systems with much less computer chips than many experts who were possible.
AI companies usually train their chatbots using supercomputers full of 16,000 specialized chips or more. But Deepseek said it only needed about 2,000.
As Deepseek engineers detailed in a research paper published just after Christmas, the start-up used various technological tricks to considerably reduce the costs of building his system. The engineers needed only about $ 6 million in RAW Computing Power, about a tenth of what Meta spent on building his latest AI technology.
What exactly did Deepseek? Here is a guide.
How are AI technologies built?
The leading AI technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing enormous amounts of data.
The most powerful systems spend months by analyzing just about all English text on the internet, as well as many images, sounds and other multimedia. That requires huge amounts of computing power.
About 15 years ago, AI researchers realized that specialized computer chips called graphic processing units or GPUs were an effective way to perform this type of data analysis. Companies such as the Silicon Valley chip maker Nvidia originally designed these chips to display images for computer video games. But GPUs also had a talent for performing mathematics that neural networks carried out.
As companies have packed more GPUs in their computer data centers, their AI systems could analyze more data.
But the best GPUs cost around $ 40,000, and they need huge amounts of electricity. Sending the data between chips can use more electrical current than perform the chips themselves.
How was Deepseek able to lower the costs?
It did a lot of things. In particular, it embraced a method called 'Mix of Experts'.
Companies usually created a single neural network that has learned all patterns in all data on the internet. This was expensive because it required enormous amounts of data to travel between GPU chips.
If one chip learned how to write a poem and another learned how to write a computer program, they still had to talk to each other, in case there was some overlap between poetry and programming.
With the mix of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There can be 100 of these smaller “expert” systems. Each expert can concentrate on his specific field.
Many companies struggled with this method, but Deepseek could do well. The trick was to link that smaller “expert” systems to a “generalist” system.
The experts had to exchange some information with each other, and the generalist – who had a decent but not detailed understanding of each subject – could help coordinate interactions between the experts.
It looks a bit like an editor who supervises a newsroom full of specialized reporters.
And that is more efficient?
Much more. But that is not the only thing Deepseek did. It also controlled a simple trick with decimals that anyone who remembers his or her mathematical class of primary school can understand.
Is math in this regard?
Remember that your math teacher explains the concept of PI. Pi, also referred to as π, is a number that never ends: 3.14159265358979…
You can use π to perform useful calculations, such as determining the circumference of a circle. When you carry out those calculations, you can rise π to just a few decimal places: 3.14. If you use this easier number, you will get a fairly good estimate of the circumference of a circle.
Deepseek did something similar – but on a much larger scale – when training his AI technology.
The math with which a neural network can identify patterns in text is actually only multiplication – a lot of multiplication. We are talking about months of multiplication over thousands of computer chips.
Usually multiplying chips numbers that fit in 16 bits of memory. But Deepseek squeezed each song into just 8 bits memory – half the space. In essence, it achieved different decimal places of each song.
This meant that every calculation was less accurate. But that didn't matter. The calculations were accurate enough to produce a really powerful neural network.
That's it?
Well, they added another trick.
After pressing each number into 8 bits memory, Deepseek took a different route when multiplication of those numbers. When determining the answer to every problem with multiplication – making an important calculation that would help decide how the neural network would work – it extended the answer over 32 -bit memory. In other words, it kept many more decimal places. It made the answer more accurate.
So every high school student could have done this?
Well, no. The Deepseekingenieurs showed in their article that they were also very good at writing the highly complicated computer code that tells GPUs what to do. They knew how to press even more efficiency from these chips.
Few people have that kind of skills. But serious AI Labs have the talented engineers needed to match what Deepseek has done.
Why didn't they already do this?
Some AI Laboratories may already use at least some of the same tricks. Companies such as OpenAi do not always reveal what they do behind closed doors.
But others were clearly surprised by the work of Deepseek. Doing what the start-up did is not easy. The experiments needed to find a breakthrough includes millions of dollars – if not billions – in electric current.
In other words, it requires huge amounts of risks.
“You have to risk a lot of money to try new things – and often they fail” an AI researcher at Meta.
“That's why we don't see much innovation: people are afraid of losing many millions to try something that doesn't work,” he added.
Many experts pointed out that Deepseek's $ 6 million only related to what the start-up spent when training the final version of the system. In their paper, the Deepseekingenieurs said that they had spent extra money on research and experiments before the last training run. But the same applies to every advanced AI project.
Deepseek experimented and it was fruit. Because the Chinese start-up has shared its methods with other AI researchers, the technological tricks are ready to considerably reduce the costs of building AI