Skip to content

How the Chinese AI Startup Deepseek has made a model that Rivals OpenAI

    Nowadays Deepseek is one of the meadows in China that is not dependent on tech giants such as Baidu, Alibaba or Bytedance.

    A young group of geniuses who want to prove themselves

    According to Liang, when he put together the Deepseek research team, he was not looking for experienced engineers to build a product on consumers. Instead, he focused on PhD students of the Chinese top universities, including Beijing University and Tsinghua University, who wanted to prove. Many were published in top journals and won prizes at international academic conferences, but there was no industry experience, according to the Chinese technical publication QBitai.

    “Our technical core positions are usually filled by people who have graduated this year or in the past one or two years,” said 36kr in 2023. The recruitment strategy helped in creating unorthodox research projects. It is a grim other way to operate from established internet companies in China, where teams often compete for resources. (A recent example: Bytedance accused a former intern – a prestigious winner of the Academic Award, no less – of sabotaging the work of his colleagues to collect more computer sources for his team.)

    Liang said that students are better suited for research with a high investment, low-profit research. “Most people, when they are young, can fully devote themselves to a mission without utilitarian considerations,” he explained. His pitch for potential recruitments is that Deepseek was made to “resolve the most difficult questions in the world.”

    The fact that these young researchers are almost completely trained in China contributes to their drive, experts say. “This younger generation also embodies a feeling of patriotism, especially because they navigate through the US and navigate points in critical hardware and software technologies,” explains Zhang. “Their determination to overcome these barriers reflects not only the personal ambition, but also a broader dedication to promote the position of China as a global innovation leader.”

    Innovation born of a crisis

    In October 2022, the US government began to put together export controls that seriously limited Chinese AI companies to gain access to advanced chips such as Nvidia's H100. The movement was a problem for Deepseek. The company had started with a stock of 10,000 H100s, but it needed more to compete with companies such as OpenAi and Meta. “The problem with which we are confronted has never been financing, but the export check on advanced chips,” Liangkr 36Kr said in a second interview in 2024.

    Deepseek had to come up with more efficient methods to train his models. “They have optimized their model architecture using a series of technical tricks communication schemes between chips, reducing the size of fields to save memory and innovative use of the approach to the mix-of-models,” says Wendy Chang, a software -Engineer who policy is an analyst at the Mercator Institute for China Studies. “Many of these approaches are not new ideas, but it is successful to combine an advanced model is a remarkable achievement.”

    Deepseek has also made significant progress on multi-head latent attention (MLA) and mixture of experts, two technical designs that make Deepseek models more cost-effective by requiring fewer computer sources to train. The newest model of Deepseek is even so efficient that it requires a tenth of the computing power of the comparable Llama 3.1 model of Meta to train, according to the Research Institution Epoch AI.

    The Deepseek's willingness to share these innovations with the public has earned the considerable goodwill within the global AI research community. For many Chinese AI companies, developing open source models is the only way to catch up with their Western counterparts, because it attracts more users and contributors, which in turn helps the models grow. “They have now shown that advanced models can be built with less, although still a lot of money and that the current standards of model structure leave a lot of space for optimization,” says Chang. “We will certainly see many more attempts in this direction in the future.”

    The news could cause problems for the current US Export control elements that focus on creating bottlenecks of the computer sources. “Existing estimates of how many AI Computing Power has China, and what they can achieve with it can be increased,” says Chang.