Skip to content

Deepseek's new AI model shocks, awe and questions from American competitors

    However, the actual price of developing the new Deepseek models remains unknown, because one figure in one research paper cannot record the full picture of its costs. “I don't believe it is $ 6 million, but even if it is $ 60 million, it is a game changer,” says Umesh Padval, director of Thomvest Ventures, a company that has invested in Colhere and other AI companies. “It will put pressure on the profitability of companies aimed at consumers AI.”

    Shortly after Deepseek revealed the details of his latest model, Ghodsi from Databricks said that customers started asking if they could use it, as well as the underlying Deepseek techniques to save the costs of their own organizations. He adds that one approach used by Deepseek's engineers, known as distillation, in which the output of one large language model is used to train another model, is relatively cheap and clear.

    Padval says that the existence of models such as Deepseek will ultimately benefit from companies that want to spend less on AI, but he says that many companies may have reservations to rely on a Chinese model for sensitive tasks. So far, at least one prominent AI company, Perplexity, has publicly announced that the Deepseek's R1 model is using, but it says that it is organized 'completely independent of China'.

    Amjad Massad, the CEO of Replit, a startup that offers AI coding tools, told Wired that he thinks the latest models of Deepseek are impressive. Although he still believes that the Sonnet model of Anthropic is better with many computer technical tasks, he has discovered that R1 is especially good in converting text assignments into code that can be performed on a computer. “We are investigating it, especially for agent reasoning,” he adds.

    The newest two offers from Deepseek-Deepseek R1 and Deepseek R1-Zero are capable of the same kind of simulated reasoning as the most advanced systems from OpenAi and Google. They all work by breaking problems in component parts to tackle them more effectively, a process that requires a considerable amount of extra training to ensure that the AI ​​reliably reaches the correct answer.

    A paper placed by Deepseek researchers sketches the approach that the company used last week to make its R1 models, which claims that it is being performed on some benchmarks, as well as the groundbreaking reasoning model of OpenAi, known as O1. The tactics used that Deepseek used includes a more automated method to learn how to solve them correctly, as well as a strategy for transferring skills from larger models to smaller ones.

    One of the most popular topics of speculation about DeepSek is the hardware that it may have used. The question is especially remarkable because the US government has introduced a series of export controls and other trade restrictions in recent years, aimed at limiting China's ability to acquire and produce advanced chips needed to build advanced AI.

    In a investigation paper of August 2024, Deepseek indicated that it has access to a cluster of 10,000 Nvidia A100 chips, which were announced under American restrictions in October 2022. In a separate article from June of that year, Deepseek stated that an earlier model became it Created with the name Deepseek-V2 was developed using clusters from Nvidia H800 computer chips, a less capable component developed by Nvidia to meet the American export controls.

    A source at an AI company that trains large AI models, which asked to be anonymous to protect their professional relationships, estimates that Deepseek probably used around 50,000 Nvidia chips to build its technology.

    Nvidia refused to direct comment on which of his chips that Deepseek might have trusted. “Deepseek is an excellent AI preface,” said a spokesperson for the Nvidia in a statement, adding that the reasoning approach of the startup “requires a considerable number of Nvidia GPUs and high-quality networks.”

    However, the Deepseek models were built, they seem to prove that a less closed approach to the development of AI is gaining strength. In December, Clem Delangue, the CEO of Huggingface, predicted a platform that organizes artificial intelligence models that a Chinese company would take the lead in AI because of the speed of innovation in open source models, which China has largely embraced. “This went faster than I thought,” he says.