Skip to content

How a stubborn computer scientist accidentally launched the deep learning boom

    That's why Nvidia announced the CUDA platform in 2006. CUDA allows programmers to write “kernels,” short programs designed to run on a single execution unit. Kernels allow a large computing task to be broken down into bite-sized chunks that can be processed in parallel. This allows certain types of calculations to be performed much faster than with a CPU alone.

    But there was little interest in CUDA when it was first introduced, Steven Witt wrote in The New Yorker last year:

    When CUDA was released in late 2006, Wall Street reacted with consternation. Huang brought supercomputing to the masses, but the masses had shown no indication that they wanted such a thing.

    “They spent a fortune on this new chip architecture,” says Ben Gilbert, co-host of “Acquired,” a popular Silicon Valley podcast. “They spent many billions on an obscure corner of academic and scientific computing, which wasn't a big market at the time – certainly less than the billions they were pouring in.”

    Huang argued that CUDA's simple existence would expand the supercomputing sector. This view was not widely supported and by the end of 2008, Nvidia's share price had fallen by seventy percent…

    CUDA downloads peaked in 2009 but then declined for three years. Board members worried that Nvidia's low stock price would make it a target for corporate raiders.

    Huang wasn't thinking specifically about AI or neural networks when he created the CUDA platform. But it turned out that Hinton's backpropagation algorithm could easily be broken down into bite-sized chunks. So training neural networks turned out to be a great app for CUDA.

    According to Witt, Hinton quickly recognized the potential of CUDA:

    In 2009, Hinton's research group used Nvidia's CUDA platform to train a neural network to recognize human speech. He was surprised by the quality of the results, which he presented at a conference later that year. He then contacted Nvidia. “I sent an email saying, 'Look, I just told a thousand machine learning researchers to go buy Nvidia cards. Can you send me one for free?' Hinton told me. “They said no.”

    Despite the criticism, Hinton and his students, Alex Krizhevsky and Ilya Sutskever, purchased a pair of Nvidia GTX 580 GPUs for the AlexNet project. Each GPU had 512 execution units, allowing Krizhevsky and Sutskever to train a neural network hundreds of times faster than would be possible with a CPU. This speed allowed them to train a larger model – and train it on many more training images. And they would need all that extra computing power to handle the huge ImageNet dataset.