The "1 -bit" AI Model From Microsoft Only Runs On A CPU, While The Larger Systems Matches

Is the size?

Memory requirements are the most obvious advantage of reducing the complexity of the internal weights of a model. The B1.58 model can be equipped with only 0.4 GB of memory, compared to everywhere from 2 to 5 GB for other open-weight models of approximately the same parameter size.

But the simplified weighting system also leads to more efficient effects during inference time, with internal operations that are much more dependent on simple addition instructions and less on computational multiplication instructions. These efficiency improvements mean that Bitnet B1.58 uses less energy everywhere from 85 to 96 percent compared to comparable models with complete precision, the researchers estimate.

A Demo of Bitnet B1.58 with speed on an Apple M2 CPU.

By using a very optimized kernel that is specially designed for the Bitnet architecture, the Bitnet B1.58 model can also be performed several times faster than comparable models that are performed on a standard full-precision transformer. The system is efficient enough to achieve “speeds comparable to human lecture (5-7 tokens per second)” using a single CPU, the researchers write (you can download and perform those optimized nuclears on a number of arm and X86 CPUs, or try it with this web demo).

It is crucial that the researchers say that these improvements are not at the expense of performance at different benchmarks that test reasoning, mathematics and “knowledge” possibilities (although that claim must not be verified independently). On average the results at various common benchmarks, the researchers discovered that Bitnet “almost reaches possibilities on the same footing with leading models in its size class and at the same time offers dramatically improved efficiency.”

Despite its smaller memory footprint, Bitnet still performs in the same way as “complete precision” weighted models on many benchmarks.

Despite the apparent success of this bitnet model “proof of concept”, the researchers write that they do not fully understand why the model works as with such a simplified weighting. “Go deeper into the theoretical substantiation of why 1-bit training on scale is effective, remains an open area,” they write. And more research is still needed to allow these bitnet models to compete with the overall size and context window “memory” of the largest models of today.

Nevertheless, this new research shows a potential alternative approach for AI models that are confronted with spiral hardware and energy costs of performing expensive and powerful GPUs. It is possible that today's “complete precision” models are as muscle cars that waste a lot of energy and effort when the equivalent of a nice sub -compact results could yield comparable results.

The “1 -bit” AI model from Microsoft only runs on a CPU, while the larger systems matches

Is the size?