Jul 16, 2023

Forbes: ChatGPT Burns Millions Every Day. Rain AI Aims to Make It One Million Times More Efficient

July 16, 2023, Repost from Rain AI Blog and Forbes based on the conversation of Rain AI CEO Gordon Wilson and Forbes contributor John Koetsier - Running ChatGPT costs millions of dollars a day, which is why OpenAI, the company behind the viral natural-language processing artificial intelligence has started ChatGPT Plus, a $20/month subscription plan. But our brains are a million times more efficient than the GPUs, CPUs, and memory that make up ChatGPT’s cloud hardware. And neuromorphic computing researchers are working hard to make the miracles that big server farms in the clouds can do today much simpler and cheaper, bringing them down to the small devices in our hands, our homes, our hospitals, and our workplaces.

One of the keys: modeling computing hardware after the computing wetware in human brains. “We have to give up immortality,” the CEO of Rain AI, Gordon Wilson, told me in a recent TechFirst podcast. “We have to give up the idea that, you know, we can save software, we can save the memory of the system after the hardware dies.”

Wilson is quoting Geoff Hinton, a cognitive psychologist and computer scientist, author or co-author of over 200 peer-reviewed publications, who just left Google Brain, and one of the “godfathers” of deep learning. At a recent NeurIPS machine learning conference, he talked about the need for a different kind of hardware substrate to form the foundation of AI that is both smarter and more efficient. It’s analog and neuromorphic — built with artificial neurons in a very human style — and it’s co-designed with software to form a tight blend of hardware and software that is massively more efficient than current AI hardware. Achieving this is not just a nice-to-have, or a vague theoretical dream.

Building a next-generation foundation for artificial intelligence is literally a multi-billion-dollar concern in the coming age of generative AI and search. One reason is that when training large language models (LLM) in the real world, there are two sets of costs to consider.

Training a large language model like that used by ChatGPT is expensive — likely in the tens of millions of dollars — but running it is the true expense. Running the model, responding to people’s questions and queries, uses what AI experts call “inference.”

That’s precisely what runs ChatGPT compute costs into the millions regularly. But it will cost Microsoft’s AI-enhanced Bing much more. And the costs for Google to respond to the competitive threat and duplicate this capability could be literally astronomical.

“Inference costs far exceed training costs when deploying a model at any reasonable scale,” say Dylan Patel and Afzal Ahmad in SemiAnalysis. “In fact, the costs to inference ChatGPT exceed the training costs on a weekly basis. If ChatGPT-like LLMs are deployed into search, that represents a direct transfer of $30 billion of Google’s profit into the hands of the picks and shovels of the computing industry.”

If you run the numbers like they have, the implications are staggering.

“Deploying current ChatGPT into every search done by Google would require 512,820 A100 HGX servers with a total of 4,102,568 A100 GPUs,” they write. “The total cost of these servers and networking exceeds $100 billion of Capex alone, of which Nvidia would receive a large portion.”

Assuming that’s not going to happen (likely a good assumption), Google has to find another way to approach similar capability. In fact, Microsoft, which has only released its new ChatGPT-enhanced Bing in very limited availability for very good reasons probably including hardware and cost, needs another way. Perhaps that other way is analogous to something we already have a lot of familiarity with.

According to Rain AI’s Wilson, we have to learn from the most efficient computing platform we currently know of: the human brain. Our brain is “a million times” more efficient than the AI technology that ChatGPT and large language models use, Wilson says. And it happens to come in a very flexible, convenient, and portable package.

“I always like to talk about scale and efficiency, right? The brain has achieved both,” Wilson says. “Typically, when we’re looking at compute platforms, we have to choose.” That means you can get the creativity that is obvious in ChatGPT or Stable Diffusion, which relies on data center compute to build AI-generated answers or art (trained, yes, on copyrighted images), or you can get something small and efficient enough to deploy and run on a mobile phone, but doesn’t have much intelligence.

That, Wilson says, is a trade-off that we don’t want to keep having to make. Which is why, he says, an artificial brain built with memristors that can “ultimately enable 100 billion-parameter models in a chip the size of a thumbnail,” is critical.

For reference, ChatGPT’s large language model is built on 175 billion parameters, and it’s one of the largest and most powerful yet built. ChatGPT 4, which rumors say is as big a leap from ChatGPT 3 as the third version was from its predecessors — will likely be much larger. But even the current version used 10,000 Nvidia GPUs just for training, with likely more to support actual queries, and costs about a penny an answer.

Running something of roughly similar scale on your finger is going to be multiple orders of magnitude cheaper. And if we can do that, it unlocks much smarter machines that generate that intelligence in much more local ways. And that is what Rain AI is aiming for - building an artificial brain on a chip for in-memory computing, which will make generative AI one million times more efficient.

Also on Forbes, Earlier conversation in March 2022

Rain AI: 1000X More Efficient Neural Networks: Building An Artificial Brain With 86 Billion Physical (Not Biological) Neurons

What if in our attempt to build artificial intelligence we don’t simulate neurons in code and mimic neural networks in Python, but instead build actual physical neurons connected by physical synapses in ways very similar to our own biological brains? And in so doing create neural networks that are 1000X more energy efficient than existing AI frameworks?

That’s precisely what Rain Neuromorphics is trying to do: build a non-biological yet very human-style artificial brain.Which at one and the same time uses much less energy and is much faster at learning than existing AI projects. And that learns, in short, kind of like we meatspace humans do. Plus, that is built with analog chips, not digital.

“We have kind of two missions that are very complimentary: One of them is to build a brain and the other one is to actually understand it,” Gordon Wilson, the soft-spoken but deep-thinking CEO of Rain Neuromorphics told me in a recent TechFirst podcast. “Ultimately, we see these as kind of like Lego pieces that due to their low-power footprint, we’ll be able to concatenate together using things like chiplet integration, advanced packaging, and ultimately scale out these systems to be brain scale — 86 billion neurons, 500 trillion synapses — and low-power enough that they can exist in autonomous devices.”

Wilson seems to be in the habit of very quietly and unassumingly saying things that are essentially completely mind-blowing and world-altering. So quietly you almost miss the gargantuan scale of the scheme.

Wilson and co-founders Jack Kendall and Juan Nino started four years ago with a small seed round. Late last year the team taped together a demonstration chip that proves out at least some of their theories about building brain-analog hardware for artificial intelligence workloads via a completely analog chip. And just a month ago the team was rewarded with a $25 million funding round to finish that design, engineer it to be manufacturable, and bring it to market.

That’s precisely what Rain Neuromorphics is trying to do: build a non-biological yet very human-style artificial brain. Which at one and the same time uses much less energy and is much faster at learning than existing AI projects. And that learns, in short, kind of like we meatspace humans do. Plus, that is built with analog chips, not digital.

One of the investors? Artificial intelligence heavyweight and Open AI CEO Sam Altman. (Note: Sam Altman backed Rain AI after FoundersX invested in its seed round).

Key to the project is the fact that Rain Neuromorphics is building an analog chip. This is very different than 99.9% of the computer chips on the market that reduce reality as they see it to binary: on or off, zeroes or ones. Those chips have to model the facts and relationships and verbs of computer programs with very precise digital math. Analog chips, on the other hand, represent reality in a very natural way.

“Digital chips are ... built on the very bottom on zeros and ones, on this Boolean logic of on or off, and all of the other logic is then constructed on top of that,” explains Wilson. “When you zoom down to the bottom of an analog chip, you don’t have zeros or ones, you have gradients of information. You have voltages and currents and resistances. You have physical quantities you are measuring, that represent the mathematical operations you’re performing, and you’re exploiting the relationship between those physical quantities to then perform these very complex neural operations.”

How does that work? By making physics do the work of computation for us, rather than brute-forcing it through a reality-screen of ones and zeroes. So when you’re building out a neural network and modeling it on how the human brain so incredibly efficiently learns, stores data, and executes decisions, you are more measuring conclusions than arriving at them, using the artificial neurons and synapses that you’ve built.

“In an analog chip ... we have the activations of the neurons represented by voltages,” says Wilson. “We have the weights of the synapses represented by resistances, which are held in components called memristors. And when that voltage passes across that resistance, you have a natural relationship between voltage and resistance that’s multiplicative. To receive a current, you read out a current and that’s your output. So an analog chip works by kind of first understanding these physical relationships between electrical quantities and exploiting those to do the math — to make the physics do the math for us.”

Which, to me, sounds both unimaginably complex and sublimely simple. Sort of like, perhaps, like our brains. Building chips with analogs of biological neurons and dendrites and neural networks like our brains is also key to the massive efficiency gains Rain Neuromorphics is claiming: 1,000 times more efficient than existing digital chips from companies like Nvidia.

That 1000X improvement comes from two places: a 10X required power reduction, and a 100X improvement in speed. If achieved, together the two combine for provide similar results to digital hardware with three orders of magnitude lower energy requirements.

Energy use matters for different reasons in different contexts. In server farms, more energy has cost implications as well as heat implications (and additional cooling requirement implications). In mobile or edge applications, energy might be scarce or challenging to deliver, making an energy-efficient application much more interesting than a power-hungry beast of a chip.

“I think that will be the first step for kind of low-power inference and devices, but we don’t want devices just to be pre-programmed and just do what they do in the world,” says Wilson. “We want devices to learn on their own. We want devices to have an adaptive brain that’s continuously learning from a changing environment and from a changing self.”

Analog chips can achieve much faster speed than digital chips because the computation, again, is done essentially for you: input to output at “wire speed,” as Wilson says. Partially analog chips achieve some of this for specific operations, but still incur speeding tickets for translations to and from digital states.

The challenge for fully analog chips has been of course that along with that blazing speed you also get extreme specificity. Where a digital chip can do “anything,” an analog chip only does what it’s designed for.

Rain Neuromorphics, in essence, is building something of a general purpose analog chip because they are building an analog of the human brain, connecting individual neurons with synapses according to a small-world network pattern that ensures that neurons have both short and long-distance connections that create extremely efficient and effective meshes of connectivity (think six degrees of Kevin Bacon).

And the chips then will essentially teach themselves how to do the various jobs they become tasked with, much like we learn both as children and adults ... and often with only one or two examples of training data.

“The brain trains and learns with both very few examples,” Wilson says. “We learn with one example or two examples, one-shot learning, two-shot learning, and we can generalize extraordinarily well. So learning/training happens very, very efficiently.”

There’s plenty of work to do, and plenty of competitors. Nvidia is a major one, and IBM is also building neuromorphic chips: the Loihi chip that I wrote about early in 2021 that Intel is using to build better drones and real-world navigation systems.

Rain Neuromorphics hopes to go to market in 2025. And, eventually, connect enough of its chips together — 86 billion neurons, 500 trillion synapses — to enable an artificial brain. Which might just enable us to achieve a level of artificial intelligence that approaches general AI.

Forbes: ChatGPT Burns Millions Every Day. Rain AI Aims to Make It One Million Times More Efficient

Recent Posts