AI is the most disruptive technology of our lifetimes, and AI chips are the most disruptive infrastructure for AI. By that measure, the impact of what Graphcore is about to massively unleash in the world is beyond description. Here is how pushing the boundaries of Moore’s Law with IPUs works, and how it compares to today’s state of the art on the hardware and software level. Should incumbent Nvidia worry, and users rejoice?
If luck is another word for being at the right place at the right time, you could say we got lucky. Graphcore, the hottest name in AI chips, has been on our radar for a while now, and a discussion with Graphcore’s founders was planned well before the news about it broke out this week.
Graphcore, as you may have heard by now, just secured another $200 million of funding from BMW, Microsoft, and leading financial investors to deliver the world’s most advanced AI chip at scale. Names include the likes of Atomico, Merian Chrysalis, Investment Company Limited, Sofina, and Sequoia. As Graphcore CEO and Founder, Nigel Toon shared, Graphcore had to turn down investors for this round, including, originally, the iconic Sequoia fund.
Graphcore is now officially a unicorn, with a valuation of $1.7 billion. Graphcore’s partners such as Dell, the world’s largest server producer, Bosch, the world’s largest supplier of electronics for the automotive industry, and Samsung, the world’s largest consumer electronics company, have access to its chips already. So, here’s your chance to prepare for, and understand, the revolution you’re about to see unfolding in the not-so-distant future.
Learning how the brain works is one thing, modeling chips after it is another
Graphcore is based in Bristol, UK, and was founded by semiconductor industry veterans Nigel Toon, CEO, and Simon Knowles, CTO. Toon and Knowles were previously involved in companies such as Altera, Element14, and Icera that exited for combined value in the billions. Toon is positive they can, and will, disrupt the semiconductor industry more than ever before this time around, breaking what he sees as the near-monopoly of Nvidia.
Nvidia is the dominant player in AI workloads, with its GPU chips, and it keeps evolving. There are more players in the domain, but Toon believes it’s only Nvidia that has a clear, coherent strategy and an effective product in the marketplace. There are also players such as Google, with its TPU investing in AI chips, but Toon claims Graphcore has the leading edge and a fantastic opportunity to build an empire with its IPU (Intelligent Processor Unit) chip. He cites the success of ARM mobile processors versus incumbents of the time as an example.
In order to understand his confidence, and that of investors and partners, we need to understand what exactly Graphcore does and how that is different from the competition. Machine learning and AI are the most rapidly developing and disruptive technologies. Machine learning, which is at the core of what is called AI these days, is effectively very efficient pattern matching, based on a combination of appropriate algorithms (models) and data (training sets).
Some people go to the extreme of calling AI, essentially, matrix multiplication. While such reductionism is questionable, the fact remains that much of machine learning is about efficient data operations at scale. This is why GPUs are so good at machine learning workloads. Their architecture, originally developed for graphics rendering, has proven very efficient for data operations as well.
Graphcore revolutionizes hardware and software, using Graphs
What Graphcore has done, however, is to invest in a new architecture altogether. This is why Toon believes they have the edge over other options, which he sees as adding ad-hoc, incremental improvements. Toon notes that what the competition does is effectively building specialized chips (ASICs) that are very good at some specific mathematical operation on data, optimized for a specific workload. This, he argues, won’t do for tomorrow’s workloads.
So, what is so special about Graphcore’s own architecture? There has been some speculation that Graphcore is building what is called a neuromorphic AI chip: A processor built after a model of the human brain, with its neurons and synapses mirrored in its architecture. Knowles, however, dispels this misconception:
“The brain is a great exemplar for computer architects in this brave new endeavor of machine intelligence. But the strengths and weaknesses of silicon are very different to those of wetware. We have not copied nature’s pattern for flying machines, nor for surface locomotion, nor for engines, because our engineering materials are different. So, too, with computation.
For example, most neuromorphic computing projects advocate communication by electrical spikes, like the brain. But a basic analysis of energy efficiency immediately concludes that an electrical spike (two edges) is half as efficient for information transmission as a single edge, so following the brain is not automatically a good idea. I think computer architects should always strive to learn how the brain computes, but should not strive to literally copy it in silicon.”
Breaking Moore’s Law, Outperforming GPUs
Energy efficiency is indeed a limiting factor for neuromorphic architectures, but it does not only apply there. Toon, when asked to comment on the limits of Moore’s Law, noted that we’ve gone well beyond what anybody thought was possible, and we still have another 10 to 20 years of progress. But, he went on to add, we’ve reached some fundamental limits.
Toon thinks we’ve reached the lowest voltage that we can use on those chips. So, we can add more transistors, but we can’t make them go much faster: “Your laptop still runs at 2Ghz, it’s just got more cores in it. But we now need thousands of cores to work with machine learning. We need a different architectural process, to design chips in different ways. The old ways of doing it don’t work.”
Toon said IPUs are a general purpose machine intelligence processor, specifically designed for machine intelligence. “One of the advantages of our architecture is that it is suitable for lots of today’s machine learning approaches like CNNs, but it’s also highly optimized for different machine learning approaches like reinforcement learning and future approaches too,” he said. “The IPU architecture enables us to outperform GPUs — it combines massive parallelism with over 1000 independent processor cores per IPU and on-chip memory so the entire model can be held on chip”.
But how does the IPU compare to Nvidia’s GPUs in practice? Recently some machine learning benchmarks were released, in which Nvidia was shown to outperform the competition. When asked for his thoughts on this, Toon said they are aware of them, but focused on optimizing customer-specific applications and workloads right now.
He has previously stated, however, that data structures for machine learning are different, as they are high dimensional and complex models. This, he said, means dealing with data in different way. Toon noted that GPUs are very powerful, but not necessarily efficient in how they handle these data structures: “We have the opportunity to create something 10, 100 times faster for these data structures”.
Machine learning is changing the paradigm for compute, and AI chips catalyze the process. Image: Graphcore
Speed, however, is not all it takes to succeed in this game. Nvidia, for example, did not succeed just because its GPUs are powerful. A big part of its success, and a differentiator over GPU competitors such as AMD, is in the software layer. The libraries that enabled developers to abstract from hardware specifics and focus on optimizing their machine learning algorithms, parameters, and processes have been a key part of Nvidia’s success.
Nvidia keeps evolving these libraries, with the latest RAPIDS library promising a 50-fold GPU acceleration of data analytics and machine learning compared to CPUs. Where does Graphcore stand in comparison? Toon acknowledged that the software is hugely important, going on to add that, alongside building the world’s most complex silicon processor, Graphcore has also built the first software tool chain designed specifically for machine intelligence, called Poplar.
According to Toon:
“When Graphcore began there was no TensorFlow or PyTorch, but it was clear that in order to target this emerging world of knowledge models we had to rethink the traditional microprocessor software stack. The world has moved from developers defining everything in terms of vectors and scalars to one of graphs and tensors.
In this new world, traditional tool chains do not have the capabilities required to provide an easy and open platform for developers. The models and applications of Compute 2.0 are massively parallel and rely on millions of identical calculations to be performed at the same time.
These workloads dictate that for maximum efficiency models must stay resident and must allow the data to stream through them. Existing architectures that rely on streaming both application code and data through the processor to implement these models are inefficient for this purpose both in hardware construct and in the methodologies used in the tool chains that support them.”
Software 2.0, Compute 2.0, and Computational Graphs
Graphcore talks about Compute 2.0, others talk about Software 2.0. These two are strongly related as the new paradigm for application development. When discussing Compute 2.0, Toon noted that for 70 years we have been telling computers what to do step by step in a program, which is the familiar algorithmic process. Now, he said, we learn from data.
Rather than programming the machine, the machine learns — hence, machine learning. This is fundamentally changing the development and behavior of applications. The processes for building software need to be adapted, and software may display non-deterministic, or at least, non-explainable behavior. This seems to be the way of the future, however, and Toon pointed out that, with enough data and compute, we can build models that outperform humans in pattern recognition tasks.
“When we talk about Poplar being designed for machine intelligence what does that mean, what are the characteristics that such a tool chain requires. It must first and foremost use the graph as its key construct. The graph represents the knowledge model and the application which is built by the tool chain. Poplar is built around a computational graph abstraction, the intermediate representation (IR) of its graph compiler is a large directed graph,” Toon said.
Graphcore, like others, is using graphs as a fundamental metaphor upon which its approach to software and compute for machine intelligence is built. Toon noted that graph images shared by Graphore are the internal representation of their graph compiler. A representation of the entire knowledge model broken down to expose the huge parallel workloads, which Graphcore schedules and executes across the IPU processor.
The IPU processor and Poplar were designed together, and Toon said this design philosophy of both silicon architecture and software programming environment being developed in this way reflects the culture and environment of Graphcore:
“The engineering we do is open and collaborative in how we build our technology. Poplar supports the design decisions we made in our chip, building and running highly optimized machine intelligence models in place with a highly optimized BSP (bulk synchronous parallel) execution model.
It is built to support the concepts of separating compute and communication for power efficiency and also to interface with host platforms to remove the bottlenecks that plague existing platforms that are being augmented with support for machine learning rather than being designed for it.
Along with supporting and allowing users access to our high-performance IPU platform, the Poplar tool chain has to be easy to use for developers. It has to integrate seamlessly into the backend of machine learning frameworks such as PyTorch and Tensorflow and provide a runtime for network interchange formats such as ONNX both for inference and training workloads.”
Poplar supports TensorFlow, PyTorch, ONNX and Keras now, and will roll out support for other machine learning frameworks over the course of 2019 and as new frameworks appear. Toon said that, by using Poplar as the back end of these frameworks, users can get access to the benefits of having their machine learning models passed through an optimizing graph compiler for all required workloads, rather than just the simple pattern matching that gets used in legacy software platforms.
“It is not just about running the models and constructs of today, innovators and researchers need a platform to develop and explore the solutions of tomorrow with an easy to use and programmable platform,” he said. “The field of machine intelligence is being held back by software libraries for hardware platforms, which are not open and extensible providing a black box to developers who want to innovate and evolve ideas.”
The revolution is ready to ship, graph-based, and open sourced
If you are like us, you may be wondering what those graphs are like beyond their remarkable imagery. What kind of structures, models and formalism does Graphcore use to represent and work with graphs? Would they go as far as to call them knowledge graphs?
“We just call them computational graphs. All machine learning models are best expressed as graphs — this is how TensorFlow works as well. It’s just that our graphs are orders of magnitude more complex because we have orders of magnitude parallelism for the graphs to exploit on our chips,” said Toon.
But if you really are curious, there is good news — you just have to wait a bit longer. Toon promised that over time Graphcore will be providing IPU developers full open-source access to its optimized graph libraries so they can see how Graphcore builds applications. We are certainly looking forward to that, adding it to Graphcore’s current and future plans to track.
Graphcore is already shipping production hardware to early access customers. Toon said Graphcore sells PCIe cards, which are ready to slot into server platforms, called C2 IPU-Processor cards. They contain two IPU processors each. He also noted they are working with Dell as a channel partner to deliver Dell server platforms to enterprise customers and cloud customers.
According to Toon, products will be more widely available next year. Initial focus is on data center, cloud, and a select number of edge applications that require heavy compute — like autonomous cars. Graphcore is not currently targeting consumer edge devices like mobile phones.
Graphcore delivering on its promises will be nothing short of a revolution, both on the hardware and the software layer.