Categories
knowledge connexions

The AI chip unicorn that’s about to revolutionize everything has computational Graph at its Core

AI is the most disruptive technology of our lifetimes, and AI chips are the most disruptive infrastructure for AI. By that measure, the impact of what Graphcore is about to massively unleash in the world is beyond description. Here is how pushing the boundaries of Moore’s Law with IPUs works, and how it compares to today’s state of the art on the hardware and software level. Should incumbent Nvidia worry, and users rejoice?

If luck is another word for being at the right place at the right time, you could say we got lucky. Graphcore, the hottest name in AI chips, has been on our radar for a while now, and a discussion with Graphcore’s founders was planned well before the news about it broke out this week.

Graphcore, as you may have heard by now, just secured another $200 million of funding from BMW, Microsoft, and leading financial investors to deliver the world’s most advanced AI chip at scale. Names include the likes of Atomico, Merian Chrysalis, Investment Company Limited, Sofina, and Sequoia. As Graphcore CEO and Founder, Nigel Toon shared, Graphcore had to turn down investors for this round, including, originally, the iconic Sequoia fund.

Graphcore is now officially a unicorn, with a valuation of $1.7 billion. Graphcore’s partners such as Dell, the world’s largest server producer, Bosch, the world’s largest supplier of electronics for the automotive industry, and Samsung, the world’s largest consumer electronics company, have access to its chips already. So, here’s your chance to prepare for, and understand, the revolution you’re about to see unfolding in the not-so-distant future.

Learning how the brain works is one thing, modeling chips after it is another

Graphcore is based in Bristol, UK, and was founded by semiconductor industry veterans Nigel Toon, CEO, and Simon Knowles, CTO. Toon and Knowles were previously involved in companies such as Altera, Element14, and Icera that exited for combined value in the billions. Toon is positive they can, and will, disrupt the semiconductor industry more than ever before this time around, breaking what he sees as the near-monopoly of Nvidia.

Nvidia is the dominant player in AI workloads, with its GPU chips, and it keeps evolving. There are more players in the domain, but Toon believes it’s only Nvidia that has a clear, coherent strategy and an effective product in the marketplace. There are also players such as Google, with its TPU investing in AI chips, but Toon claims Graphcore has the leading edge and a fantastic opportunity to build an empire with its IPU (Intelligent Processor Unit) chip. He cites the success of ARM mobile processors versus incumbents of the time as an example.

In order to understand his confidence, and that of investors and partners, we need to understand what exactly Graphcore does and how that is different from the competition. Machine learning and AI are the most rapidly developing and disruptive technologies. Machine learning, which is at the core of what is called AI these days, is effectively very efficient pattern matching, based on a combination of appropriate algorithms (models) and data (training sets).

Some people go to the extreme of calling AI, essentially, matrix multiplication. While such reductionism is questionable, the fact remains that much of machine learning is about efficient data operations at scale. This is why GPUs are so good at machine learning workloads. Their architecture, originally developed for graphics rendering, has proven very efficient for data operations as well.

graphcore-wordmark-brain-scan-1920x1080.jpg

Graphcore revolutionizes hardware and software, using Graphs

What Graphcore has done, however, is to invest in a new architecture altogether. This is why Toon believes they have the edge over other options, which he sees as adding ad-hoc, incremental improvements. Toon notes that what the competition does is effectively building specialized chips (ASICs) that are very good at some specific mathematical operation on data, optimized for a specific workload. This, he argues, won’t do for tomorrow’s workloads.

So, what is so special about Graphcore’s own architecture? There has been some speculation that Graphcore is building what is called a neuromorphic AI chip: A processor built after a model of the human brain, with its neurons and synapses mirrored in its architecture. Knowles, however, dispels this misconception:

“The brain is a great exemplar for computer architects in this brave new endeavor of machine intelligence. But the strengths and weaknesses of silicon are very different to those of wetware. We have not copied nature’s pattern for flying machines, nor for surface locomotion, nor for engines, because our engineering materials are different. So, too, with computation.

For example, most neuromorphic computing projects advocate communication by electrical spikes, like the brain. But a basic analysis of energy efficiency immediately concludes that an electrical spike (two edges) is half as efficient for information transmission as a single edge, so following the brain is not automatically a good idea. I think computer architects should always strive to learn how the brain computes, but should not strive to literally copy it in silicon.”

Breaking Moore’s Law, Outperforming GPUs

Energy efficiency is indeed a limiting factor for neuromorphic architectures, but it does not only apply there. Toon, when asked to comment on the limits of Moore’s Law, noted that we’ve gone well beyond what anybody thought was possible, and we still have another 10 to 20 years of progress. But, he went on to add, we’ve reached some fundamental limits.

Toon thinks we’ve reached the lowest voltage that we can use on those chips. So, we can add more transistors, but we can’t make them go much faster: “Your laptop still runs at 2Ghz, it’s just got more cores in it. But we now need thousands of cores to work with machine learning. We need a different architectural process, to design chips in different ways. The old ways of doing it don’t work.”

Toon said IPUs are a general purpose machine intelligence processor, specifically designed for machine intelligence. “One of the advantages of our architecture is that it is suitable for lots of today’s machine learning approaches like CNNs, but it’s also highly optimized for different machine learning approaches like reinforcement learning and future approaches too,” he said. “The IPU architecture enables us to outperform GPUs — it combines massive parallelism with over 1000 independent processor cores per IPU and on-chip memory so the entire model can be held on chip”.

But how does the IPU compare to Nvidia’s GPUs in practice? Recently some machine learning benchmarks were released, in which Nvidia was shown to outperform the competition. When asked for his thoughts on this, Toon said they are aware of them, but focused on optimizing customer-specific applications and workloads right now.

He has previously stated, however, that data structures for machine learning are different, as they are high dimensional and complex models. This, he said, means dealing with data in different way. Toon noted that GPUs are very powerful, but not necessarily efficient in how they handle these data structures: “We have the opportunity to create something 10, 100 times faster for these data structures”.

pentagram-map-branding-graphcore-design-dezeen-2364-col-11.jpg

Machine learning is changing the paradigm for compute, and AI chips catalyze the process. Image: Graphcore

Speed, however, is not all it takes to succeed in this game. Nvidia, for example, did not succeed just because its GPUs are powerful. A big part of its success, and a differentiator over GPU competitors such as AMD, is in the software layer. The libraries that enabled developers to abstract from hardware specifics and focus on optimizing their machine learning algorithms, parameters, and processes have been a key part of Nvidia’s success.

Nvidia keeps evolving these libraries, with the latest RAPIDS library promising a 50-fold GPU acceleration of data analytics and machine learning compared to CPUs. Where does Graphcore stand in comparison? Toon acknowledged that the software is hugely important, going on to add that, alongside building the world’s most complex silicon processor, Graphcore has also built the first software tool chain designed specifically for machine intelligence, called Poplar.

According to Toon:

“When Graphcore began there was no TensorFlow or PyTorch, but it was clear that in order to target this emerging world of knowledge models we had to rethink the traditional microprocessor software stack. The world has moved from developers defining everything in terms of vectors and scalars to one of graphs and tensors.

In this new world, traditional tool chains do not have the capabilities required to provide an easy and open platform for developers. The models and applications of Compute 2.0 are massively parallel and rely on millions of identical calculations to be performed at the same time.

These workloads dictate that for maximum efficiency models must stay resident and must allow the data to stream through them. Existing architectures that rely on streaming both application code and data through the processor to implement these models are inefficient for this purpose both in hardware construct and in the methodologies used in the tool chains that support them.”

Software 2.0, Compute 2.0, and Computational Graphs

Graphcore talks about Compute 2.0, others talk about Software 2.0. These two are strongly related as the new paradigm for application development. When discussing Compute 2.0, Toon noted that for 70 years we have been telling computers what to do step by step in a program, which is the familiar algorithmic process. Now, he said, we learn from data.

Rather than programming the machine, the machine learns — hence, machine learning. This is fundamentally changing the development and behavior of applications. The processes for building software need to be adapted, and software may display non-deterministic, or at least, non-explainable behavior. This seems to be the way of the future, however, and Toon pointed out that, with enough data and compute, we can build models that outperform humans in pattern recognition tasks.

“When we talk about Poplar being designed for machine intelligence what does that mean, what are the characteristics that such a tool chain requires. It must first and foremost use the graph as its key construct. The graph represents the knowledge model and the application which is built by the tool chain. Poplar is built around a computational graph abstraction, the intermediate representation (IR) of its graph compiler is a large directed graph,” Toon said.

Graphcore, like others, is using graphs as a fundamental metaphor upon which its approach to software and compute for machine intelligence is built. Toon noted that graph images shared by Graphore are the internal representation of their graph compiler. A representation of the entire knowledge model broken down to expose the huge parallel workloads, which Graphcore schedules and executes across the IPU processor.

The IPU processor and Poplar were designed together, and Toon said this design philosophy of both silicon architecture and software programming environment being developed in this way reflects the culture and environment of Graphcore:

“The engineering we do is open and collaborative in how we build our technology. Poplar supports the design decisions we made in our chip, building and running highly optimized machine intelligence models in place with a highly optimized BSP (bulk synchronous parallel) execution model.

It is built to support the concepts of separating compute and communication for power efficiency and also to interface with host platforms to remove the bottlenecks that plague existing platforms that are being augmented with support for machine learning rather than being designed for it.

Along with supporting and allowing users access to our high-performance IPU platform, the Poplar tool chain has to be easy to use for developers. It has to integrate seamlessly into the backend of machine learning frameworks such as PyTorch and Tensorflow and provide a runtime for network interchange formats such as ONNX both for inference and training workloads.”

Poplar supports TensorFlow, PyTorch, ONNX and Keras now, and will roll out support for other machine learning frameworks over the course of 2019 and as new frameworks appear. Toon said that, by using Poplar as the back end of these frameworks, users can get access to the benefits of having their machine learning models passed through an optimizing graph compiler for all required workloads, rather than just the simple pattern matching that gets used in legacy software platforms.

“It is not just about running the models and constructs of today, innovators and researchers need a platform to develop and explore the solutions of tomorrow with an easy to use and programmable platform,” he said. “The field of machine intelligence is being held back by software libraries for hardware platforms, which are not open and extensible providing a black box to developers who want to innovate and evolve ideas.”

The revolution is ready to ship, graph-based, and open sourced

If you are like us, you may be wondering what those graphs are like beyond their remarkable imagery. What kind of structures, models and formalism does Graphcore use to represent and work with graphs? Would they go as far as to call them knowledge graphs?

“We just call them computational graphs. All machine learning models are best expressed as graphs — this is how TensorFlow works as well. It’s just that our graphs are orders of magnitude more complex because we have orders of magnitude parallelism for the graphs to exploit on our chips,” said Toon.

But if you really are curious, there is good news — you just have to wait a bit longer. Toon promised that over time Graphcore will be providing IPU developers full open-source access to its optimized graph libraries so they can see how Graphcore builds applications. We are certainly looking forward to that, adding it to Graphcore’s current and future plans to track.

Graphcore is already shipping production hardware to early access customers. Toon said Graphcore sells PCIe cards, which are ready to slot into server platforms, called C2 IPU-Processor cards. They contain two IPU processors each. He also noted they are working with Dell as a channel partner to deliver Dell server platforms to enterprise customers and cloud customers.

According to Toon, products will be more widely available next year. Initial focus is on data center, cloud, and a select number of edge applications that require heavy compute — like autonomous cars. Graphcore is not currently targeting consumer edge devices like mobile phones.

Graphcore delivering on its promises will be nothing short of a revolution, both on the hardware and the software layer.

Content retrieved from: https://www.zdnet.com/article/the-ai-chip-unicorn-that-is-about-to-revolutionize-everything-has-computational-graph-at-its-core/.

Categories
knowledge connexions

The AI chip unicorn that’s about to revolutionize everything has computational Graph at its Core

AI is the most disruptive technology of our lifetimes, and AI chips are the most disruptive infrastructure for AI. By that measure, the impact of what Graphcore is about to massively unleash in the world is beyond description. Here is how pushing the boundaries of Moore’s Law with IPUs works, and how it compares to today’s state of the art on the hardware and software level. Should incumbent Nvidia worry, and users rejoice?

If luck is another word for being at the right place at the right time, you could say we got lucky. Graphcore, the hottest name in AI chips, has been on our radar for a while now, and a discussion with Graphcore’s founders was planned well before the news about it broke out this week.

Graphcore, as you may have heard by now, just secured another $200 million of funding from BMW, Microsoft, and leading financial investors to deliver the world’s most advanced AI chip at scale. Names include the likes of Atomico, Merian Chrysalis, Investment Company Limited, Sofina, and Sequoia. As Graphcore CEO and Founder, Nigel Toon shared, Graphcore had to turn down investors for this round, including, originally, the iconic Sequoia fund.

Graphcore is now officially a unicorn, with a valuation of $1.7 billion. Graphcore’s partners such as Dell, the world’s largest server producer, Bosch, the world’s largest supplier of electronics for the automotive industry, and Samsung, the world’s largest consumer electronics company, have access to its chips already. So, here’s your chance to prepare for, and understand, the revolution you’re about to see unfolding in the not-so-distant future.

Learning how the brain works is one thing, modeling chips after it is another

Graphcore is based in Bristol, UK, and was founded by semiconductor industry veterans Nigel Toon, CEO, and Simon Knowles, CTO. Toon and Knowles were previously involved in companies such as Altera, Element14, and Icera that exited for combined value in the billions. Toon is positive they can, and will, disrupt the semiconductor industry more than ever before this time around, breaking what he sees as the near-monopoly of Nvidia.

Nvidia is the dominant player in AI workloads, with its GPU chips, and it keeps evolving. There are more players in the domain, but Toon believes it’s only Nvidia that has a clear, coherent strategy and an effective product in the marketplace. There are also players such as Google, with its TPU investing in AI chips, but Toon claims Graphcore has the leading edge and a fantastic opportunity to build an empire with its IPU (Intelligent Processor Unit) chip. He cites the success of ARM mobile processors versus incumbents of the time as an example.

In order to understand his confidence, and that of investors and partners, we need to understand what exactly Graphcore does and how that is different from the competition. Machine learning and AI are the most rapidly developing and disruptive technologies. Machine learning, which is at the core of what is called AI these days, is effectively very efficient pattern matching, based on a combination of appropriate algorithms (models) and data (training sets).

Some people go to the extreme of calling AI, essentially, matrix multiplication. While such reductionism is questionable, the fact remains that much of machine learning is about efficient data operations at scale. This is why GPUs are so good at machine learning workloads. Their architecture, originally developed for graphics rendering, has proven very efficient for data operations as well.

graphcore-wordmark-brain-scan-1920x1080.jpg

Graphcore revolutionizes hardware and software, using Graphs

What Graphcore has done, however, is to invest in a new architecture altogether. This is why Toon believes they have the edge over other options, which he sees as adding ad-hoc, incremental improvements. Toon notes that what the competition does is effectively building specialized chips (ASICs) that are very good at some specific mathematical operation on data, optimized for a specific workload. This, he argues, won’t do for tomorrow’s workloads.

So, what is so special about Graphcore’s own architecture? There has been some speculation that Graphcore is building what is called a neuromorphic AI chip: A processor built after a model of the human brain, with its neurons and synapses mirrored in its architecture. Knowles, however, dispels this misconception:

“The brain is a great exemplar for computer architects in this brave new endeavor of machine intelligence. But the strengths and weaknesses of silicon are very different to those of wetware. We have not copied nature’s pattern for flying machines, nor for surface locomotion, nor for engines, because our engineering materials are different. So, too, with computation.

For example, most neuromorphic computing projects advocate communication by electrical spikes, like the brain. But a basic analysis of energy efficiency immediately concludes that an electrical spike (two edges) is half as efficient for information transmission as a single edge, so following the brain is not automatically a good idea. I think computer architects should always strive to learn how the brain computes, but should not strive to literally copy it in silicon.”

Breaking Moore’s Law, Outperforming GPUs

Energy efficiency is indeed a limiting factor for neuromorphic architectures, but it does not only apply there. Toon, when asked to comment on the limits of Moore’s Law, noted that we’ve gone well beyond what anybody thought was possible, and we still have another 10 to 20 years of progress. But, he went on to add, we’ve reached some fundamental limits.

Toon thinks we’ve reached the lowest voltage that we can use on those chips. So, we can add more transistors, but we can’t make them go much faster: “Your laptop still runs at 2Ghz, it’s just got more cores in it. But we now need thousands of cores to work with machine learning. We need a different architectural process, to design chips in different ways. The old ways of doing it don’t work.”

Toon said IPUs are a general purpose machine intelligence processor, specifically designed for machine intelligence. “One of the advantages of our architecture is that it is suitable for lots of today’s machine learning approaches like CNNs, but it’s also highly optimized for different machine learning approaches like reinforcement learning and future approaches too,” he said. “The IPU architecture enables us to outperform GPUs — it combines massive parallelism with over 1000 independent processor cores per IPU and on-chip memory so the entire model can be held on chip”.

But how does the IPU compare to Nvidia’s GPUs in practice? Recently some machine learning benchmarks were released, in which Nvidia was shown to outperform the competition. When asked for his thoughts on this, Toon said they are aware of them, but focused on optimizing customer-specific applications and workloads right now.

He has previously stated, however, that data structures for machine learning are different, as they are high dimensional and complex models. This, he said, means dealing with data in different way. Toon noted that GPUs are very powerful, but not necessarily efficient in how they handle these data structures: “We have the opportunity to create something 10, 100 times faster for these data structures”.

pentagram-map-branding-graphcore-design-dezeen-2364-col-11.jpg

Machine learning is changing the paradigm for compute, and AI chips catalyze the process. Image: Graphcore

Speed, however, is not all it takes to succeed in this game. Nvidia, for example, did not succeed just because its GPUs are powerful. A big part of its success, and a differentiator over GPU competitors such as AMD, is in the software layer. The libraries that enabled developers to abstract from hardware specifics and focus on optimizing their machine learning algorithms, parameters, and processes have been a key part of Nvidia’s success.

Nvidia keeps evolving these libraries, with the latest RAPIDS library promising a 50-fold GPU acceleration of data analytics and machine learning compared to CPUs. Where does Graphcore stand in comparison? Toon acknowledged that the software is hugely important, going on to add that, alongside building the world’s most complex silicon processor, Graphcore has also built the first software tool chain designed specifically for machine intelligence, called Poplar.

According to Toon:

“When Graphcore began there was no TensorFlow or PyTorch, but it was clear that in order to target this emerging world of knowledge models we had to rethink the traditional microprocessor software stack. The world has moved from developers defining everything in terms of vectors and scalars to one of graphs and tensors.

In this new world, traditional tool chains do not have the capabilities required to provide an easy and open platform for developers. The models and applications of Compute 2.0 are massively parallel and rely on millions of identical calculations to be performed at the same time.

These workloads dictate that for maximum efficiency models must stay resident and must allow the data to stream through them. Existing architectures that rely on streaming both application code and data through the processor to implement these models are inefficient for this purpose both in hardware construct and in the methodologies used in the tool chains that support them.”

Software 2.0, Compute 2.0, and Computational Graphs

Graphcore talks about Compute 2.0, others talk about Software 2.0. These two are strongly related as the new paradigm for application development. When discussing Compute 2.0, Toon noted that for 70 years we have been telling computers what to do step by step in a program, which is the familiar algorithmic process. Now, he said, we learn from data.

Rather than programming the machine, the machine learns — hence, machine learning. This is fundamentally changing the development and behavior of applications. The processes for building software need to be adapted, and software may display non-deterministic, or at least, non-explainable behavior. This seems to be the way of the future, however, and Toon pointed out that, with enough data and compute, we can build models that outperform humans in pattern recognition tasks.

“When we talk about Poplar being designed for machine intelligence what does that mean, what are the characteristics that such a tool chain requires. It must first and foremost use the graph as its key construct. The graph represents the knowledge model and the application which is built by the tool chain. Poplar is built around a computational graph abstraction, the intermediate representation (IR) of its graph compiler is a large directed graph,” Toon said.

Graphcore, like others, is using graphs as a fundamental metaphor upon which its approach to software and compute for machine intelligence is built. Toon noted that graph images shared by Graphore are the internal representation of their graph compiler. A representation of the entire knowledge model broken down to expose the huge parallel workloads, which Graphcore schedules and executes across the IPU processor.

The IPU processor and Poplar were designed together, and Toon said this design philosophy of both silicon architecture and software programming environment being developed in this way reflects the culture and environment of Graphcore:

“The engineering we do is open and collaborative in how we build our technology. Poplar supports the design decisions we made in our chip, building and running highly optimized machine intelligence models in place with a highly optimized BSP (bulk synchronous parallel) execution model.

It is built to support the concepts of separating compute and communication for power efficiency and also to interface with host platforms to remove the bottlenecks that plague existing platforms that are being augmented with support for machine learning rather than being designed for it.

Along with supporting and allowing users access to our high-performance IPU platform, the Poplar tool chain has to be easy to use for developers. It has to integrate seamlessly into the backend of machine learning frameworks such as PyTorch and Tensorflow and provide a runtime for network interchange formats such as ONNX both for inference and training workloads.”

Poplar supports TensorFlow, PyTorch, ONNX and Keras now, and will roll out support for other machine learning frameworks over the course of 2019 and as new frameworks appear. Toon said that, by using Poplar as the back end of these frameworks, users can get access to the benefits of having their machine learning models passed through an optimizing graph compiler for all required workloads, rather than just the simple pattern matching that gets used in legacy software platforms.

“It is not just about running the models and constructs of today, innovators and researchers need a platform to develop and explore the solutions of tomorrow with an easy to use and programmable platform,” he said. “The field of machine intelligence is being held back by software libraries for hardware platforms, which are not open and extensible providing a black box to developers who want to innovate and evolve ideas.”

The revolution is ready to ship, graph-based, and open sourced

If you are like us, you may be wondering what those graphs are like beyond their remarkable imagery. What kind of structures, models and formalism does Graphcore use to represent and work with graphs? Would they go as far as to call them knowledge graphs?

“We just call them computational graphs. All machine learning models are best expressed as graphs — this is how TensorFlow works as well. It’s just that our graphs are orders of magnitude more complex because we have orders of magnitude parallelism for the graphs to exploit on our chips,” said Toon.

But if you really are curious, there is good news — you just have to wait a bit longer. Toon promised that over time Graphcore will be providing IPU developers full open-source access to its optimized graph libraries so they can see how Graphcore builds applications. We are certainly looking forward to that, adding it to Graphcore’s current and future plans to track.

Graphcore is already shipping production hardware to early access customers. Toon said Graphcore sells PCIe cards, which are ready to slot into server platforms, called C2 IPU-Processor cards. They contain two IPU processors each. He also noted they are working with Dell as a channel partner to deliver Dell server platforms to enterprise customers and cloud customers.

According to Toon, products will be more widely available next year. Initial focus is on data center, cloud, and a select number of edge applications that require heavy compute — like autonomous cars. Graphcore is not currently targeting consumer edge devices like mobile phones.

Graphcore delivering on its promises will be nothing short of a revolution, both on the hardware and the software layer.

Content retrieved from: https://www.zdnet.com/article/the-ai-chip-unicorn-that-is-about-to-revolutionize-everything-has-computational-graph-at-its-core/.

Categories
knowledge connexions

Rebooting AI: Deep learning, meet knowledge graphs

Gary Marcus, a prominent figure in AI, is on a mission to instill a breath of fresh air to a discipline he sees as in danger of stagnating. Knowledge graphs, the 20-year old hype, may have something to offer there.

“This is what we need to do. It’s not popular right now, but this is why the stuff that is popular isn’t working.” That’s a gross oversimplification of what scientist, best-selling author, and entrepreneur Gary Marcus has been saying for a number of years now, but at least it’s one made by himself.

The “popular stuff which is not working” part refers to deep learning, and the “what we need to do” part refers to a more holistic approach to AI. Marcus is not short of ambition; he is set on nothing else but rebooting AI. He is not short of qualifications either. He has been working on figuring out the nature of intelligence, artificial or otherwise, more or less since his childhood.

Questioning deep learning may sound controversial, considering deep learning is seen as the most successful sub-domain in AI at the moment. Marcus on his part has been consistent in his critique. He has published work that highlights how deep learning fails, exemplified by language models such as GPT-2, Meena, and GPT-3.

Marcus has recently published a 60-page long paper titled “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence.” In this work, Marcus goes beyond critique, putting forward concrete proposals to move AI forward.

As a precursor to Marcus’ upcoming keynote on the future of AI in Knowledge Connexions, ZDNet engaged with him on a wide array of topics. Picking up from where we left off in the first part, today we expand on specific approaches and technologies.

Robust AI: 4 blocks versus 4 lines of code

Recently, Geoff Hinton, one of the forefathers of deep learning, claimed that deep learning is going to be able to do everything. Marcus thinks the only way to make progress is to put together building blocks that are there already, but no current AI system combines.

Building block No. 1: A connection to the world of classical AI. Marcus is not suggesting getting rid of deep learning, but using it in conjunction with some of the tools of classical AI. Classical AI is good at representing abstract knowledge, representing sentences or abstractions. The goal is to have hybrid systems that can use perceptual information.

No. 2: We need to have rich ways of specifying knowledge, and we need to have large scale knowledge. Our world is filled with lots of little pieces of knowledge. Deep learning systems mostly aren’t. They’re mostly just filled with correlations between particular things. So we need a lot of knowledge.

No. 3: We need to be able to reason about these things. Let’s say we know physical objects and their position in the world — a cup, for example. The cup contains pencils. Then AI systems need to be able to realize that if we cut a hole in the bottom of the cup, the pencils might fall out. Humans do this kind of reasoning all the time, but current AI systems don’t.

No. 4: We need cognitive models — things inside our brain or inside of computers that tell us about the relations between the entities that we see around us in the world. Marcus points to some systems that can do this some of the time, and why the inferences they can make are far more sophisticated than what deep learning alone is doing.

To us, this looks like a well-rounded proposal. But there has been some pushback, by the likes of Yoshua Bengio no less. Yoshua Bengio, Geoff Hinton, and Yan LeCun are considered the forefathers of deep learning and recently won the Turing Award for their work.

deeplearningiconsr5png-jpg.png

There is more to AI than Machine Learning, and there is more to Machine Learning than deep learning. Gary Marcus is arguing for a hybrid approach to AI, reconnecting it with its roots. Image: Nvidia

Bengio and Marcus have engaged in a debate, in which Bengio acknowledged some of Marcus’ arguments, while also choosing to draw a metaphorical line in the sand. Marcus mentioned he finds Bengio’s early work on deep learning to be “more on the hype side of the spectrum”:

“I think Bengio took the view that if we had enough data we would solve all the problems. And he now sees that’s not true. In fact, he softened his rhetoric quite a bit. He’s acknowledged that there was too much hype, and he acknowledged the limits of generalization that I’ve been pointing out for a long time — although he didn’t attribute this to me. So he’s recognized some of the limits.

However, on this one point, I think he and I are still pretty different. We were talking about which things you need to build in innately into a system. So there’s going to be a lot of knowledge. Not all of it’s going to be innate. A lot of it’s going to be learned, but there might be some core that is innate. And he was willing to acknowledge one particular thing because he said, well, that’s only four lines of computer code.

He didn’t quite draw a line and say nothing more than five lines. But he said it’s hard to encode all of this stuff. I think that’s silly. We have gigabytes of memory now which cost nothing. So you could easily accommodate the physical storage. It’s really a matter of building and debugging and getting the right amount of code.”

Innate knowledge, and the 20-year-old hype

Marcus went on to offer a metaphor. He said the genome is a kind of code that’s evolved over a billion years to build brains autonomously without a blueprint, adding it’s a very sophisticated system which he wrote about in a book called The Birth of the Mind. There’s plenty of room in that genome to have some basic knowledge of the world.

That’s obvious, Marcus argues, by observing what we call a social animal like a horse, that just gets up and starts walking, or an ibex that climbs down the side of the mountain when it’s a few hours old. There has to be some innate knowledge there about what the visual world looks like and how to interpret it, how forces apply to your own limbs, and how that relates to balance, and so forth.

There’s a lot more than four lines of code in the human genome, the reasoning goes. Marcus believes most of our genome is expressed in our brain as the brain develops. So a lot of our DNA is actually about building strong starting points in our brains that allow us to then accumulate more knowledge:

“It’s not nature versus nurture. Like the more nature you have, the less nurture you have. And it’s not like there’s one winner there. It’s actually nature and nurture work together. The more that you have built in, the easier it is to learn about the world.”

The best tech inventions of all time that advanced civilization ZDNet

Exploring intelligence, artificial and otherwise, almost inevitably gets philosophical. The innateness hypothesis refers to whether certain primitives, such as language, are built in elements of intelligence.

Marcus’ point about having enough storage to go by resonated with us, and so did the part about adding knowledge to the mix. After all, more and more AI experts are acknowledging this. We would argue that the hard part is not so much how to store this knowledge, but how to encode, connect it, and make it usable.

Which brings us to a very interesting, and also hyped point/technology: Knowledge graphs. The term “knowledge graph” is essentially a rebranding of an older approach — the semantic web. Knowledge graphs may be hyped right now, but if anything, it’s a 20-year-old hype.

The semantic web was created by Sir Tim Berners Lee to bring symbolic AI approaches to the web: Distributed, decentralized, and at scale. Parts of it worked well, others less so. It went through its own trough of disillusionment, and now it’s seeing its vindication, in the form of schema.org taking over the web and knowledge graphs being hyped. Most importantly, however, knowledge graphs are seeing real-world adoption. Marcus did reference knowledge graphs in his “Next Decade in AI” paper, which was a trigger for us.

Marcus acknowledges that there are real problems to be solved to pursue his approach, and a great deal of effort must go into constraining symbolic search well enough to work in real-time for complex problems. But he sees Google’s knowledge graph as at least a partial counter-example to this objection.

Deep learning, meet knowledge graphs

When asked if he thinks knowledge graphs can have a role in the hybrid approach he advocates for, Marcus was positive. One way to think about it, he said, is that there is an enormous amount of knowledge that’s represented on the Internet that’s available essentially for free, and is not being leveraged by current AI systems. However, much of that knowledge is problematic:

“Most of the world’s knowledge is imperfect in some way or another. But there’s an enormous amount of knowledge that, say, a bright 10-year-old can just pick up for free, and we should have RDF be able to do that.

Some examples are, first of all, Wikipedia, which says so much about how the world works. And if you have the kind of brain that a human does, you can read it and learn a lot from it. If you’re a deep learning system, you can’t get anything out of that at all, or hardly anything.

Wikipedia is the stuff that’s on the front of the house. On the back of the house are things like the semantic web that label web pages for other machines to use. There’s all kinds of knowledge there, too. It’s also being left on the floor by current approaches.

The kinds of computers that we are dreaming of that can help us to, for example, put together medical literature or develop new technologies are going to have to be able to read that stuff.

We’re going to have to get to AI systems that can use the collective human knowledge that’s expressed in language form and not just as a spreadsheet in order to really advance, in order to make the most sophisticated systems.”

deep-learning-pix.png

A hybrid approach to AI, mixing and matching deep learning and knowledge representation as exemplified by knowledge graphs, may be the best way forward

Marcus went on to add that for the semantic web, it turned out to be harder than anticipated to get people to play along and be consistent about it. But that doesn’t mean there’s no value in the approach, and in making knowledge explicit. It just means we need better tools to make use of it. This is something we can subscribe to, and something many people are on to as well.

It’s become evident that we can’t really expect people to manually annotate each piece of content published with RDF vocabularies. So a lot of that is now happening automatically, or semi-automatically, by content management systems. WordPress, the popular blogging platform, is a good example. Many plugins exist that annotate content with RDF (in its developer-friendly JSON-LD form) as it is published, with minimum or no effort required, ensuring better SEO in the process.

Marcus thinks that machine annotations will get better as machines get more sophisticated, and there will be a kind of an upward ratcheting effect as we get to AI that is more and more sophisticated. Right now, the AI is so unsophisticated, that it’s not really helping that much, but that will change over time.

The value of hybrids

More generally, Marcus thinks people are recognizing the value of hybrids, especially in the last year or two, in a way that they did not previously:

“People fell in love with this notion of ‘I just pour in all of the data in this one magic algorithm and it’s going to get me there’. And they thought that was going to solve driverless cars and chat bots and so forth.

But there’s been a wake up — ‘Hey, that’s not really working, we need other techniques’. So I think there’s been much more hunger to try different things and try to find the best of both worlds in the last couple of years, as opposed to maybe the five years before that.”

Amen to that, and as previously noted — it seems like the state of the art of AI in the real world is close to what Marcus describes too. We’ll revisit, and wrap up, next week with more techniques for knowledge infusion and semantics at scale, and a look into the future.

Content retrieved from: https://www.zdnet.com/article/rebooting-ai-deep-learning-meet-knowledge-graphs/.

Categories
knowledge connexions

Data.world secures $26 million funding, exemplifies the use of semantics and knowledge graphs for metadata management

Data.world wants to eliminate data silos to answer business questions. Their bet to do this is to provide data catalogs powered by knowledge graphs and semantics. The choice of technology seems to hit the mark, but intangibles matter, too.

Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, has announced it has closed a $26 million round of venture capital funding led by Tech Pioneers Fund. This is Data.world’s fourth and largest round of funding to date. The latest infusion of capital puts the total raised by Data.world at $71.3 million.

Two years after the unveiling of its enterprise offering, Data.world is showing strong growth and keeps evolving its offering. The company wants to use the investment to accelerate its agile data governance initiatives, scale to meet increased market demand for its enterprise platform, and continue to deliver its brand of product and customer service. 

We take the opportunity to review its progress, and through it, the prospects for the sector at large.

THE IMPORTANCE OF METADATA, COUPLED WITH A KNOWLEDGE-BASED FOCUS

Most of the time when we talk about data the narrative is along the lines of “data is the new oil.” While data can power insights and applications, that’s not really possible without governance and metadata. We are past the Big Data infatuation stage: Databases and data management technologies today are capable of handling the requirements of most organizations.

The question is no longer about how to store lots of data, but rather, how to organize, keep track, and make sense of those ever-growing heaps of data. This is where metadata and data catalogs come in. And this is why the metadata management market is expected to reach a massive $9.34 billion by 2023. According to Gartner:

“Metadata supports understanding of an organization’s data assets, how those data assets are used, and their business value. Metadata management initiatives deliver business benefits such as improved compliance and corporate governance, better risk management, better shareability and reuse, and better assessments of the impact of change within an enterprise, while creating opportunities and guarding against threats.”

Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. We have long argued for the importance of metadata, coupled with a knowledge-based focus, which Data.world exemplifies. Its product is based on knowledge graph technology and a collaborative approach.

metadata-management-magic-quadrant-2019.jpg
Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included.

Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included in Gartner’s Metadata Management Magic Quadrant Report in 2019Semantic Web Companywith whom Data.world has partnered, was also included. We see that as an affirmation of the fact that semantics-based knowledge graphs and metadata are a great match, and we expect to see more adoption of the approach in this space.

“Despite the challenges of the global pandemic, Data.world saw new enterprise bookings in the first half of its current fiscal year grow by more than 100% YoY. Additionally, the number of users within our enterprise customers has grown by over 1,900% in 2020 (YTD) over all of 2019 as Data.world’s accessible user interface has accelerated secular trends in remote work and more inclusive data cultures within their organizations,” CEO Brett Hurt told ZDNet.

As part of the investment, Scott Booth, chairman of Tech Pioneers Fund, will join Data.world’s board of directors. Booth noted that as one of the original investors in both Alibaba and Compass, he has seen how critical data is to driving company performance, and he thinks Data.world is changing the way enterprises think about and use data:

“It’s not simply for the data scientists and engineers, but for everyone in an organization. That bears out in how quickly the platform is deployed, adopted, and attached to business-critical use cases. We see this pattern again and again within Data.world’s expanding customer base, and it’s one of many reasons we’re so excited to work with the team and accelerate the market opportunity.”

INTANGIBLES AND PRODUCT PROGRESS

This is an important point and one on which Hurt and investors seem to converge. Data.world has a strong technical foundation, but this is not enough in and by itself. In the admittedly somewhat dry domain of metadata management, intangibles play an important role.

Hurt emphasized “a best-in-class UX designed for both business and IT users” as a key part of their strategy. Likewise, he went on to add, Data.world’s B-Corporation story means a great deal to customers and investors:

“They recognize that doing business with Data.world means having a true partner with a core mission to democratize access to data within both their organizations and broader society to drive better decisions. A B-Corp designation also helps us attract and retain the top talent from around the country, which translates to better products and services and operational excellence.”

The decision to raise capital at this time was in order to accelerate growth in sales and marketing teams, as well as increase investment in product innovation to take advantage of this massive market opportunity, said Hurt. He also added that raising most of their round after COVID-19 hit is a testament to the team’s performance and ambitious mission.

data-world.jpg
Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, today announced it has closed a $26 million round of venture capital

The company mentions strong customer reviews via Gartner Peer Insights, strategic partnerships, and extended product integrations with AWSSnowflake, Semantic Web Company, MANTA, and more than 1,000 platform updates in the past 12 months as some key achievements leading to today’s funding round. Hurt emphasized the continuous release cycle aspect of Data.world’s SaaS platform, and we were curious to know where exactly progress was made.

We were particularly interested in Gra.fo, the visual modeling tool that Data.world onboarded with the acquisition of Capsenta in 2019, as visual modeling can greatly simplify knowledge graph development. Hurt said that users can model their knowledge graph schemas and ontologies in Gra.fo and map them to Data.world datasets, thus semantically integrating data and creating an enterprise knowledge graph.

Other improvements include crowdsourcing and suggested edits/workflows, bulk edits, machine learning tagging, fully automated lineage, centralized access requests, enhanced usage metrics and reporting, curated data access and virtualization, and unified browse experience. That’s a handful indeed. Standing out among those:

The ability to do cross-database queries within the data catalog, including analysis and BI tool access such as Tableau, Excel, Jupyter, R, Python, and more. Auto-tagging to help organize and classify information assets, including automatically identifying which ones may be sensitive. And automated lineage to audit and understand how data connects.

MAPPING TECHNOLOGY TO MISSION

Data.world states its mission is to make it easy for everyone, not just the “data people,” to get clear, accurate, fast answers to any business question. The goal is to map siloed, distributed data to familiar and consistent business concepts, creating a unified body of knowledge anyone can find, understand, and use.

We think the technology Data.world has chosen is a good match for this goal, and attention on the intangibles seems to be paying off too. Data.world is growing, and the funding round comes both as an affirmation and an opportunity to fuel this growth. It will be interesting to see if others in this space decide to take a page from this book. 

Categories
knowledge connexions

Data.world secures $26 million funding, exemplifies the use of semantics and knowledge graphs for metadata management

Data.world wants to eliminate data silos to answer business questions. Their bet to do this is to provide data catalogs powered by knowledge graphs and semantics. The choice of technology seems to hit the mark, but intangibles matter, too.

Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, has announced it has closed a $26 million round of venture capital funding led by Tech Pioneers Fund. This is Data.world’s fourth and largest round of funding to date. The latest infusion of capital puts the total raised by Data.world at $71.3 million.

Two years after the unveiling of its enterprise offering, Data.world is showing strong growth and keeps evolving its offering. The company wants to use the investment to accelerate its agile data governance initiatives, scale to meet increased market demand for its enterprise platform, and continue to deliver its brand of product and customer service. 

We take the opportunity to review its progress, and through it, the prospects for the sector at large.

THE IMPORTANCE OF METADATA, COUPLED WITH A KNOWLEDGE-BASED FOCUS

Most of the time when we talk about data the narrative is along the lines of “data is the new oil.” While data can power insights and applications, that’s not really possible without governance and metadata. We are past the Big Data infatuation stage: Databases and data management technologies today are capable of handling the requirements of most organizations.

The question is no longer about how to store lots of data, but rather, how to organize, keep track, and make sense of those ever-growing heaps of data. This is where metadata and data catalogs come in. And this is why the metadata management market is expected to reach a massive $9.34 billion by 2023. According to Gartner:

“Metadata supports understanding of an organization’s data assets, how those data assets are used, and their business value. Metadata management initiatives deliver business benefits such as improved compliance and corporate governance, better risk management, better shareability and reuse, and better assessments of the impact of change within an enterprise, while creating opportunities and guarding against threats.”

Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. We have long argued for the importance of metadata, coupled with a knowledge-based focus, which Data.world exemplifies. Its product is based on knowledge graph technology and a collaborative approach.

metadata-management-magic-quadrant-2019.jpg
Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included.

Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included in Gartner’s Metadata Management Magic Quadrant Report in 2019Semantic Web Companywith whom Data.world has partnered, was also included. We see that as an affirmation of the fact that semantics-based knowledge graphs and metadata are a great match, and we expect to see more adoption of the approach in this space.

“Despite the challenges of the global pandemic, Data.world saw new enterprise bookings in the first half of its current fiscal year grow by more than 100% YoY. Additionally, the number of users within our enterprise customers has grown by over 1,900% in 2020 (YTD) over all of 2019 as Data.world’s accessible user interface has accelerated secular trends in remote work and more inclusive data cultures within their organizations,” CEO Brett Hurt told ZDNet.

As part of the investment, Scott Booth, chairman of Tech Pioneers Fund, will join Data.world’s board of directors. Booth noted that as one of the original investors in both Alibaba and Compass, he has seen how critical data is to driving company performance, and he thinks Data.world is changing the way enterprises think about and use data:

“It’s not simply for the data scientists and engineers, but for everyone in an organization. That bears out in how quickly the platform is deployed, adopted, and attached to business-critical use cases. We see this pattern again and again within Data.world’s expanding customer base, and it’s one of many reasons we’re so excited to work with the team and accelerate the market opportunity.”

INTANGIBLES AND PRODUCT PROGRESS

This is an important point and one on which Hurt and investors seem to converge. Data.world has a strong technical foundation, but this is not enough in and by itself. In the admittedly somewhat dry domain of metadata management, intangibles play an important role.

Hurt emphasized “a best-in-class UX designed for both business and IT users” as a key part of their strategy. Likewise, he went on to add, Data.world’s B-Corporation story means a great deal to customers and investors:

“They recognize that doing business with Data.world means having a true partner with a core mission to democratize access to data within both their organizations and broader society to drive better decisions. A B-Corp designation also helps us attract and retain the top talent from around the country, which translates to better products and services and operational excellence.”

The decision to raise capital at this time was in order to accelerate growth in sales and marketing teams, as well as increase investment in product innovation to take advantage of this massive market opportunity, said Hurt. He also added that raising most of their round after COVID-19 hit is a testament to the team’s performance and ambitious mission.

data-world.jpg
Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, today announced it has closed a $26 million round of venture capital

The company mentions strong customer reviews via Gartner Peer Insights, strategic partnerships, and extended product integrations with AWSSnowflake, Semantic Web Company, MANTA, and more than 1,000 platform updates in the past 12 months as some key achievements leading to today’s funding round. Hurt emphasized the continuous release cycle aspect of Data.world’s SaaS platform, and we were curious to know where exactly progress was made.

We were particularly interested in Gra.fo, the visual modeling tool that Data.world onboarded with the acquisition of Capsenta in 2019, as visual modeling can greatly simplify knowledge graph development. Hurt said that users can model their knowledge graph schemas and ontologies in Gra.fo and map them to Data.world datasets, thus semantically integrating data and creating an enterprise knowledge graph.

Other improvements include crowdsourcing and suggested edits/workflows, bulk edits, machine learning tagging, fully automated lineage, centralized access requests, enhanced usage metrics and reporting, curated data access and virtualization, and unified browse experience. That’s a handful indeed. Standing out among those:

The ability to do cross-database queries within the data catalog, including analysis and BI tool access such as Tableau, Excel, Jupyter, R, Python, and more. Auto-tagging to help organize and classify information assets, including automatically identifying which ones may be sensitive. And automated lineage to audit and understand how data connects.

MAPPING TECHNOLOGY TO MISSION

Data.world states its mission is to make it easy for everyone, not just the “data people,” to get clear, accurate, fast answers to any business question. The goal is to map siloed, distributed data to familiar and consistent business concepts, creating a unified body of knowledge anyone can find, understand, and use.

We think the technology Data.world has chosen is a good match for this goal, and attention on the intangibles seems to be paying off too. Data.world is growing, and the funding round comes both as an affirmation and an opportunity to fuel this growth. It will be interesting to see if others in this space decide to take a page from this book. 

Categories
knowledge connexions

AI and automation vs. the COVID-19 pandemic: Trading liberty for safety

Reports on the use of AI to respond to COVID-19 may have been greatly exaggerated. But does the rush to pandemic-fighting solutions like thermal scanners, face recognition and immunity passports signal the normalization of surveillance technologies?

Digital technologies have been touted as a solution to the COVID-19 outbreak since early in the pandemic. AlgorithmWatch, a non-profit research and advocacy organisation to evaluate and shed light on algorithmic decision making processes, just published a report on Automated Decision-Making Systems in the COVID-19 Pandemic, examining the use of technology to respond to COVID-19.

The report has a European lens, as AlgorithmWatch focuses on the use of digital technology in the EU. Its findings, however, are interesting and applicable regardless of geographies, as they refer to the same underlying principles and technologies. Furthermore, there is reference and comparison to the use of technology worldwide.

Is it AI or ADM?

The reports sets the stage by introducing the distinction between Artificial Intelligence(AI) and Automated Decision-Making (ADM). AlgorithmWatch notes that AI is a vague and much hyped term, to which they have long preferred the more rigorous locution ADM. AlgorithmWatch defines an ADM system as:

“A socio-technological framework that encompasses a decision-making model, an algorithm that translates this model into computable code, the data this code uses as an input — either to ‘learn’ from it or to analyse it by applying the model — and the entire political and economic environment surrounding its use.”

The point is that ADM systems are about more than technology. Rather, AlgorithmWatch notes, they are ways in which a certain technology is inserted within a decision-making process. And that technology may be far less sophisticated or “intelligent” than deep learning algorithms. The same technology can be used for very different purposes, depending on the rationale.

Data collected through a Bluetooth LTE-based smartphone app, for example, can be voluntarily and anonymously shared either with a central server or with smartphones of potentially infected individuals, with no consequences or sanctions whatsoever in case a citizen decides not to download it.

Or, the same technology can be adopted within a much more rights-invasive solution, working in tandem with GPS to continuously provide a citizen’s location to the authorities, at times within mandatory schemes, and with harsh sanctions in case they are not respected.

On that premise, the report goes on to examine different ways of using technology and collecting data employed by different initiatives around the world.

Mandatory ADM and bracelets

Some regimes have resorted to invasive ADM solutions that strongly prioritize public health and safety concerns over individual rights, notes AlgorithmWatch. China seems to be leading the way. According to a New York Times report, a color-based rating system called Alipay Health Code is used.

The system uses big data “to draw automated conclusions about whether someone is a contagion risk”. Under this model of ADM, citizens have to fill out a form with their personal details, to be then presented with a QR code in three colors:

“A green code enables its holder to move about unrestricted. Someone with a yellow code may be asked to stay home for seven days. Red means a two-week quarantine.” A scan is necessary to visit “office buildings, shopping malls, residential compounds and metro systems,” according to a Reuters report.

AlgorithmWatch goes on to add Bahrain, India, Israel, Kuwait, Russia and South Korea to the list of countries where ADM applications are used in a way that poses threats to the rights of their citizens. Although the report notes that the EU fares better in that respect, the use of apps in Hungary, Lithuania, Norway and Poland is rife with issues too.

10-cicret-bracelet.png
Technologies such as wearables take on a different dimension if their use is mandated

AlgorithmWatch provides some graphic details on some of those cases before moving on to wearables, aka bracelets. Here it’s Liechtenstein leading the way, having launched a study in which 2.200 citizens are given a biometric bracelet to collect “vital bodily metrics including skin temperature, breathing rate and heart rate.”

That data is then sent to a Swiss laboratory for analysis. The experiment, that will ultimately involve all of the citizens in the country, is based on the premise that by analyzing physiological vital signs “a new algorithm for the sensory armband may be developed that can recognize COVID-19 at an early stage, even if no typical symptoms of the disease are present.”

Wearables are also utilized in countries such as Hong KongSingaporeSaudi Arabia, the UAE, and Jordan, but also at Michigan’s Albion College. The report notes that although the stated goal is to enforce quarantine orders and other COVID-19 restrictions, organizations such as the Electronic Frontier Foundation (EFF) are deeply concerned.

The EFF states that wearables, in the context of the pandemic, “remain an unproven technology that might do little to contain the virus, and should at most be a supplement to primary public health measures like widespread testing and manual contact tracing.” Also, and importantly, “everyone should have the right not to wear a tracking token, and to take it off whenever they wish.”

How do contact tracing apps work, and actually, do they work?

The fundamental clash between different models of ADM is exemplified in the global debate around digital apps to complement contact tracing efforts, AlgorithmWatch notes. While some tech enthusiasts argued that privacy and other fundamental rights could be sacrificed to enable public healthnot everyone is in favor of that view.

Furthermore, a heated debate on the adoption of relevant technologies ensued, resulting in two main camps: GPS tracking to collect location data, and Bluetooth Low Energy to collect proximity data. The latter camp also split in two opposing lines of thought: centralized vs decentralized. Countries like France, the UK and initially Germany tried to develop centralized Bluetooth-based solutions, while Italy, Switzerland, Denmark, Estonia (and, ultimately, Germany) opted for a decentralized solution.

GPS-based apps work by collecting location data. The rationale is that the data can help health authorities reconstruct the web of contacts of an individual who tested positive to COVID-19 had. This aids contact tracing efforts, the thinking goes, by speeding them up and making them more effective and complete, while also enabling precise geographic identification of outbreaks. GPS-based apps can also enable identification of trends and enforcement of quarantine rules.

Coronavirus tracking or contact tracing application to reduce coronavirus spreading after quarantine detecting infected people.
Contact tracing applications are touted as a means to reduce coronavirus spreading. But how do they work, and do they actually work?

Getty Images/iStockphoto

Decentralized contact tracing apps work by merely signaling that two phones have been close enough to each other for long enough to consider the encounter at risk. They issue a notification of potential exposure to a positive subject, were one of the owners to be diagnosed with COVID-19 within 14 days, assuming they are willing to upload encounter data through the app.

Exposure notification APIs developed by Google and Apple for the Android and iOS operating systems which comprise the vast majority have been utilized, with varying degrees of success, while also causing some friction. The claim was that no location data would be collected. However it has been argued that Google still asked for location data to be turned on, even though not collected, to actually be able to notify users via Bluetooth.

AlgorithmWatch notes that months after the first deployments, we still lack hard evidence on the effectiveness of all such ADM systems. As a systematic review of the literature concluded after analyzing 110 full-text studies, “no empirical evidence of the effectiveness of automated contact tracing (regarding contacts identified or transmission reduction) was identified.” Why?

As the American Civil Liberties Union notes, GPS technology has “a best-case theoretical accuracy of 1 meter, but more typically 5 to 20 meters under an open sky.” Also, “GPS radio signals are relatively weak; the technology does not work indoors and works poorly near large buildings, in large cities, and during thunderstorms, snowstorms, and other bad weather.”

As for Bluetooth, even its own creators have argued for caution: problems in terms of accuracy and “uncertainty in the detection range” are very real, “so, yes, there may be false negatives and false positives and those have to be accounted for.” AlgorithmWatch elaborates further, and notes that based on the above, the efficacy of such apps is questionable.

Thermal scanners, face recognition, immunity passports: should this be our new normal?

The report also notes that for some industries, the pandemic is not exactly catastrophic. Forecasts for the thermal scanning, facial recognition, face and voice biometrics technology markets look outstanding, largely thanks to the pandemic. AlgorithmWatch dubs this both unsurprising and surprising:

“Unsurprising, given that face recognition is being widely adopted and deployed, both inside and outside the EU, with little to no meaningful democratic debate and safeguards in place. Bur surprising also, given what we know about their scant usefulness in the battle against COVID-19.”

National Institute of Standards and Technology study argues that “wearing face masks that adequately cover the mouth and nose causes the error rate of some of the most widely used facial recognition algorithms to spike to between 5 percent and 50 percent.” EFF on its part notes that thermal cameras not only present privacy problems, but can lead to false positives carrying the very real risk of involuntary quarantines and/or harassment.

hybrid-cloud-scales.jpg
The balance between liberty and safety is always a controversial issue, and the effort to tackle COVID-19 with technology brings it to the fore

Some countries are experimenting with immunity passports too, from Estonia to the UK, as AlgorithmWatch documents. The rationale for their adoption, and the case for urgently doing so, is the same: when adopted as a digital “credential,” as per Privacy International, an individual becomes able to prove his health status (positive, recovered, vaccinated, etc.) whenever needed in public contexts, thus enabling governments to avoid further total lockdowns.

Privacy International goes on to add, however, that similarly to all the tools previously described, “there is currently no scientific basis for these measures, as highlighted by the WHO. The nature of what information would be held on an immunity passport is currently unknown.”

AlgorithmWatch concludes by highlighting the common theme emerging from what has been studied: a “move fast and break things” mentality, trading liberty for safety. What’s more, there does not seem to be much in terms of evidence for safety, or in terms of a democratic debate, accountability, and safeguards in terms of giving up liberty. Or even in how to measure “success.” The focus should not be to make these technologies better, AlgorithmWatch notes, but rather to safeguard their use:

“Rushing to novel technological solutions to as complex a social problem as a pandemic can result both in not solving the social problem at hand, and in needlessly normalizing surveillance technologies.”

Categories
knowledge connexions

Graph, machine learning, hype, and beyond: ArangoDB open source multi-model database releases version 3.7

A sui generis, multi-model open source database, designed from the ground up to be distributed. ArangoDB keeps up with the times and uses graph, and machine learning, as the entry points for its offering.

If open source is the new normal in enterprise software, then that certainly holds for databases, too. In that line of thinking, Github is where it all happens. So to have been favorited 10.000 times on Github must say something about a project. Open source ArangoDB, which also offers an Enterprise version, has hit that milestone recently.

On Aug. 27, ArangoDB announces its new release 3.7, which comes with interesting new features around graph. We take the opportunity to discuss the database market, graph, and beyond, with CEO and co-founder Claudius Weinberger and Head of Engineering and Machine Learning Jörg Schad.

CLOUD AND MACHINE LEARNING READY

ArangoDB was founded in Cologne in 2014 by OnVista veterans Claudius Weinberger and Frank Celler. The team made the headlines in 2019 with their $10 million in Series A funding led by Bow Capital. As Weinberger noted, he and his co-founder have been working together for 20 years, and the decision to pursue their vision was not a spur of the moment idea:

“The main idea for ArangoDB, what is still valid today, is what we call the native multi-model approach. That means that we found a way that we can combine the JSON document data model, the graph model, and the key-value model in one database core with one query language.”

Today ArangoDB is a US company with a German subsidiary, it has a new chief revenue officer, Matt Ekstrom, and a new head of engineering, Schad. Schad joined ArangoDB last year but has been working with ArangoDB for the past four years. With a PhD in database systems, distributed data analytics, and large scale infrastructure container systems, Schad has been switching between databases.

Two key factors made him join the ArangoDB team: Distribution in a cloud setting and machine learning (ML). ArangoDB has been an early adopter of both Apache Mesos / DC/OS and Kubernetes. Eventually, Kubernetes prevailed, and ArangoDB 3.7 comes with the general availability of its Kubernetes operator, which has been developed over the last three years.

ArangoDB’s Kubernetes operator is also the foundation for its managed service Oasis, available in AWS, Azure, and GCP. The new release includes a number of improvements for faster replacement and movement of servers, improved monitoring and cluster health analysis, an advanced inspection of pod failure causes, and overall reduced resource usage. Cluster scalability improvements for on-premise deployment apply too.

arangoml-pipeline-complete-pipeline-1024x470.jpg
ArangoDB is touted as a solution to unify metadata across machine learning pipelines

ArangoDB has been promoting ArangoML: Using ArangoDB as the infrastructure for teams using ML. The idea is that beyond training data, which is a prerequisite for training ML models, metadata is also important, and using ArangoDB is a good match for that. We have long argued for the importance of metadata. But why ArangoDB, and not any other data management system?

Schad referred to his experience building machine learning pipelines for finance and healthcare use cases. One of the biggest challenges he saw there were audit trails for CCPA or GDPR, making it necessary to have a full view of the entire pipeline. They had to figure out what happens if patients withdraw consent to use their data, for example.

Just being able to identify the different ML models deployed in production was very challenging because they had to go through a number of different metadata stores — for the ML part, the data feature transformation part, and so on. So they wanted to have a common layer with all the metadata where this would end up being one query.

Relational systems are not a good match, Schad said. Machine learning features may be derived from other features, which means ending up with a lot of joins, and especially a lot of self joins. Apart from being ugly to write and maintain, those queries don’t perform well either. So this started to look like a case for a graph database — these are the types of queries graph databases excel at.

FROM GRAPH TO MULTI-MODEL AND BACK AGAIN

But still: why ArangoDB? ArangoDB is not a traditional graph database — it is a multi-model database which also supports graph. The advantage according to Schad is that this enables users to combine the flexibility of having no schema, leveraging the JSON document view of multi-model, with the structure of how things are connected as a graph:

“In the end, looking at which models have been impacted by which is being derived from just one data set, it’s just a graph traversal. So it turned out to be a really easy model, to be both flexible and very efficient in terms of formulating this query and many others as well.”

Schad went on to add that ArangoML has connectors for popular ML ecosystems like Tensorflow and PyTorch, and they are now working on Kubeflow integration. Custom integrations can be developed using a Python API. ArangoDB supports clients in Java, JavaScript, NodeJS, Go, Python, Elixir, R, and Rust.

Not having a schema, however, is not always a plus. ArangoDB 3.7 introduces JSON schema support, giving users the option to validate all new data written to the database, as well as analyze existing data validity. To us, this looks overdue. JSON schema may not be the most powerful schema mechanism around, but for a database emphasizing JSON, it’s a natural choice.

stresschaosistock-507216088a-poselenov-1.jpg
The key premise of multi-model databases is offering many views over the same data. For ArangoDB, graph is one view, document and key-value are the othersGetty Images/iStockphoto

Although ArangoDB has its own sui generis approach, we noticed that in the last year or so its messaging has shifted a bit from the multi-model aspect to emphasize graph. Its people confirmed that, mentioning they’re seeing a lot of demand for graph. Many users are coming with a graph use case and expand upon multi-model use cases later on.

The ArangoDB team believes, however, more data models are needed to support efficient and successful graph use cases. Graph and beyond, where graph is a central use case. Up until recently, the hype was all around graph, too. But those who have been into graph before it was cool knew that hypes come and go, and were expecting the hype to subside at some point.

The first sign came last week, with Gartner’s hype cycle for emerging technology in 2020 moving “graphs and ontologies” to the trough of disillusionment. Apart from the fact that conflating graphs and ontologies does not make much sense to us, we see this as a normal phase in the evolution of new, or in this case, not so new but still hyped, technology.

Schad noted that while graph use cases are on the rise, there’s still a lot of trial and error. Although use cases become more mature, some disillusionment in terms of scalability limits does exist. For Weinberger, it’s a good sign that the overall graph story is moving on, but expecting to do everything faster than other databases should not be the main reason people look at graphs.

Categories
knowledge connexions

Explainable AI: From the peak of inflated expectations to the pitfalls of interpreting machine learning models

We have reached peak hype for explainable AI. But what does this actually mean, and what will it take to get there?

Machine learning and artificial intelligence are helping automate an ever-increasing array of tasks, with ever-increasing accuracy. They are supported by the growing volume of data used to feed them, and the growing sophistication in algorithms. 

The flip side of more complex algorithms, however, is less interpretability. In many cases, the ability to retrace and explain outcomes reached by machine learning models (ML) is crucial, as:

“Trust models based on responsible authorities are being replaced by algorithmic trust models to ensure privacy and security of data, source of assets and identity of individuals and things. Algorithmic trust helps to ensure that organizations will not be exposed to the risk and costs of losing the trust of their customers, employees and partners. Emerging technologies tied to algorithmic trust include secure access service edge, differential privacy, authenticated provenance, bring your own identity, responsible AI and explainable AI.”

FROM THE PEAK OF INFLATED EXPECTATIONS TO A DEEP DIVE IN MACHINE LEARNING INTERPRETABILITY

The above quote is taken from Gartner’s newly released 2020 Hype Cycle for Emerging Technologies. In it, explainable AI is placed at the peak of inflated expectations. In other words, we have reached peak hype for explainable AI. To put that into perspective, a recap may be useful.

As experts such as Gary Marcus point out, AI is probably not what you think it is. Many people today conflate AI with machine learning. While machine learning has made strides in recent years, it’s not the only type of AI we have. Rule-based, symbolic AI has been around for years, and it has always been explainable.

Incidentally, that kind of AI, in the form of “Ontologies and Graphs” is also included in the same Gartner Hype Cycle, albeit in a different phase — the trough of disillusionment. Incidentally, again, that’s conflating. Ontologies are part of AI, while graphs, not necessarily.

That said: If you are interested in getting a better understanding of the state of the art in explainable AI machine learning, reading Christoph Molnar’s book is a good place to start. Molnar is a data scientist and Ph.D. candidate in interpretable machine learning. Molnar has written the book Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, in which he elaborates on the issue and examines methods for achieving explainability.

ethc2020.png
Gartner’s Hype Cycle for Emerging Technologies, 2020. Explainable AI, meaning interpretable machine learning, is at the peak of inflated expectations. Ontologies, a part of symbolic AI which is explainable, is in the trough of disillusionment

Recently, Molnar and a group of researchers attempted to addresses ML practitioners by raising awareness of pitfalls and pointing out solutions for correct model interpretation, as well as ML researchers by discussing open issues for further research. Their work was published as a research paper, titledPitfalls to Avoid when Interpreting Machine Learning Models, by the ICML 2020 Workshop XXAI: Extending Explainable AI Beyond Deep Models and Classifiers.

Similar to Molnar’s book, the paper is thorough. Admittedly, however, it’s also more involved. Yet, Molnar has striven to make it more approachable by means of visualization, using what he dubs “poorly drawn comics” to highlight each pitfall. As with Molnar’s book on interpretable machine learning, we summarize findings here, while encouraging readers to dive in for themselves.

The paper mainly focuses on the pitfalls of global interpretation techniques when the full functional relationship underlying the data is to be analyzed. Discussion of “local” interpretation methods, where individual predictions are to be explained, is out of scope. For a reference on global vs. local interpretations, you can refer to Molnar’s book as previously covered on ZDNet.

Authors note that ML models usually contain non-linear effects and higher-order interactions. As interpretations are based on simplifying assumptions, the associated conclusions are only valid if we have checked that the assumptions underlying our simplifications are not substantially violated.

In classical statistics this process is called “model diagnostics,” and the research claims that a similar process is necessary for interpretable ML (IML) based techniques. The research identifies and describes pitfalls to avoid when interpreting ML models, reviews (partial) solutions for practitioners, and discusses open issues that require further research.

BAD MODEL GENERALIZATION, UNNECESSARY USE OF COMPLEX MODELS

Under- or overfitting models will result in misleading interpretations regarding true feature effects and importance scores, as the model does not match the underlying data generating process well. Evaluation of training data should not be used for ML models due to the danger of overfitting. We have to resort to out-of-sample validation such as cross-validation procedures.

Formally, IML methods are designed to interpret the model instead of drawing inferences about the data generating process. In practice, however, the latter is the goal of the analysis, not the former. If a model approximates the data generating process well enough, its interpretation should reveal insights into the underlying process. Interpretations can only be as good as their underlying models. It is crucial to properly evaluate models using training and test splits — ideally using a resampling scheme.

Flexible models should be part of the model selection process so that the true data-generating function is more likely to be discovered. This is important, as the Bayes error for most practical situations is unknown, and we cannot make absolute statements about whether a model already fits the data optimally.

Using opaque, complex ML models when an interpretable model would have been sufficient (i.e., having similar performance) is considered a common mistake. Starting with simple, interpretable models and gradually increasing complexity in a controlled, step-wise manner, where predictive performance is carefully measured and compared is recommended.

Measures of model complexity allow us to quantify the trade-off between complexity and performance and to automatically optimize for multiple objectives beyond performance. Some steps toward quantifying model complexity have been made. However, further research is required as there is no single perfect definition of interpretability but rather multiple, depending on the context.

IGNORING FEATURE DEPENDENCE

This pitfall is further analyzed in three sub-categories: Interpretation with extrapolation, confusing correlation with dependence, and misunderstanding conditional interpretation.

Interpretation with Extrapolation refers to producing artificial data points that are used for model predictions with perturbations. These are aggregated to produce global interpretations. But if features are dependent, perturbation approaches produce unrealistic data points. In addition, even if features are independent, using an equidistant grid can produce unrealistic values for the feature of interest. Both issues can result in misleading interpretations.

Before applying interpretation methods, practitioners should check for dependencies between features in the data (e.g., via descriptive statistics or measures of dependence). When it is unavoidable to include dependent features in the model, which is usually the case in ML scenarios, additional information regarding the strength and shape of the dependence structure should be provided.

Confusing correlation with dependence is a typical error. The Pearson correlation coefficient (PCC) is a measure used to track dependency among ML features. But features with PCC close to zero can still be dependent and cause misleading model interpretations. While independence between two features implies that the PCC is zero, the converse is generally false.

Any type of dependence between features can have a strong impact on the interpretation of the results of IML methods. Thus, knowledge about (possibly non-linear) dependencies between features is crucial. Low-dimensional data can be visualized to detect dependence. For high-dimensional data, several other measures of dependence in addition to PCC can be used.

Misunderstanding conditional interpretation. Conditional variants to estimate feature effects and importance scores require a different interpretation. While conditional variants for feature effects avoid model extrapolations, these methods answer a different question. Interpretation methods that perturb features independently of others also yield an unconditional interpretation.

Conditional variants do not replace values independently of other features, but in such a way that they conform to the conditional distribution. This changes the interpretation as the effects of all dependent features become entangled. The safest option would be to remove dependent features, but this is usually infeasible in practice.

When features are highly dependent and conditional effects and importance scores are used, the practitioner has to be aware of the distinct interpretation. Currently, no approach allows us to simultaneously avoid model extrapolations and to allow a conditional interpretation of effects and importance scores for dependent features.

MISLEADING EFFECT DUE TO INTERACTIONS, IGNORING ESTIMATION UNCERTAINTY, IGNORING MULTIPLE COMPARISONS

Global interpretation methods can produce misleading interpretations when features interact. Many interpretation methods cannot separate interactions from main effects. Most methods that identify and visualize interactions are not able to identify higher-order interactions and interactions of dependent features.

There are some methods to deal with this, but further research is still warranted. Furthermore, solutions lack in automatic detection and ranking of all interactions of a model as well as specifying the type of modeled interaction.

Due to the variance in the estimation process, interpretations of ML models can become misleading. When sampling techniques are used to approximate expected values, estimates vary, depending on the data used for the estimation. Furthermore, the obtained ML model is also a random variable, as it is generated on randomly sampled data and the inducing algorithm might contain stochastic components as well.

Hence, the model variance has to be taken into account. The true effect of a feature may be flat, but purely by chance, especially on smaller data, an effect might algorithmically be detected. This effect could cancel out once averaged over multiple model fits. The researchers note the uncertainty in feature effect methods has not been studied in detail.

group-of-people-on-peak-mountain.jpg
It’s a steep fall to the peak of inflated expectations to the trough of disillusionment. Getting things done for interpretable machine learning takes expertise and concerted effort.

Simultaneously testing the importance of multiple features will result in false-positive interpretations if the multiple comparisons problem (MCP) is ignored. MCP is well known in significance tests for linear models and similarly exists in testing for feature importance in ML.

For example, when simultaneously testing the importance of 50 features, even if all features are unimportant, the probability of observing that at least one feature is significantly important is ≈ 0.923. Multiple comparisons will even be more problematic, the higher dimensional a dataset is. Since MCP is well known in statistics, the authors refer practitioners to existing overviews and discussions of alternative adjustment methods.

UNJUSTIFIED CAUSAL INTERPRETATION

Practitioners are often interested in causal insights into the underlying data-generating mechanisms, which IML methods, in general, do not provide. Common causal questions include the identification of causes and effects, predicting the effects of interventions, and answering counterfactual questions. In the search for answers, researchers can be tempted to interpret the result of IML methods from a causal perspective.

However, a causal interpretation of predictive models is often not possible. Standard supervised ML models are not designed to model causal relationships but to merely exploit associations. A model may, therefore, rely on the causes and effects of the target variable as well as on variables that help to reconstruct unobserved influences.

Consequently, the question of whether a variable is relevant to a predictive model does not directly indicate whether a variable is a cause, an effect, or does not stand in any causal relation to the target variable.

As the researchers note, the challenge of causal discovery and inference remains an open key issue in the field of machine learning. Careful research is required to make explicit under which assumptions what insight about the underlying data generating mechanism can be gained by interpreting a machine learning model

GROUNDWORK VS. HYPE

Molnar et. al. offer an involved review of the pitfalls of global model-agnostic interpretation techniques for ML. Although as they note their list is far from complete, they cover common ones that pose a particularly high risk. 

They aim to encourage a more cautious approach when interpreting ML models in practice, to point practitioners to already (partially) available solutions, and to stimulate further research.

Contrasting this highly involved and detailed groundwork to high-level hype and trends on explainable AI may be instructive. 

Categories
knowledge connexions

Open source observability marches on: New Relic and Grafana Labs partnership brings benefits to developers

The perfect observability storm with open source leading the way, and a partnership that makes sense

New Relic is one of the leaders in Application Performance Monitoring (APM), which has been on a pivot to observability. Grafana Labs, makers of popular open-source dashboarding platform Grafana, has been on an growth course for a while now.

Today, the two vendors announced an ongoing partnership they claim will drive advanced open instrumentation and visibility for all developers and software teams. The companies delivered new integrations designed to empower engineering teams to solve problems even faster, as well as a free trial of Grafana Enterprise for new and existing New Relic customers.

We’d be lying if we said we saw this coming. In retrospect, however, the partnership seems to makes sense. ZDNet connected with Grafana Labs CEO Raj Dutt and New Relic Chief Product Officer Bill Staples, and discussed the specifics of the partnership as well as the broader observability landscape.

A partnership that makes sense

New Relic has been on a reinvention course for a while now. 2019 marked a pivot to observability, embracing A.I. and open source. The company has updated its New Relic One platform which instruments IT environments and applications, with CEO Lew Cirne noting this is designed to make New Relic easier to consume and address the convergence of logs, infrastructure, and APM.

Grafana’s recent release, on the other hand, brought enhancements to simplify the development of custom plugins and increase the power, speed, and flexibility of visualization. Composability, or the the ability to integrate data from a variety of data sources which Dutt refers to as a “big tent philosophy”, is key for Grafana.

crystal-ball-365.jpg
Observability is about being able to instrument everything and view all application related data in one place. The partnership between New Relic and Grafana Labs makes this a bit easier.

Staples noted that New Relic is trying to eliminate both functional and the economic barriers to making observability ubiquitous, referring the simplified pricing, interface, portfolio, and a free tier for the New Relic One platform just unveiled. He went on to add that New Relic realized that probably the most popular and prolific visualization and dashboard platform in the world is Grafana:

“A lot of our customers use Grafana in conjunction with New Relic. We wanted to have a great story there, as developers embrace our observability platform and the great economics. We want them to have great visualization on top of that. In addition to our own dashboard, we wanted to open it up to Grafana. So we reached out, and the rest is history”.

For Dutt, the interesting thing within the partnership is that New Relic is now offering Prometheus and PromQL as a method of querying data on the New Relic Telemetry Data Platform (TDP). Prometheus is one of the most interesting, and the fastest growing data source and community in the big tent ecosystem, and Grafana Labs is one of the main contributors, he went on to add.

Αs part of the partnership, New Relic is providing a PromQL-like interface to the metrics in their telemetry data platform. Grafana speaks native PromQL, and can be used to bring New Relic data together with other data, whether that’s in open source Prometheus, in tools like Graphite or other commercial vendors like DataDog or Stackdriver or Azure monitor, said Dutt.

Built on open source Prometheus

PromQL is the query language used by Prometheus, and it was a cornerstone for the integration between Grafana and New Relic’s TDP. Prometheus users can use the Prometheus remote write capability to send metric data directly to New Relic’s TDP with a single configuration change. Grafana open source users can now add TDP as a Grafana data source using Grafana’s native Prometheus data source.

This enables teams to enjoy New Relic’s up-to 13 months of retention for their Prometheus metrics while continuing to use their existing Grafana dashboards and alerts. With New Relic’s new PromQL-style syntax, Prometheus users no longer need to learn a new query language.

Grafana Enterprise customers using Grafana’s New Relic data source plugin will enjoy updates designed to support New Relic’s latest NRQL native query language capabilities. The plugin enables users to query any data stored in TDP using NRQL to build dashboards in Grafana Enterprise. Plus, new and existing New Relic customers get a free trial of Grafana Enterprise for 30 days.

newrelic-logo-bug.png
New Relic is executing on a change of direction

This means there are now 3 ways to access New Relic data via Grafana: NRQL, PromQL, and Grafana’s own data processing and transformation layer. New Relic essentially built a translation layer between NRQL and PromQL, which made the integration with Grafana easy.

You may be wondering what the difference between storing your telemetry data in Prometheus versus New Relic is, now that the option is there. Staples said that what TDP offers extra is the ability to extend retention, provide additional scale, plus private key encryption of data a fully managed solution.

Dutt on his part noted that Prometheus’ remote write functionality is something Grafana labs developed about three years ago as part of its participation within the Prometheus community. He went on to add that the use of Prometheus is a way of recognizing the growth of Prometheus, as it’s become the de facto standard for metrics in the observability community.

The future of observability, open source, standards, and AI

Grafana Labs is clearly big on Prometheus. New Relic, after announcing the intention to get involved in OpenTelemetry, has company executed on it. Staples said New Relic believes the future of instrumentation is open and open standard, and they are the number three contributor today to OpenTelemetry project after Microsoft and Splunk.

Staples said New Relic is committing engineers and resources to OpenTelemetry, and wants to fully supported it out of the box package when it’s released. TDP currently has beta support for OpenTelemetry. New Relic also open sourced existing agents and integrations, and shared the decade plus of IP that we have in instrumentation with the community, Staples added, noting those projects are now ran fully in the open and will conttinue to do so.

Interestingly, another proprietary vendor, Sumo Logic, just announced support for OpenTelemetry too. Grafana Labs, however, is not entirely sold on OpenTelemetry. They’re watching it closely, but they believe more in open source than in open standards, said Dutt.

grafana-logo-horizontal-fullcolor-dark-2.png
Grafana Labs has an open source background

Dutt finds OpenTelemetry interesting because it’s tried to combine things like tracing and metrics together, and now supports both metrics, logs and traces as kind of first party data sources and first party telemetry types. However, his preference for open source rather than “committee driven standards” is clear.

In addition to open sourcing agents similar to New Relic, Grafana Labs also open sources all its backends, including Cortex. Cortex is CNCF project that’s a scale out Prometheus back end that does long term storage. Dutt described it as a scalable Prometheus offering that runs on Grafana cloud. So there is competition, and different views there.

Wrapping up, we asked Staples and Dutt to comment on the themes that emerged as key for the future of observability in a recent survey: open source, AI and machine learning, cloud and serverless. Open source is a kind of no brainer here, as it’s core to Grafana, and something that New Relic is consciously trying to build on.

As far as AI and machine learning go, Dutt acknowledged them, but noted there’s lots of hype, and he doesn’t believe that it’s going to be a replacement for talented SREs anytime soon. Staples emphasized New Relic’s applied intelligence product, which uses AI and ML based approaches to discover anomalies and take automated actions based on the data.

Overall, both executives seemed to be on the same page regarding the main themes for the observability space, even if how they approach them may be a bit different. New Relic’s turn to open source is going strong, while Grafana Labs continues to execute on its open source legacy path. 

Categories
knowledge connexions

Explainable AI: A guide for making black box machine learning models explainable

In the future, AI will explain itself, and interpretability could boost machine intelligence research. Getting started with the basics is a good way to get there, and Christoph Molnar’s book is a good place to start.

Machine learning is taking the world by storm, helping automate more and more tasks. As digital transformation expands, the volume and coverage of available data grows, and machine learning sets its sights on tasks of increasing complexity, and achieving better accuracy.

But machine learning (ML), which many people conflate with the broader discipline of artificial intelligence (AI), is not without its issues. ML works by feeding historical real world data to algorithms used to train models. ML models can then be fed new data and produce results of interest, based on the historical data used to train the model.

A typical example is diagnosing medical conditions. ML models can be produced using data such as X-rays and CT scans, and then be fed with new data and asked to identify whether a medical condition is present or not. In situations like these, however, getting an outcome is not enough: we need to know the explanation behind it, and this is where it gets tricky.

Explainable AI

Christoph Molnar is a data scientist and PhD candidate in interpretable machine learning. Molnar has written the book “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable”, in which he elaborates on the issue and examines methods for achieving explainability.

Molnar uses the terms interpretable and explainable interchangeably. Notwithstanding the AI/ML conflation, this is a good introduction to explainable AI and how to get there. Well-researched and approachable, the book provides a good overview for experts and non-experts alike. While we summarize findings here, we encourage interested readers to dive in for themselves.

Interpretability can be defined as the degree to which a human can understand the cause of a decision, or the degree to which a human can consistently predict a ML model’s result. The higher the interpretability of a model, the easier it is to comprehend why certain decisions or predictions have been made.

xai-book.jpg
Christoph Molnar is a data scientist and PhD candidate in interpretable machine learning. In the book “Interpretable Machine Learning: A Guide for Making Black Box Models Explainable” he elaborates on the issue and examines methods for achieving explainability

There is no real consensus about what interpretability is in ML, nor is it clear how to measure it, notes Molnar. But there is some initial research on this and an attempt to formulate some approaches for evaluation. Three main levels for the evaluation of interpretability have been proposed:

Application level evaluation (real task): Put the explanation into the product and have it tested by the end user. Evaluating fracture detection software with a ML component for example would involve radiologists testing the software directly to evaluate the model. A good baseline for this is always how good a human would be at explaining the same decision.

Human level evaluation (simple task) is a simplified application level evaluation. The difference is that these experiments are carried with laypersons instead of domain experts. This makes experiments cheaper and it is easier to find more testers. An example would be to show a user different explanations and the user would choose the best one.

Function level evaluation (proxy task) does not require humans. This works best when the class of model used has already been evaluated by someone else in a human level evaluation. For example, it might be known that the end users understand decision trees. A proxy for explanation quality may be the depth of the tree: shorter trees would get a better explainability score.

Molnar includes an array of methods for achieving interpretability, noting however that most of them are intended for the interpretation of models for tabular data. Image and text data require different methods.

Scope of Interpretability

As ML algorithms train models that produce predictions, each step can be evaluated in terms of transparency or interpretability. Molnar distinguishes among Algorithm Transparency, Global Holistic Model Interpretability, Global Model Interpretability on a Modular Level, Local Interpretability for a Single Prediction, and Local Interpretability for a Group of Predictions.

Algorithm transparency is about how the algorithm learns a model from the data and what kind of relationships it can learn. Understanding how an algorithm works does not necessarily provide insights for a specific model the algorithm generates, or individual predictions are made. Algorithm transparency only requires knowledge of the algorithm and not of the data or learned model.

Global holistic model interpretability means comprehending the entire model at once. It’s about understanding how the model makes decisions, based on a holistic view of its features and each of the learned components such as weights, other parameters, and structures. Explaining the global model output requires the trained model, knowledge of the algorithm and the data.

ai robot thinking
Interpretability is a key element for machine intelligenceGetty Images/iStockphoto

While global model interpretability is usually out of reach, there is a good chance of understanding at least some models on a modular level, Molnar notes. Not all models are interpretable at a parameter level, but we can still ask how do parts of the model affect predictions. Molnar uses linear models as an example, for which weights only make sense in the context of other features in the model.

Why did the model make a certain prediction for an instance? This is the question that defines local interpretability for a single prediction. Looking at individual predictions, the behavior of otherwise complex models might be easier to explain. Locally, a prediction might only depend linearly or monotonically on some features, rather than having a complex dependence on them.

Similarly, local interpretability for a group of predictions is about answering why a model made specific predictions for a group of instances. Model predictions for multiple instances can be explained either with global model interpretation methods (on a modular level) or with explanations of individual instances.

Global methods can be applied by taking the group of instances, treating them as if the group were the complete dataset, and using the global methods with this subset. The individual explanation methods can be used on each instance and then listed or aggregated for the entire group.

Interpretable Models and Model-Agnostic Methods

The easiest way to achieve interpretability as per Molnar is to use only a subset of algorithms that create interpretable models. Linear regression, logistic regression and decision trees are commonly used interpretable models included in the book. Decision rules, RuleFit, naive Bayes and k-nearest neighbors, which is the only one not interpretable on a modular level, are also included.

Molnar summarizes interpretable model types and their properties. A model is linear if the association between features and target is modelled linearly. A model with monotonicity constraints ensures that the relationship between a feature and the target outcome always goes in the same direction over the entire range of the feature: An increase in the feature value either always leads to an increase or always to a decrease in the target outcome, making it easier to understand.

Some models can automatically include interactions between features to predict the target outcome. Interactions can be included in any type of model by manually creating interaction features. Interactions can improve predictive performance, but too many or too complex interactions can hurt interpretability. Some models handle only regression, some only classification, while others can handle both.

opera-snapshot-2020-08-06-135718-christophm-github-io.png
Interpretable machine learning models and their properties. Image: Christoph Molnar

There are, however, potential disadvantages in using interpretable models exclusively: predictive performance can be lower compared to other models, and users limit themselves to one type of model. One alternative is to use model-specific interpretation methods, but that also binds users to one model type, and it may be difficult to switch to something else.

Another alternative is model-agnostic interpretation methods, i.e. separating the explanations from the ML model. Their great advantage is flexibility. Developers are free to use any model they like, and anything that builds on a model interpretation, such as a user interface, also becomes independent of the underlying ML model.

Typically many types of ML models are evaluated to solve a task. When comparing models in terms of interpretability, working with model-agnostic explanations is easier because the same method can be used for any type of model, notes Molnar. Model, explanation and representation flexibility are desirable properties of model-agnostic explanation systems.

The methods included in the book are partial dependence plots, individual conditional expectation, accumulated local effects, feature interaction, permutation feature importance, global surrogate, local surrogate, anchors, Shapley values and SHAP.

The Future of Interpretability

Molnar’s book also examines example-based explanations, which work by selecting particular instances of the dataset to explain model behavior or data distribution. These are mostly model-agnostic, as they make any model more interpretable. Example-based explanations only make sense if we can represent an instance of the data in a humanly understandable way, which works well for images.

As far as deep neural networks go, Molnar notes using model-agnostic methods is possible. However, there are two reasons why using interpretation methods developed specifically for neural networks makes sense: First, neural networks learn features and concepts in their hidden layers, so special tools are needed to uncover them. Second, the gradient can be utilized to implement interpretation methods more computationally efficient than model-agnostic methods.

Molnar concludes by offering his predictions on the future of interpretability. He believes the focus will be on model-agnostic interpretability tools, as it’s much easier to automate interpretability when it is decoupled from the underlying ML model. Automation is already happening in ML, and Molnar sees this trend as continuing and expanding to include not just interpretability, but also data science work.

Molnar notes that many analytical tools are already based on data models, and a switch from analyzing assumption-based, transparent models to analyzing assumption-free black box models is imminent. Using assumption-free black box models has advantages, he notes, and adding interpretability may be the way to have the best of both worlds.

In the Future of Interpretability, robots and programs will explain themselves, and interpretability could boost machine intelligence research. Getting started with the basics of explainable AI is a good way to get there, and Molnar’s book is a good place to start.