Categories
knowledge connexions

Salesforce Research: Knowledge graphs and machine learning to power Einstein

Explainable AI in real life could mean Einstein not just answering your questions, but also providing justification. Advancing the state of the art in natural language processing is done on the intersection of graphs and machine learning.

A super geeky topic, which could have super important repercussions in the real world. That description could very well fit anything from cold fusion to knowledge graphs, so a bit of unpacking is in order. (Hint: it’s about Salesforce, and Salesforce is not into cold fusion as far as we know.)

If you’re into science, chances are you know arXiv.org. arXiv is a repository of electronic publication preprints for scientific papers. In other words, it’s where cutting edge research often appears first. Some months back, a publication from researchers from Salesforce appeared in arXiv, titled “Multi-Hop Knowledge Graph Reasoning with Reward Shaping.”

The paper elaborates on a technique for using knowledge graphs with machine learning; specifically, a branch of machine learning called reinforcement learning. This is something that holds great promise as a way to get the best of both worlds: Curated, top-down knowledge representation (knowledge graphs), and emergent, bottom-up pattern recognition (machine learning).

This seemingly dry topic piqued our interest for a number of reasons, not the least of which was the prospect of seeing this being applied by Salesforce. Xi Victoria Lin, research scientist at Salesforce and the paper’s primary author, was kind enough to answer our questions.

Salesforce Research: it’s all about answering questions

To start with the obvious, the fact that this paper was published says a lot in and by itself. Salesforce presumably faces the same issue everyone else is facing in staffing their research these days: the boom in the applicability of machine learning in real-world problems means there is a ongoing race to attract and retain researchers.

  
People in the research community have an ethos of sharing their accomplishments with the world by publishing in conferences and journals. That, presumably, has a lot to do with why we are seeing a number of those publications lately coming from places such as Salesforce.

The paper, presented by Lin in the 2018 Conference on Empirical Methods in Natural Language Processing (NLP), was well received. The authors have also released the source code on Github. But what is that all about, and what is the motivation, and the novelty of their approach?

salesforce-einstein-1024x576.png

Salesforce Einstein: A virtual AI assistant embedded in Salesforce’s offering. Salesforce is looking into ways of adding explainable question answering to its capabilities.

For Salesforce Research, it’s all about question answering. This is obvious browsing through their key topics and publications. And it makes sense, considering Salesforce’s offering: would it not be much easier and productive to ask whatever it is you are interested in finding in your CRM, rather than having to go through an API, or a user interface, no matter how well-designed those may be?

Lin said:

“In the near future, we would like to enable machines to answer questions over multi-modal information, which include unstructured data such as text and images as well as structured such as knowledge graphs and web tables. This work is a step towards a building block which enables the question answering system to effectively retrieve target information from (incomplete) knowledge graph.”

She went on to add that Salesforce Research is aiming to tackle AI’s communication problem. Lin and her colleagues work on a wide range of NLP problems, spanning from advancements in text summarization to learning how to build more efficient natural language interfaces to a unified approach to language understanding:

“Deep learning is the umbrella theme of the lab, which means we also work on areas outside NLP, including core machine learning projects such as novel neural architectures and other application areas such as computer vision and speech technology.”

Not tested on real data — yet

Lin also emphasized that deep learning is not the end all. For example, it was pointed out to her that the path-finding approach Lin’s team presented which uses deep reinforcement learning is related to the “relational pathfinding” technique proposed in a 1992 paper:

“The learning algorithm in that paper is not neural-based. My take-away from this is that revisiting earlier findings in inductive logic programming and possibly combining them with deep learning approaches may result in stronger algorithms.”

The obvious point of integration would be Einstein, Salesforce’s own virtual assistant. Based on Lin’s answers, it does not look like this work has been incorporated in Einstein yet, although conceptually it seems possible. Lin explained that this work is a research prototype, using benchmark datasets publicly available to academia.

opera-snapshot-2019-03-18-162015-arxiv-org.png

An incomplete knowledge graph, where some links (edges) are not explicit.

It seems that Salesforce data and infrastructure were not used in the context of the publication. All the data Lin used could fit into a 4G RAM machine. Special data structures for representation and storage to enable fast access of the graph were not really needed, said Lïn:

“I stored facts of the graph in a plain .txt file and read the entire graph into memory when running experiments. This is the common practice of KG research in academia. To apply the model on industry scale knowledge graphs would require special infrastructure.”

Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs. However, there are some issues with this approach: False negatives, and sensitivity to spurious paths. Lin’s work helps address those, largely by adding more links to incomplete knowledge graphs.

One thing we wondered was whether those links are stored, or generated on the fly. Lin explained that so far they have been generating answers on the fly for the prototype. But in real-world the two approaches would most likely be mixed:

“One would cache the links generated, manually verify them periodically and add the verified links back to the knowledge graph for reuse and generating new inference paths. We haven’t tested this hypothesis on real data.”

Graphs and machine learning for the win

Another contribution of Lin’s work is on what is called symbolic compositionality of knowledge graph relations in embedding approaches. Embedding is a technique widely used in machine learning, including machine learning reasoning with graphs. But this approach does not explicitly leverage logical composition rules.

For example, from the embeddings (A born_in California) & (California is_in US), (A born_in US) could be be deduced. But logical composition steps like this one are learned implicitly by knowledge graph embeddings. This means that this approach cannot offer such logical inference paths as support evidence for an answer.

Lin’s approach takes discrete graph paths as input, hence is explicitly modeling compositionality. This means it can offer the user an inference path which consists of the edges existing in the knowledge graph as support evidence. In other words, this can lead to so-called explainable AI, using the structure of the knowledge graph as supporting evidence for answers, at the expense of more computationally intensive algorithms.

knowledgegraphsmachinelearning.jpg

The combination of graphs and machine learning is a promising research direction gaining more attention as a way to bridge top-down and bottom-up AI

Combining graphs and machine learning has been getting a lot of attention lately, especially since the work published by researchers from DeepMind, Google Brain, MIT, and the University of Edinburgh. We asked Lin what her opinion on this is: Are graphs an appropriate means to feed neural networks? Lin believes this is an open question, and sees a lot of research needed in this direction:

“The combination of neural networks and graphs in NLP is fairly preliminary — most neural architectures take sequences as input, which are the simplest graphs. Even our model uses relational paths instead of relational subgraphs.”

Lin mentioned work done by researchers from USC and Microsoft [PDF], which generalizes LSTMs to model graphs. She also mentioned work done by Thomas N. Kipf from the University of Amsterdam [PDF], proposing graph convolutional networks to learn hidden node presentations which support node classification and other downstream tasks.

“It is definitely interesting to see more and more neural architectures specifically catering for which takes general graphs as input being proposed. We are seeing graphs being used to represent relations between objects across multiple AI domains these days. Graph is a powerful representation in the sense that by simply varying the definitions of nodes and edges we can model a variety of data types using it.

While inference over graphs is hard in general, it offers a potential way to integrate multimodal data (text, images, tables, etc.). UC Irvine researchers presented a really interesting paper in EMNLP, which improves knowledge graph completion by leveraging multimodal relational data. Their proposed architecture, for example, takes images and free-form texts as node features.”

The takeaway? It may be early days for graph-based machine learning reasoning, but initial results look promising. So, if one day you see your questions being answered by Einstein, along with supporting evidence for this, you will probably have graph and researchers like Lin to thank for it.

Content retrieved from: https://www.zdnet.com/article/salesforce-research-knowledge-graphs-and-machine-learning-to-power-einstein/.

Categories
knowledge connexions

Salesforce Research: Knowledge graphs and machine learning to power Einstein

Explainable AI in real life could mean Einstein not just answering your questions, but also providing justification. Advancing the state of the art in natural language processing is done on the intersection of graphs and machine learning.

A super geeky topic, which could have super important repercussions in the real world. That description could very well fit anything from cold fusion to knowledge graphs, so a bit of unpacking is in order. (Hint: it’s about Salesforce, and Salesforce is not into cold fusion as far as we know.)

If you’re into science, chances are you know arXiv.org. arXiv is a repository of electronic publication preprints for scientific papers. In other words, it’s where cutting edge research often appears first. Some months back, a publication from researchers from Salesforce appeared in arXiv, titled “Multi-Hop Knowledge Graph Reasoning with Reward Shaping.”

The paper elaborates on a technique for using knowledge graphs with machine learning; specifically, a branch of machine learning called reinforcement learning. This is something that holds great promise as a way to get the best of both worlds: Curated, top-down knowledge representation (knowledge graphs), and emergent, bottom-up pattern recognition (machine learning).

This seemingly dry topic piqued our interest for a number of reasons, not the least of which was the prospect of seeing this being applied by Salesforce. Xi Victoria Lin, research scientist at Salesforce and the paper’s primary author, was kind enough to answer our questions.

Salesforce Research: it’s all about answering questions

To start with the obvious, the fact that this paper was published says a lot in and by itself. Salesforce presumably faces the same issue everyone else is facing in staffing their research these days: the boom in the applicability of machine learning in real-world problems means there is a ongoing race to attract and retain researchers.

  
People in the research community have an ethos of sharing their accomplishments with the world by publishing in conferences and journals. That, presumably, has a lot to do with why we are seeing a number of those publications lately coming from places such as Salesforce.

The paper, presented by Lin in the 2018 Conference on Empirical Methods in Natural Language Processing (NLP), was well received. The authors have also released the source code on Github. But what is that all about, and what is the motivation, and the novelty of their approach?

salesforce-einstein-1024x576.png

Salesforce Einstein: A virtual AI assistant embedded in Salesforce’s offering. Salesforce is looking into ways of adding explainable question answering to its capabilities.

For Salesforce Research, it’s all about question answering. This is obvious browsing through their key topics and publications. And it makes sense, considering Salesforce’s offering: would it not be much easier and productive to ask whatever it is you are interested in finding in your CRM, rather than having to go through an API, or a user interface, no matter how well-designed those may be?

Lin said:

“In the near future, we would like to enable machines to answer questions over multi-modal information, which include unstructured data such as text and images as well as structured such as knowledge graphs and web tables. This work is a step towards a building block which enables the question answering system to effectively retrieve target information from (incomplete) knowledge graph.”

She went on to add that Salesforce Research is aiming to tackle AI’s communication problem. Lin and her colleagues work on a wide range of NLP problems, spanning from advancements in text summarization to learning how to build more efficient natural language interfaces to a unified approach to language understanding:

“Deep learning is the umbrella theme of the lab, which means we also work on areas outside NLP, including core machine learning projects such as novel neural architectures and other application areas such as computer vision and speech technology.”

Not tested on real data — yet

Lin also emphasized that deep learning is not the end all. For example, it was pointed out to her that the path-finding approach Lin’s team presented which uses deep reinforcement learning is related to the “relational pathfinding” technique proposed in a 1992 paper:

“The learning algorithm in that paper is not neural-based. My take-away from this is that revisiting earlier findings in inductive logic programming and possibly combining them with deep learning approaches may result in stronger algorithms.”

The obvious point of integration would be Einstein, Salesforce’s own virtual assistant. Based on Lin’s answers, it does not look like this work has been incorporated in Einstein yet, although conceptually it seems possible. Lin explained that this work is a research prototype, using benchmark datasets publicly available to academia.

opera-snapshot-2019-03-18-162015-arxiv-org.png

An incomplete knowledge graph, where some links (edges) are not explicit.

It seems that Salesforce data and infrastructure were not used in the context of the publication. All the data Lin used could fit into a 4G RAM machine. Special data structures for representation and storage to enable fast access of the graph were not really needed, said Lïn:

“I stored facts of the graph in a plain .txt file and read the entire graph into memory when running experiments. This is the common practice of KG research in academia. To apply the model on industry scale knowledge graphs would require special infrastructure.”

Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs. However, there are some issues with this approach: False negatives, and sensitivity to spurious paths. Lin’s work helps address those, largely by adding more links to incomplete knowledge graphs.

One thing we wondered was whether those links are stored, or generated on the fly. Lin explained that so far they have been generating answers on the fly for the prototype. But in real-world the two approaches would most likely be mixed:

“One would cache the links generated, manually verify them periodically and add the verified links back to the knowledge graph for reuse and generating new inference paths. We haven’t tested this hypothesis on real data.”

Graphs and machine learning for the win

Another contribution of Lin’s work is on what is called symbolic compositionality of knowledge graph relations in embedding approaches. Embedding is a technique widely used in machine learning, including machine learning reasoning with graphs. But this approach does not explicitly leverage logical composition rules.

For example, from the embeddings (A born_in California) & (California is_in US), (A born_in US) could be be deduced. But logical composition steps like this one are learned implicitly by knowledge graph embeddings. This means that this approach cannot offer such logical inference paths as support evidence for an answer.

Lin’s approach takes discrete graph paths as input, hence is explicitly modeling compositionality. This means it can offer the user an inference path which consists of the edges existing in the knowledge graph as support evidence. In other words, this can lead to so-called explainable AI, using the structure of the knowledge graph as supporting evidence for answers, at the expense of more computationally intensive algorithms.

knowledgegraphsmachinelearning.jpg

The combination of graphs and machine learning is a promising research direction gaining more attention as a way to bridge top-down and bottom-up AI

Combining graphs and machine learning has been getting a lot of attention lately, especially since the work published by researchers from DeepMind, Google Brain, MIT, and the University of Edinburgh. We asked Lin what her opinion on this is: Are graphs an appropriate means to feed neural networks? Lin believes this is an open question, and sees a lot of research needed in this direction:

“The combination of neural networks and graphs in NLP is fairly preliminary — most neural architectures take sequences as input, which are the simplest graphs. Even our model uses relational paths instead of relational subgraphs.”

Lin mentioned work done by researchers from USC and Microsoft [PDF], which generalizes LSTMs to model graphs. She also mentioned work done by Thomas N. Kipf from the University of Amsterdam [PDF], proposing graph convolutional networks to learn hidden node presentations which support node classification and other downstream tasks.

“It is definitely interesting to see more and more neural architectures specifically catering for which takes general graphs as input being proposed. We are seeing graphs being used to represent relations between objects across multiple AI domains these days. Graph is a powerful representation in the sense that by simply varying the definitions of nodes and edges we can model a variety of data types using it.

While inference over graphs is hard in general, it offers a potential way to integrate multimodal data (text, images, tables, etc.). UC Irvine researchers presented a really interesting paper in EMNLP, which improves knowledge graph completion by leveraging multimodal relational data. Their proposed architecture, for example, takes images and free-form texts as node features.”

The takeaway? It may be early days for graph-based machine learning reasoning, but initial results look promising. So, if one day you see your questions being answered by Einstein, along with supporting evidence for this, you will probably have graph and researchers like Lin to thank for it.

Content retrieved from: https://www.zdnet.com/article/salesforce-research-knowledge-graphs-and-machine-learning-to-power-einstein/.

Categories
knowledge connexions

The AI chip unicorn that’s about to revolutionize everything has computational Graph at its Core

AI is the most disruptive technology of our lifetimes, and AI chips are the most disruptive infrastructure for AI. By that measure, the impact of what Graphcore is about to massively unleash in the world is beyond description. Here is how pushing the boundaries of Moore’s Law with IPUs works, and how it compares to today’s state of the art on the hardware and software level. Should incumbent Nvidia worry, and users rejoice?

If luck is another word for being at the right place at the right time, you could say we got lucky. Graphcore, the hottest name in AI chips, has been on our radar for a while now, and a discussion with Graphcore’s founders was planned well before the news about it broke out this week.

Graphcore, as you may have heard by now, just secured another $200 million of funding from BMW, Microsoft, and leading financial investors to deliver the world’s most advanced AI chip at scale. Names include the likes of Atomico, Merian Chrysalis, Investment Company Limited, Sofina, and Sequoia. As Graphcore CEO and Founder, Nigel Toon shared, Graphcore had to turn down investors for this round, including, originally, the iconic Sequoia fund.

Graphcore is now officially a unicorn, with a valuation of $1.7 billion. Graphcore’s partners such as Dell, the world’s largest server producer, Bosch, the world’s largest supplier of electronics for the automotive industry, and Samsung, the world’s largest consumer electronics company, have access to its chips already. So, here’s your chance to prepare for, and understand, the revolution you’re about to see unfolding in the not-so-distant future.

Learning how the brain works is one thing, modeling chips after it is another

Graphcore is based in Bristol, UK, and was founded by semiconductor industry veterans Nigel Toon, CEO, and Simon Knowles, CTO. Toon and Knowles were previously involved in companies such as Altera, Element14, and Icera that exited for combined value in the billions. Toon is positive they can, and will, disrupt the semiconductor industry more than ever before this time around, breaking what he sees as the near-monopoly of Nvidia.

Nvidia is the dominant player in AI workloads, with its GPU chips, and it keeps evolving. There are more players in the domain, but Toon believes it’s only Nvidia that has a clear, coherent strategy and an effective product in the marketplace. There are also players such as Google, with its TPU investing in AI chips, but Toon claims Graphcore has the leading edge and a fantastic opportunity to build an empire with its IPU (Intelligent Processor Unit) chip. He cites the success of ARM mobile processors versus incumbents of the time as an example.

In order to understand his confidence, and that of investors and partners, we need to understand what exactly Graphcore does and how that is different from the competition. Machine learning and AI are the most rapidly developing and disruptive technologies. Machine learning, which is at the core of what is called AI these days, is effectively very efficient pattern matching, based on a combination of appropriate algorithms (models) and data (training sets).

Some people go to the extreme of calling AI, essentially, matrix multiplication. While such reductionism is questionable, the fact remains that much of machine learning is about efficient data operations at scale. This is why GPUs are so good at machine learning workloads. Their architecture, originally developed for graphics rendering, has proven very efficient for data operations as well.

graphcore-wordmark-brain-scan-1920x1080.jpg

Graphcore revolutionizes hardware and software, using Graphs

What Graphcore has done, however, is to invest in a new architecture altogether. This is why Toon believes they have the edge over other options, which he sees as adding ad-hoc, incremental improvements. Toon notes that what the competition does is effectively building specialized chips (ASICs) that are very good at some specific mathematical operation on data, optimized for a specific workload. This, he argues, won’t do for tomorrow’s workloads.

So, what is so special about Graphcore’s own architecture? There has been some speculation that Graphcore is building what is called a neuromorphic AI chip: A processor built after a model of the human brain, with its neurons and synapses mirrored in its architecture. Knowles, however, dispels this misconception:

“The brain is a great exemplar for computer architects in this brave new endeavor of machine intelligence. But the strengths and weaknesses of silicon are very different to those of wetware. We have not copied nature’s pattern for flying machines, nor for surface locomotion, nor for engines, because our engineering materials are different. So, too, with computation.

For example, most neuromorphic computing projects advocate communication by electrical spikes, like the brain. But a basic analysis of energy efficiency immediately concludes that an electrical spike (two edges) is half as efficient for information transmission as a single edge, so following the brain is not automatically a good idea. I think computer architects should always strive to learn how the brain computes, but should not strive to literally copy it in silicon.”

Breaking Moore’s Law, Outperforming GPUs

Energy efficiency is indeed a limiting factor for neuromorphic architectures, but it does not only apply there. Toon, when asked to comment on the limits of Moore’s Law, noted that we’ve gone well beyond what anybody thought was possible, and we still have another 10 to 20 years of progress. But, he went on to add, we’ve reached some fundamental limits.

Toon thinks we’ve reached the lowest voltage that we can use on those chips. So, we can add more transistors, but we can’t make them go much faster: “Your laptop still runs at 2Ghz, it’s just got more cores in it. But we now need thousands of cores to work with machine learning. We need a different architectural process, to design chips in different ways. The old ways of doing it don’t work.”

Toon said IPUs are a general purpose machine intelligence processor, specifically designed for machine intelligence. “One of the advantages of our architecture is that it is suitable for lots of today’s machine learning approaches like CNNs, but it’s also highly optimized for different machine learning approaches like reinforcement learning and future approaches too,” he said. “The IPU architecture enables us to outperform GPUs — it combines massive parallelism with over 1000 independent processor cores per IPU and on-chip memory so the entire model can be held on chip”.

But how does the IPU compare to Nvidia’s GPUs in practice? Recently some machine learning benchmarks were released, in which Nvidia was shown to outperform the competition. When asked for his thoughts on this, Toon said they are aware of them, but focused on optimizing customer-specific applications and workloads right now.

He has previously stated, however, that data structures for machine learning are different, as they are high dimensional and complex models. This, he said, means dealing with data in different way. Toon noted that GPUs are very powerful, but not necessarily efficient in how they handle these data structures: “We have the opportunity to create something 10, 100 times faster for these data structures”.

pentagram-map-branding-graphcore-design-dezeen-2364-col-11.jpg

Machine learning is changing the paradigm for compute, and AI chips catalyze the process. Image: Graphcore

Speed, however, is not all it takes to succeed in this game. Nvidia, for example, did not succeed just because its GPUs are powerful. A big part of its success, and a differentiator over GPU competitors such as AMD, is in the software layer. The libraries that enabled developers to abstract from hardware specifics and focus on optimizing their machine learning algorithms, parameters, and processes have been a key part of Nvidia’s success.

Nvidia keeps evolving these libraries, with the latest RAPIDS library promising a 50-fold GPU acceleration of data analytics and machine learning compared to CPUs. Where does Graphcore stand in comparison? Toon acknowledged that the software is hugely important, going on to add that, alongside building the world’s most complex silicon processor, Graphcore has also built the first software tool chain designed specifically for machine intelligence, called Poplar.

According to Toon:

“When Graphcore began there was no TensorFlow or PyTorch, but it was clear that in order to target this emerging world of knowledge models we had to rethink the traditional microprocessor software stack. The world has moved from developers defining everything in terms of vectors and scalars to one of graphs and tensors.

In this new world, traditional tool chains do not have the capabilities required to provide an easy and open platform for developers. The models and applications of Compute 2.0 are massively parallel and rely on millions of identical calculations to be performed at the same time.

These workloads dictate that for maximum efficiency models must stay resident and must allow the data to stream through them. Existing architectures that rely on streaming both application code and data through the processor to implement these models are inefficient for this purpose both in hardware construct and in the methodologies used in the tool chains that support them.”

Software 2.0, Compute 2.0, and Computational Graphs

Graphcore talks about Compute 2.0, others talk about Software 2.0. These two are strongly related as the new paradigm for application development. When discussing Compute 2.0, Toon noted that for 70 years we have been telling computers what to do step by step in a program, which is the familiar algorithmic process. Now, he said, we learn from data.

Rather than programming the machine, the machine learns — hence, machine learning. This is fundamentally changing the development and behavior of applications. The processes for building software need to be adapted, and software may display non-deterministic, or at least, non-explainable behavior. This seems to be the way of the future, however, and Toon pointed out that, with enough data and compute, we can build models that outperform humans in pattern recognition tasks.

“When we talk about Poplar being designed for machine intelligence what does that mean, what are the characteristics that such a tool chain requires. It must first and foremost use the graph as its key construct. The graph represents the knowledge model and the application which is built by the tool chain. Poplar is built around a computational graph abstraction, the intermediate representation (IR) of its graph compiler is a large directed graph,” Toon said.

Graphcore, like others, is using graphs as a fundamental metaphor upon which its approach to software and compute for machine intelligence is built. Toon noted that graph images shared by Graphore are the internal representation of their graph compiler. A representation of the entire knowledge model broken down to expose the huge parallel workloads, which Graphcore schedules and executes across the IPU processor.

The IPU processor and Poplar were designed together, and Toon said this design philosophy of both silicon architecture and software programming environment being developed in this way reflects the culture and environment of Graphcore:

“The engineering we do is open and collaborative in how we build our technology. Poplar supports the design decisions we made in our chip, building and running highly optimized machine intelligence models in place with a highly optimized BSP (bulk synchronous parallel) execution model.

It is built to support the concepts of separating compute and communication for power efficiency and also to interface with host platforms to remove the bottlenecks that plague existing platforms that are being augmented with support for machine learning rather than being designed for it.

Along with supporting and allowing users access to our high-performance IPU platform, the Poplar tool chain has to be easy to use for developers. It has to integrate seamlessly into the backend of machine learning frameworks such as PyTorch and Tensorflow and provide a runtime for network interchange formats such as ONNX both for inference and training workloads.”

Poplar supports TensorFlow, PyTorch, ONNX and Keras now, and will roll out support for other machine learning frameworks over the course of 2019 and as new frameworks appear. Toon said that, by using Poplar as the back end of these frameworks, users can get access to the benefits of having their machine learning models passed through an optimizing graph compiler for all required workloads, rather than just the simple pattern matching that gets used in legacy software platforms.

“It is not just about running the models and constructs of today, innovators and researchers need a platform to develop and explore the solutions of tomorrow with an easy to use and programmable platform,” he said. “The field of machine intelligence is being held back by software libraries for hardware platforms, which are not open and extensible providing a black box to developers who want to innovate and evolve ideas.”

The revolution is ready to ship, graph-based, and open sourced

If you are like us, you may be wondering what those graphs are like beyond their remarkable imagery. What kind of structures, models and formalism does Graphcore use to represent and work with graphs? Would they go as far as to call them knowledge graphs?

“We just call them computational graphs. All machine learning models are best expressed as graphs — this is how TensorFlow works as well. It’s just that our graphs are orders of magnitude more complex because we have orders of magnitude parallelism for the graphs to exploit on our chips,” said Toon.

But if you really are curious, there is good news — you just have to wait a bit longer. Toon promised that over time Graphcore will be providing IPU developers full open-source access to its optimized graph libraries so they can see how Graphcore builds applications. We are certainly looking forward to that, adding it to Graphcore’s current and future plans to track.

Graphcore is already shipping production hardware to early access customers. Toon said Graphcore sells PCIe cards, which are ready to slot into server platforms, called C2 IPU-Processor cards. They contain two IPU processors each. He also noted they are working with Dell as a channel partner to deliver Dell server platforms to enterprise customers and cloud customers.

According to Toon, products will be more widely available next year. Initial focus is on data center, cloud, and a select number of edge applications that require heavy compute — like autonomous cars. Graphcore is not currently targeting consumer edge devices like mobile phones.

Graphcore delivering on its promises will be nothing short of a revolution, both on the hardware and the software layer.

Content retrieved from: https://www.zdnet.com/article/the-ai-chip-unicorn-that-is-about-to-revolutionize-everything-has-computational-graph-at-its-core/.

Categories
knowledge connexions

The AI chip unicorn that’s about to revolutionize everything has computational Graph at its Core

AI is the most disruptive technology of our lifetimes, and AI chips are the most disruptive infrastructure for AI. By that measure, the impact of what Graphcore is about to massively unleash in the world is beyond description. Here is how pushing the boundaries of Moore’s Law with IPUs works, and how it compares to today’s state of the art on the hardware and software level. Should incumbent Nvidia worry, and users rejoice?

If luck is another word for being at the right place at the right time, you could say we got lucky. Graphcore, the hottest name in AI chips, has been on our radar for a while now, and a discussion with Graphcore’s founders was planned well before the news about it broke out this week.

Graphcore, as you may have heard by now, just secured another $200 million of funding from BMW, Microsoft, and leading financial investors to deliver the world’s most advanced AI chip at scale. Names include the likes of Atomico, Merian Chrysalis, Investment Company Limited, Sofina, and Sequoia. As Graphcore CEO and Founder, Nigel Toon shared, Graphcore had to turn down investors for this round, including, originally, the iconic Sequoia fund.

Graphcore is now officially a unicorn, with a valuation of $1.7 billion. Graphcore’s partners such as Dell, the world’s largest server producer, Bosch, the world’s largest supplier of electronics for the automotive industry, and Samsung, the world’s largest consumer electronics company, have access to its chips already. So, here’s your chance to prepare for, and understand, the revolution you’re about to see unfolding in the not-so-distant future.

Learning how the brain works is one thing, modeling chips after it is another

Graphcore is based in Bristol, UK, and was founded by semiconductor industry veterans Nigel Toon, CEO, and Simon Knowles, CTO. Toon and Knowles were previously involved in companies such as Altera, Element14, and Icera that exited for combined value in the billions. Toon is positive they can, and will, disrupt the semiconductor industry more than ever before this time around, breaking what he sees as the near-monopoly of Nvidia.

Nvidia is the dominant player in AI workloads, with its GPU chips, and it keeps evolving. There are more players in the domain, but Toon believes it’s only Nvidia that has a clear, coherent strategy and an effective product in the marketplace. There are also players such as Google, with its TPU investing in AI chips, but Toon claims Graphcore has the leading edge and a fantastic opportunity to build an empire with its IPU (Intelligent Processor Unit) chip. He cites the success of ARM mobile processors versus incumbents of the time as an example.

In order to understand his confidence, and that of investors and partners, we need to understand what exactly Graphcore does and how that is different from the competition. Machine learning and AI are the most rapidly developing and disruptive technologies. Machine learning, which is at the core of what is called AI these days, is effectively very efficient pattern matching, based on a combination of appropriate algorithms (models) and data (training sets).

Some people go to the extreme of calling AI, essentially, matrix multiplication. While such reductionism is questionable, the fact remains that much of machine learning is about efficient data operations at scale. This is why GPUs are so good at machine learning workloads. Their architecture, originally developed for graphics rendering, has proven very efficient for data operations as well.

graphcore-wordmark-brain-scan-1920x1080.jpg

Graphcore revolutionizes hardware and software, using Graphs

What Graphcore has done, however, is to invest in a new architecture altogether. This is why Toon believes they have the edge over other options, which he sees as adding ad-hoc, incremental improvements. Toon notes that what the competition does is effectively building specialized chips (ASICs) that are very good at some specific mathematical operation on data, optimized for a specific workload. This, he argues, won’t do for tomorrow’s workloads.

So, what is so special about Graphcore’s own architecture? There has been some speculation that Graphcore is building what is called a neuromorphic AI chip: A processor built after a model of the human brain, with its neurons and synapses mirrored in its architecture. Knowles, however, dispels this misconception:

“The brain is a great exemplar for computer architects in this brave new endeavor of machine intelligence. But the strengths and weaknesses of silicon are very different to those of wetware. We have not copied nature’s pattern for flying machines, nor for surface locomotion, nor for engines, because our engineering materials are different. So, too, with computation.

For example, most neuromorphic computing projects advocate communication by electrical spikes, like the brain. But a basic analysis of energy efficiency immediately concludes that an electrical spike (two edges) is half as efficient for information transmission as a single edge, so following the brain is not automatically a good idea. I think computer architects should always strive to learn how the brain computes, but should not strive to literally copy it in silicon.”

Breaking Moore’s Law, Outperforming GPUs

Energy efficiency is indeed a limiting factor for neuromorphic architectures, but it does not only apply there. Toon, when asked to comment on the limits of Moore’s Law, noted that we’ve gone well beyond what anybody thought was possible, and we still have another 10 to 20 years of progress. But, he went on to add, we’ve reached some fundamental limits.

Toon thinks we’ve reached the lowest voltage that we can use on those chips. So, we can add more transistors, but we can’t make them go much faster: “Your laptop still runs at 2Ghz, it’s just got more cores in it. But we now need thousands of cores to work with machine learning. We need a different architectural process, to design chips in different ways. The old ways of doing it don’t work.”

Toon said IPUs are a general purpose machine intelligence processor, specifically designed for machine intelligence. “One of the advantages of our architecture is that it is suitable for lots of today’s machine learning approaches like CNNs, but it’s also highly optimized for different machine learning approaches like reinforcement learning and future approaches too,” he said. “The IPU architecture enables us to outperform GPUs — it combines massive parallelism with over 1000 independent processor cores per IPU and on-chip memory so the entire model can be held on chip”.

But how does the IPU compare to Nvidia’s GPUs in practice? Recently some machine learning benchmarks were released, in which Nvidia was shown to outperform the competition. When asked for his thoughts on this, Toon said they are aware of them, but focused on optimizing customer-specific applications and workloads right now.

He has previously stated, however, that data structures for machine learning are different, as they are high dimensional and complex models. This, he said, means dealing with data in different way. Toon noted that GPUs are very powerful, but not necessarily efficient in how they handle these data structures: “We have the opportunity to create something 10, 100 times faster for these data structures”.

pentagram-map-branding-graphcore-design-dezeen-2364-col-11.jpg

Machine learning is changing the paradigm for compute, and AI chips catalyze the process. Image: Graphcore

Speed, however, is not all it takes to succeed in this game. Nvidia, for example, did not succeed just because its GPUs are powerful. A big part of its success, and a differentiator over GPU competitors such as AMD, is in the software layer. The libraries that enabled developers to abstract from hardware specifics and focus on optimizing their machine learning algorithms, parameters, and processes have been a key part of Nvidia’s success.

Nvidia keeps evolving these libraries, with the latest RAPIDS library promising a 50-fold GPU acceleration of data analytics and machine learning compared to CPUs. Where does Graphcore stand in comparison? Toon acknowledged that the software is hugely important, going on to add that, alongside building the world’s most complex silicon processor, Graphcore has also built the first software tool chain designed specifically for machine intelligence, called Poplar.

According to Toon:

“When Graphcore began there was no TensorFlow or PyTorch, but it was clear that in order to target this emerging world of knowledge models we had to rethink the traditional microprocessor software stack. The world has moved from developers defining everything in terms of vectors and scalars to one of graphs and tensors.

In this new world, traditional tool chains do not have the capabilities required to provide an easy and open platform for developers. The models and applications of Compute 2.0 are massively parallel and rely on millions of identical calculations to be performed at the same time.

These workloads dictate that for maximum efficiency models must stay resident and must allow the data to stream through them. Existing architectures that rely on streaming both application code and data through the processor to implement these models are inefficient for this purpose both in hardware construct and in the methodologies used in the tool chains that support them.”

Software 2.0, Compute 2.0, and Computational Graphs

Graphcore talks about Compute 2.0, others talk about Software 2.0. These two are strongly related as the new paradigm for application development. When discussing Compute 2.0, Toon noted that for 70 years we have been telling computers what to do step by step in a program, which is the familiar algorithmic process. Now, he said, we learn from data.

Rather than programming the machine, the machine learns — hence, machine learning. This is fundamentally changing the development and behavior of applications. The processes for building software need to be adapted, and software may display non-deterministic, or at least, non-explainable behavior. This seems to be the way of the future, however, and Toon pointed out that, with enough data and compute, we can build models that outperform humans in pattern recognition tasks.

“When we talk about Poplar being designed for machine intelligence what does that mean, what are the characteristics that such a tool chain requires. It must first and foremost use the graph as its key construct. The graph represents the knowledge model and the application which is built by the tool chain. Poplar is built around a computational graph abstraction, the intermediate representation (IR) of its graph compiler is a large directed graph,” Toon said.

Graphcore, like others, is using graphs as a fundamental metaphor upon which its approach to software and compute for machine intelligence is built. Toon noted that graph images shared by Graphore are the internal representation of their graph compiler. A representation of the entire knowledge model broken down to expose the huge parallel workloads, which Graphcore schedules and executes across the IPU processor.

The IPU processor and Poplar were designed together, and Toon said this design philosophy of both silicon architecture and software programming environment being developed in this way reflects the culture and environment of Graphcore:

“The engineering we do is open and collaborative in how we build our technology. Poplar supports the design decisions we made in our chip, building and running highly optimized machine intelligence models in place with a highly optimized BSP (bulk synchronous parallel) execution model.

It is built to support the concepts of separating compute and communication for power efficiency and also to interface with host platforms to remove the bottlenecks that plague existing platforms that are being augmented with support for machine learning rather than being designed for it.

Along with supporting and allowing users access to our high-performance IPU platform, the Poplar tool chain has to be easy to use for developers. It has to integrate seamlessly into the backend of machine learning frameworks such as PyTorch and Tensorflow and provide a runtime for network interchange formats such as ONNX both for inference and training workloads.”

Poplar supports TensorFlow, PyTorch, ONNX and Keras now, and will roll out support for other machine learning frameworks over the course of 2019 and as new frameworks appear. Toon said that, by using Poplar as the back end of these frameworks, users can get access to the benefits of having their machine learning models passed through an optimizing graph compiler for all required workloads, rather than just the simple pattern matching that gets used in legacy software platforms.

“It is not just about running the models and constructs of today, innovators and researchers need a platform to develop and explore the solutions of tomorrow with an easy to use and programmable platform,” he said. “The field of machine intelligence is being held back by software libraries for hardware platforms, which are not open and extensible providing a black box to developers who want to innovate and evolve ideas.”

The revolution is ready to ship, graph-based, and open sourced

If you are like us, you may be wondering what those graphs are like beyond their remarkable imagery. What kind of structures, models and formalism does Graphcore use to represent and work with graphs? Would they go as far as to call them knowledge graphs?

“We just call them computational graphs. All machine learning models are best expressed as graphs — this is how TensorFlow works as well. It’s just that our graphs are orders of magnitude more complex because we have orders of magnitude parallelism for the graphs to exploit on our chips,” said Toon.

But if you really are curious, there is good news — you just have to wait a bit longer. Toon promised that over time Graphcore will be providing IPU developers full open-source access to its optimized graph libraries so they can see how Graphcore builds applications. We are certainly looking forward to that, adding it to Graphcore’s current and future plans to track.

Graphcore is already shipping production hardware to early access customers. Toon said Graphcore sells PCIe cards, which are ready to slot into server platforms, called C2 IPU-Processor cards. They contain two IPU processors each. He also noted they are working with Dell as a channel partner to deliver Dell server platforms to enterprise customers and cloud customers.

According to Toon, products will be more widely available next year. Initial focus is on data center, cloud, and a select number of edge applications that require heavy compute — like autonomous cars. Graphcore is not currently targeting consumer edge devices like mobile phones.

Graphcore delivering on its promises will be nothing short of a revolution, both on the hardware and the software layer.

Content retrieved from: https://www.zdnet.com/article/the-ai-chip-unicorn-that-is-about-to-revolutionize-everything-has-computational-graph-at-its-core/.

Categories
knowledge connexions

Rebooting AI: Deep learning, meet knowledge graphs

Gary Marcus, a prominent figure in AI, is on a mission to instill a breath of fresh air to a discipline he sees as in danger of stagnating. Knowledge graphs, the 20-year old hype, may have something to offer there.

“This is what we need to do. It’s not popular right now, but this is why the stuff that is popular isn’t working.” That’s a gross oversimplification of what scientist, best-selling author, and entrepreneur Gary Marcus has been saying for a number of years now, but at least it’s one made by himself.

The “popular stuff which is not working” part refers to deep learning, and the “what we need to do” part refers to a more holistic approach to AI. Marcus is not short of ambition; he is set on nothing else but rebooting AI. He is not short of qualifications either. He has been working on figuring out the nature of intelligence, artificial or otherwise, more or less since his childhood.

Questioning deep learning may sound controversial, considering deep learning is seen as the most successful sub-domain in AI at the moment. Marcus on his part has been consistent in his critique. He has published work that highlights how deep learning fails, exemplified by language models such as GPT-2, Meena, and GPT-3.

Marcus has recently published a 60-page long paper titled “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence.” In this work, Marcus goes beyond critique, putting forward concrete proposals to move AI forward.

As a precursor to Marcus’ upcoming keynote on the future of AI in Knowledge Connexions, ZDNet engaged with him on a wide array of topics. Picking up from where we left off in the first part, today we expand on specific approaches and technologies.

Robust AI: 4 blocks versus 4 lines of code

Recently, Geoff Hinton, one of the forefathers of deep learning, claimed that deep learning is going to be able to do everything. Marcus thinks the only way to make progress is to put together building blocks that are there already, but no current AI system combines.

Building block No. 1: A connection to the world of classical AI. Marcus is not suggesting getting rid of deep learning, but using it in conjunction with some of the tools of classical AI. Classical AI is good at representing abstract knowledge, representing sentences or abstractions. The goal is to have hybrid systems that can use perceptual information.

No. 2: We need to have rich ways of specifying knowledge, and we need to have large scale knowledge. Our world is filled with lots of little pieces of knowledge. Deep learning systems mostly aren’t. They’re mostly just filled with correlations between particular things. So we need a lot of knowledge.

No. 3: We need to be able to reason about these things. Let’s say we know physical objects and their position in the world — a cup, for example. The cup contains pencils. Then AI systems need to be able to realize that if we cut a hole in the bottom of the cup, the pencils might fall out. Humans do this kind of reasoning all the time, but current AI systems don’t.

No. 4: We need cognitive models — things inside our brain or inside of computers that tell us about the relations between the entities that we see around us in the world. Marcus points to some systems that can do this some of the time, and why the inferences they can make are far more sophisticated than what deep learning alone is doing.

To us, this looks like a well-rounded proposal. But there has been some pushback, by the likes of Yoshua Bengio no less. Yoshua Bengio, Geoff Hinton, and Yan LeCun are considered the forefathers of deep learning and recently won the Turing Award for their work.

deeplearningiconsr5png-jpg.png

There is more to AI than Machine Learning, and there is more to Machine Learning than deep learning. Gary Marcus is arguing for a hybrid approach to AI, reconnecting it with its roots. Image: Nvidia

Bengio and Marcus have engaged in a debate, in which Bengio acknowledged some of Marcus’ arguments, while also choosing to draw a metaphorical line in the sand. Marcus mentioned he finds Bengio’s early work on deep learning to be “more on the hype side of the spectrum”:

“I think Bengio took the view that if we had enough data we would solve all the problems. And he now sees that’s not true. In fact, he softened his rhetoric quite a bit. He’s acknowledged that there was too much hype, and he acknowledged the limits of generalization that I’ve been pointing out for a long time — although he didn’t attribute this to me. So he’s recognized some of the limits.

However, on this one point, I think he and I are still pretty different. We were talking about which things you need to build in innately into a system. So there’s going to be a lot of knowledge. Not all of it’s going to be innate. A lot of it’s going to be learned, but there might be some core that is innate. And he was willing to acknowledge one particular thing because he said, well, that’s only four lines of computer code.

He didn’t quite draw a line and say nothing more than five lines. But he said it’s hard to encode all of this stuff. I think that’s silly. We have gigabytes of memory now which cost nothing. So you could easily accommodate the physical storage. It’s really a matter of building and debugging and getting the right amount of code.”

Innate knowledge, and the 20-year-old hype

Marcus went on to offer a metaphor. He said the genome is a kind of code that’s evolved over a billion years to build brains autonomously without a blueprint, adding it’s a very sophisticated system which he wrote about in a book called The Birth of the Mind. There’s plenty of room in that genome to have some basic knowledge of the world.

That’s obvious, Marcus argues, by observing what we call a social animal like a horse, that just gets up and starts walking, or an ibex that climbs down the side of the mountain when it’s a few hours old. There has to be some innate knowledge there about what the visual world looks like and how to interpret it, how forces apply to your own limbs, and how that relates to balance, and so forth.

There’s a lot more than four lines of code in the human genome, the reasoning goes. Marcus believes most of our genome is expressed in our brain as the brain develops. So a lot of our DNA is actually about building strong starting points in our brains that allow us to then accumulate more knowledge:

“It’s not nature versus nurture. Like the more nature you have, the less nurture you have. And it’s not like there’s one winner there. It’s actually nature and nurture work together. The more that you have built in, the easier it is to learn about the world.”

The best tech inventions of all time that advanced civilization ZDNet

Exploring intelligence, artificial and otherwise, almost inevitably gets philosophical. The innateness hypothesis refers to whether certain primitives, such as language, are built in elements of intelligence.

Marcus’ point about having enough storage to go by resonated with us, and so did the part about adding knowledge to the mix. After all, more and more AI experts are acknowledging this. We would argue that the hard part is not so much how to store this knowledge, but how to encode, connect it, and make it usable.

Which brings us to a very interesting, and also hyped point/technology: Knowledge graphs. The term “knowledge graph” is essentially a rebranding of an older approach — the semantic web. Knowledge graphs may be hyped right now, but if anything, it’s a 20-year-old hype.

The semantic web was created by Sir Tim Berners Lee to bring symbolic AI approaches to the web: Distributed, decentralized, and at scale. Parts of it worked well, others less so. It went through its own trough of disillusionment, and now it’s seeing its vindication, in the form of schema.org taking over the web and knowledge graphs being hyped. Most importantly, however, knowledge graphs are seeing real-world adoption. Marcus did reference knowledge graphs in his “Next Decade in AI” paper, which was a trigger for us.

Marcus acknowledges that there are real problems to be solved to pursue his approach, and a great deal of effort must go into constraining symbolic search well enough to work in real-time for complex problems. But he sees Google’s knowledge graph as at least a partial counter-example to this objection.

Deep learning, meet knowledge graphs

When asked if he thinks knowledge graphs can have a role in the hybrid approach he advocates for, Marcus was positive. One way to think about it, he said, is that there is an enormous amount of knowledge that’s represented on the Internet that’s available essentially for free, and is not being leveraged by current AI systems. However, much of that knowledge is problematic:

“Most of the world’s knowledge is imperfect in some way or another. But there’s an enormous amount of knowledge that, say, a bright 10-year-old can just pick up for free, and we should have RDF be able to do that.

Some examples are, first of all, Wikipedia, which says so much about how the world works. And if you have the kind of brain that a human does, you can read it and learn a lot from it. If you’re a deep learning system, you can’t get anything out of that at all, or hardly anything.

Wikipedia is the stuff that’s on the front of the house. On the back of the house are things like the semantic web that label web pages for other machines to use. There’s all kinds of knowledge there, too. It’s also being left on the floor by current approaches.

The kinds of computers that we are dreaming of that can help us to, for example, put together medical literature or develop new technologies are going to have to be able to read that stuff.

We’re going to have to get to AI systems that can use the collective human knowledge that’s expressed in language form and not just as a spreadsheet in order to really advance, in order to make the most sophisticated systems.”

deep-learning-pix.png

A hybrid approach to AI, mixing and matching deep learning and knowledge representation as exemplified by knowledge graphs, may be the best way forward

Marcus went on to add that for the semantic web, it turned out to be harder than anticipated to get people to play along and be consistent about it. But that doesn’t mean there’s no value in the approach, and in making knowledge explicit. It just means we need better tools to make use of it. This is something we can subscribe to, and something many people are on to as well.

It’s become evident that we can’t really expect people to manually annotate each piece of content published with RDF vocabularies. So a lot of that is now happening automatically, or semi-automatically, by content management systems. WordPress, the popular blogging platform, is a good example. Many plugins exist that annotate content with RDF (in its developer-friendly JSON-LD form) as it is published, with minimum or no effort required, ensuring better SEO in the process.

Marcus thinks that machine annotations will get better as machines get more sophisticated, and there will be a kind of an upward ratcheting effect as we get to AI that is more and more sophisticated. Right now, the AI is so unsophisticated, that it’s not really helping that much, but that will change over time.

The value of hybrids

More generally, Marcus thinks people are recognizing the value of hybrids, especially in the last year or two, in a way that they did not previously:

“People fell in love with this notion of ‘I just pour in all of the data in this one magic algorithm and it’s going to get me there’. And they thought that was going to solve driverless cars and chat bots and so forth.

But there’s been a wake up — ‘Hey, that’s not really working, we need other techniques’. So I think there’s been much more hunger to try different things and try to find the best of both worlds in the last couple of years, as opposed to maybe the five years before that.”

Amen to that, and as previously noted — it seems like the state of the art of AI in the real world is close to what Marcus describes too. We’ll revisit, and wrap up, next week with more techniques for knowledge infusion and semantics at scale, and a look into the future.

Content retrieved from: https://www.zdnet.com/article/rebooting-ai-deep-learning-meet-knowledge-graphs/.

Categories
knowledge connexions

Data.world secures $26 million funding, exemplifies the use of semantics and knowledge graphs for metadata management

Data.world wants to eliminate data silos to answer business questions. Their bet to do this is to provide data catalogs powered by knowledge graphs and semantics. The choice of technology seems to hit the mark, but intangibles matter, too.

Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, has announced it has closed a $26 million round of venture capital funding led by Tech Pioneers Fund. This is Data.world’s fourth and largest round of funding to date. The latest infusion of capital puts the total raised by Data.world at $71.3 million.

Two years after the unveiling of its enterprise offering, Data.world is showing strong growth and keeps evolving its offering. The company wants to use the investment to accelerate its agile data governance initiatives, scale to meet increased market demand for its enterprise platform, and continue to deliver its brand of product and customer service. 

We take the opportunity to review its progress, and through it, the prospects for the sector at large.

THE IMPORTANCE OF METADATA, COUPLED WITH A KNOWLEDGE-BASED FOCUS

Most of the time when we talk about data the narrative is along the lines of “data is the new oil.” While data can power insights and applications, that’s not really possible without governance and metadata. We are past the Big Data infatuation stage: Databases and data management technologies today are capable of handling the requirements of most organizations.

The question is no longer about how to store lots of data, but rather, how to organize, keep track, and make sense of those ever-growing heaps of data. This is where metadata and data catalogs come in. And this is why the metadata management market is expected to reach a massive $9.34 billion by 2023. According to Gartner:

“Metadata supports understanding of an organization’s data assets, how those data assets are used, and their business value. Metadata management initiatives deliver business benefits such as improved compliance and corporate governance, better risk management, better shareability and reuse, and better assessments of the impact of change within an enterprise, while creating opportunities and guarding against threats.”

Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. We have long argued for the importance of metadata, coupled with a knowledge-based focus, which Data.world exemplifies. Its product is based on knowledge graph technology and a collaborative approach.

metadata-management-magic-quadrant-2019.jpg
Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included.

Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included in Gartner’s Metadata Management Magic Quadrant Report in 2019Semantic Web Companywith whom Data.world has partnered, was also included. We see that as an affirmation of the fact that semantics-based knowledge graphs and metadata are a great match, and we expect to see more adoption of the approach in this space.

“Despite the challenges of the global pandemic, Data.world saw new enterprise bookings in the first half of its current fiscal year grow by more than 100% YoY. Additionally, the number of users within our enterprise customers has grown by over 1,900% in 2020 (YTD) over all of 2019 as Data.world’s accessible user interface has accelerated secular trends in remote work and more inclusive data cultures within their organizations,” CEO Brett Hurt told ZDNet.

As part of the investment, Scott Booth, chairman of Tech Pioneers Fund, will join Data.world’s board of directors. Booth noted that as one of the original investors in both Alibaba and Compass, he has seen how critical data is to driving company performance, and he thinks Data.world is changing the way enterprises think about and use data:

“It’s not simply for the data scientists and engineers, but for everyone in an organization. That bears out in how quickly the platform is deployed, adopted, and attached to business-critical use cases. We see this pattern again and again within Data.world’s expanding customer base, and it’s one of many reasons we’re so excited to work with the team and accelerate the market opportunity.”

INTANGIBLES AND PRODUCT PROGRESS

This is an important point and one on which Hurt and investors seem to converge. Data.world has a strong technical foundation, but this is not enough in and by itself. In the admittedly somewhat dry domain of metadata management, intangibles play an important role.

Hurt emphasized “a best-in-class UX designed for both business and IT users” as a key part of their strategy. Likewise, he went on to add, Data.world’s B-Corporation story means a great deal to customers and investors:

“They recognize that doing business with Data.world means having a true partner with a core mission to democratize access to data within both their organizations and broader society to drive better decisions. A B-Corp designation also helps us attract and retain the top talent from around the country, which translates to better products and services and operational excellence.”

The decision to raise capital at this time was in order to accelerate growth in sales and marketing teams, as well as increase investment in product innovation to take advantage of this massive market opportunity, said Hurt. He also added that raising most of their round after COVID-19 hit is a testament to the team’s performance and ambitious mission.

data-world.jpg
Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, today announced it has closed a $26 million round of venture capital

The company mentions strong customer reviews via Gartner Peer Insights, strategic partnerships, and extended product integrations with AWSSnowflake, Semantic Web Company, MANTA, and more than 1,000 platform updates in the past 12 months as some key achievements leading to today’s funding round. Hurt emphasized the continuous release cycle aspect of Data.world’s SaaS platform, and we were curious to know where exactly progress was made.

We were particularly interested in Gra.fo, the visual modeling tool that Data.world onboarded with the acquisition of Capsenta in 2019, as visual modeling can greatly simplify knowledge graph development. Hurt said that users can model their knowledge graph schemas and ontologies in Gra.fo and map them to Data.world datasets, thus semantically integrating data and creating an enterprise knowledge graph.

Other improvements include crowdsourcing and suggested edits/workflows, bulk edits, machine learning tagging, fully automated lineage, centralized access requests, enhanced usage metrics and reporting, curated data access and virtualization, and unified browse experience. That’s a handful indeed. Standing out among those:

The ability to do cross-database queries within the data catalog, including analysis and BI tool access such as Tableau, Excel, Jupyter, R, Python, and more. Auto-tagging to help organize and classify information assets, including automatically identifying which ones may be sensitive. And automated lineage to audit and understand how data connects.

MAPPING TECHNOLOGY TO MISSION

Data.world states its mission is to make it easy for everyone, not just the “data people,” to get clear, accurate, fast answers to any business question. The goal is to map siloed, distributed data to familiar and consistent business concepts, creating a unified body of knowledge anyone can find, understand, and use.

We think the technology Data.world has chosen is a good match for this goal, and attention on the intangibles seems to be paying off too. Data.world is growing, and the funding round comes both as an affirmation and an opportunity to fuel this growth. It will be interesting to see if others in this space decide to take a page from this book. 

Categories
knowledge connexions

Data.world secures $26 million funding, exemplifies the use of semantics and knowledge graphs for metadata management

Data.world wants to eliminate data silos to answer business questions. Their bet to do this is to provide data catalogs powered by knowledge graphs and semantics. The choice of technology seems to hit the mark, but intangibles matter, too.

Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, has announced it has closed a $26 million round of venture capital funding led by Tech Pioneers Fund. This is Data.world’s fourth and largest round of funding to date. The latest infusion of capital puts the total raised by Data.world at $71.3 million.

Two years after the unveiling of its enterprise offering, Data.world is showing strong growth and keeps evolving its offering. The company wants to use the investment to accelerate its agile data governance initiatives, scale to meet increased market demand for its enterprise platform, and continue to deliver its brand of product and customer service. 

We take the opportunity to review its progress, and through it, the prospects for the sector at large.

THE IMPORTANCE OF METADATA, COUPLED WITH A KNOWLEDGE-BASED FOCUS

Most of the time when we talk about data the narrative is along the lines of “data is the new oil.” While data can power insights and applications, that’s not really possible without governance and metadata. We are past the Big Data infatuation stage: Databases and data management technologies today are capable of handling the requirements of most organizations.

The question is no longer about how to store lots of data, but rather, how to organize, keep track, and make sense of those ever-growing heaps of data. This is where metadata and data catalogs come in. And this is why the metadata management market is expected to reach a massive $9.34 billion by 2023. According to Gartner:

“Metadata supports understanding of an organization’s data assets, how those data assets are used, and their business value. Metadata management initiatives deliver business benefits such as improved compliance and corporate governance, better risk management, better shareability and reuse, and better assessments of the impact of change within an enterprise, while creating opportunities and guarding against threats.”

Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. We have long argued for the importance of metadata, coupled with a knowledge-based focus, which Data.world exemplifies. Its product is based on knowledge graph technology and a collaborative approach.

metadata-management-magic-quadrant-2019.jpg
Data.world’s debut in Gartner’s Metadata Management Magic Quadrant Report in 2019 was an accolade for the company. Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included.

Interestingly, Data.world was not the only vendor leveraging knowledge graph technology to be included in Gartner’s Metadata Management Magic Quadrant Report in 2019Semantic Web Companywith whom Data.world has partnered, was also included. We see that as an affirmation of the fact that semantics-based knowledge graphs and metadata are a great match, and we expect to see more adoption of the approach in this space.

“Despite the challenges of the global pandemic, Data.world saw new enterprise bookings in the first half of its current fiscal year grow by more than 100% YoY. Additionally, the number of users within our enterprise customers has grown by over 1,900% in 2020 (YTD) over all of 2019 as Data.world’s accessible user interface has accelerated secular trends in remote work and more inclusive data cultures within their organizations,” CEO Brett Hurt told ZDNet.

As part of the investment, Scott Booth, chairman of Tech Pioneers Fund, will join Data.world’s board of directors. Booth noted that as one of the original investors in both Alibaba and Compass, he has seen how critical data is to driving company performance, and he thinks Data.world is changing the way enterprises think about and use data:

“It’s not simply for the data scientists and engineers, but for everyone in an organization. That bears out in how quickly the platform is deployed, adopted, and attached to business-critical use cases. We see this pattern again and again within Data.world’s expanding customer base, and it’s one of many reasons we’re so excited to work with the team and accelerate the market opportunity.”

INTANGIBLES AND PRODUCT PROGRESS

This is an important point and one on which Hurt and investors seem to converge. Data.world has a strong technical foundation, but this is not enough in and by itself. In the admittedly somewhat dry domain of metadata management, intangibles play an important role.

Hurt emphasized “a best-in-class UX designed for both business and IT users” as a key part of their strategy. Likewise, he went on to add, Data.world’s B-Corporation story means a great deal to customers and investors:

“They recognize that doing business with Data.world means having a true partner with a core mission to democratize access to data within both their organizations and broader society to drive better decisions. A B-Corp designation also helps us attract and retain the top talent from around the country, which translates to better products and services and operational excellence.”

The decision to raise capital at this time was in order to accelerate growth in sales and marketing teams, as well as increase investment in product innovation to take advantage of this massive market opportunity, said Hurt. He also added that raising most of their round after COVID-19 hit is a testament to the team’s performance and ambitious mission.

data-world.jpg
Data.world, a vendor offering a knowledge graph powered, cloud-native enterprise data catalog solution, today announced it has closed a $26 million round of venture capital

The company mentions strong customer reviews via Gartner Peer Insights, strategic partnerships, and extended product integrations with AWSSnowflake, Semantic Web Company, MANTA, and more than 1,000 platform updates in the past 12 months as some key achievements leading to today’s funding round. Hurt emphasized the continuous release cycle aspect of Data.world’s SaaS platform, and we were curious to know where exactly progress was made.

We were particularly interested in Gra.fo, the visual modeling tool that Data.world onboarded with the acquisition of Capsenta in 2019, as visual modeling can greatly simplify knowledge graph development. Hurt said that users can model their knowledge graph schemas and ontologies in Gra.fo and map them to Data.world datasets, thus semantically integrating data and creating an enterprise knowledge graph.

Other improvements include crowdsourcing and suggested edits/workflows, bulk edits, machine learning tagging, fully automated lineage, centralized access requests, enhanced usage metrics and reporting, curated data access and virtualization, and unified browse experience. That’s a handful indeed. Standing out among those:

The ability to do cross-database queries within the data catalog, including analysis and BI tool access such as Tableau, Excel, Jupyter, R, Python, and more. Auto-tagging to help organize and classify information assets, including automatically identifying which ones may be sensitive. And automated lineage to audit and understand how data connects.

MAPPING TECHNOLOGY TO MISSION

Data.world states its mission is to make it easy for everyone, not just the “data people,” to get clear, accurate, fast answers to any business question. The goal is to map siloed, distributed data to familiar and consistent business concepts, creating a unified body of knowledge anyone can find, understand, and use.

We think the technology Data.world has chosen is a good match for this goal, and attention on the intangibles seems to be paying off too. Data.world is growing, and the funding round comes both as an affirmation and an opportunity to fuel this growth. It will be interesting to see if others in this space decide to take a page from this book. 

Categories
knowledge connexions

AI and automation vs. the COVID-19 pandemic: Trading liberty for safety

Reports on the use of AI to respond to COVID-19 may have been greatly exaggerated. But does the rush to pandemic-fighting solutions like thermal scanners, face recognition and immunity passports signal the normalization of surveillance technologies?

Digital technologies have been touted as a solution to the COVID-19 outbreak since early in the pandemic. AlgorithmWatch, a non-profit research and advocacy organisation to evaluate and shed light on algorithmic decision making processes, just published a report on Automated Decision-Making Systems in the COVID-19 Pandemic, examining the use of technology to respond to COVID-19.

The report has a European lens, as AlgorithmWatch focuses on the use of digital technology in the EU. Its findings, however, are interesting and applicable regardless of geographies, as they refer to the same underlying principles and technologies. Furthermore, there is reference and comparison to the use of technology worldwide.

Is it AI or ADM?

The reports sets the stage by introducing the distinction between Artificial Intelligence(AI) and Automated Decision-Making (ADM). AlgorithmWatch notes that AI is a vague and much hyped term, to which they have long preferred the more rigorous locution ADM. AlgorithmWatch defines an ADM system as:

“A socio-technological framework that encompasses a decision-making model, an algorithm that translates this model into computable code, the data this code uses as an input — either to ‘learn’ from it or to analyse it by applying the model — and the entire political and economic environment surrounding its use.”

The point is that ADM systems are about more than technology. Rather, AlgorithmWatch notes, they are ways in which a certain technology is inserted within a decision-making process. And that technology may be far less sophisticated or “intelligent” than deep learning algorithms. The same technology can be used for very different purposes, depending on the rationale.

Data collected through a Bluetooth LTE-based smartphone app, for example, can be voluntarily and anonymously shared either with a central server or with smartphones of potentially infected individuals, with no consequences or sanctions whatsoever in case a citizen decides not to download it.

Or, the same technology can be adopted within a much more rights-invasive solution, working in tandem with GPS to continuously provide a citizen’s location to the authorities, at times within mandatory schemes, and with harsh sanctions in case they are not respected.

On that premise, the report goes on to examine different ways of using technology and collecting data employed by different initiatives around the world.

Mandatory ADM and bracelets

Some regimes have resorted to invasive ADM solutions that strongly prioritize public health and safety concerns over individual rights, notes AlgorithmWatch. China seems to be leading the way. According to a New York Times report, a color-based rating system called Alipay Health Code is used.

The system uses big data “to draw automated conclusions about whether someone is a contagion risk”. Under this model of ADM, citizens have to fill out a form with their personal details, to be then presented with a QR code in three colors:

“A green code enables its holder to move about unrestricted. Someone with a yellow code may be asked to stay home for seven days. Red means a two-week quarantine.” A scan is necessary to visit “office buildings, shopping malls, residential compounds and metro systems,” according to a Reuters report.

AlgorithmWatch goes on to add Bahrain, India, Israel, Kuwait, Russia and South Korea to the list of countries where ADM applications are used in a way that poses threats to the rights of their citizens. Although the report notes that the EU fares better in that respect, the use of apps in Hungary, Lithuania, Norway and Poland is rife with issues too.

10-cicret-bracelet.png
Technologies such as wearables take on a different dimension if their use is mandated

AlgorithmWatch provides some graphic details on some of those cases before moving on to wearables, aka bracelets. Here it’s Liechtenstein leading the way, having launched a study in which 2.200 citizens are given a biometric bracelet to collect “vital bodily metrics including skin temperature, breathing rate and heart rate.”

That data is then sent to a Swiss laboratory for analysis. The experiment, that will ultimately involve all of the citizens in the country, is based on the premise that by analyzing physiological vital signs “a new algorithm for the sensory armband may be developed that can recognize COVID-19 at an early stage, even if no typical symptoms of the disease are present.”

Wearables are also utilized in countries such as Hong KongSingaporeSaudi Arabia, the UAE, and Jordan, but also at Michigan’s Albion College. The report notes that although the stated goal is to enforce quarantine orders and other COVID-19 restrictions, organizations such as the Electronic Frontier Foundation (EFF) are deeply concerned.

The EFF states that wearables, in the context of the pandemic, “remain an unproven technology that might do little to contain the virus, and should at most be a supplement to primary public health measures like widespread testing and manual contact tracing.” Also, and importantly, “everyone should have the right not to wear a tracking token, and to take it off whenever they wish.”

How do contact tracing apps work, and actually, do they work?

The fundamental clash between different models of ADM is exemplified in the global debate around digital apps to complement contact tracing efforts, AlgorithmWatch notes. While some tech enthusiasts argued that privacy and other fundamental rights could be sacrificed to enable public healthnot everyone is in favor of that view.

Furthermore, a heated debate on the adoption of relevant technologies ensued, resulting in two main camps: GPS tracking to collect location data, and Bluetooth Low Energy to collect proximity data. The latter camp also split in two opposing lines of thought: centralized vs decentralized. Countries like France, the UK and initially Germany tried to develop centralized Bluetooth-based solutions, while Italy, Switzerland, Denmark, Estonia (and, ultimately, Germany) opted for a decentralized solution.

GPS-based apps work by collecting location data. The rationale is that the data can help health authorities reconstruct the web of contacts of an individual who tested positive to COVID-19 had. This aids contact tracing efforts, the thinking goes, by speeding them up and making them more effective and complete, while also enabling precise geographic identification of outbreaks. GPS-based apps can also enable identification of trends and enforcement of quarantine rules.

Coronavirus tracking or contact tracing application to reduce coronavirus spreading after quarantine detecting infected people.
Contact tracing applications are touted as a means to reduce coronavirus spreading. But how do they work, and do they actually work?

Getty Images/iStockphoto

Decentralized contact tracing apps work by merely signaling that two phones have been close enough to each other for long enough to consider the encounter at risk. They issue a notification of potential exposure to a positive subject, were one of the owners to be diagnosed with COVID-19 within 14 days, assuming they are willing to upload encounter data through the app.

Exposure notification APIs developed by Google and Apple for the Android and iOS operating systems which comprise the vast majority have been utilized, with varying degrees of success, while also causing some friction. The claim was that no location data would be collected. However it has been argued that Google still asked for location data to be turned on, even though not collected, to actually be able to notify users via Bluetooth.

AlgorithmWatch notes that months after the first deployments, we still lack hard evidence on the effectiveness of all such ADM systems. As a systematic review of the literature concluded after analyzing 110 full-text studies, “no empirical evidence of the effectiveness of automated contact tracing (regarding contacts identified or transmission reduction) was identified.” Why?

As the American Civil Liberties Union notes, GPS technology has “a best-case theoretical accuracy of 1 meter, but more typically 5 to 20 meters under an open sky.” Also, “GPS radio signals are relatively weak; the technology does not work indoors and works poorly near large buildings, in large cities, and during thunderstorms, snowstorms, and other bad weather.”

As for Bluetooth, even its own creators have argued for caution: problems in terms of accuracy and “uncertainty in the detection range” are very real, “so, yes, there may be false negatives and false positives and those have to be accounted for.” AlgorithmWatch elaborates further, and notes that based on the above, the efficacy of such apps is questionable.

Thermal scanners, face recognition, immunity passports: should this be our new normal?

The report also notes that for some industries, the pandemic is not exactly catastrophic. Forecasts for the thermal scanning, facial recognition, face and voice biometrics technology markets look outstanding, largely thanks to the pandemic. AlgorithmWatch dubs this both unsurprising and surprising:

“Unsurprising, given that face recognition is being widely adopted and deployed, both inside and outside the EU, with little to no meaningful democratic debate and safeguards in place. Bur surprising also, given what we know about their scant usefulness in the battle against COVID-19.”

National Institute of Standards and Technology study argues that “wearing face masks that adequately cover the mouth and nose causes the error rate of some of the most widely used facial recognition algorithms to spike to between 5 percent and 50 percent.” EFF on its part notes that thermal cameras not only present privacy problems, but can lead to false positives carrying the very real risk of involuntary quarantines and/or harassment.

hybrid-cloud-scales.jpg
The balance between liberty and safety is always a controversial issue, and the effort to tackle COVID-19 with technology brings it to the fore

Some countries are experimenting with immunity passports too, from Estonia to the UK, as AlgorithmWatch documents. The rationale for their adoption, and the case for urgently doing so, is the same: when adopted as a digital “credential,” as per Privacy International, an individual becomes able to prove his health status (positive, recovered, vaccinated, etc.) whenever needed in public contexts, thus enabling governments to avoid further total lockdowns.

Privacy International goes on to add, however, that similarly to all the tools previously described, “there is currently no scientific basis for these measures, as highlighted by the WHO. The nature of what information would be held on an immunity passport is currently unknown.”

AlgorithmWatch concludes by highlighting the common theme emerging from what has been studied: a “move fast and break things” mentality, trading liberty for safety. What’s more, there does not seem to be much in terms of evidence for safety, or in terms of a democratic debate, accountability, and safeguards in terms of giving up liberty. Or even in how to measure “success.” The focus should not be to make these technologies better, AlgorithmWatch notes, but rather to safeguard their use:

“Rushing to novel technological solutions to as complex a social problem as a pandemic can result both in not solving the social problem at hand, and in needlessly normalizing surveillance technologies.”

Categories
knowledge connexions

Graph, machine learning, hype, and beyond: ArangoDB open source multi-model database releases version 3.7

A sui generis, multi-model open source database, designed from the ground up to be distributed. ArangoDB keeps up with the times and uses graph, and machine learning, as the entry points for its offering.

If open source is the new normal in enterprise software, then that certainly holds for databases, too. In that line of thinking, Github is where it all happens. So to have been favorited 10.000 times on Github must say something about a project. Open source ArangoDB, which also offers an Enterprise version, has hit that milestone recently.

On Aug. 27, ArangoDB announces its new release 3.7, which comes with interesting new features around graph. We take the opportunity to discuss the database market, graph, and beyond, with CEO and co-founder Claudius Weinberger and Head of Engineering and Machine Learning Jörg Schad.

CLOUD AND MACHINE LEARNING READY

ArangoDB was founded in Cologne in 2014 by OnVista veterans Claudius Weinberger and Frank Celler. The team made the headlines in 2019 with their $10 million in Series A funding led by Bow Capital. As Weinberger noted, he and his co-founder have been working together for 20 years, and the decision to pursue their vision was not a spur of the moment idea:

“The main idea for ArangoDB, what is still valid today, is what we call the native multi-model approach. That means that we found a way that we can combine the JSON document data model, the graph model, and the key-value model in one database core with one query language.”

Today ArangoDB is a US company with a German subsidiary, it has a new chief revenue officer, Matt Ekstrom, and a new head of engineering, Schad. Schad joined ArangoDB last year but has been working with ArangoDB for the past four years. With a PhD in database systems, distributed data analytics, and large scale infrastructure container systems, Schad has been switching between databases.

Two key factors made him join the ArangoDB team: Distribution in a cloud setting and machine learning (ML). ArangoDB has been an early adopter of both Apache Mesos / DC/OS and Kubernetes. Eventually, Kubernetes prevailed, and ArangoDB 3.7 comes with the general availability of its Kubernetes operator, which has been developed over the last three years.

ArangoDB’s Kubernetes operator is also the foundation for its managed service Oasis, available in AWS, Azure, and GCP. The new release includes a number of improvements for faster replacement and movement of servers, improved monitoring and cluster health analysis, an advanced inspection of pod failure causes, and overall reduced resource usage. Cluster scalability improvements for on-premise deployment apply too.

arangoml-pipeline-complete-pipeline-1024x470.jpg
ArangoDB is touted as a solution to unify metadata across machine learning pipelines

ArangoDB has been promoting ArangoML: Using ArangoDB as the infrastructure for teams using ML. The idea is that beyond training data, which is a prerequisite for training ML models, metadata is also important, and using ArangoDB is a good match for that. We have long argued for the importance of metadata. But why ArangoDB, and not any other data management system?

Schad referred to his experience building machine learning pipelines for finance and healthcare use cases. One of the biggest challenges he saw there were audit trails for CCPA or GDPR, making it necessary to have a full view of the entire pipeline. They had to figure out what happens if patients withdraw consent to use their data, for example.

Just being able to identify the different ML models deployed in production was very challenging because they had to go through a number of different metadata stores — for the ML part, the data feature transformation part, and so on. So they wanted to have a common layer with all the metadata where this would end up being one query.

Relational systems are not a good match, Schad said. Machine learning features may be derived from other features, which means ending up with a lot of joins, and especially a lot of self joins. Apart from being ugly to write and maintain, those queries don’t perform well either. So this started to look like a case for a graph database — these are the types of queries graph databases excel at.

FROM GRAPH TO MULTI-MODEL AND BACK AGAIN

But still: why ArangoDB? ArangoDB is not a traditional graph database — it is a multi-model database which also supports graph. The advantage according to Schad is that this enables users to combine the flexibility of having no schema, leveraging the JSON document view of multi-model, with the structure of how things are connected as a graph:

“In the end, looking at which models have been impacted by which is being derived from just one data set, it’s just a graph traversal. So it turned out to be a really easy model, to be both flexible and very efficient in terms of formulating this query and many others as well.”

Schad went on to add that ArangoML has connectors for popular ML ecosystems like Tensorflow and PyTorch, and they are now working on Kubeflow integration. Custom integrations can be developed using a Python API. ArangoDB supports clients in Java, JavaScript, NodeJS, Go, Python, Elixir, R, and Rust.

Not having a schema, however, is not always a plus. ArangoDB 3.7 introduces JSON schema support, giving users the option to validate all new data written to the database, as well as analyze existing data validity. To us, this looks overdue. JSON schema may not be the most powerful schema mechanism around, but for a database emphasizing JSON, it’s a natural choice.

stresschaosistock-507216088a-poselenov-1.jpg
The key premise of multi-model databases is offering many views over the same data. For ArangoDB, graph is one view, document and key-value are the othersGetty Images/iStockphoto

Although ArangoDB has its own sui generis approach, we noticed that in the last year or so its messaging has shifted a bit from the multi-model aspect to emphasize graph. Its people confirmed that, mentioning they’re seeing a lot of demand for graph. Many users are coming with a graph use case and expand upon multi-model use cases later on.

The ArangoDB team believes, however, more data models are needed to support efficient and successful graph use cases. Graph and beyond, where graph is a central use case. Up until recently, the hype was all around graph, too. But those who have been into graph before it was cool knew that hypes come and go, and were expecting the hype to subside at some point.

The first sign came last week, with Gartner’s hype cycle for emerging technology in 2020 moving “graphs and ontologies” to the trough of disillusionment. Apart from the fact that conflating graphs and ontologies does not make much sense to us, we see this as a normal phase in the evolution of new, or in this case, not so new but still hyped, technology.

Schad noted that while graph use cases are on the rise, there’s still a lot of trial and error. Although use cases become more mature, some disillusionment in terms of scalability limits does exist. For Weinberger, it’s a good sign that the overall graph story is moving on, but expecting to do everything faster than other databases should not be the main reason people look at graphs.

Categories
knowledge connexions

Explainable AI: From the peak of inflated expectations to the pitfalls of interpreting machine learning models

We have reached peak hype for explainable AI. But what does this actually mean, and what will it take to get there?

Machine learning and artificial intelligence are helping automate an ever-increasing array of tasks, with ever-increasing accuracy. They are supported by the growing volume of data used to feed them, and the growing sophistication in algorithms. 

The flip side of more complex algorithms, however, is less interpretability. In many cases, the ability to retrace and explain outcomes reached by machine learning models (ML) is crucial, as:

“Trust models based on responsible authorities are being replaced by algorithmic trust models to ensure privacy and security of data, source of assets and identity of individuals and things. Algorithmic trust helps to ensure that organizations will not be exposed to the risk and costs of losing the trust of their customers, employees and partners. Emerging technologies tied to algorithmic trust include secure access service edge, differential privacy, authenticated provenance, bring your own identity, responsible AI and explainable AI.”

FROM THE PEAK OF INFLATED EXPECTATIONS TO A DEEP DIVE IN MACHINE LEARNING INTERPRETABILITY

The above quote is taken from Gartner’s newly released 2020 Hype Cycle for Emerging Technologies. In it, explainable AI is placed at the peak of inflated expectations. In other words, we have reached peak hype for explainable AI. To put that into perspective, a recap may be useful.

As experts such as Gary Marcus point out, AI is probably not what you think it is. Many people today conflate AI with machine learning. While machine learning has made strides in recent years, it’s not the only type of AI we have. Rule-based, symbolic AI has been around for years, and it has always been explainable.

Incidentally, that kind of AI, in the form of “Ontologies and Graphs” is also included in the same Gartner Hype Cycle, albeit in a different phase — the trough of disillusionment. Incidentally, again, that’s conflating. Ontologies are part of AI, while graphs, not necessarily.

That said: If you are interested in getting a better understanding of the state of the art in explainable AI machine learning, reading Christoph Molnar’s book is a good place to start. Molnar is a data scientist and Ph.D. candidate in interpretable machine learning. Molnar has written the book Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, in which he elaborates on the issue and examines methods for achieving explainability.

ethc2020.png
Gartner’s Hype Cycle for Emerging Technologies, 2020. Explainable AI, meaning interpretable machine learning, is at the peak of inflated expectations. Ontologies, a part of symbolic AI which is explainable, is in the trough of disillusionment

Recently, Molnar and a group of researchers attempted to addresses ML practitioners by raising awareness of pitfalls and pointing out solutions for correct model interpretation, as well as ML researchers by discussing open issues for further research. Their work was published as a research paper, titledPitfalls to Avoid when Interpreting Machine Learning Models, by the ICML 2020 Workshop XXAI: Extending Explainable AI Beyond Deep Models and Classifiers.

Similar to Molnar’s book, the paper is thorough. Admittedly, however, it’s also more involved. Yet, Molnar has striven to make it more approachable by means of visualization, using what he dubs “poorly drawn comics” to highlight each pitfall. As with Molnar’s book on interpretable machine learning, we summarize findings here, while encouraging readers to dive in for themselves.

The paper mainly focuses on the pitfalls of global interpretation techniques when the full functional relationship underlying the data is to be analyzed. Discussion of “local” interpretation methods, where individual predictions are to be explained, is out of scope. For a reference on global vs. local interpretations, you can refer to Molnar’s book as previously covered on ZDNet.

Authors note that ML models usually contain non-linear effects and higher-order interactions. As interpretations are based on simplifying assumptions, the associated conclusions are only valid if we have checked that the assumptions underlying our simplifications are not substantially violated.

In classical statistics this process is called “model diagnostics,” and the research claims that a similar process is necessary for interpretable ML (IML) based techniques. The research identifies and describes pitfalls to avoid when interpreting ML models, reviews (partial) solutions for practitioners, and discusses open issues that require further research.

BAD MODEL GENERALIZATION, UNNECESSARY USE OF COMPLEX MODELS

Under- or overfitting models will result in misleading interpretations regarding true feature effects and importance scores, as the model does not match the underlying data generating process well. Evaluation of training data should not be used for ML models due to the danger of overfitting. We have to resort to out-of-sample validation such as cross-validation procedures.

Formally, IML methods are designed to interpret the model instead of drawing inferences about the data generating process. In practice, however, the latter is the goal of the analysis, not the former. If a model approximates the data generating process well enough, its interpretation should reveal insights into the underlying process. Interpretations can only be as good as their underlying models. It is crucial to properly evaluate models using training and test splits — ideally using a resampling scheme.

Flexible models should be part of the model selection process so that the true data-generating function is more likely to be discovered. This is important, as the Bayes error for most practical situations is unknown, and we cannot make absolute statements about whether a model already fits the data optimally.

Using opaque, complex ML models when an interpretable model would have been sufficient (i.e., having similar performance) is considered a common mistake. Starting with simple, interpretable models and gradually increasing complexity in a controlled, step-wise manner, where predictive performance is carefully measured and compared is recommended.

Measures of model complexity allow us to quantify the trade-off between complexity and performance and to automatically optimize for multiple objectives beyond performance. Some steps toward quantifying model complexity have been made. However, further research is required as there is no single perfect definition of interpretability but rather multiple, depending on the context.

IGNORING FEATURE DEPENDENCE

This pitfall is further analyzed in three sub-categories: Interpretation with extrapolation, confusing correlation with dependence, and misunderstanding conditional interpretation.

Interpretation with Extrapolation refers to producing artificial data points that are used for model predictions with perturbations. These are aggregated to produce global interpretations. But if features are dependent, perturbation approaches produce unrealistic data points. In addition, even if features are independent, using an equidistant grid can produce unrealistic values for the feature of interest. Both issues can result in misleading interpretations.

Before applying interpretation methods, practitioners should check for dependencies between features in the data (e.g., via descriptive statistics or measures of dependence). When it is unavoidable to include dependent features in the model, which is usually the case in ML scenarios, additional information regarding the strength and shape of the dependence structure should be provided.

Confusing correlation with dependence is a typical error. The Pearson correlation coefficient (PCC) is a measure used to track dependency among ML features. But features with PCC close to zero can still be dependent and cause misleading model interpretations. While independence between two features implies that the PCC is zero, the converse is generally false.

Any type of dependence between features can have a strong impact on the interpretation of the results of IML methods. Thus, knowledge about (possibly non-linear) dependencies between features is crucial. Low-dimensional data can be visualized to detect dependence. For high-dimensional data, several other measures of dependence in addition to PCC can be used.

Misunderstanding conditional interpretation. Conditional variants to estimate feature effects and importance scores require a different interpretation. While conditional variants for feature effects avoid model extrapolations, these methods answer a different question. Interpretation methods that perturb features independently of others also yield an unconditional interpretation.

Conditional variants do not replace values independently of other features, but in such a way that they conform to the conditional distribution. This changes the interpretation as the effects of all dependent features become entangled. The safest option would be to remove dependent features, but this is usually infeasible in practice.

When features are highly dependent and conditional effects and importance scores are used, the practitioner has to be aware of the distinct interpretation. Currently, no approach allows us to simultaneously avoid model extrapolations and to allow a conditional interpretation of effects and importance scores for dependent features.

MISLEADING EFFECT DUE TO INTERACTIONS, IGNORING ESTIMATION UNCERTAINTY, IGNORING MULTIPLE COMPARISONS

Global interpretation methods can produce misleading interpretations when features interact. Many interpretation methods cannot separate interactions from main effects. Most methods that identify and visualize interactions are not able to identify higher-order interactions and interactions of dependent features.

There are some methods to deal with this, but further research is still warranted. Furthermore, solutions lack in automatic detection and ranking of all interactions of a model as well as specifying the type of modeled interaction.

Due to the variance in the estimation process, interpretations of ML models can become misleading. When sampling techniques are used to approximate expected values, estimates vary, depending on the data used for the estimation. Furthermore, the obtained ML model is also a random variable, as it is generated on randomly sampled data and the inducing algorithm might contain stochastic components as well.

Hence, the model variance has to be taken into account. The true effect of a feature may be flat, but purely by chance, especially on smaller data, an effect might algorithmically be detected. This effect could cancel out once averaged over multiple model fits. The researchers note the uncertainty in feature effect methods has not been studied in detail.

group-of-people-on-peak-mountain.jpg
It’s a steep fall to the peak of inflated expectations to the trough of disillusionment. Getting things done for interpretable machine learning takes expertise and concerted effort.

Simultaneously testing the importance of multiple features will result in false-positive interpretations if the multiple comparisons problem (MCP) is ignored. MCP is well known in significance tests for linear models and similarly exists in testing for feature importance in ML.

For example, when simultaneously testing the importance of 50 features, even if all features are unimportant, the probability of observing that at least one feature is significantly important is ≈ 0.923. Multiple comparisons will even be more problematic, the higher dimensional a dataset is. Since MCP is well known in statistics, the authors refer practitioners to existing overviews and discussions of alternative adjustment methods.

UNJUSTIFIED CAUSAL INTERPRETATION

Practitioners are often interested in causal insights into the underlying data-generating mechanisms, which IML methods, in general, do not provide. Common causal questions include the identification of causes and effects, predicting the effects of interventions, and answering counterfactual questions. In the search for answers, researchers can be tempted to interpret the result of IML methods from a causal perspective.

However, a causal interpretation of predictive models is often not possible. Standard supervised ML models are not designed to model causal relationships but to merely exploit associations. A model may, therefore, rely on the causes and effects of the target variable as well as on variables that help to reconstruct unobserved influences.

Consequently, the question of whether a variable is relevant to a predictive model does not directly indicate whether a variable is a cause, an effect, or does not stand in any causal relation to the target variable.

As the researchers note, the challenge of causal discovery and inference remains an open key issue in the field of machine learning. Careful research is required to make explicit under which assumptions what insight about the underlying data generating mechanism can be gained by interpreting a machine learning model

GROUNDWORK VS. HYPE

Molnar et. al. offer an involved review of the pitfalls of global model-agnostic interpretation techniques for ML. Although as they note their list is far from complete, they cover common ones that pose a particularly high risk. 

They aim to encourage a more cautious approach when interpreting ML models in practice, to point practitioners to already (partially) available solutions, and to stimulate further research.

Contrasting this highly involved and detailed groundwork to high-level hype and trends on explainable AI may be instructive.