The industry as a whole is beginning to realize the intimate connection between Artificial Intelligence and its less heralded, yet equally viable, knowledge foundation. The increasing prominence of knowledge graphs in almost any form of analytics—from conventional Business Intelligence solutions to data science tools—suggests this fact, as does the growing interest in Neuro-Symbolic AI.
In most of these use cases, graphs are the framework for intelligently reasoning about business concepts with a comprehension exceeding that of mere machine learning. However, what many organizations still don’t realize is there’s an equally vital movement gaining traction around AI’s knowledge base that drastically improves its statistical learning prowess, making the latter far more effectual.
In these applications graphs aren’t simply providing an alternative form of AI to machine learning that naturally complements it. They supply the setting—the visual capabilities, dimensionality, and topology—for expanding the merit of the vectors at statistical AI’s core with a range of techniques including embedding, manifold learning, and clustering.
Utilizing AI’s knowledge base to better its statistical base via graph’s ability to increase machine learning’s aptitude is further testament to the undisputed truth that, for AI, “if you’re not using a diversity of approaches you’re limiting yourself in the generality of your system,” remarked Kyndi CEO Ryan Welsh.
Embedding
The chief advantage knowledge graphs provide for machine learning is a relationship-savvy environment for depicting all the intricacies of the connections between individual nodes of data. A technique known as embedding is particularly useful in this regard. According to Katana Graph CEO Keshav Pingali, “Embeddings are a way to find relationships between entities that are not always obvious if you look at the connectivity of the graph.” In some use cases, embeddings simplify—if not obviate—the need for otherwise time consuming feature engineering.
In almost all deployments they identify non-linear relationships between entities to inform query results and searches for attributes used to build machine learning models. “The beauty of embeddings is if you do them right, those seemingly unrelated nodes that are far away in a graph end up close together in a three dimensional space,” Pingali observed. Such results are critical for building machine learning models with the proper weights and measures to maximize the use of training data for the most accurate models possible.
Mappings
Embeddings are also a means of accelerating cognitive search capabilities to return nuanced results to users in a fraction of the time otherwise required to do so. Pingali mentioned there are multiple ways to perform embeddings, some of which involve Convolutional Neural Networks and web to web techniques. “What you do in a vector space model is you take every node and you map it into some point in a large dimensional space,” Pingali revealed. “And then the way that mapping happens is that there are related nodes, nodes that are similar in their properties, that may not be related together in the graph.”
Embeddings position those nodes together so the topology displays their similarities according to business problems. In the pharmaceutical space this approach is helpful for “hypothesis generation,” Pingali explained. “It’s very expensive to try out drugs in the lab to see whether they work, so if you can narrow down the candidates by using graph and AI, then that saves [companies] a tremendous amount of money.”
Manifold Learning
In other use cases, firms can leverage embeddings to search for the most relevant pharmaceutical elements or compounds to treat a healthcare issue like diabetes 2, for example. The relationship discernment of word embeddings is also widely used for aspects of natural language technologies including conversational AI, natural language generation and more, as part of “statistical machine learning that allow us to learn the relatedness between words,” Welsh disclosed. For example, applications of word embedding—which provides some cognitive search utility—can determine connections between the terms ‘apartment complex’ and ‘building’.
Machine learning can “relate those two words in a lower dimensional space called a manifold, which is, essentially, you’re embedding these relationships into this lower dimensional space,” Welsh noted. Manifold learning provides non-linear dimensionality reduction. It’s particularly efficacious on high dimensionality datasets and furnishes a fair amount of value for both unlabeled and labeled data. Manifold learning is frequently used in conjunction with embeddings and graph environments. This mixture is directly responsible for some of the relatively recent gains in natural language technologies.
Clustering
Clustering is a form of unsupervised learning that excels in graph settings because they depict the intricacies of relationships between nodes, data types, and classifications in ways that are difficult for other systems—particularly relational ones—to match. Clustering techniques are invaluable to machine learning deployments because they don’t need the surfeit of labeled training data that oftentimes prevents enterprise level supervised learning applications from getting implemented. According to Pingali, graph milieus are frequently sought for clustering when there are hierarchical data involved (such as taxonomies, for example).
“The graphs come in right there, because anytime you have hierarchies and clusters and so on, you need a graph representation,” Pingali said. Knowledge graph data representations, Louvain clustering, and embeddings are responsible for the rapid processing of data required for the fintech space and others. Pingali described a fintech use case in which “we’re able to do Louvain clustering for datasets which are many billions of vertices and 30 billion edges. It uses all of the graph technologies we use for embedding knowledge graphs. We’re doing all these graph computations in a scale out way.”
A Retrospective
The overall impact of knowledge graphs for enhancing the enterprise worth of machine learning deployments is illustrated most acutely in a retrospective of the above use cases. In the fintech vertical, graph aware cognitive computing is responsible for rapidly processing data at extremely high velocities essential to the success of this industry. By extension, therefore, this approach can do the same thing for martech, insurtech, and other emergent, data-driven service industries.
The horizontal applicability of manifold learning and word embeddings, as realized through natural language technologies including conversational search, natural language understanding and more, is readily apparent. This type of functionality is silently seeping into an array of BI or search tools, although it’s considerably aided by other aspects of knowledge graphs pertaining to semantic inferencing.
Finally, in the pharmaceutical industry (as well as several others) this cognitive computing methodology is expediting feature generation for machine learning models while issuing ad-hoc queries in rapid time frames. Graph capabilities are quietly underpinning machine learning as a whole; their impact on this branch of AI should only become more pronounced in the days to come.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Sign up for the free insideAI News newsletter.