Semantics in Medical Research
The seminal Cancergrid project was a collaboration between five UK universities: Cambridge (specializing in oncology), Oxford (software engineering), University College London (semantic modelling), Birmingham (clinical trials management), and Belfast (telemedicine). It was funded in the first instance for three years from 2005 by the UK Medical Research Council. Oxford University and the Cancer Research UK Cambridge Research Institute continued the work when the original project ended in 2008. Microsoft Research had involvement in 2011 to 2012.
The project was initiated in order to “address the twin problems of interoperability between incompatible software systems and generativity in clinical trials” to inform next generation approaches, taking a model-driven approach to the development of trials management tools.
The CancerGrid approach addressed the two problems of data integration and tool generation, via the collection and management of metadata in the first case, and model-driven engineering in the second — improving the science through greater effectiveness, and reducing drudgery through greater efficiency.
It concluded as follows
“… there is growing recognition that the large-scale sharing and integration of data from dynamic, heterogeneous sources requires computable representations of the semantics of data, and it is here that a significant part of the challenge lies. Natural language or informal understanding is sufficient for such a semantics only when the concepts are straightforward, the community is small or homogeneous, and the period of time over which understanding must be maintained is short. For problems of any complexity, communities of any size, or initiatives that are intended to last for many years, a more formal approach is required. The semantics has to be amenable to automatic processing, and this processing has to be automatically linked to the processing of the data itself.
Their development will require effective collaboration with users and domain experts, as well as advances in semantics — and model-driven software engineering research, above the level of industry-based technological development. The grand challenge for information-driven health is to make semantics-driven management of data standard practice across the whole spectrum of healthcare and medical research.” [Read more articles here]
Subsequent funded projects applying the CancerGrid approach include:
1. Accelerating Cancer Research Using Semantics-Driven Technology , funded by Microsoft Research, exploring the extension from phase III to early-phase studies;
2. Evolving Health Informatics, funded by Research Councils UK, working with colleagues in the Centre for Clinical Vaccinology and Tropical Medicine at the University of Oxford to demonstrate applicability to infectious disease control;
3. Hospital of the Future, aiming to improve patient outcomes through information-driven management;
4. The Data Support Service, funded by the UK Medical Research Council (MRC), to retrospectively catalogue the data collected in some of the MRC’s valuable long-running studies; and
5. The Union of Light-Ion Centres in Europe, funded by the European Union Seventh Framework Programme, to curate experimental results in particle therapy.
In addition, collaboration with local colleagues continued on a pro bono basis. The vaccinology study discussed in the link below is one such; the Oxford Team were able to reduce the time taken to produce a complete set of semantically annotated forms for the study from six months to as many weeks — not least because a set of prototype forms could be shown to clinical researchers with a turnaround of a few days, while the discussion of their design was still fresh.
Despite an incomplete proposal in 1998 from Professor Sir Tim Berners-Lee et al for “semantics-driven management of data” via a “Semantic Web”, so-called Semantic Computing suffered its own “AI Winter” until very recently.
Semantic Computing data models, called “RDF” and “Ontologies”, were based on the most sophisticated mathematics models ever introduced into computing. These models were derived from the field of “Topology’, and were sadly judged by commercial software companies as “too complex”.
Machine Learning techniques such as Artificial Neural Networks came to prominence around 2012 and were soon being hyped to be the key technology for Artificial Intelligence.
New “graph” databases targeted computer “Knowledge Engineering”. The most popular, known as Key Value Graphs and Labelled Property Graphs were seen as easier to use with quicker results than the “complex” Semantic Knowledge Graph. Also typical of the software industry, Labelled Property Graph technologies were hyped, bordering on technological fantasy.
Things have changed. In the last two years there has been increasing skepticism about Machine Learning techniques. Property Graphs are now considered the “Learning Wheels” of Knowledge Engineering as database vendors scramble to add semantic capabilities to their products.
The reality is that the unique capabilities of Semantic Knowledge Graphs for automated and robust Artificial Intelligence are now acknowledged by Artificial Intelligence leaders. Software giants Google, Facebook, LinkedIn, Amazon, Oracle and IBM have quietly adopted Semantic Knowledge Graphs and semantic technologies in their core software platforms.
Knowledge engineers and scientists have begun to deploy semantic technologies; now understanding that it is different rather than complex, and elegant rather than verbose.
Small companies like us have developed software workbenches that hide the perceived complexity “under the covers” to speed productivity with ease of use. Deep learning algorithms for extracting useful information from digitized text are more common and more powerful. Training techniques for such algorithms are improving.
We have arrived at the cusp of elementary but genuine computer intelligence, based on
1. Computer knowledge being stored in Semantic Knowledge Graphs
2. The complementary use of advanced Machine Learning technologies to extract knowledge from digitized text (unstructured data)
3. The use of inference engines for reasoning, and open linked data for data integrity and sharing
4. The ability of computers to learn and reason, by themselves, finding hidden relationships between entities described in massive amounts of data (like 50,000 plus research papers) in hours rather than months or years
What experts say:
“Most of our clients start directly with Knowledge Graphs, but we recognize that that isn’t the only path. Our contention is that a bit of strategic planning up front, outlining where this is likely to lead gives you a lot more runway. You may choose to do your first graph project using a Property Graph, but we suspect that sooner or later you will want to get beyond the first few projects and will want to adopt an RDF / Semantic Knowledge Graph based system.” [Read full article]
People think RDF is a pain because it is complicated. The truth is even worse. RDF is painfully simplistic, but it allows you to work with real-world data and problems that are horribly complicated. While you can avoid RDF, it is harder to avoid complicated data and complicated computer problems. I could not have come up with a better conclusion”. [Read full article]
“Semantic Computing technology is used to digitally represent knowledge for AI systems to understand their world in context, using semantic “graphs”, ontologies, metadata and rules. Semantic technology can represent relationships between data and ties together different pieces of data because of like attributes (including hidden relationships).” [Forrester Research, 2017]
“ When it comes to enabling the AI your company needs, think Semantic Graph.” “Collapsing the IT Stack” — Alan Morrison, PwC Senior Research Fellow, Emerging Tech, September 2018
Research papers as a resource
It is simply not possible for a researcher, or even a group of researchers, to efficiently analyze 50,000 plus research papers without Artificial Intelligence. While an effective treatment or cure for Corona Virus may well be hidden in one or more research papers, they are highly unlikely to be found using indexed research papers stored in traditional databases. Semantic Knowledge Graphs are essential for an Artificial Intelligence approach as described in an example published this week:
“If you turned (sic) the BenevolentAI 250-person team and turned all of them into 65-year-old ex-pharmacology teachers, it would have taken probably a year to come up with this treatment. Instead, it took my three colleagues working two hours a day, and myself working full time, three days to come up with this. We’ve gone from computer to bedside, as it were, in two months.” [Read full article].
Researchers must demand these tools from their Data Science colleagues; they are now available and are increasingly sophisticated yet easier to use.
The ability to combine deep insights into research papers and real-time access to clinical data is now a reality. Trials to test new treatments, interventions or tests as a means to prevent, detect, treat or manage various diseases can be shared in real time and analyzed in real time against knowledge extracted from past research and clinical outcomes.
Semantic Knowledge Graphs leave alternative technologies in their wake. Robust and automated Artificial Intelligence required for analysis of complex datasets is not possible without full functioning Semantic Knowledge Graphs. Over thirty years of research and standards development has resulted in data models that are mathematically pure and the most powerful, consistent and reliable in the history of computing.
Professor Jim Davies said of the seminal Cancergrid project that they sought generativity, to inform next generation approaches. His Oxford University colleague and Professorial Fellow Tim Berners-Lee said in 1998 “I have a dream for the web (in which computers) become capable of analyzing all the data on the Web — the contents, links, and transactions between people and machines.” Both of these brilliant men have met their objectives of informing the next generations of technology, as they must surely be frustrated by the lack of leadership and progress.
Coronavirus is an opportunity to create the greatest collaboration in human history; a collaboration between the world’s most brilliant researchers and their collective linked Semantic Knowledge systems, both of them searching for hidden relationships in lethal enemies that are roughly one-900th the width of a human hair, creating new knowledge in days, not years.
The persistence of bacterial and viral breakouts and the increasing ferocity of the corona virus families have become Black Swan events with severe global economic outcomes. We don’t know how long this latest will last, and if or when we will find a treatment, a vaccine, or a cure.
The need to arm Researchers with real Artificial Intelligence tools based on decades of research and facilitated by recent breakthroughs in computing scalability has become an urgent one. We need to get on with it now.
Author: Mark Bradley is the Founder and sales chief of Cognitive Software Group, the leading cognitive computing company in Australia. www.cognitivesoftware.com