Diagnosing diseases with big data

Can the large data volumess be used to diagnose diseases?

Needless to say, a real mountain of data does not just consist of personal data, it also combines the data of many people – that is called big data, perhaps best described as "extremely large data sets". Data collected from our network activities – shopping, internet surfing, social network posts, apps, discounts or credit cards – is often used to target us with custom ads that are tailored to our specific interests. Algorithms draw conclusions about our interests and preferences based on our habits, which may be translated into a recommendation and possible sales for a company: "Customers who bought this item also bought ..."

Medical science has not reached this point yet. For one, it is not as networked and connected as other areas of our lives. Oftentimes, different doctors store our data in different records and files, some are in digital, others in paper version. To use big data effectively and efficiently in medicine, this data must first be merged and consolidated. Big data plays a similar role in medicine as it does in retail: it is all about characteristics and patterns that allow conclusions on how to personalize a patient’s treatment or to support the diagnostic process: "The following characteristic shown in the MRI scan is frequently indicative of a tumor."

Algorithms in medicine: Learning from data
Our data becomes really useful for medicine only when it is digitized
Our data becomes really useful for medicine only when it is digitized, cosolidated and structured in a way algorithms can use it.

Just like doctors, algorithms also have to first learn how to draw the right conclusions. However, there are two distinct differences: First, algorithms excel where physicians tend to tire quickly. For instance, when they have to quickly scan many images and search for potential findings. Secondly, algorithms do not require years of study to complete these types of tasks. This eases physician workload and frees up their time for other responsibilities that necessitate specialist in-depth knowledge. "You do not search for methods to replicate and artificially create intelligence. Instead, you look for ways to teach computers to solve a cognitive task as quickly and efficiently as possible and do at least as well as a human being," states Dr. Markus Wenzel from the Fraunhofer Institute for Medical Image Computing MEVIS.

Humans assign algorithms a cognitive task, that is, a task that relates to perception. Human beings are very good at this because our sense of sight and our sense of hearing are made to recognize faces, voices and language even if those are not clearly visible, as is the case in changing light conditions or ambient noise settings for instance. These tasks are extremely difficult for computers because it is impossible or difficult to describe and explain them via logical rules, which is what computers are based on.

In radiology, the type of task might be, "Find all MRI scans showing a tumor". The researcher feeds the algorithm image data for learning purposes. Simply put, the algorithm then compares the image data, identifies the features detectable on these images and recognizes patterns and regularities. Based on this information, the algorithm can perform and solve the task and independently detect images with tumors. When it comes to the algorithm, more data gives better results. "A computer has more opportunities when it can choose the important features by itself. This is why deep learning is often also called feature learning because the computer picks the features it deems important in the images to solve the task," says Wenzel.

Different data sources depict a more accurate picture
Different data sources depict a more accurate picture
Access to different data sources creates a clearer image of us and our health's development. On the downside, data processing by humans and algorithms becomes much more complex.

Before they train their algorithms, researchers and developers are often confronted with finding or generating sufficient and relevant data. This includes not just comparative data from healthy individuals but also sufficient data revealing pathological changes. It is quite easy to export image data with corresponding findings from hospital information systems. This might be one reason algorithms are so far chiefly studied in radiology, pathology and dermatology. This type of data is abundantly available in these fields and image recognition via algorithms has been researched for a long time – outside the realm of medicine – and implemented in numerous scenarios. That's why the premises are great in these settings. Things get more difficult if the algorithm is tasked to access data from different sources. In Wenzel's opinion, this is still a long way off. "It will still take some time before all information about a specific patient and point in time can be automatically collected from the information systems and processed to where computers can learn from it." Wenzel adds that algorithms could use this to predict the patient's further progression.

One project that already researches the linking of various data sets is iPrognosis. In an interview, Dr. Lisa Klingelhöfer from the Dresden University of Technology describes the main goal of this European Horizon 2020 project, a smartphone app: "It aims to detect and measure the motor and non-motor symptoms of Parkinson's disease. In doing so, we hope to identify individuals who experience early symptoms of Parkinson's disease, prompting them to consult a physician at a much earlier stage." The app collects different user data and transmits the data to the study center in Dresden in encrypted and anonymized form. This gives the analyzing algorithm the opportunity to recognize features and patterns in different types of data and identify changes that suggest the early stages of Parkinson's disease.

These might be changes in voice and speech patterns or the way users operate their smartphones for example. Other wearables such as a smartwatch that monitors eating habits, and a smart belt that screens gastrointestinal tract function likewise collect data on possible changes that occur over time. These changes are often very subtle and go unnoticed during clinical observations that are time-limited. Once again, the algorithm perceives things differently but more accurately and to a greater extent than a human physician is able to do.

This also shows the purpose of machine learning in medicine: nobody intends to replace physicians with artificial intelligence (AI). Yet AI has the ability to ease physician workload and – for specific concerns –, analyze patients more accurately and faster than human beings are able to. Having said that, physicians ultimately have one crucial advantage: they accumulate years of experience in their respective field, which does not become part of big data and cannot be defined through features and patterns. Based on this experience, they are also able to draw conclusions in a creative and independent manner. That's why the final diagnosis will always require a physician as a supervisory authority - even if an algorithm is able to suggest possible diagnoses.

The article was written by Timo Roth and translated from German by Elena O'Meara.