“Machine learning” rates very high on the buzz-word scale, right up there with “nano-technology” and “blockchain”. Like most buzz it is more noise than substance. However, every now and again it looks like there might be something in the noise that bites. This episode of the Cheesecake Files1 is about testing an algorithm (another buzz word) developed through a machine learning technique for the early detection of heart attacks (strictly – myocardial infarction).
Before I begin my story in earnest a couple of words about the buzz words. When I say “algorithm” think “recipe”. In the context of emergency medicine this is simply a series of steps which assist the medical team in their decision making. For example – if the presenting complaint is “chest pain” the triage nurse will connect up a device (ECG) to measure the electrical activity of the heart and will draw some blood and send it off the the lab with specific instructions to measure the concentration of a molecule called troponin. Several years ago we introduced in all New Zealand emergency departments more detailed pathways (ie an algorithm) which included guidance on which other data to obtain from the patient, when to repeat blood measurements and how all the data goes together to risk stratify the patient. The principal aim being to ensure that as quickly as possible, and as safely as possible, physicians could rule out the presence of a heart attack. This is important because patients presenting with possible heart attacks are one of the most common presentations to the ED and so if they remain in the ED a long time this can affect the whole service. However, only about 10-15% (in NZ) are actually having a heart attack. Many of those who aren’t can now be reassured early that they are not. Please note – if you’ve sudden onset chest pain then the ED is the right place for you. Just because most who attend are not having a heart attack doesn’t mean that you might not be.
The other buzz word is “machine learning.” This term is usually used to mean a computational technique which involves giving a computer some data and some basic instructions how to look at it. Then asking the computer to make a prediction of an outcome (in our case, whether a patient is having a heart attack or not). The prediction is compared to the actual outcomes and information on how well the computer performs is feedback into the machine to tweak some of the algorithm. Think of this as tasting the soup and then adding a few more spices. The process is repeated many times until the soup is as good as it can possibly be. Some recipes we know and can follow ourselves. Some happen behind closed doors as a team of chefs puts together a meal. A characteristic of machine learning algorithms is that they are often not easily understood (a “black-box”), but the proof of the pudding is in the eating. This leads me to the story that is the current cheesecake.
Nearly three years ago we were asked to test if an algorithm called MI3 works to risk stratify people who appear in the emergency department with symptoms suggestive of a heart attack. The algorithm had been developed by a US based diagnostic company called Abbott Diagnostics. We were given access to the black box and could input variables from real patients and observe the predicted outcome. In this case the algorithm was producing a number that very closely corresponded to the probability of a patient having a heart attack. There were very few variables required to make this prediction – sex, age, two measures of troponin and the time between the two measures. The latter is important because how troponin concentrations change over time informs us about the possible heart attack.
A collaboration of research groups from Scotland, Switzerland, Germany, United States of America, Australia and New Zealand came together to provide sufficient data to test MI3. This group was lead by Christchurch ED physician Dr Martin Than, and Scottish cardiologist, Prof Nicholas Mills. I was charged with pulling together all the data and conducting the statistical analysis of the performance of MI3.
There were about 8000 patients in our testing data set with 10.6% of them having a heart attack. Importantly, the first thing I noted is that the values output by the algorithm corresponded to the true rate of heart attacks. ie when the MI3 value was 5 about 5% of those with this value were having a heart attack, when it was 90 about 90% of people were having a heart attack. In other words, the algorithm was well calibrated – this can give physicians confidence. The second thing was to see if we could find MI3 values below which we could say that almost everyone is not having a heart attack (it’s impossible to be 100% certain – we aim for about 99% or better). We were able to find such a value and show that it identified an impressive 69% of people as low-risk. The full results are available in the cardiology journal Circulation – here.
So, how may this be used? The difference with this algorithm compared with others is three-fold (i) it does not require blood samples to be taken an specific set intervals, (ii) it does not require information about patient history or detailed signs and symptoms to be gathered and incorporated, (iii) and the output is a probability rather than simply stratifying patients to a low, intermediate or high risk category. In other words, the inputs are simple and objective, and the output is easily interpretable. In practice, the physician may receive the MI3 value from the labs along with the troponin results. This may aid discussions with the patient through the use of icon arrays or similar (see the figure).
1 Once upon a time, a long long time ago, I received a cheesecake for every publication. Sadly, those days are gone now. But I live in hope.
Disclaimer: I have acted as a consultant statistician for Abbott Diagnostics. I have no shares or intellectual property associated with MI3. Abbott was not involved in the testing of the algorithm.