false
Catalog
Science of Neurosurgical Practice
Diagnostic Studies
Diagnostic Studies
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Okay, we're going to talk about the diagnostic tests, and I am pleased to be doing this one. I like this talk for a couple reasons. It helped me to realize that what Dr. Haynes actually mentioned to us earlier, and that is the paradigm that we work with in our heads when we think about evidence-based medicine usually has to do with therapeutic trials. And so we think about the randomized clinical trial and the observational studies, all those being for therapeutic benefit, but the truth is that the application of the statistics is for far more than just the therapeutic trial. And so we're going to be looking at diagnostic studies here, and so Michael likes to remind us that life here is, you got all kinds of influences in medicine, and that for all kinds of reasons, all right, but what we're going to do is try to get to the truth of the matter. And speaking of that, just one more reminder that's not on the slide, and that is we read Buddha every day in discharge planning rounds, and Buddha has a saying that I think taught me as much about research as anything. And that was, Buddha says that it's not your preferences that get you into trouble, it's your attachments to them, and that is really the key of eliminating bias. We all are prone to bias because that's who we are, all right? We don't mean to do it, but there it is. So what we're going to do is, this talk really on diagnostic studies has two different applications. One of them is simply how you use diagnostic studies on a day-to-day basis, and that's what I think is so fascinating about this. And then we're also going to look at how you read the literature and how you decide whether the paper that you're reading is doing a good job at telling you whether the new diagnostic study they're testing is going to be applicable or not. There's a lot of objectives to this, you have a lot of pieces to it, and we'll go through all of these parts right now. So this is our overview, we're going to look at diagnostic methods, we're going to look at how we use that information. We are stuck with this trade-off between sensitivity and specificity, again, clinical decisions, and then strength of data. So again, if I were to characterize neurology and neurosurgery here, from the neurosurgeon's point of view, the first method is the method of the neurologists. You just order every test that is known to mankind, and when you're done, you sift through the data and see if you have an answer, okay? That's not a very good approach, is it? The second approach is how the neurologists look at how neurosurgeons make a diagnosis, they look across the room and they go, oh, I know what this is, and they just give you the answer just like that. So that's also not a very good method, they shoot from the hip, if you will. All of us need to do this correctly, and correctly means that when you have some initial data on the patient and their chief complaint, basically you're going to immediately start a process of differential diagnosis, which includes hypothesis generation and testing. And what I'd love to get across is that every question that you ask in your history or your review of systems is a test. And like any other test that we will talk about, it has a sensitivity and a specificity. And I would also say that the art of doing neurology and neurosurgery is to be able to learn to ask your question or perform your physical exam in such a way that you actually increase your sensitivity and increase your specificity at the same time. That's actually the art of medicine, and that can be done. But we're not talking about that sort of thing, we're talking about diagnostic tests that are less under our control, where we're going to have this trade-off, all right? You make a hypothesis, you test the hypothesis, that's what this is going to be about. Apply this in clinical diagnosis. Now you do pre-test probability. Now this is something, this whole issue. So first of all, clinicians are Bayesian statisticians, all right, get used to it. People think of Bayesian as a dirty word, actually this really goes back to the history of statistics. Bayesian is actually older than what we think of as statistics, which is a Fisher type of statistics. In a Fisher type of statistics, every test is independent of everything else. In the Bayesian, that's not true. In the Bayesian, you make use of the information that you already started with. A lot of people think that's a bad idea, but as clinicians, we do that all the time. We're always starting with a set of information, which might be prevalence data, or might be just simply our gestalt of what the patient has. And on rounds, we routinely, I will routinely require of our team that we go around the room and everybody give an estimate of the probability, of their pre-test probability before we order the test. And this is really important, because as you're going to see very shortly, the value of the test, given the sensitivity and specificity of a test, you might think, well, you're sort of stuck with that. But in certain cases, where your pre-test probability might be low, it might be useful. But if your pre-test probability is high, that same test may give you no useful information. So this gets pretty tricky. So your estimate of the pre-test probability is very, very important. You have to bring an expectation to the test. That's essential, all right? And then you're going to use, I'm going to go over a couple examples with you and show you how easy it is to do that. You do it all the time, you just may not be completely aware of it. And then what you're going to want to do is take this pre-test probability and you're going to ask yourself, have I reached the treatment threshold, have I reached the test threshold? In other words, in plain English, is the diagnosis so unlikely that I don't even need to test for this? Or is the diagnosis likely enough that given the severity of the disease, okay, and given the relative lack of side effects of this treatment, I'm going to treat them now. I already have enough information. I'm just going ahead and I'm going to treat them. Or have I not crossed either threshold, and you know what, I really need more information to make up my mind. That's when you do another test. I need more information. So this basically lays out what we're talking about. That is a very unlikely diagnosis, a diagnosis which is likely enough that I want to treat, and then the big middle ground in between. Now from this, I just want to point out that this treatment threshold, it's stuck in a place because this is a slide, but the reality is, is that that treatment threshold could be anywhere. And take for an example, I always love this as an example, how certain do you have to be that a patient having a new seizure and a fever in the emergency room, all right, how certain do you have to be that that patient has herpes simplex encephalitis in order to want to treat them? The answer is not very certain at all. You have a catastrophic, if you delay 24 hours, the patient's probably going to die, all right, whereas you have a drug with nearly no complication rate, all right. So the answer is if you think about it, you're going to put acyclovir into the mix, okay. You are going to treat the patient. So your treatment threshold for herpes simplex encephalitis is probably in the 1 to 2% range. So this is a sliding scale. Don't think of this as stuck in place, it's not. We're doing diagnostic tests, and so it's worth pointing out that the PICO approach is useful for diagnostic testing. And I've taken a sort of trivial example. We'll go into more complex examples later, but this is an example that we argue about all the time at Einstein and in Philadelphia in general, and that is how you diagnose an aneurysm in a patient with a severe and unusual headache. And so we've taken that. In my patient with a severe and unusual headache, and the question is does a brain CT scan, in contrast, head CT, provide more information than the clinical criteria, does that increase the likelihood of diagnosing an aneurysm? An aneurysm, not a subarachnoid hemorrhage, by the way. So we can break down the problem this way, and we can say, okay, here's the initial clinical data that we're starting with. This is the patient, severe and unusual headache. My question is do they have an aneurysm? I assign a pretest probability. Now we're going to do this. I'm going to take this as an example. I assign a pretest probability, and then I run my test, which is a brain CT scan, and then post-test, I decide if they do or don't have an aneurysm. So let's estimate the pretest probability, and how do you do that? Well, one thing you can do is you could use prevalence data. Let's just start with prevalence data. I could say to Carl Chudnovsky in the emergency room, okay, how many headaches, how many severe headaches came into the ER in the last year, and of those, how many had an aneurysm? And I can just do a prevalence estimate of the likelihood, or I can know based on my own experience what are the key indicators in the person's history and physical that make it more or less likely that they have an aneurysm, and so I use my experience. So let's take three simple cases, all right, and we're going to have some show of hands here, so we're going to do a little participation. Let's start with case A. This is a 21-year-old white female who comes in with her third headache in the last six months. She describes an onset over about a 15-minute period associated with nausea and vomiting, and she has a family history of migraine. Now I want a show of hands. Because we're a big group for this sort of thing, I just have a simple question. I want to know who thinks that this is 50 percent or greater likelihood that this girl has an aneurysm. Raise your hand if you think 50 percent are likely. So just as a check, who would say 50 percent or less? Right, okay, you got it working. Now let me just move it this way. Who here thinks that the girl has a 20 percent, all right? So 20 percent or greater show of hands if you think this is at least a 20 percent likelihood of an aneurysm. Raise your hand. Well, nobody does that. If you think it's less than 20 percent, raise your hand. Okay, see? See how this works? I keep parsing this down, okay? Or we could just go around the room and ask everybody to give an estimate, and the estimate would be probably in the less than 5 percent range. And so we could just say, okay, well, this girl has a very low risk. Now let's go on to case B, right? Now we got a 34-year-old white male with known migraine who complains of headache, which began as usual, but is much worse and is lasting three days. So the same sort of thing. Who thinks it's 50 percent or greater likelihood this guy has an aneurysm? Nobody. Who thinks it's 50 percent or less? Everybody. Who thinks it's 20 percent or greater? Nobody. 20 percent or less, okay. Now what if I said 10 percent? So the other girl, we've said 5 percent. What if I said 10 percent likelihood? Anybody? Or greater? See, a couple people start going, hmm, maybe, okay? That would show up. If I gave you chances to give real numbers, you would give a real number. That B is probably too high. Case C, a 45-year-old white female with sudden onset of the worst headache ever with nausea, vomiting, and numbness of her right face and arm. Who here thinks it's 50 percent or greater that this person has an aneurysm? Okay, so more or less everybody. So you see what I'm saying? I could, we're not, I'm not parsing this out, but you have no difficulty. Who here, who's here having a problem with what I'm saying? See, nobody. So I could go around and say, give me a number. You give me a number. Believe me, you would have no difficulty. You have no difficulty estimating a pretest probability if you simply make the effort to do it. That's my point. If you make the effort to do it. So then you have to make this decision, all right, and we've already talked about this. Did we reach the treatment threshold? This has gotten to the point where we really do need to treat, and we'll look at that in more detail in a minute, or that this is a waste of time. I mean, ordering this, this, given this presentation, this, it would be dumb to order a $6,000 test or a test that is going to expose the patient to risk, okay? And especially with, you know what, if I don't make the diagnosis right now, I'll make the diagnosis later. So just like the treatment threshold is a movable target, so the testing threshold can be a movable target. I love the expression that I take from surgeons, and that is, I've learned from you, and I think this is a great expression, it's, you know, you ask a surgeon, you know, in thinking about differential diagnosis, and they always say, well, you've got to think about what can turn around and bite you in the ass. I love that expression. I think it's just great. In other words, what could lead to a sudden catastrophic outcome that's irreversible? It's like, yeah, I get it. That's, I'm with you guys. That is, that is the right way to think. Do you need a diagnostic test here with treatment thresholds? So the sorts of things that contribute to that are, well, is the clinical presentation atypical? I'm not really sure that this is what I think it is. I'm not sure I have a good diagnosis. Do you have insufficient data to meet the diagnostic criteria? Are there other reasonable diagnoses that also might behave this way? And again, what can turn around and bite in the ass here? What's the consequences of missing this diagnosis? So then we're going to apply the test, right? So we have a pretest probability. We presumably are, we have not reached the point that we are either ready to treat or stop testing. We're in between. We're trying to get to a post-test probability. And so how do we do that? First of all, we have to begin with a reasonable estimate of the pretest probability. And then we need a nice, accurate test. And then we're going to combine the information somehow. And the way that we're going to combine the information is through what's called the likelihood ratio. And then we're going to combine that using what's called Fagan's Nomogram. But we got a ways to go before we can appreciate what's in this slide. This is Fagan's Nomogram. Don has a tool that is excellent for using that. It allows you to slide this bar around. But we'll explain what this is and what it isn't. So we've already been talking about these things. These are some of the real statistical questions here. Is this test precise enough? Will I be confident of the test interpretation? Will I need more tests when I'm done? And ultimately, the question is, am I ready to treat this patient or not? So you can use the PICO model for this patient, whatever you want it to be. Now we look at the intervention as a diagnostic test instead. The comparator is the gold standard or reference test. And the outcome will be our final diagnosis. So we can apply the PICO model here. Just to play that one out, in the patient with severe and unusual headache, would blood or an aneurysm seen on a non-contrast head CT compared to a conventional cerebral angiogram provide a valuable, useful diagnosis of aneurysm? We could put that in question format. And in this patient with severe and unusual headache, how accurate is the assessment of subarachnoid blood or the presence of an abnormal focal hyperintensity on the CT scan compared with cerebral angiography for the diagnosis of intracranial aneurysm? That's an answerable question. So now let's look at diagnostic tests now and think about them a little bit. First of all, if we were trying to create a new diagnostic test, we have a diagnostic area where we're not happy with the current diagnosis, right? That's implied in the whole setting, is that we don't have the perfect test. As we move to an ideal test, well, what is it that makes this test ideal? It should be accurate, and we had an excellent discussion from Fred on this. It should be precise with a small random error, available, convenient, low-risk, inexpensive, and reproducible. And we're going to deal with the question of reproducible. And as a matter of assessing it, we're going to have to assess it in dependence of the reference standard. The reference standard is the gold standard. This is the truth, all right? This is our currently best available procedure, which might not be perfect. And in general, we should accept that it will not be a perfect test. An angiogram is not a perfect diagnostic test for an aneurysm, for instance. When we assess, when we look at the study, it needs to be applied independent of the index test, as you will explain further. Now, why wouldn't you use the — you know, Dr. Haynes talked about why don't you always use the perfect tool, okay? Well, sometimes the tool just isn't available. The reference standard might be an autopsy finding. Well, we'd like to make the diagnosis before the guy dies, right? The procedure may be risky, all right? And we're trying to get to a test which is as good or almost as good, but without that kind of risk. And the test may be very expensive. In this day and age, tens of thousands of dollars for a test would not be uncommon. So now we need to quantify all this, and this is where we're going to spend the time. So this all has just been leading up toward this sort of thing, is how do we quantify diagnostic tests, all right? We're going to look mainly at the magnitude of the effect. This is going to be our focus, is how we judge the magnitude of the diagnostic information in a test. We're then going to also, in quick format, look at precision and reproducibility, and we're going to do all this basically with a two-by-two table. Now I want to take an extra minute or two at this two-by-two table. I'm going to belabor this with you because it will make the next part, the rest, a lot easier. So here, to set up the two-by-two table, where we are talking about diagnosis and not treatment, all right, what we have are columns, which are disease and no disease, all right, and rows, which are a positive test and a negative test. This is going to be the basic data. And now we're going to apply some specific names here because we're dealing with diagnostic tests. And when the test is positive in the disease state, we're going to call that a true positive. And when a test is negative in the person who doesn't have the disease, we're going to call that the true negative. We're going to have a false positive and a false negative as well, all right? So these are really important to fix in your mind where these are. And you say, I know that I get it. It's only a two-by-two table. But we're going to be mixing these terms real quick, and I want you to keep them straight. So let's start by defining sensitivity. So what is sensitivity? Sensitivity is the positivity in the disease. That is, in the column of people who have the disease, how often do you have a true positive test, all right? So that's the sensitivity. Very simple. It's a true positive over the total number with the disease, okay? And you see this little mnemonic down here? This is very useful because you're going to find easier and then harder ways of applying. The harder ways give you more information. The easier ways are simpler to keep in mind at the bedside. And the mnemonic that we use that you may be familiar with is SNOUT. And that is that when you have a test with a high sensitivity, a negative test rules out the diagnosis, all right? So if the test is usually positive, it's a very sensitive test, and the result is negative, they probably don't have the disease. That's a good rule of thumb, and that's one of the ways that we apply it. We're going to come back to that later in much more sophisticated ways. Specificity is the negativity of a test in no disease, all right? So that's the true negative over the column of all the people who did not have the disease, all right? So that's down here in the lower right. So that's your specificity is the true negative over the total without the disease. And again, you get a mnemonic aid, which is SPIN. That is, in a test with high specificity, a positive test rules in the diagnosis, all right? So high specificity, all right? If the test is usually correct, it has a high specificity, when you get a positive result, pretty much you can count on the patient having that diagnosis. So SNOUT and SPIN are useful mnemonic aids. And remember how these go, all right? Specificity and the specificity. By the way, I just want to point out before we leave this slide that if the specificity, okay, is the true negative over the total without the disease, then 1 minus the specificity, which is a term that we're going to use a lot, 1 minus the specificity is the false positive rate, okay? I know that you say, yeah, yeah, I get it, I get it. But we're going to be applying that later. So I just wanted to make that point, all right? Just a quick comparison between therapy studies and diagnostic studies, and you can see what the basic differences are. Instead of things like relative risk or odds ratios and risk differences, we're dealing with sensitivity, specificity, and the likelihood ratio. We're going to apply that value later. The diagnostic accuracy of a test, all right, ultimately comes down to the sensitivity and specificity. Everything that we're going to look at, the raw data is always the sensitivity and specificity. We're going to derive different measures, but ultimately it's the sensitivity and specificity of the test that matters. The only reason that we don't use sensitivity and specificity is because they're not intuitive to us. There are a lot of examples where I could give you some numbers that would sound really good. Here is an example. You get a sensitivity of 60 percent and a specificity of 40%, the test is positive, does it change your mind? Trust me, if we work through the numbers, it wouldn't make any difference to you at all if this test is positive or negative, you couldn't care less, all right? So sensitivity and specificity are the raw data always, but they're not intuitive to us, and we need to get to something that's a lot more intuitive. So we end up using this kind of table to assess a diagnostic test, or this graph I mean, where you have sensitivity on this side and specificity on the bottom, and you're gonna look at this trade-off of sensitivity and specificity, which we're gonna do very graphically in a minute. But I also wanna point out that this is a weird graph, okay? Because I'm going zero to 100, and then I wanna go zero to 100 on the x-axis too, but I'm not, I'm using the specificity, and that's 100 to zero, so I don't wanna do that. So we're gonna transform that number into one minus the specificity in a minute, you're gonna see that. The perfect test, obviously, is always correct, it's perfectly sensitive and perfectly specific. But probably except in certain genetic tests, you almost never get to that. A useless test is one that sits on this diagonal. If a test sits on this diagonal, basically it gives you no information at all. And so what we're gonna do is we're gonna wanna be able to assess these tests on this scale, is where does this test lie somewhere in this graph? We'll come back to that very shortly. We can take lots of examples from the literature, they're all over the place. Most tests, papers that report on any new diagnostic test, will virtually always give you the sensitivity and specificity along with the confidence intervals. And so you can always derive from that simple data virtually any of the measures that we're gonna use. Again, everything else, that's the raw data, everything else is just derived. Here's the hyperattenuated sinus sign on a non-contrast head CT for dural sinus thrombosis. You have a sensitivity of 64.6 and a specificity of 97.2. A head CT for detection of acute subarachnoid hemorrhage, you probably know is time dependent. If it's done within about 45 minutes of the onset of headache, it's at least 98% sensitive to subarachnoid hemorrhage. If you wait six hours, it's not, it's not at all, and the number may be down to maybe 90, 92%. You can do it again with physical diagnosis, a jolt accentuation of headache in meningitis has a sensitivity of 100%, they will all have pain. And here's just a image of the hyperattenuation sign in dural sinus thrombosis. Specificity, plenty of examples. MRI for acute ischemic stroke is 92% specific. The anticholinergic receptor antibody in myasthenia is over 99% specific. Now it's only 70% sensitive or so. They're independent, right? We don't, we're talking here about specificity when it's positive. Okay, so let's just apply that for a second. A test with a high specificity, when the result comes back positive, they probably have the disease. This is a perfect example. If I have someone with weakness and I send off a acetylcholine receptor antibody titer and it comes back positive, I'm treating this person for myasthenia, right? I'm pretty confident they have myasthenia. The MRI for acute hemorrhagic stroke, similarly, 99 to 100% by six hours. And this is oculomasticatory myarrhythmia. Now, we wanna look at the trade-off that occurs. You all understand that most of the time you're gonna be trading sensitivity and specificity. It's very rare that you don't. The only time that you won't is if you have two populations where the result of the test of the people with the disease doesn't even overlap, doesn't overlap at all, the results of the people who don't have the disease. If there's no overlap, then you're not trading sensitivity and specificity. But if there's any overlap at all, then you're gonna be making a trade-off. So let's look at this slide. So first of all, we assume that we have a normal distribution. It doesn't matter, but we'll assume for the moment that we do. And the vertical line is gonna determine where we set our cutoff. So we're gonna call, we get a number, 62, all right? Is that number positive or negative, okay? So we're gonna have to adjust our threshold for the test, and that's what we're doing. So that vertical line is movable. We can adjust that threshold to wherever we choose. And that's the point about the trade-off between sensitivity and specificity, all right? Because you see, the blue are the people with the disease and the red are the people without the disease. And the two curves overlap. So anytime I move my line, okay, let's say to the, well, we'll, so before we talk about moving, we'll do that in a minute. Here we are, we have a large true positive, all right, in the disease population, and we've split between true positive and true negative in the healthy population. Notice we have a very small false positive. This is where we set the line here. If I move the line so it splits the difference between the two groups, and here comes the notion of trade-off, right? I have reduced my true positive in the disease population, I have reduced my true positive, and I have increased my false positive now. But I've gotten back benefit here in my healthy population because I now have a much smaller false positive group. Okay, so I'm liking that. And then I can move it still further, and I can actually go back in the other direction of where I came from. And so in order to evaluate, this now becomes, so I do my test, and I get a sensitivity, and I get a specificity, and now I need to set a threshold as I try to decide what's my threshold gonna be for calling the test positive or negative, right? And what am I gonna do? I mean, am I gonna look with all these curves? Am I just gonna keep moving it around? And I'm gonna, so how do I decide where I set my threshold, and how do I decide ultimately is the test useful or not? And that is the notion of the receiver-operating curve, okay? So now we have really set this receiver-operator curve up with these notions of trading off based on the threshold. And that's basically what we're gonna do in this ROC. And notice here that we've transformed the x-axis, all right? So that the y-axis is the true positive, and the x-axis is one minus the specificity, which, as I pointed out to you earlier, is the false positive rate. So you've got true positive rate versus false positive rate. Once again, we see that the 45-degree line is useless, and anything below that line basically is misleading. So you don't wanna be anywhere near that. The ideal test would run right up the y-axis and then come across the top at the top, okay? But we don't get many of those. And so what we see is that we have this bowed curve, and that as we begin to move the threshold, we pick up more true positives, and hopefully don't pick up a lot of false positives. And we do that for a while, and then we start to pick up the false positives, and that's where this curve bends. The term receiver-operator characteristic actually comes from, this is historically, from the Battle of Britain, where they actually had this accuracy curve for them. We wanna use this information now to get to a diagnosis. You will see a group of values which I don't understand at all, which is the positive and negative predictive values. And I'm not gonna spend any time on this at all. They never made any sense to me at all. And all that I wanna say about this group is that ultimately, statistically, the reason that this is not a good way of doing things is that when you create the study for your diagnostic tests, the value that you publish is gonna be determined by the overall risk of this disease in your population. And that's not very useful, because your patient is probably not gonna be from that population. So for instance, I thought of the example. If we were doing a, we were working on a diagnostic test for peripheral neuropathy of people who were referred from outside Minnesota to the Mayo Clinic, okay? Is your patient with peripheral neuropathy gonna be well-described by the test results coming out of the Mayo Clinic? And the answer is no, not at all. They're dealing with a highly skewed population. That's not your population at all. So that's the reason that we don't use this test, is it's just very dependent on where the study was done. So that's not useful. And instead, we wanna get away from that. And so we wanna introduce this notion of the likelihood ratio, okay? Which is tremendous. This is just a hugely valuable way of thinking. And I definitely wanna leave you with this. So the likelihood ratio, as you could see, is the likelihood of the test result when the disease is present over the likelihood of the test result when the disease is absent. And this can be done when the test, for a positive result or a negative result. So the positive likelihood ratio is the sensitivity over one minus the specificity, which is the true positive over the false positive. Remember, that's why I went over that two-by-two table, was to keep emphasizing these basic pieces at the level of the two-by-two table. So your positive likelihood ratio is the sensitivity over one minus the specificity, the true positive over the false positive. And you can see that the same thing would work out for the negative likelihood ratio. Let's take a simple example of no particular disease or test, just to see how we do the calculation, and give you a sense of what kind of numbers we're looking at. So here's 63% are a true positive, and 37% are false negative. These people have the disease, and the test is negative. You can see the rest. You end up with this data set, you end up with a positive likelihood ratio of three. You end up with a negative likelihood ratio of .47. So just starting to introduce some numbers and give you a sense of what kind of numbers we're looking at with likelihood ratios. This is something that I've learned really only since I've been working with Michael and the group on this. But if you look at odds ratios and risk ratios, risk ratios are much more intuitive to us. You can quickly learn with some simple examples that odds ratios are in no way intuitive. They get out of control real quickly for you. The risk ratio, the percent likelihood that someone has a disease is always a very simple intuitive number. The odds ratios are not. So why do we ever work with odds ratios? It turns out that in statistics, odds ratios are actually much more manageable. So for instance, if you wanna do logistic regression and look at multiple factors, you cannot use risk ratio. There's no math for that. You're gonna have to work with the odds ratios and then calculate backwards. So the odds ratios are simpler in almost all settings. This is a perfect example of that, that I can take that likelihood ratio, so I can take my pretest odds, not pretest probability, but my pretest odds, and all I need to do if I have a positive test result is multiply my pretest odds times the positive likelihood ratio, and that will give me my post-test odds. That's about, that's pretty simple. The problem is that odds are not simple. And here's a perfect example of how they're not simple in my miscalculation. So here we have a pretest odds of a person with, we've made this up, with the three out of seven is the odds. That actually is a 30% probability. The 43% is an error. So it should have been that three out of seven odds would conform to a 30% probability. But the point of this slide is that I could take that person with a pretest odds of three to seven, and with a positive test result, with a positive likelihood ratio of three, all I need to do is take the odds of three to seven, multiply it by three, come up with an odds of nine to seven, and calculate that this was a 56% probability, and that one is accurate. So this positive, so here, a positive test with a positive likelihood ratio of three moved me from a 30% probability to a 56% probability. It gives you a sense of the size of how these work. Now, let's back up for a second. Do I care? Did this do me any good? So I'm dealing with a patient, and you say, well, he has a 30% chance of having the disease, okay, three to seven odds, and after you run the test, and I get a positive test, I'm now at 56%. Did it change my opinion or how I'm gonna manage the patient? Am I ready to treat? Did going from 30% to 56% change me in any way? And I got the positive result. I got the result of the test that I wanted, right? Did it help? Probably not. Probably not. Probably didn't do anything for me at all. So what I'm saying is, as you start to choose these tests, okay, don't just choose a test because it can give you information. The question is, is it giving you enough information to change how you're gonna manage the patient? That's the real question. So that was using odds. When we deal with probability, which is, as I said, much more intuitive to me and to you, to all of us, it becomes nonlinear. The mathematics become very difficult, and so we simplify this by using what's called Fagan's Nomogram. And you can see that this is a very nonlinear thing, and I strongly urge you to play with Fagan's Nomogram in the set that Don has given you. And so you simply put your line on the pretest probability. You adjust the line so that it runs through the likelihood ratio, either positive or negative, and that gives you the post-test probability, probability, percentage. So this is very, very useful. Here we take this simple example. I started with a 15% pretest probability. I had a positive result on this wonderful test with a positive likelihood ratio of more than 20, which is, that's really up there. And I went from 15% likelihood to 80% likelihood. Now, that's starting to be useful. That looks useful to me, right? And this is one of those examples where you just want to go home and play with this thing and keep playing with this thing and say, well, what if I started with a 50% pretest probability? What would move me or not move me? This is something that I strongly urge you to do. If you do that, you will come to something like this, and that is, so first of all, a likelihood ratio of one means it gives you no information at all, right? So my pretest probability times one when I get my positive result, I get the posttest probability the same. So the closer I am to one, the less useful this information is. And the further I am away, either in a positive direction or a negative direction, the more useful this test is gonna be. In general, if you can get above five or in the reciprocal, below 0.2, you're looking at a pretty useful test. When you get up over 10, you're really starting to deal with tests which have serious impact on your clinical practice. The value of the likelihood ratio is that they are not affected by the prevalence in the test site. So that's what makes them so tremendously useful. So we're just gonna close with a fairly quick discussion now of how we assess a paper. And I really think Fred has, I think, done us a huge service in focusing us on the non-random variation bias. I've found it very interesting to think about further. And no matter how many times you repeat a study, if you start, if your design is wrong, if you're bringing in the wrong patients, if you're handling them in the wrong way, you're gonna get the wrong result, and that's all there is to it. So we need to avoid these biases. In diagnostic tests, you have some very specific biases that can be introduced. One of them is called the spectrum bias. I alluded to that with the Mayo Clinic. That is a very narrow population gets referred to the site that a dementia clinic in the Netherlands applies a CSF tau for the diagnosis of Alzheimer's. It just, it isn't your population. You have to be very careful when you look at that. There is also this verification or workup bias, and you would think that people would know better, but where you only apply, this is where you only apply the reference test when the index test is positive. So in other words, you don't want to be doing, the gold standard is an invasive test. You don't want to do it on everyone, so you only do it on the people where the index test is positive, okay? That is not maintaining independence between the tests. That introduces a particular kind of bias called verification bias. And again, in general, the tests need to be independent. You don't want the person assessing the index test to be the same guy who's reading the angiogram or whatever it is. You need to keep these people separate from one another. That's basically what we're talking about. Spectrum bias has to do with making sure that your population is the same as where the study was done. As far as random error goes, I think we have covered this very thoroughly. You want to make a smaller interval, increase your sample size. That's basically the answer. Let's just give you an example of that. Watch this. I think this is very cute. Here's results. Again, we don't care what the disease is. We don't care what the test is. These are some numbers, all right? What I'm gonna do is I'm gonna take these same proportions and I'm gonna divide them by 10. All right, so here's the data. Again, you don't care what it is. Here is the point estimate, all right, of the value of the test, of its combined sensitivity and specificity. And you can see the diagnostic error that's involved. That's in the yellow around it, all right? Now we're gonna take the data set which has the same proportions, all right, but only one-tenth the sample size, and that's what you get. So the red dot is in the same place, right? But your confidence in the value of this test is, I mean, it goes all the way below the line. For all you know, this test could be misleading. Your sample size is just simply too small. In Fagan's nomogram, you go, well, wait a minute, Fagan's nomogram was just a line. Yeah, but you can apply confidence limits to Fagan's nomogram by using the line that would be the highest possible positive likelihood ratio, or the lowest possible likelihood ratio, and then you can get the range of your post-test probability, starting with the pre-test probability. So you can do that, and Fagan's nomogram is perfectly capable of handling that kind of complexity. The other piece, when you're testing a diagnostic test, you know, death, you don't need a lot of arbiters. There's gonna be pretty good agreement on death, although death can become more complicated than that because some of the deaths will be due to the disease in question, and some of them will be the guy who went out and got hit by a car, okay, or died of some other disease. So even death has this variability to it, right, of a certain type. But most tests, certainly that we have ordinal scales, or, you know, is this MRI worse than that MRI, that kind of thing. So the question is, if it involves any subjective element, then you wanna test and see whether multiple observers would agree, and so that's what you do, and there's this whole complex thing where they're gonna agree a certain amount of time by chance, and so you measure a kappa statistics, and that kappa statistic, which I'm not gonna go into detail, but one of our exercises do, and the kappa statistics give you a sense of whether there's good agreement among the observers. Finally, let me just close by saying that there is a standardized way of assessing a paper called STAR-D, which is just a checklist of, I think, 27 items, 25 items, and was it well-written, did it include, just like you have similar tables for therapeutic tests, you have the same thing for diagnostic tests, and that STAR-D is generally applicable on the internet, and you can always apply that to any diagnostic study that you want. So just to summarize where we are then, we want to emphasize that diagnosis should be achieved by a hypothetical and deductive methods, that you wanna formulate an appropriate diagnostic question before you choose your test, all right, and the clinical importance of the test is gonna be determined not only by the accuracy of the test, but where you sit in your pre-test probability, that is fundamentally important that you get that idea, and that the diagnostic test accuracy is most intuitively and most usefully expressed through the likelihood ratio along with its confidence limits, and that if you wanna look at how good a paper is, pull out the STAR-D criteria and go down the checklist, okay, and you have references that are provided. Okay. Thank you.
Video Summary
In this video, the speaker discusses the importance of diagnostic tests in medicine and explores the different applications and considerations of such tests. The speaker emphasizes that diagnostic studies are not limited to therapeutic trials and that the application of statistics is essential in understanding the accuracy and usefulness of diagnostic tests. They discuss the concept of sensitivity and specificity, as well as the trade-off between the two when evaluating a diagnostic test. The speaker introduces the concept of likelihood ratios and how they can be used to calculate post-test probabilities, thereby aiding in diagnosis. They also highlight biases and errors that can be introduced in diagnostic studies and provide guidance on how to critically assess the quality of such studies. The speaker concludes by emphasizing the importance of formulating appropriate diagnostic questions and considering the clinical importance of a test before making a diagnosis. Overall, the video provides a comprehensive overview of diagnostic tests and their application in clinical practice.
Asset Subtitle
Presented by George C. Newman, MD, PhD
Keywords
diagnostic tests
medicine
applications
statistics
sensitivity
specificity
likelihood ratios
clinical importance
×
Please select your language
1
English