false
Catalog
Science of Neurosurgical Practice
Variability and Error
Variability and Error
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I'm Fred Barker, and when I saw this topic that I was assigned, I was very disappointed. Because like most neurosurgeons are not very comfortable with error, I don't think I make errors. I don't like to be around people who make errors, or I don't like my name to be in the chart when an error has been made. But from a more practical standpoint, the kind of studies that I've, I'm a tumor surgeon, the kind of studies that I deal with, to me error means like standard deviation, standard error, variance, you have to remember that like the standard deviation is the square root of the, and I can never remember any of that stuff. And most of the measurements that I deal with are not expressed in numbers like 29.7. If you're going to look at numbers like that, it's likely that for instance you're a spine surgeon who's looking at a pain scale, or you're looking at a quality of life scale that you can treat as a continuous number because it goes from zero to 100 and isn't constrained to 10, 20, 30, 40. Most of the numbers that I deal with are categorical numbers, so people are alive or dead. They are, you know, tumor has progressed or has not progressed. So most things that I deal with are expressed as a fraction, or sometimes as a Kaplan-Meier curve, which is the number of patients who are still alive at a certain time after you have started the clock on your analysis. So, but the truth is that those are actually, the error that is inherent in measurements like that is even simpler, much simpler to me to understand than the error that's involved in expressing for instance the average age of people in this room. And so I tried to kind of slant the talk toward that because I think it's more useful and it's also easier to understand. Almost everything that you want to write a paper about or read a paper about is quantitative data. There are exceptions to this. There's a field of study called qualitative study where you sit down and you interview people and you say, well, you know, what's important to you in terms of the treatment of your tumor? Is it the length of life? Is it the quality of life? What symptoms do you, is it more bothersome to you to be nauseated or dizzy or have tinnitus? But we're not really dealing with that much in this course, so we're going to talk about things that are expressed in some kind of number. And if you pick up the Journal of Neurosurgery, you will see that almost without exception, people give you the number, but they don't give you the range over which that number can occur in representative individuals or representative trials. So they don't tell you the uncertainty of the measurement. And to me, that just cripples the description of the data. If you reduce it to a single number, only under unusual circumstances does that tell you everything about the measurement that you really want to know. So if the average age of people in this room is 30, then you have people like me who really, if you saw the bar graph, would be way out, you know, toward the right, and there would be almost nobody way toward the left. There's no five-year-olds here today. There are people here who are even older than I am, for instance, Dr. Haynes. So we'll talk about how to deal with distributions like that a little bit. And when you yourself come to write papers or write proposals to do research, really, anytime you write a number, you ought to include a description of the uncertainty of that measurement. There just are almost no exceptions to that. If you look on the internet, I tried to get something from XKCD. I thought there would be something great on XKCD about accuracy and precision. There is not. There are websites that will search XKCD for you, but there's nothing there. All you keep getting is this graphic. And you can see that most of you are probably not recreational gun owners. But this is expressed as an accurate and precise measurement. This is neither accurate nor precise. Here they're using accuracy to express the idea that the difference between where the bullets have penetrated the target and where they were supposed to is random, not systematically disturbed. And here the bullets are hitting with a systematic error. And so we don't really use the words accuracy and precision very much. Those are more engineering terms than terms that we use. But we talk about things like random error and systematic error. So here's a set of shots that have random error. Here's a set of shots that have systematic error. More often we call the systematic error bias. And here's an example of both. So we have the difference. I don't mind calling things invalid, by the way. I call things invalid all the time. These are valid measurements. These are invalid or biased measurements. These are precise measurements. These are imprecise measurements. And all measurements have both kinds of error. All we're really doing here is pretending that there's no random error here. But of course there is. And there's probably a little bit of systematic error there as well. So you always have to think about both the random component of the error, which is really the only part of the error that you can reduce by reproducing the experiment over and over. And to reduce bias, you have to think very carefully about the way that the study is designed. And if you keep doing the experiment wrong over and over, you gain more and more confidence in an error that is systematically wrong. And once you've reviewed a few meta-analyses of non-randomized studies, you will have enough examples of this to last you the rest of your life. Statistical estimates of error, like confidence intervals and like p-values, under ordinary circumstances only describe the random component of the total error. So when you and your friends are arguing in the hall, you get your iPad outside the patient room. You say, look at this. This is a statistically significant result between these two different groups of patients. You are only protecting yourself against random error that comes from not doing the measurement enough times. You are not at all protecting yourself against doing the measurement wrong over and over again. So this is, to me, both a much more important thing that authors of papers do wrong, and it's also much more hidden. It's under the surface. So as the reviewer, you can say, well, you're not showing a p-value for this comparison. You're not showing your confidence intervals for these measurements. But the bias lies under the surface, and you have to think much more carefully about what has been done. And that's the only thing about reviewing papers or reading papers that is actually interesting, is that aspect of it. And there are so many amazingly cool forms of bias that we could have talked about if we had enough time. Things like immortal time bias, for instance, which is just a beautifully wrong thing that people do. I'll show you one example of that, but you can find that in the medical literature over and over, and it's actually kind of fun. It's like an Easter egg hunt, only you don't really want the Easter eggs. The first component of the measurement error that we've talked about is the imprecision of the basic measurement. And this is more because I based these slides on Dr. Glantz's slide deck than because I think that this is very important, because, yes, it's true. If we all weighed ourselves this morning to determine the average weight of neurosurgeons, there would be an error. Some of us would forget to take our watch off. Some of us would refuse to take off our undergarments. But then if you use that bunch of numbers to estimate the weight of neurosurgeons, it would be a dramatic underestimate, because more than half of neurosurgeons are my age or older, and age and weight in neurosurgeons are very tightly correlated. So you would see that the average estimate would be maybe 15, 20 pounds too light, and not because the scale was not calibrated properly, not because some of us had a glass of water before getting on the scale, but because of the bias of the cluster of measurements. So the random variability of the measurement itself needs to be taken into account, but usually isn't very important. The second component of random error is because we only have, say, 30 people in this room. So yes, you do get some error because of the finite sample size, but there are formulas that will allow you to estimate the magnitude of that error. By the time you get up to 30 estimates of weight, that component of the error is going to be pretty modest. If you just took three or four people, you might not — you might have a much bigger random component of error because of the finite sample size. But for sample sizes that you see in normal neurosurgical literature, this is, first of all, not that big a component of the error, and secondly, it's a component that follows mathematical laws and can be estimated and accounted for. So we've talked mostly with our hypothetical examples about continuous measurements like height or weight or age, but to me, much more interesting almost always are things that are expressed as a fraction, like the number of times a treatment worked. You have to define what success and failure is for the treatment, and then you end up with a two-by-two table, and then there's survival data or time-to-event data where each measurement includes two components, one, the amount of time after you started the clock that you made the measurement, and secondly, what the result of the measurement was. So a year after you start the treatment, are they alive or dead, or a year after, or have you lost track of them, something like that. We'll go over these in turn. So here's a bunch of numbers that are more or less continuous. These are real numbers. These are the average, these are the individual ages of patients who receive an operation in the United States for an acoustic neuroma. So this black line here is patients who do not have coded neurofibromatosis, and these red bars are patients who do have coded neurofibromatosis. So as a surgeon, you say, well, this makes a lot of sense. Most of my sporadic patients are in their 40s, 50s, 60s, less so in the older patients and more, and less in the younger patients, too. You don't see that many sporadic patients with acoustic neuromas who are younger than 20. With NF, in contrast, a lot of the patients get their operations in their 20s. The tumors are often diagnosed in the pre-symptomatic state. Different surgeons have different feelings about how aggressive you should be about that, but most NF patients who undergo surgery for the acoustic neuromas are in their 20s, and the sad truth is that by the time they get to be 60, most of them are dead. So this is a number here that we're going to look at, the age in the NF population only. And you can see one of the things about it is it's pretty skewed, and one reason is because you can't have an age less than zero, and because you can't really get an operation before you're about two for acoustic neuroma. That would be very unusual. There's also kind of a limit out here in that very few people live to be older than 100, but here the more important limit is on the zero side, because the cluster is around 20. There's a pretty big spread of this data, and if this were a symmetric curve like the sporadic curve is, it would extend to the left of zero, which we know can't happen. So if you wanted to tell people how old are patients who have NF2 who have an acoustic neuroma operation, how would you do it? First is you could take the mean. So the mean is just you add them up and divide by the total number, and you get a number here. You can see that this distribution is skewed. Another hint that this distribution is skewed is because these bars here, the tallest bars, these are called the mode. These are to the left of the mean. If you did a median here also, you would see that the median is in the high 20s. So the median is lower than the mean. Another curve that follows this is the income of neurosurgeons. So very few neurosurgeons have an income less than zero. There are some spine surgeons out here who make a ton of money, and so an income distribution is almost always skewed like this. A weight distribution of adults, on the other hand, the skewness you can't predict ahead of time, because you could be very skinny, you could be very fat, and you just have to see which is more common. The truth is that usually the tail out here is heavier, meaning that there are more patients in the right tail here than there are in the left tail. So what's the confidence interval on that? So your mean, we said, was 31. And if what you do is the confidence interval for the mean is a formula that you can look up that is based on the mean plus or minus for the 95th confidence interval, 1.96 times the standard deviation, you can calculate out the standard deviation here. And this is your confidence interval for the mean, 27.3 to 34.8. So that's about the width of this red bar here. You see that probably less than 5% of these patients' actual ages are expressed by this communication, right? So yeah, you do have a pretty good idea of the mean, but if you're trying to tell people how old patients are, this is not a good way of telling them. And you think, well, that's pretty artificial, but it's not. It happens all the time. People are constantly expressing confidence intervals based on the standard deviation and the mean. If you take it and you do the type of calculation that is supposed to embrace 95% of the actual measurements, now you're talking about a distribution from age 2.5 to 59.6. So you see there's a lot of patients out here to the right of this, and there are actually no patients who are as young as 2.5 who had the operation. So because the distribution is skewed and you are using an estimate of the range of the variable that is based on assuming a normal distribution, which is not a skewed distribution, your confidence interval actually extends too far to the left and not far enough to the right. If you did the same thing on the sporadics, because it's a much more symmetric distribution, you would have a much more accurate and informative communication. So when things are skewed, we tend to sort of jettison the normal distribution assumption because the normal distribution curve is a symmetric curve and doesn't fit our data very well. And you go to a type of statistics called nonparametric where you're not making those basic assumptions about the data. So the median is a nonparametric estimate of the central tendency of the data. So it means half the numbers are less than 27, half the numbers are more than 27. Now when you want to express the range, you can say, well, 25% of the numbers are larger than 19, 75% are smaller than 42. So this is the central 50% of the data. And then you can do the same thing with the actual, to get the central 95% tendency of the data, you just take the lowest 2.5% and throw them out and the highest 2.5% and throw them out. And you get a very informative number that says 95% of patients who have NF2 who have an acoustic neuroma operation are between the ages of 10 and 59. And that's what you really want to communicate with this data, with this particular distribution. Here's an example. You thought that I was just making it up that people take the means and express the confidence interval of the mean as the error of the measurement. I picked up the top issue of the Journal of Neurosurgery in the big stack of unread issues on my desk. And I had to look through, I think, three articles before I found this graph, which is figure one in the paper. So you know how KPS works, right? You have a 10, 20, 30, 80, 90. All of the measurements are evenly divisible by 10. So here you have what is obviously a mean KPS of about 84, and the confidence interval goes from about, say, 83 to about 87. So you know that actually none of the measurements fell within this confidence interval, right? It's impossible for an actual measurement to have fallen within this confidence interval. The second thing is that the caption of the figure does not tell you what the error bars represent. So two big problems here. One is it's not a very informative way of communicating information, and the second is they're not even telling you what they're showing you. And if you look through article after article after article, it's actually quite unusual to see a description in the caption of what the error bars represent. So when it comes to be your turn to communicate information, don't do that. Do it wrong, at least, if you want, but tell people what you've done. Much nicer are binomial data, which are fractions. So binomial data or Bernoulli distribution-type data arise from doing an experiment over and over again that can be either a success or a failure. So you're either alive or dead, the treatment worked or did not work. This is a lot of what we do. And it's very nice because the mathematics of this are very simple and pure. If you have the fraction, if you have the number of trials and the number of successes, you can open up your basic statistics program and get the confidence interval on that fraction just because of mathematical truth. It always is the same confidence interval no matter what the fraction expresses. The number of trials and the number of failures gives you the confidence interval. And every statistics program will give this to you. So you express the results, as you've already seen, in a two-by-two table. We're going to talk in a minute about word recognition scores and the confidence intervals on those. And so here the idea is you get a tape recording of 50 standard disyllabic words. You record it ahead of time, then you play it back for the patient so that you reduce the variation in comparing my voice versus Tony Asher's voice and how readily understood those are, or a male voice versus a female voice, which have different frequency distributions. So you standardize everything that way. It's hard to standardize out the learning effect. Many people have had the same test before and they've heard the words before. But it's pretty good. So we had a situation in our clinic with NF2 patients where my colleague, Scott Plotkin, was bringing me case after case when he first started doing NF. He wasn't really, he didn't grow up thinking that he was going to be a neuro-oncologist who took care of acoustic neuromas. He got trained mostly in glioma work. And in glioma work, as Tony and everybody knows, when the tumor starts getting bigger you have to do something about it. So here Scott inherited this huge practice of people with acoustic neuromas and the tumors were getting bigger. And he would bring them to me and say, you have to operate on this. And I'd say, no, I'm not going to operate. He'd say, why not? I'd say, well, what's the hearing? This is the only hearing ear. I'd say, so the patient has no new symptoms. The tumor is a little bit bigger on the scan and you want me to do an operation that will probably make the patient deaf. He goes, okay, I see. But what am I supposed to do? How about radiation? I said, same thing will happen, plus the tumor will turn malignant in 10 years and the patient will die. He said, what am I supposed to do? I said, you have to find a drug to give these people. So we started out with a lot and it was very clear after the first few patients that this drug was doing nothing. And so I saw him in the hall one day and I said, how's the erlotinib going? And he said, well, you know, we have one patient who's been stable for a while, but I'm not sure what that means. And I've been giving a couple people Avastin, and I said, Avastin, you can't do that. These are young people. They're healthy. They don't have cancer. You're giving them intravenous chemotherapy. They're all going to get hypertension and DVTs, and something's going to happen. Their heart is going to stop working or their kidneys. And he goes, well, I've been doing it. I said, OK, so how's it been going? He said, well, a patient told me that her hearing was much better on the drug. Now you don't have to take care of acoustics for very long before you find out that that actually cannot happen. The hearing never gets better with radiation, never gets better with surgery, never gets better spontaneously. So I said, well, show me the ideograms. So he did. And this is what the data looked like. So before the treatment, her word recognition score was 8%. And here, after just about six weeks into treatment, it had gone up to 40%. So this looks like a big success, right? But you have to understand what's the confidence interval on both of these measurements before you can really decide whether an increase from 8%, for instance, this increase here from 4% to 8% doesn't look very significant. You wouldn't expect that to hold up. So this is the first 33 patients who were treated. And for each patient here, the central dot is their initial measurement. And then the width of these bars we have drawn to show you the confidence interval on that initial measurement. So here is patient number two, who started out at 8%. And here is the confidence interval on the fraction of 4 out of 50. It goes from about 2% to about 20%. And you see that her eventual result, the initial result was 40%, which is well outside that confidence interval. So it's very unlikely that that result arose from chance. And her eventual hearing, which it is today, after about six years on treatment, is in the high 90%, which is obviously way outside the confidence interval. So each of these bars where the patient's final measurement is outside the confidence interval, we've colored the bar either green or red to show you whether that means that their hearing got worse on the treatment or the hearing got better on the treatment. And you see there are many more green bars than red bars. So our suspicion is that the drug was active, that the drug is actually improving hearing in patients with NF2-related acoustics who are getting Avastin treatment. One thing I'll draw your attention to is that the patients who have 100%, who got 50 out of 50 words, that there's a confidence interval on that measurement. And there's also a confidence interval on 0 correct out of 50 words. And we had one patient who had a very good response to the drug. But these other patients who started out with 0 either stayed at 0 or only went up to, say, 6. And so that's not a significant result. So you say, well, what does it matter to go from 0 to 40%? And this is the second important aspect of deciding whether a change in a variable is important or not. It has to be not just statistically significant. I just didn't see this in anybody else's talk. And I thought it would be a shame not to mention it. But it also has to be clinically significant. So if you had normal hearing in your right ear and you went from 0 to 40% in your left ear, you still wouldn't be able to use the phone with the left ear. You can't really do that below about 50%. You might have some improvement in your ability to localize sounds, but you would still be using the telephone every day with your good ear. You would still be putting people on your good side in the restaurant or at the party where you wanted to hear them. You might not think it was much difference in the quality of life. But when you have no hearing in the other ear, then the difference is astonishing. So here she says, this is her email to us. She lives in Atlanta. In the car, I can listen to songs I know on volume 7 to 9, which is down from level 10. So all right, is that clinically significant? Is that an important difference that you can turn your car radio down two notches? Probably not. But if you can go to the movie in a non-closed captioned theater and not use your hearing aid and understand the movie, that's a big difference. So anything that you feel like a pain score, if you're going to look at a quality of life score, an activity score, a functional activity score like the Karnofsky score, you have to have a sense of what is the minimum significant difference. So not just a statistically significant difference, but also the clinically significant difference. And defining that is a whole field that we're not going to get into. Now what about the confidence interval of a fraction where you've never seen something happen or something always happens? These fractions also have a confidence interval. And this is one of the best papers I ever read. I can still remember the library where I was sitting when I read it. This is a rule of thumb that gives you that confidence interval just in your head. It's called the rule of three. And it was published in 1983 by Dr. Hanley and JAMA. There's the skeleton, the reference. So if you've done something 100 times and you've never seen something happen, the upper limit of the confidence interval is about 3 over 100, or 3%. If you've done something 1,000 times, you can be pretty confident, 95% confident, that it's less than 0.3%, the actual number. So if you've done an operation 1,000 times and nobody has died, the true answer is not zero, but it's a confidence interval from 0 to 0.3%. If you've done something 10 times and nobody's died, you think you're doing great, the upper confidence bound is still about 30%. So the true measurement of the death rate in that population is still pretty high. And you can just carry this around in your brain. It actually works very well. Confidence intervals on survival measurements, I think, are going to be talked about later in the course. But I just wanted you to get a mental picture of how big these numbers are. Here's a graph that shows loss of hearing after observation or after radiation for small acoustic neuromas. So this is from Denmark. They had 400 patients in this cohort. And you see there's still a pretty big confidence interval around this measurement of 50% retaining hearing in the observation cohort five years after the measurement is made. But when you start with only 42 patients and five years later only 20% of them have hearing, the confidence interval is huge. It goes from less than 10% almost to 50%. And this, again, this is a form of confidence interval that you almost never see displayed in a figure. You always see just the Kaplan-Meier. Sometimes there are some error bars on it. The true confidence limit around a Kaplan-Meier is expressed as a line as these are. And this is sort of what you have to have as your mental picture for the confidence of this kind of measurement because it probably won't be displayed for you. So that's all just the statistical error that's implied by the mechanism of the measurement. Bias is a lot more complicated. And back in 30 years ago, David Sackett, who wrote some great papers, one of the founding people in evidence-based medicine, identified 35 types of bias of which he classified nine as important. You can look this paper up. These, he describes them as things like name and bias and incidence prevalence bias. These are, to me, not very helpful forms of discussion. Mittenin and in a later article, Grimes have narrowed the types of bias down to three broad types, selection bias, information bias, and confounding. There's going to be a special talk about confounding, I think, tomorrow, so I'm not going to talk about it much. And just to reiterate what Dr. Haynes said, the randomized trial is the least biased design that we know about. And so when you're trying to design other types of trials, you basically want to duplicate the randomized trial that you would do if you had the resources or had the permission or whatever it is that prevents you from actually doing it. Most often, when I read a paper that compares two treatments, the thing that hits you in the face about the difference between those two groups is the selection bias. And this is such an important process in the way clinical medicine is done, just like the Bell's palsy example, where you wouldn't give the patient with diabetes steroids. This is how doctors' minds work. You look at the whole patient, and then you make your decision, not just in the context of the severity of the facial weakness or the rapidity with which it happened, but also your perception of the fragility of the patient, how old they are, what's their support status like, are they going to be able to afford the medication, could they come all the way from Iowa to see you from the medication if you live in Boston. All of these factors that you would think, well, what does it matter, what does the travel time from Iowa to Boston matter? It actually matters a lot. There's a great paper by Elizabeth Lamont called Travel Distance to Clinic as a Survival Predictor Factor in Phase II Cancer Trials. So most Phase II cancer trials are testing ineffective agents. So this is how long you live if you go to the hospital and you get a drug that doesn't work. And it's strongly predicted by how far you have come to get to the hospital. So if the hospital is at Duke, for example, and you've come from Iowa, that means, first of all, that you're wealthy enough to buy a plane ticket to Duke. Secondly, that your brain is functioning well enough for you to get onto a plane and make it to Duke. All these factors, it turns out that it's mostly age, weak influence of male gender, educational level, income, compliance with the treatment, like we saw the death rate with the niacin example. So if you come all that distance to get the expensive treatment, you're likely to comply not only with the treatment but also with getting the scans, with getting the medications to control your symptoms. And it turns out that travel distance to the clinic is a very important predictor of survival even when the agent is ineffective. And doctors use these factors to drive the treatment. You can't separate that out unless you introduce a randomization step. So for instance, you can find, I counted 10 meta-analyses on this question about a year ago. The open versus endoscopic operation for an anterior skull base tumor. So olfactory groove, meningioma, tuberculum cell meningioma. This is a craniopharyngioma example, but they all have the same bias. So multiple observational cohort studies comparing open versus endoscopic resections of pediatric cranio. 61% gross total resection in the open cohort versus 72% gross total resection in the endoscopic cohort. You can find papers that say this shows that endoscopic surgery is better in this patient population. This was a nice paper by Jeff Wissoff's group in which they said, but hold on a minute, the tumors that were done open were much larger than the tumors that were done endoscopically, much more likely to have all these other adverse factors. And so their conclusion was that directly comparing outcomes may not be valid. And this is an example of this kind of target shooting. So study after study has the same bias in it, the same difference between the open and the endoscopic cohorts. And you can do these studies until the cows come home and you're not going to learn anything more about the question. Other examples, these are just everywhere in neurosurgery, everywhere you look. Biopsy versus resection. Well, the tumors that are resected tend to be smaller. They tend to be located in the frontal pole. They're not near the speech areas. They're not in the thalamus. And this one just cropping up after years where we expected it to. The ones that are at the frontal pole are IDH mutant tumors that will live longer even if you didn't do the surgery. The ones that are in the thalamus are IDH wild type. Same thing with a treatment like radiosurgery that has kind of a volume cutoff for eligibility. The big tumors aren't going to get radiosurgery. The small tumors are. There are nice papers about brachytherapy back when this was a hot treatment comparing patient survival with eligibility for brachytherapy as the way that the two groups were sorted out. And it turned out that in patients who do not receive brachytherapy, the ones who were eligible for it lived much longer than the ones who were not eligible for it. Same thing with intra-arterial chemotherapy. If your tumor is fed by one major cerebral vessel rather than by several, you will live longer regardless of whether you receive the treatment. Comparisons versus historical controls. So if your historical controls were drawn from an earlier era, then any sort of improvement in general treatment over time will render the comparison biased. And obviously the single center versus multicenter results are almost always better at the single center that reports the results than they are in the confirmatory multicenter trial. How about other – this is a fun one, subversion of randomization with the thin envelopes. I have subverted randomization in a trial and the reason was because we had two treatments was when I was at UCSF, one of which you had to be in town for four days for and the other one you had to be in town for 10 days for. So if the patient was from Kansas, we would go look up what they were going to get and then tell them, well, you're only going to be in town for four days even if you go into the trial and they would enter the trial. You think that doesn't make a difference. I didn't think it made a difference, but it does. Removal or loss of some patients after treatment assignment. This is the intent to treat that Dr. Haynes was talking about. What I think is the most powerful example of why you need intent to treat, which is in the carotid endarterectomy trials, patients who had strokes after the angiogram and didn't get surgery. It's very tempting to take those patients out of your analysis, but you just can't do it. Non-responders in survey studies, they tend to differ from responders, so if you're going to send out a survey to 100 people and you have a response rate of 50%, you have to look and see whether your responders are younger, older, better educated, less well educated, all those sorts of things. Remedies for selection bias, randomization, randomization, randomization. If you're going to do a non-randomized trial, you want to mimic it. So every patient who is included in your analysis has to have been ideally equally eligible for either of the two treatments. So you can't include patients in your open resection cohort who would not have been eligible for transnasal endoscopic resection if you want to do a valid comparison. And there are statistical methods of adjusting for these things. The propensity score is a really good one that is less used in neurosurgery than I think it should, and maybe we can talk about that in the small groups. Information bias, so here's something where you measure things differently in the two groups that you're comparing. That's one major way that this arises. And obviously having a non-blinded observer is a big way that this arises. The surgeon is willing to ascribe the complication to something else if the patient had surgery, but not if they didn't have surgery. It's nice to use blinding if you can, or a placebo in order to avoid this. But it's difficult, difficult to do. You can misidentify a placebo effect as a treatment benefit. That's one way that this happens. Recall bias, if you're doing an epidemiology study, the patient who has the acoustic neuroma or the meningioma is much more likely to have their mom say, well, you did fall off the skateboard when you were 8 and had to go, and you were in the emergency room for 8 hours. This really works that way in real life. Patient after patient with meningiomas will come in and tell you this story. This is the immortal time bias that I was talking about, mismeasurement or survival. This happens when you use information that you acquired during this time period here to sort the patients out. This is called immortal time bias. I picked this out of the literature. You see this in studies of reoperation for glioblastoma. You see this actually, we're going to talk later in the course about the Novo device, NovoCure device, which is an electrical hat that you put on that makes you live longer if you have a glioblastoma. The actual analysis that was presented to the FDA, they said, well, of course, if you don't wear the hat for at least 6 weeks, then how could it possibly work? So we're going to throw patients out who didn't wear the hat for 6 weeks. And what you see is that everybody in that group lives for 6 weeks, 100%, obviously, that you're guaranteed to be immortal for 6 weeks if you have entered that cohort of the analysis. Here this cohort is patients who had 4 glioblastoma operations. Well, that takes at least 4 days, right? I mean, you can't operate on somebody 4 times. In a more realistic sense, you probably wouldn't even do one per month. And you see that every patient in that cohort lived for a year, not surprisingly. So you can't do that. Remedies for this, obviously, if you suspect a placebo effect, you would use a placebo group. You get your outcomes adjudicated by blinded observers. You use endpoints that are not open to subjective interpretation. So you use survival rather than progression-free survival, which is graded by somebody looking at two scans and saying whether the tumor got bigger or not. And most importantly, through proper study design. Now, confounding, there's going to be a lecture about, I'm just going to include it here just for the sake of completeness. So you observe two things that you see are correlated, and you think that there's a causation effect. So, for instance, neurosurgeons who weigh more are older, so the passage of time causes the increase in weight. I think in some sense that's true. But that's how doctors think. You forgot your umbrella, so it rained. That's another example of that kind of thinking. But there are actual examples where it's kind of hard to figure out. Epidemiological studies consistently show that people who have acoustic neuromas are less likely to smoke than patients who do not have acoustic neuromas. Now, in the Parkinson's disease literature, you see the same thing, an apparent protective effect of tobacco smoking against Parkinson's disease. So how does it work for acoustic neuromas? Are you actually protected against acoustic neuroma, tumorigenesis by tobacco smoke? As you would predict from the simple observation that smokers don't tend to have acoustics, the fact is that the driver is almost certainly low socioeconomic status, that smoking is associated with not having good education, not having a high income, statistically very strong effect. And having an acoustic neuroma requires having health insurance that allows you to get a $2,000 test that diagnoses the acoustic neuroma. So if you adjust away from socioeconomic status, the apparent protective effect of smoking goes away. And there are ways of eliminating confounding effects that I think we're going to hear more about tomorrow. So you can use restrictions, so you can limit your study cohort only to patients who have a college education only to patients who have a certain socioeconomic status. Obviously, you lose out on generalizability if you do this. You can match or stratify patients. That's a statistical way of adjusting it. You can use propensity score methods when you're choosing between two treatments and you want to eliminate this effect, or you can just do a simple multivariate adjustment. So that's all I had to say about error today. Thank you.
Video Summary
In the video, Fred Barker, a neurosurgeon, discusses the topic of error in measurements and data analysis. He starts by expressing his discomfort with the concept of error and his belief that he doesn't make errors. He then delves into the types of measurements he typically deals with in his work, which are categorical rather than numerical. He talks about the importance of understanding the uncertainty of measurements and how many papers fail to communicate this effectively. Barker emphasizes the need to consider both random error (which can be reduced through repeated experiments) and bias (such as selection bias and information bias). He provides examples of various types of bias and discusses the importance of randomized trials to reduce bias. Barker also touches on the concept of confidence intervals and the significance of both statistical and clinical significance in measurements. He concludes by discussing the impact of confounding factors and potential remedies to address confounding in studies. Overall, Barker highlights the importance of understanding and addressing error in measurements and data analysis to ensure accurate and reliable results.
Asset Subtitle
Presented by Fred G. Barker, II, MD, FAANS
Keywords
Fred Barker
error in measurements
data analysis
categorical measurements
uncertainty
random error
bias
randomized trials
×
Please select your language
1
English