This is topic Was I being honest? in forum Books, Films, Food and Culture at Hatrack River Forum.


To visit this topic, use this URL:
http://www.hatrack.com/ubb/main/ultimatebb.php?ubb=get_topic;f=2;t=034620

Posted by King of Men (Member # 6684) on :
 
I am a little worried.

For the past half-year or so, I have been working on a particular method of data analysis, intended to produce very large signal samples of good purity in a particular decay channel. The goal was to reach 800k events of 97% purity.

Now, as our full data set is very large, I have been testing things on the portion of it that is 'off-resonance', about one-tenth of the whole. From this it is of course easy to estimate the size of the signal in the full sample.

About two months ago, I had the analysis method working reasonably well, and I wanted to estimate the amount of data there would be in the full sample. Now this turned out to be a little trickier than my comments above would indicate, because it wasn't so easy to find, in the welter of information at the BaBar site, just how much of our data was on-resonance, how much was off-resonance, and how much I had in fact run over. But in the end, I did get an estimate - a hair under 800k at 97% purity. I told my professor this, and he was well pleased.

That was two months ago; since then, I have made some improvements, some of which should reduce bias, others increase yield. I have now run one-half the full data sample, and gotten out 350k events at 97% purity. Simple math indicates that the full sample will yield 700k.

Now, this is not a major difference. It certainly does not make or break the analysis. And there are some explanations for it : Maybe the on-resonance data is a little dirtier than the off-resonance, so that the estimate was a little skewed. Or maybe my change to eliminate bias did exactly what it was supposed to, and the estimates were a little off because the method overestimated the purity. It is even possible that I entered a certain multiplier into the program that calculated the total expected yield, and forgot to change it when the data sample changed. (And yes, we do need an actual program for that, as we need estimates under several different conditions, and it's just easier to automate it.)

But I am a little uneasy in my mind. Did I really use the best possible honesty in making those estimates? Or did I, perhaps, tinker about, looking in various places in our website until I had a number agreeing with our goals for the analysis? At this distance in time, I cannot tell. I do recall frowning, at one point, and saying "That's odd, we expected a bit more. Let me run those numbers again." Perhaps I should have run them a third time, and not stopped when I had what we expected.

It is nothing earth-shattering. At the absolute worst, I am guilty of a little carelessness; at best, I have done nothing except remove a source of bias in my calculation, which happened to be skewing my numbers upwards. But it gives me a little insight into how easy it is to fool yourself, especially if you are working with methods on the edge of statistical significance. (Not the case here, and just as well, too!)

I am a little worried. But I shall do better in the future.
 
Posted by Kwea (Member # 2199) on :
 
You didn't lie, and you said that this was a possible outcome, even based on the previous data, right? And you let your boss/prof know about it, and haven't hidden anything from him, right?

Sounds like you are honest, not that that suprises me [Big Grin] .

It is good that you are wondering about it though, and that you want to make sure you do better next time.

Kwea
 
Posted by Bob_Scopatz (Member # 1227) on :
 
Disclosure shall set you free. Seriously, if you took the new information to your boss, you've met the primary obligation of honesty in data analysis.

The fact that you've learned methods that are better for estimating the unbiased truth probably means that you could publish a paper on it all. It is something valuable.

And if, as you said, the difference isn't devastating to the project, I would imagine that your boss would welcome the news. If not that, at least knowing at the earliest possible moment would be better than finding out later on.

By the way, I have no freakin' idea what you're talking about.
 
Posted by King of Men (Member # 6684) on :
 
Well, yes, of course I told him! Not to do so would convert that little voice in the back of my head saying "Are you sure you did that right?" into a raging tempest. He was a little disappointed, naturally, but as he pointed out : "We've done this analysis once with 15k events at 95% purity. 700k or 800k, we are going to see a huge improvement." To be sure, I not only mentioned the possible causes for the discrepancy that I outlined above, I also pointed out some things I am doing to squeeze out the last drop of efficiency, which may put us over the 800k mark again.

I am certainly not in trouble over this, even at the informal level of my boss being displeased. I just wanted to get it off my chest, and see if putting it down in writing clarified my thinking.
 
Posted by aspectre (Member # 2222) on :
 
More importantly, not expressing the qualifications/uncertainties in ones own mind to those who will be reading/using the results would make one a poor scientist.

No respectable scientist expects perfection or impeccable genius in another scientist. What is expected is a willingness to work hard to solve problems, and honesty all the way to the "I may not have taken all factors into consideration" attitude of sharing any&all doubts about ones own work. The latter being an expression of confidence in other scientists' abilities.

The mutually shared trust of "Since I can't think my way around this, maybe you can. Or at least help point me in the right direction." is the second most important factor in doing science.

[ May 08, 2005, 06:44 AM: Message edited by: aspectre ]
 
Posted by Orson Scott Card (Member # 209) on :
 
This is what integrity and rigor look like in the real world. Thanks for posting.
 


Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.


Powered by Infopop Corporation
UBB.classic™ 6.7.2