Hatrack River Forum: computerized testing sucks....met with the Vice President of Health and Science yest

This is topic computerized testing sucks....met with the Vice President of Health and Science yest in forum Books, Films, Food and Culture at Hatrack River Forum.

To visit this topic, use this URL:
http://www.hatrack.com/ubb/main/ultimatebb.php?ubb=get_topic;f=2;t=055845

Posted by Kwea (Member # 2199) on :

So far this year I have had to take 2 tests over because the test booted my answers when I hit submit. I THOUGHT that was the worst that could hapen, until today.

I had problems wiht teh mouse, and was banging it all test long. It was an older ball and sockey mouse, and it had a level rather than a mouse wheel on top too.

I rechecked my answers, and as I moved down the page, some of my answers were moved right to the bottom, so that regardless of what I had picked, my answer was now D.

Unless I scrolled back up, and then it was A.

UGH!

I got a 70, and there were at LEAST 8 answers wrong because of that. Simple things, that I knew because of my own life (like tetracycline making you photosensitive, and how to test for parasitical infection.), that I KNOW beyond ANY doubt that I picked the right answer.

I had an 82 in the class, but now it's a 78, only 3 percent from failing this program

I am PISSED! And I don't know what the teacher will do. Not much she CAN do, really.

GRRR!

All joking aside, I almost threw the computer through the window. It's one thing to mess up because you didn't study, or just don;t know the material. But to know it, put the right answer down, and STILL not get credit...that's BULLSHIT.

I'm going to the Dean and insisting I take all tests with paper and pencil. I don't think the teachers convenience should trump my grades.

[ August 27, 2009, 10:00 AM: Message edited by: Kwea ]

Posted by Xavier (Member # 405) on :

Not that it's helpful now, but I can't imagine me taking a test and not calling my instructor (or proctor) over to show them my difficulty.

I'd imagine there were circumstances preventing this from being an option?

Posted by Kwea (Member # 2199) on :

I thought it was working, it was just a little difficult. I didn't see the mouse causing the answers to change because it didn't happen until I did the test review. As I reviewed answers, I'd finish the question and use the mouse wheel to scroll down. As I did this, the answer would change, but the question would also roll up, off the screen.

My instructor was sitting right next to me, and heard me shaking it and banging it, but as I thought it was working, just stubborn, I didn't say anything until I submitted the test.

As I scrolled down on the test review, it didn't show my answers, but it did tell me what I got wrong, and what the right answer was. I thought I had just made a mistake or two, but then I missed 4 answers in a row that I remembered clear as day....and I had selected the answer the computer said was correct.

I hate this crap. I got a 70%, which is failing, on a test that I had AT LEAST an 86%, and I doubt I can prove any of it.

Keep in mind that I bombed a test earlier this semester, and the teacher asked if I had had a problem. I said nope, I just wasn't prepared.

If I screw up, fine. But this is bullshit. I'll go see the Dean tomorrow, and her boss after that if I don't get at least a retest.

Posted by AchillesHeel (Member # 11736) on :

Just in case, bring your own mouse.

Posted by Jamio (Member # 12053) on :

How are computer tests more convenient for the teacher than scantron? When I was in school, my instructors hated computerized tests for the very problems you are having.

Posted by MightyCow (Member # 9253) on :

You can clean a ball-mouse by unscrewing the bottom (it should pop open with a quarter turn) then scraping the accumulated gunk off the rollers inside with your fingernail or a pen cap. Then just replace the ball and lock it back in place and it should run much more smoothly.

Posted by scifibum (Member # 7625) on :

I've had that problem with scroll wheels and option groups as well. [Frown]

You need to click on another part of the page to ensure that scrolling won't change your selection.

Good luck getting a re-test.

Posted by Tstorm (Member # 1871) on :

Scroll wheels may not be easy to clean.

Suggestion?

You can use the "Page Up" or "Page Down" buttons to scroll down the page. This shouldn't impact your selections on any forms, in my experience. Just forget the mouse wheel.

Posted by DSH (Member # 741) on :

If you are going to be taking these kind of tests more often, you might learn to navigate a computer screen using the keyboard only.

It's a pain, but can be done!

Posted by Boris (Member # 6935) on :

It sounds like the testing software itself is collecting input incorrectly if use of the scroll wheel/lever is causing the answers to change. At the very least, reporting that should go up the chain to the company that writes the software. It sounds like a very poor implementation to me.

All of the computerized tests I've taken have had test questions on their own page, so you have to click a button to go to the next page. If the software really requires you to scroll down to see the next question, it's very poorly written.

Posted by fugu13 (Member # 2859) on :

That's just standard web browser behavior with some sorts of selection widgets in web pages. There's nothing really about the software, other than that they might consider using less changeable widgets.

Posted by Corwin (Member # 5705) on :

Boris: Yeah, that's my thought too. Bad programming, really. I've seen this before on non-professional tests, it can certainly wreck things if you're not careful.

Posted by Corwin (Member # 5705) on :

fugu: Hmm, can't you make the radio-button selection unscrollable with javascript?

Posted by Noemon (Member # 1115) on :

Kwea, did any of your classmates run into similar trouble? You'd probably have better luck with getting a retest (and paper-based tests in the future) if a group of you petitioned for it.

Posted by Badenov (Member # 12075) on :

quote:
Originally posted by fugu13:
That's just standard web browser behavior with some sorts of selection widgets in web pages. There's nothing really about the software, other than that they might consider using less changeable widgets.

Which still means it's a poorly implemented design. If you have to scroll down to view the whole test, it's a poor design choice to utilize a widget that is scrollable. I imagine the problem wouldn't exist if they chose check boxes instead of radio buttons for answer selection.

Further, it shows a lot about the company who wrote the software if they chose a browser based system over a self-contained platform (Such as the one use by Prometric, for example). This simple fact shows that they care more about cost and ease of creation that capability and integrity.

I guess this is just another example of shoddy work done by the lowest bidder.

Posted by BannaOj (Member # 3206) on :

In order to have a backup copy, could you do a print screen when your answers are in, before going to the next screen?

Posted by Kwea (Member # 2199) on :

Nope, we can;t print anything, or save anything related to a test, it's all disabled. Nursing uses similar questions year to year, so there has been a problem in the past with people copying a test and giving it to a friend, or selling it to the next class.

I'm going to see the Dean tomorrow, and if she doesn't have the answer I want, then I head to the Dean of Student Services. This crap has to stop now.

If they can't design a fair test on the computer, then they need to stop using it for testing.

Posted by Noemon (Member # 1115) on :

quote:
Originally posted by Noemon:
Kwea, did any of your classmates run into similar trouble? You'd probably have better luck with getting a retest (and paper-based tests in the future) if a group of you petitioned for it.

Posted by just_me (Member # 3302) on :

I'm not trying to be a jerk, even though it seems like it, but...

As I understand it this is a test for nursing school? I'd just like to point out then that the medical industry uses A LOT of different software of varying sophistication, and some of it will likely be built on a web browser platform and exhibit similar behavior. Mistakes like the ones you talk about could happen - and the results can be worse than a bad grade.

I'm just pointing out that the argument can be made that it's your responsibility to understand the tool (the computer) you are using and make sure this type of thing doesn't happen... no one is going to be able to give you a redo if you screw it up out in the "real world" later.

That said I do absolutely think this sucks for you and I think you definitely should push for a solution to it. The I argument I just mentioned above falls apart because in the "real world" you'd be able to print the info or at least review it in a different manner to make sure it's correct, and medical software usually has better error checking than crappy test taking software.

They really should have a "review mode" where you can see what you answered but can't actually change anything just to make sure something like this doesn't happen...

good luck!

Posted by King of Men (Member # 6684) on :

Clearly, the test was in two parts: One, information on nursing; two, ability to deal with real-world computer interfaces. You get an 'F' on the second part.

Posted by Orincoro (Member # 8854) on :

That's not fair KoM- the interface was obviously flawed, so perhaps an "Incomplete" in dealing with flawed computer interfaces.

Posted by King of Men (Member # 6684) on :

An interface is not flawed if there's an obvious workaround like, I don't know, not messing with the dang scroll button once you figure out the problem.

Posted by Corwin (Member # 5705) on :

Meh, sometimes you'll figure it too late. If he were one of our clients I doubt that "you don't know how to use a badly designed tool" would really work as an excuse.

Posted by Boris (Member # 6935) on :

quote:
Originally posted by just_me:
I'm not trying to be a jerk, even though it seems like it, but...

As I understand it this is a test for nursing school? I'd just like to point out then that the medical industry uses A LOT of different software of varying sophistication, and some of it will likely be built on a web browser platform and exhibit similar behavior. Mistakes like the ones you talk about could happen - and the results can be worse than a bad grade.

Having worked with many different types of medical IT environments in my day, as well as many different types of medical software, I can tell you that the likelihood of simply using a scroll wheel resulting in severe consequences is virtually nil. Any software company that creates a software suite that could result in such a mistake will very quickly find itself sued out of existence.

All IT assets (hardware, software, etc) used in the medical industry *must* comply with HIPAA standards. What you'll find is that there is only one or two different companies that create software for use in a specific type of practice. And that these software suites are *always* extremely expensive (I've seen costs as much as 20,000 dollars for a single workstation license) and often quite well designed. This is because software must be approved by the Department of Health and Human Services before it can be legally sold for use in the medical industry. They are required to be extremely secure and, because it is used by medical professionals who cannot be guaranteed to be technically savvy, often quite user friendly. Which this test obviously wasn't.

And KoM, get your head out of your butt for once. Your "holier than thou" attitude is nothing more than annoying at this point.

Posted by King of Men (Member # 6684) on :

Well, I clearly am holier than thou. In particular, I poked my tongue so far into my cheek it came out on the other side. Ouch.

Posted by Orincoro (Member # 8854) on :

Prst strč skrz krk.

Posted by Boris (Member # 6935) on :

quote:
Originally posted by King of Men:
Well, I clearly am holier than thou. In particular, I poked my tongue so far into my cheek it came out on the other side. Ouch.

Yeah, see, people who make a habit of trying to piss people off regularly shouldn't ever try the whole tongue in cheek thing.

Posted by Kwea (Member # 2199) on :

I got that, man, and wasn;t offended. I was going to post pretending to be pissed off, but tone is hard to convey.

I get a B on the test. THEY get the F, in programming. [Big Grin]

Posted by Kwea (Member # 2199) on :

I spoke to the Dean yesterday, and while I sort of doubt anything will happen, she was VERY interested, because this doesn't affect JUST the nursing program.

I made the point about "How many people failed because of this, and no one noticed?" as well as the fact that I wondered on EVERY TEST so far about some of my answers, but I thought I just didn't remember clearly what I had picked.

She looked me in the face and said that if this really happened, they shouldn't be using Angel for testing. I agreed, and offered to retest.

If I don;t hear from her by Monday, I go to the Dean of the whole freaking college.

Posted by Orincoro (Member # 8854) on :

quote:
Originally posted by Boris:

quote:
Originally posted by King of Men:
Well, I clearly am holier than thou. In particular, I poked my tongue so far into my cheek it came out on the other side. Ouch.
Yeah, see, people who make a habit of trying to piss people off regularly shouldn't ever try the whole tongue in cheek thing.

Yeah... those people. Gotta... really... think... those people do... about making comments... about other people... when they themselves do the same thing... those... gosh darn people.

Posted by Orincoro (Member # 8854) on :

quote:
Originally posted by Kwea:

If I don't hear from her by Monday, I go to the Dean of the whole freaking college.

I'm glad there are people like you at your college who are aren't willing to take crap like this lying down. Unfortunately this kind of thing is way too common, and the attitude toward it is way too relaxed.

Posted by Boris (Member # 6935) on :

quote:
Originally posted by Orincoro:

quote:
Originally posted by Boris:

quote:
Originally posted by King of Men:
Well, I clearly am holier than thou. In particular, I poked my tongue so far into my cheek it came out on the other side. Ouch.
Yeah, see, people who make a habit of trying to piss people off regularly shouldn't ever try the whole tongue in cheek thing.
Yeah... those people. Gotta... really... think... those people do... about making comments... about other people... when they themselves do the same thing... those... gosh darn people.

That made no sense whatsoever. Try again please.

Posted by Kwea (Member # 2199) on :

We will see if it makes any difference, as I have no power, and can't force things to change all by myself.

So far I've already been questioned about hacking/ethics, and now this, when it is system flaws and a lack of communication that is responsible for all of this crap.

I am not the one who designed the system, so I can't be responsible for the flaws, but if the teachers end up having to hand grade tests I know I'll get the flack for it.

And as our clinical grades are pass/fail, and completely based on our instructor's opinion of our performance, I could be setting myself up for trouble down the line.

Then again, I'd rather be in trouble for fighting this crap than take it laying down and thank them for it.

Posted by BandoCommando (Member # 7746) on :

Hand grade? Have all of the scantron machines broken? Have all the graduate teaching fellows come down with H1N1?

Posted by Kwea (Member # 2199) on :

We don't have Grad students at a community college, and I think this computer system replaced all the scantrons. [Smile]

Posted by dabbler (Member # 6443) on :

Kwea haven't you had like a half dozen problems with the computerized system during your coursework? It's enough to drive anyone bonkers.

Posted by Kwea (Member # 2199) on :

Yeah, I had 2 tests completely reset so that I had to retake them, and then I had the issue where the tests were released for the entire course the second week of school accidentally...I didn't log in, and I reported it ASAP.

Then the review issue, where they weren't locking it, and a TEACHER showed me how to access the review, then another teacher said that was an ethics violation.

This system sucks, and I am sick of it. I just want a test that accurately records what I think the answers are and that is consistent. I didn't realize that is was asking too much, I guess.

[ July 25, 2009, 02:46 PM: Message edited by: Kwea ]

Posted by CaySedai (Member # 6459) on :

I can sympathize: all my tests last year were web-based. Sometimes the school's server was down, or having problems, and even quit while we were either taking the test or logging in. The instructor had to reschedule. In at least one case, I was already taking the test and had to retake it.

Posted by Kwea (Member # 2199) on :

At least with the other options there is a paper trail to prove what you actually put down as the answer.

Posted by Kwea (Member # 2199) on :

I took a final today, and double checked all my answers. When I asked about an alternative form of testing, I was told I had to take it on the computer.

I asked for scrap paper, and was given a piece 6 inches long and 4 inches wide.

I submitted my answers, and saw my score....a 66%, on a test I felt good about. I scrolled down, and....

There were 14 answers not filled in, 12 of them in a row. 3 pages worth, in a freaking row. I called the teacher over, and she just shrugged and said she would talk to tech support.

I almost threw the freaking thing out of the room.

I met with the Dean, right after the test, and she said tech support had looked into the previous problem, and when he had logged on he had not been able to use the mouse button to change answers. I told the dean I tried to do that on question one today, and it wasn't moving like it had a few days ago, and I thought they had fixed it.

I asked her how the hell I had missed 3 PAGES of questions (4 to a page), IN A ROW, and she had no answer. I asked her if she had had tech support check my answers, ans they had told her all of the questions I challenged had been incorrectly answered as option D, the last option allowed....as I had said.

I told her it was unacceptable, and that I was considering seeing a lawyer of it was not quickly resolved. I asked her what time frame I needed to use to get back to her, and she said tomorrow, IF she could get in touch with tech support by then.

I told her that I was bringing my OWN scrap paper, and writing down EVERY answer to the final tomorrow, and giving it to the teacher BEFORE I hit submit as a safeguard.

If I don;t get full credit for all of these, or at least have the 14 questions REMOVED from the grade calculation, then I will be visiting the Dean of the whole freaking Health Services program tomorrow, and then the Dean of the school.

I almost FAILED THE FREAKING CLASS because of those 2 tests. I passed, as things stand now, with a 76.65%.

75% is FAILING, AND IF YOU FAIL YOU CAN'T PROGRESS AND FLUNK OUT OF THE WHOLE PROGRAM.

Pissed off doesn't even begin to describe my attitude.

Worst part is I failed, legitimately, the last
med-surg test, and have a 77% in that class. I thought I had a problem with an earlier test, but the teacher brushed it off and I let it go because I wasn't sure then.

So now I get to try and study for a final worth 20% of my grade, and anything less then a 73% on it means I fail. The class average in this class on the last 2 tests, even after adjustments, was a 70%.

[Mad]

Posted by Kwea (Member # 2199) on :

They adjusted everyones scores (not related to my issues) and I ended up passing with a 78%, so this cost me a B if it doesn't get fixed.

I still have Med-Surg and Peads, but I have an 82% in Peads and an extra day to study for it....it is the Med-Surg that worries me now.

Off to study!

Posted by TL (Member # 8124) on :

This is horrible. I hope it all works out for you. I really do.

Posted by Belle (Member # 2314) on :

I'm so sorry. Sounds frustrating, but at this point you are passing, right? So, hang in there, don't let the frustration affect your final exams. Try to put it out of your mind and just study and focus on doing your best.

quote:
I think this computer system replaced all the scantrons

I guarantee that this computer system did not replace all the scantron machines. Educators have a hard time throwing things away...they are there, somewhere.

Posted by TomDavidson (Member # 124) on :

I'm actually willing to bet this computer system IS actually a Scantron online exam. When you buy the Scantron administrative module that allows your faculty to share and import banks of questions between tests, they throw in this web-based testing module -- which sucks beyond all possible measure.

Posted by Kwea (Member # 2199) on :

Well, it too an 83 average and made it a 77% average. It's still passing, but how many other people have already flunked out, or are about to, because of this?

Posted by Dr Strangelove (Member # 8331) on :

Kwea, if you don't mind me asking, what program are you using to test? Is it strictly online, or specific to your college or distributed by a company? I work for a testing center and starting in August we're administering the HESI A2 (entrance exam for our nursing school) and if its by the same company I'd love to know so that if something like this happens we can tell the students to talk to the nursing school, and tell the nursing school that there might by a technical problem. A heads up on these type of things can save a lot of headaches. I hate to admit it, but sometimes us test administrators/proctors can be... less understanding then perhaps we should be. At least where I work we get a lot of people coming in and pretty obviously trying to take advantage of us, so we can be a pretty dismissive sometimes. Apologies to you on behalf of the testing profession [Smile]

Posted by rivka (Member # 4859) on :

quote:
Originally posted by TomDavidson:
which sucks beyond all possible measure.

Check.

Kwea, good luck!

Posted by Kwea (Member # 2199) on :

It's called Angel, and it is a system for online testing and distribution of grades. It covers testing, discussion boards, placed for teachers to post info and powerpoints, homework submissions,. It also has communications features such as email (to teachers, classmates, classes total, classes specific).

It shows us our current testing averages, and depending how it is set up it can also let us compare our scores to the class average.

I theory it works great. In reality, not so great. Half the teachers don't know how to use it well, so sometimes the test reviews let us see what we picked as well as the correct answers, sometimes not; sometime we can see our test grades broken down, sometimes not; sometimes the computer kicks you out of the test when you nhit submit, back to the beginning wiht no answers filled in....and sometimes it kicks you out and gives you a 0, and the teacher has to go back into it and wipe your score, and then you take it all over again.

The incidence rate in nursing is probably only 1-2%....but we have 4-6 tests per class a semester, with 48 students taking each test, and each of us having 4-6 classes. That breaks down to an average of about 24 tests per student per semester, or about 1000 tests taken per semester.

At 2%, that means that there is at least 20 screw ups per semester....and in this program, any grade less than 75% in any class, for any semester, means you fail out and have to repeat the entire semester....and the program only starts once a year, in January.

Some mistakes are easily fixed....most of the time the student just resubmits his answers. Sometimes you don't notice answers got changed, though, or aren't sure if it is the computers fault or your own. Sometimes it messes up just a few things, and it is easy to miss.

I have to honest...this sounds like a class action lawsuit waiting to happen.

Gotta go finish studying for the last final.

Later!

[ August 27, 2009, 10:46 AM: Message edited by: Kwea ]

Posted by King of Men (Member # 6684) on :

Well, you know what they call the guy who was bottom in his class at med school, right?

Posted by rivka (Member # 4859) on :

Doc?

Posted by Kwea (Member # 2199) on :

Depends of if he passed. I don;t pass I am not even going to get to test for nurse. [Frown]

I am worried, but it made me take this very seriously. We will see how much of this I remember. The test is at 9:30, and I am getting up at 7 am to study a little. Just a refresher on the stuff we just got tested on.

The teacher just released a study guide at noon today. I think it was a little late....and her study guides sucked up to now (filled with a ton of carp never on the test), but AI did most of it. I'll have done all of it by the test.

Wish me luck.

Posted by Dr Strangelove (Member # 8331) on :

Luck has been wished!

Posted by BlackBlade (Member # 8376) on :

I'll do one better, *Grants Luck*
[Smile]

Posted by Kwea (Member # 2199) on :

They have tech support across the nation testing it today, and they gave us a full page scratch sheet to use to write our answers down.

If I fail today it will be due to a lack of knowledge, not the computer.....which is all I asked. I wonder how many other tests scores of mine are problematic....

Just want we need facing a pass or fail final....more stress. [Frown]

here we go!

Posted by Corwin (Member # 5705) on :

Hope it all goes well. [Smile]

Posted by Kwea (Member # 2199) on :

I passed. Not much else top say....the tech guy was there, and they found one problem regarding the notice that was disabled....when you have ANY answers that are not filled in, a notice box should appear when you try to submit the test, asking is you are sure.

No box asked me, that's for sure!

The Dean took my concerns (and hinted threats of a lawsuit) very seriously, and they did NATION WIDE testing of the Angel system today, with no abnormalities found. That doesn't mean there weren't some in previous test, just that this one was working.

They also gave us a form to fill out as we went, so write our answers down. We had to turn them in BEFORE we submitted the test, which was fine.....none of the answers changed today.

I got a 78%, in part because I was so nervous I second-guessed myself. A lot of these nursing questions are not knowledge based questions, but "pick the BEST" type questions, where more than one answer is right.

I always narrowed it down to the 2 best, but picked the wrong one half the time.

IRL that is fine, because you can take a BP WHILE asking about constipation, and check the respirations while they answer. [Big Grin]

Posted by BlackBlade (Member # 8376) on :

Fantastic news Kwea, a pass is a pass! Worry about being a better nurse tomorrow instead of yesterday.

Posted by just_me (Member # 3302) on :

Kwea - Congrats, and that's great news!!!

quote:
I got a 78%, in part because I was so nervous I second-guessed myself. A lot of these nursing questions are not knowledge based questions, but "pick the BEST" type questions, where more than one answer is right.

The usual way of doing multiple choice (1 right answer for full credit, no credit for anything else) is a big problem with multiple choice tests in general, and it sounds like a bigger problem for ones in the medical profession in particular.

I took a decision analysis class once where the professor addressed this in an interesting way. All the tests were multiple choice, but you didn't just pick one answer. Instead you indicated the the probability you felt each one was the right answer, and your score was based on the probability you had assigned to the right answer... As long as you gave it higher than 25% you got some amount of positive credit for the question. (under 25% you actually got negative credit...)

Seems like some method of accounting for the "not quite as right" answers would be a good idea on nursing exams too. Maybe by making the "second best" answer partial credit or something.

Posted by Kwea (Member # 2199) on :

I am already a better nurse than some of the ones I have had in the past...and that isn't bragging. They were that bad. [Big Grin]

But i have had some great ones too.

I figure since I got one of the TWO best answers, I am doing OK.

I feel bad though....I think my class lost about 10 people today, and some of them would have been great nurses.

Posted by King of Men (Member # 6684) on :

quote:
As long as you gave it higher than 25% you got some amount of positive credit for the question. (under 25% you actually got negative credit...)

If the professor were doing this in the canonical way, you would get negative infinity points if you assigned 0 probability to the right answer. Presumably you would then fail the course.

Posted by adenam (Member # 11902) on :

Congratz!

Posted by fugu13 (Member # 2859) on :

Tell me, KoM, what's the canonical way?

Posted by Kwea (Member # 2199) on :

give every option a 1$ chance then. [Big Grin]

After adjustments, I ended up getting an 86% on hte final exam. LOL

Posted by King of Men (Member # 6684) on :

quote:
Originally posted by fugu13:
Tell me, KoM, what's the canonical way?

Log of the probability assigned to the right answer. Add a scaling factor if you like positive numbers.

Posted by The Rabbit (Member # 671) on :

quote:
Originally posted by Kwea:
give every option a 1$ chance then.

After adjustments, I ended up getting an 86% on hte final exam. LOL

For the method to work, there would have to be a requirement that the sum of the probabilities for all answers must = 1. To account for inadvertent mistakes in adding, your could sum the probabilities given each answer and scale the total to 1. That way giving every option a 1 would be equivalent to giving every option 0.25 (assuming 4 options).

BTW, what kind of adjustments were made? Was this based on differences between paper and electronic versions or something else.

Posted by Kwea (Member # 2199) on :

Since so many of the questions are less quantifiable than normal knowledge based questions, the teachers review each tests after the class has finished with it. One of the reasons the Angel system IS good at times is that it allows the teachers to receive data by the question, and they can automatically see the class average on not just the test as a whole, but by question.

If a questions is missed by 65% or more of the class, they go and look at it. Sometimes they miskeyed an answer; sometimes the info they gave us in class is different than the book, and a few times they had accidently tested us on material we had not even covered yet, or had covered on a previous test.

We all get scrap paper, and before we leave the testing room we are allowed to write down any questions we have about the test. Any questions have to be written, and quite often the class is right....we had been given different info than the answer indicated.

So our grade is fluid, sort of, until the teacher review the stats. You won't be penalized, unless the answer was miskeyed, but your grade MAY rise. It depends on what you answered.

Turns out this was my highest grade of the semester for this class. [Big Grin]

Posted by fugu13 (Member # 2859) on :

Ah, so the canonical way is the one that creates substantial variations in score for very small differences in probability estimates. Too bad that's nonsensical for scoring tests.

An approach grounded in statistics could make some sense. I see a few possible approaches (fitting a bi- or trimodal to see what probabilities a student assigns and avoid penalizing students who have different levels of caution but similar levels of understanding, modeling the test and using confidence intervals in the scoring somehow, all sorts of things), but most of them would be no better than or inferior to summing the probabilities assigned (which is strictly equivalent to estimating the number they would have gotten right by using those probabilities independently) and then picking reasonable score intervals.

Posted by Derrell (Member # 6062) on :

Congratulations on passing. [Hat]

Sorry about all the stress. Hopefully, you won't have to deal with it anymore.

Posted by just_me (Member # 3302) on :

quote:
Originally posted by King of Men:

quote:
As long as you gave it higher than 25% you got some amount of positive credit for the question. (under 25% you actually got negative credit...)
If the professor were doing this in the canonical way, you would get negative infinity points if you assigned 0 probability to the right answer. Presumably you would then fail the course.

Exactly... and he meant it. He had everyone sign something on the first day of class acknowledging that we understood the grading scheme and that a 0% on the right answer would result in immediate failure of the class.

QUOTE]Originally posted by fugu13:
Ah, so the canonical way is the one that creates substantial variations in score for very small differences in probability estimates. Too bad that's nonsensical for scoring tests.

An approach grounded in statistics could make some sense. I see a few possible approaches (fitting a bi- or trimodal to see what probabilities a student assigns and avoid penalizing students who have different levels of caution but similar levels of understanding, modeling the test and using confidence intervals in the scoring somehow, all sorts of things), but most of them would be no better than or inferior to summing the probabilities assigned (which is strictly equivalent to estimating the number they would have gotten right by using those probabilities independently) and then picking reasonable score intervals. [/QUOTE]

I'm not going to comment on whether or not this is a universally "good" or "bad" method of scoring (I'll let KoM deal with that since he's likely more qualified to really address it) but I will say that this method makes a lot of sense in the context of a decision analysis class, especially considering decision analysis is very much about the probability you assign to different outcomes.

Posted by fugu13 (Member # 2859) on :

just_me: I was speaking specifically of the logarithmic method of scoring, which I suspect your professor did not employ.

Posted by Sterling (Member # 8096) on :

quote:
Originally posted by fugu13:
Tell me, KoM, what's the canonical way?

There's a big cannon. They ask you to put your head inside the barrel. If you do so, you fail. [Big Grin]

Congratulations, Kwea.

Posted by King of Men (Member # 6684) on :

quote:
For the method to work, there would have to be a requirement that the sum of the probabilities for all answers must = 1.

I would auto-fail any student who gave me probabilities summing to more than one. If they sum to less than one, that's ok, it's an implicit assignment of probability to "none of the above" - although generally I would not be so evil as to make all four answers be wrong, and I'd expect students to know that.

quote:
Ah, so the canonical way is the one that creates substantial variations in score for very small differences in probability estimates. Too bad that's nonsensical for scoring tests.

Um, no? You have to decrease your probability estimate by a whole order of magnitude to change the score by 2. (Roughly.) And in any case, I said nothing about how raw scores translate to grades. Suppose an low-A student assigns 95% confidence to the right answer in 90% of cases, and 10% in the rest, then his score per question is -0.28. On a hundred questions, the A boundary is at -28. If you were slightly better than the boundary, say with a 92% chance of the 95% probability, then your expected score would be -23, and you'd need to change your estimate of no less than three questions by a whole order of magnitude - go from 95% to 9.5% - to drop down to a B. Looks quite reasonable to me.

Posted by The Rabbit (Member # 671) on :

quote:
I would auto-fail any student who gave me probabilities summing to more than one. If they sum to less than one, that's ok, it's an implicit assignment of probability to "none of the above" - although generally I would not be so evil as to make all four answers be wrong, and I'd expect students to know that.

That's extraordinarily harsh. I've given a lot of tests and I can tell you that the probability that some students might inadvertently assign probabilities that added to over 100% in a high pressure exam is very nearly unity. Auto failing a student because they incorrectly added 4 numbers on a high pressure exam defeats the purpose of giving exams, unless your testing people on being able to add correctly under pressure. That is the sort of thing that some one would only suggest if they'd never actually been responsible for a class.

Posted by The Rabbit (Member # 671) on :

quote:
Originally posted by King of Men:

quote:
Originally posted by fugu13:
Tell me, KoM, what's the canonical way?
Log of the probability assigned to the right answer. Add a scaling factor if you like positive numbers.

It would be interesting to try to devise an optimum test taking strategy for such an exam. First off, assigning any answer a 0 probability, would be extremely risky. If you are wrong and that answer turns out to be correct, you fail. So most students would automatically give every answer some non-zero probability. The difficulty is that the scale is non-linear so the potential loss from giving an answer a very low probability is much greater than the potential gain from giving a correct answer a very high probability.

So presume you adopt a strategy where you give probability X to the answer you think is best and probability (1-X)/4 to all the other answers (this presumes there are 5 options). It turns out that the optimum value of X is very nearly equal to the fraction you get right, so if you are normally an 80% student, your best bet would be X= 0.8, but if you are only a 60% student, your best bet would be X=0.6. But in either case, the penalty for being over confident is greater than the penalty for being over cautious.

The curious thing is that the curve has a much sharper optimum if you get 90% of the answers correct than if you get only 60 to 70% of the answers correct. So the 90% student pays a significant penalty for being either over or under cautious, but the 60% student doesn't really pay much of a penalty unless they choose an X less than around 45% or greater than 85%.

[ July 30, 2009, 06:45 AM: Message edited by: The Rabbit ]

Posted by fugu13 (Member # 2859) on :

KoM: imagine two people decided to assign a vanishingly small probability to an answer, and one happened to choose .01, and the other happened to choose .02. Despite that single percent difference being far too small for a human to follow the difference, it would result in a test score difference of .7; while that doesn't sound like much, that's the same difference in score as between a student who put .3 and one who put .15 -- a fifteen percent difference in probability estimate -- or between a student who put .9 and one who put .45!

Summing over the log is a nonsensical way to score a test, no matter what scale you use. Answer differences that are purely variations on perceived infinitesimals result in gigantic score variations in comparison to answer differences grounded in substantial consideration. It doesn't matter how you divide up the scale, it will frequently result in giving a much worse score to two people who were within a couple percent of agreement on all the answers they gave high probabilities to the right answers on, but one of them used a slightly smaller infinitesimal on questions where they were "sure" the right answer was wrong.

The only successful counter-strategy is to give almost entirely equally weighted answers, with a slight increase in the probability given for the one they think is right, and that would clearly be a lie -- but the consequences of getting a question wrong even once and assigning a small estimate to the right answer would dominate the consequences of getting several answers right.

Go ahead and name any strategy that assigns an approximately honest estimate of probability to answers and I'll give you one where a strategy that assigns a dishonest probability will dominate it at typical rates of error seen on student tests. Feel free to give estimates for the error rates in different confidence bands, too.

[ July 30, 2009, 07:36 AM: Message edited by: fugu13 ]

Posted by The Rabbit (Member # 671) on :

You are absolutely correct fugu, the benefit gained by increasing one probability from say 0.99 to 0.999 will never out way the risk of taking another probability from 0.01 to 0.001.

Posted by just_me (Member # 3302) on :

quote:
Originally posted by fugu13:
just_me: I was speaking specifically of the logarithmic method of scoring, which I suspect your professor did not employ.

He *did* employ it.

I don't recall how he would deal with it if we ever had the total probability be >1. It never came up. As this was a graduate level course the simple math wasn't an issue (or better not have been). I *think* the spreadsheet he used for scoring just scaled it to 1, though (this would also let everyone enter the probabilities as decimals or not (.25 or 25 for 25% for example) and still get scored properly)

I never put a 0 for anything, but I would go down to .5% if I was pretty sure something was "definitely" wrong... As I recall (it's been a few years) I'd usually start with one o the following distributions and then "tweak" it for the specific problem:

1) 85%, 5%, 5%, 5% (reasonably sure of 1 answer)
2) 45%, 45%, 5%, 5% (reduced to 2 answers)
3) 97%, 1%, 1%, 1% (very sure of 1 answer)

Posted by fugu13 (Member # 2859) on :

He employed logarithmic scoring, so answering 2% for something instead of 1% got as much of a score increase as answering 90% instead of 45%?

(edit: assuming the answer was correct)

[ July 30, 2009, 09:26 AM: Message edited by: fugu13 ]

Posted by The Rabbit (Member # 671) on :

quote:
I don't recall how he would deal with it if we ever had the total probability be >1. It never came up. As this was a graduate level course the simple math wasn't an issue (or better not have been).

Simple math isn't an issue in any of my engineering classes either, but I can't tell you how many times I've seen students add 2 and 3 on and exam and get 6. Under the stress of an exam, students will very frequently make stupid simple errors that they undoubtedly know are incorrect. AS reasonable exam does not unduly penalize students for that kind of error.

Posted by The Rabbit (Member # 671) on :

quote:

1) 85%, 5%, 5%, 5% (reasonably sure of 1 answer)
2) 45%, 45%, 5%, 5% (reduced to 2 answers)
3) 97%, 1%, 1%, 1% (very sure of 1 answer)

In this scheme, if you choose to go with option 3) rather than option 1), and you are right. It raises your score by 0.13. But if you are wrong, it drops your score by 1.6 (more than 10 times as much. Or to put it another way, answering 10 questions wrong using option 1 would be better than answering one question wrong with option 3). If you use answer 1, you would have to get 80% correct in order to do as well as using answer 2 and getting all correct.

A canonical scoring system is strongly biased against over confidence.

Posted by fugu13 (Member # 2859) on :

I mostly found it amusing KoM called it the "canonical" scoring system, since I see no rhyme or reason to why it should be so, other than possibly some vague appeal to the logit function (which is definitely more appropriate than the log, and might even have a reasonable argument for it, though I'd want to explore the way it would score various real-world situations).

Posted by King of Men (Member # 6684) on :

The canonical feature is failing the course if you assign a 0 probability to the right answer. The rest is details. [Smile]

Posted by fugu13 (Member # 2859) on :

KoM: then perhaps you shouldn't have responded to a question about what the canonical way was with "log of the probability assigned to the right answer" [Wink]

Logit would be worth exploring, now.

Posted by King of Men (Member # 6684) on :

Hmm? "Log of the probability assigned to the right answer" does have the canonical feature. [Confused]

Posted by King of Men (Member # 6684) on :

Anyway, a more serious answer to the objection that a wrong change from 2% to 1% is more harshly punished than a correct change from 98% to 99% is rewarded. I do not see this as a problem, for two reasons: First, this test is much more about self-knowledge than it is about subject-matter knowledge. Which for a college course is, in some sense, just what you need! If you feel quite certain that you know the right answer, and historically, when you feel this level of certainty you are right 60% of the time, then 60% is the correct assignment; in the long run you cannot form a strategy to beat this. That's the purpose of the test; it teaches you to evaluate your own performance, rather than rote knowledge.

Second, few people are actually capable of distinguishing between a 1% and a 2% belief anyway; certainly I'm not. Any student who agonises over whether to put 1% or 2% is, right there, Doin It Rong. Put 3% and be done. For this reason I'd be happy to modify the test so you could only put a few different degrees of belief - say 99%, 90%, 75%, 50%, 33%, 10%, and 1%. Rescale the total as necessary. This does away with the problem of the single-percent difference without requiring anyone to agonise over whether they really believe this answer has only a 1% chance of being right, or maybe it's 2%? But I'd also be ok with leaving the probability assignments open, and merely giving students the advice that they should use the fixed numbers I gave above. They don't have to do it the smart way if they don't want to.

Posted by fugu13 (Member # 2859) on :

A change from 2% to 1% isn't just punished more than a change from 98% to 99% is rewarded, it is punished more than a change from 45% to 89% is rewarded.

Posted by King of Men (Member # 6684) on :

I still don't think that's a problem, for the reasons given above.

Posted by fugu13 (Member # 2859) on :

Putting your best guess probability estimate dominates only if people are absolutely correct about the probabilities they will get answers correct. If there's even a 1% confidence interval, being slightly off in personal probability estimate can create large gaps in scores for tiny variations in probability given, and the confidence interval will be quite a bit larger than that for most people. If a student's probability estimates are likely to be incorrect, they can create a strategy that will dominate for questions with one very confident answer by putting slightly higher estimates for the perceived wrong answers.

Not only that, but unless there are a large number of questions on the test, despite the expected value of putting an accurate probability estimate being highest, the variance will be huge, making it quite hard to discern two people of equal ability.

And I don't see how in a four answer multiple choice test someone can put probability assessments that sum to one on all four parts of the answer when using almost any of your numbers, so that suggestion doesn't work. Any combination which allows a moderate amount of discrimination at the high end must necessarily allow quite narrow discrimination at the low end (at least a three times smaller, and that's assuming no one ever has an unlikely but still second best answer).

Posted by King of Men (Member # 6684) on :

I did say to rescale as needed.

quote:
If a student's probability estimates are likely to be incorrect, they can create a strategy that will dominate for questions with one very confident answer by putting slightly higher estimates for the perceived wrong answers.

I must say I do not see the distinction. If you know that your estimate is likely too high, and correct accordingly, that's exactly equivalent to just lowering your estimate a bit. Bingo, just the effect I wanted.

Posted by fugu13 (Member # 2859) on :

It doesn't matter how you rescale if the ordering is screwed up due to a high variance. For instance, the scores of twenty people who (accurately) assess the probability of all correct answers at 90% will range between numbers like -2 and -20 quite frequently, and the scores of twenty people who accurately assess the probability of all correct answers at 80% will range between numbers like -4 and -26, a huge overlap.

Whereas if you just had people pick the right answer and scored that, the scores still overlap, but only by around half the range, instead of the vast majority.

And if you're changing your estimates to be different from what you consider the actual probability because that strategy is more successful, that means the scoring system is a bad adaptation of the goal, which is presumably to have people accurately reveal the probabilities they've assessed.

Posted by King of Men (Member # 6684) on :

quote:
It doesn't matter how you rescale if the ordering is screwed up due to a high variance.

I meant, rescale the assigned probabilities to make them total 1, in accordance with your criticism that the sum wouldn't come out right.

The variance problem is easily soluble, just take more statistics. What's funding for? [Big Grin]

Posted by The Rabbit (Member # 671) on :

quote:
Originally posted by King of Men:
The canonical feature is failing the course if you assign a 0 probability to the right answer. The rest is details.

Why do you see this as a desirable feature? What benefit is gained by awarding such a severe penalty to some who assigns a 0 probability to the right answer? Why do you think that a student who assigned 100% to the right answer 99% of the time, and 0% 1 percent of the time deserves not only to fail but to get a lower grade than a student who assigned 1% to the correct answer 100% of the time. What possible advantage is there to it. Not only does it yield what I would consider highly nonsensical results but it is so easy for students to avoid the penalty by never assigning an zero probability even when they are certain an answer is wrong. I.E. it incourages dishonesty.

Posted by just_me (Member # 3302) on :

quote:
Originally posted by The Rabbit:
Not only does it yield what I would consider highly nonsensical results but it is so easy for students to avoid the penalty by never assigning an zero probability even when they are certain an answer is wrong. I.E. it incourages dishonesty.

Again, I'll caveat this by saying that I am confining my discussion to the context of the specific application (class) I encountered this scoring.

I don't think it encourages dishonesty... it encourages you to really think about your definition of the word "certain". If you really are certain that an answer is wrong go ahead and put 0% - nothing bad will happen if you're right after all. If it turns out you were wrong then you were "absolutely certain" about the wrong thing.

In my opinion it encourages a higher level of self honesty, actually. You have to really think about how certain you are... and I often found myself dismissing an answer and then upon thinking about it more realizing I wasn't really prepared to put 0% down... I just wasn't quite that sure after all.

The point in the context of a decision analysis class is that if you're going to be absolutely certain about something you *better* be right. Decision analysis relies extremely heavily on assigning probabilities to outcomes, and this scoring method made each question an exercise in doing just that.

I'm sure there are other systems that could be used that let you assign probabilities to determine score without having the exponential changes you get using the system this professor implemented... but I'll punt on that discussion and just say that I felt this worked out fine in the case of this specific application.

Posted by fugu13 (Member # 2859) on :

KoM: Or just using a more sensible scoring system that requires no particular contortions to avoid undesirable effects [Razz]

As for your rescaling suggestion, that doesn't get rid of any problems at all. The choice will still either be restricted too much or induce counter-factual strategies. Indeed, it just makes the connection with probability even harder for the student to grasp.

As for scoring systems, it looks like the overlap between an accurately self-assessed 90 percenter and 80 percenter is well less than half the range, and frequently not at all, using logit scoring (20 people, 20 questions). That's better than a "pick the best answer" approach.

Posted by Kwea (Member # 2199) on :

I love thread drift. [Big Grin]

Posted by King of Men (Member # 6684) on :

The variance problem doesn't seem so bad as all that, actually. This plot:

http://www.slac.stanford.edu/~rolfa/variance100.eps

shows the scores of 10000 students on 100 questions - that's four 25-question tests over a quarter, which is reasonable. The red points are the scores of eighty-percent students on the log system; the blue points are scores of the ninety-percent students. Green and black are the same students when they get -1 points for a wrong answer, and 0 for a right one. Now, count the overlap between red and blue: It is a rough triangle of height 400 students and half-width 5 bins, making 2000 students. The overlap between green and black is a triangle of height 350 and width likewise five bins, making 1750 students. Not a huge difference, no? And, of course, under the red-blue system you get to flunk the ones so stupid as to put 0% on correct answers. The purpose of the grader is to flip out and fail people! They are totally sweet!

Here's the code:

code:

void makePlots (int questions = 25) {
  TH1F* ninetypc = new TH1F("ninetypc", "", 300, -35, 3);
  ninetypc->SetStats(false);
  ninetypc->SetMarkerColor(kBlue);
  ninetypc->SetMarkerStyle(8);
  ninetypc->SetMarkerSize(0.5);
  TH1F* eightypc = new TH1F("eightypc", "", 300, -35, 3);
  eightypc->SetStats(false);
  eightypc->SetMarkerColor(kRed);
  eightypc->SetMarkerStyle(8);
  eightypc->SetMarkerSize(0.5);

  TH1F* ninetyflat = new TH1F("ninetyflat", "", 300, -35, 3);
  ninetyflat->SetStats(false);
  ninetyflat->SetMarkerColor(kBlack);
  ninetyflat->SetMarkerStyle(8);
  ninetyflat->SetMarkerSize(0.5);
  TH1F* eightyflat = new TH1F("eightyflat", "", 300, -35, 3);
  eightyflat->SetStats(false);
  eightyflat->SetMarkerColor(kGreen);
  eightyflat->SetMarkerStyle(8);
  eightyflat->SetMarkerSize(0.5);


  TRandom blah;
  for (int i = 0; i < 10000; ++i) {
    double ninetyscore = 0;
    double eightyscore = 0;
    double ninetypoints = 0;
    double eightypoints = 0;
    for (int j = 0; j < questions; ++j) {
      if (blah.Uniform() < 0.9) ninetyscore += log(0.9);
      else {
        ninetyscore += log(0.03);
        ninetypoints--;
      }
      if (blah.Uniform() < 0.8) eightyscore += log(0.8);
      else {
        eightyscore += log(0.06);
        eightypoints--;
      }
    }

    ninetypc->Fill(ninetyscore*(25.0/questions));
    eightypc->Fill(eightyscore*(25.0/questions));
    ninetyflat->Fill(ninetypoints*(25.0/questions));
    eightyflat->Fill(eightypoints*(25.0/questions));
  }

  TCanvas foo;
  ninetyflat->Draw("p");
  ninetypc->Draw("psame");
  eightypc->Draw("psame");
  eightyflat->Draw("psame");
  char fname[200];
  sprintf(fname, "variance%i.eps", questions);
  foo.SaveAs(fname);
}

Posted by fugu13 (Member # 2859) on :

Ah yes, a distinct negative movement in ability to discriminate among students over the simple, well known approach, plus requiring professors to teach students how to teach a new kind of test. Sounds spectacular [Wink]

Try the logit function [Smile]

Posted by King of Men (Member # 6684) on :

Everything's a tradeoff. I've given quite a few reasons this is a good system; the variance is not much increased over the linear one. In fact, now I think about it, your ability to distinguish between bad students may be increased.

Posted by fugu13 (Member # 2859) on :

You should try running your simulation before you make assertions like that [Wink]

So far there has been one, single identified positive, and that arguable: being able to fail people who put a zero probability for something that turns out to be correct. That this will hardly ever occur once you alert students to it makes it barely even noticeable. About the only other positive over pick-the-one-right-answer I can find is for when students really can't distinguish between two answers, and that is equally well handled by the student randomizing among the equal choices, as they often do.

Having an additional 12% of students one is unable to distinguish among accurately is a rather large increase that would result in distinctly worse grading, especially if adopted in classes affecting several hundred students. The system looks even worse when there's an alternative (have you run your code using logit yet?) that makes the ability to distinguish even greater, and I'm pretty sure will preserve any "positives" (including the ability to fail people putting zero for something that turns out to be correct).

Posted by King of Men (Member # 6684) on :

Where is your 12% coming from? The additional variance affects 250 out of 20k students; that's 1.25% by my math. You're also ignoring the benefit of teaching them to think about what they mean by certainty, plus honest self-assessment.

Posted by Dr Strangelove (Member # 8331) on :

Kwea, Angel does seem to operate very much like an electronic version of scantrons. I actually proctored a distance learning exam test two days ago and the student was using Angel. I didn't get too good of a look at it, but it's definitely not a service I would choose to use. But, whatever works I suppose. Congrats on passing!

Posted by fugu13 (Member # 2859) on :

It is actually more than 12%. # previously possible to be discriminated ~1750, number with your approach now possible to be discriminated ~2000. An increase of a bit over 12% (actually, nearly 15).

And your numbers are predicated on a fairly large number of questions answered in this way, plus the students being capable of answering them accurately from the beginning of the semester. If you run your simulation with even moderate student error as to their probability assessments, you'll find that variation dominates over whether or not they're actually about right.

As for the benefits listed, you can teach those things in many ways other than scoring systems, and certainly using logit scoring.

Posted by King of Men (Member # 6684) on :

I don't think that's the right way to calculate the increase; if some method had only two students' overlap (in, say, ten thousand), and another had ten but was far easier to explain, would you be impressed by the 500% increase in the effect of variance?

Posted by Kwea (Member # 2199) on :

After almost 6 weeks since meeting with the Dean of the nursing problem, nothing has been done to either address my grades OR to prevent this from happening again.

I was told today that best case possible is that I go thought an official grievance process, which will take at LEAST 3 weeks, and IF I win they make me take ANOTHER TEST, 2-3 months after the class has ended.

I responded by threatening to notify all 19 students who have flunked out this year of the problems, and offering to testify for them against the school if necessary.

They are now taking me seriously, and I meet with the Dean of the entire Health and Human Services on Tuesday.

Posted by Corwin (Member # 5705) on :

Crap. It looked promising at first. I hope they do something this time. They probably figured you'd give up at some point...

Posted by Kwea (Member # 2199) on :

I met with the Vice-President of the Health and Science department yesterday, and he was very interested in what happened. He is looking into it, and I am far more confident he will actually look into it and get back to me.

I also addressed the entire system, and told him I felt the process was punitive to any student who attempted to use it. He agreed, and said that even if I retest, he would not support any system that would result in a lower grade than the original one.

And he was most interested in the lack of communication between the teachers, the teachers and the administration, and with the students in general. I'd say that bothered him almost as much as it bothered me.

I showed up on Tuesday about 10 min early, and was told my appointment with him was on Wed. I showed up Wed and was told it had been on Tuesday. Sometimes I just can't win....so I asked for the Presidents number and office number, and offered to walk over and wait for him.

They arraigned for me to see the Vice-President within the hour. [Big Grin]