FacebookTwitter
Hatrack River Forum   
my profile login | search | faq | forum home

  next oldest topic   next newest topic
» Hatrack River Forum » Active Forums » Books, Films, Food and Culture » Statistical independence

   
Author Topic: Statistical independence
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
First off, this question is not for homework: I'm trying to get a handle on something for my own edification.

The sum of Two gaussian distributions is itself a gaussian distribution, if the random variables being looked at, X and Y, are independent.

X and Y are independent random variables if the evets X<a and Y<b, where a and b are numbers, are independent events, and events are independent if the intersection of events A and B=prA PrB

Now, the two random variables I'm looking at are, lets say D, and L.

D occurs 90% of the time, and L occurs 10% of the time. In addition, the random variable measured in L is the random variable measured in D + a random variable Q.

That is, L has a slightly higher average and slighly higher variance then does D, but is related to D.

L and D both are gaussian. I want to know if L+D is gaussian, which would be the case if L and D are independent random variables.

But I don't think they are, because L occurs much less frequently then D. if I choose a and b as the average of L and D, the probability that L<a is 1/2, and the probability that D<b is 1/2, and so the combination of those probabilities is 1/4. But that is not the intersection of events.

The mistake that I could be making in my thinking or at least A mistake is that when determining the probability L<a, I have to weight this by the 10% chance that L is an event that occurs and the 90% chance that D is an event that occurs. If I do have to perform this weighting in determining whether or not L and D are independent, then L and D could indeed be independent.

Blagh. So. Thoughts? Thanks in advance [Smile]

Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
quote:
In addition, the random variable measured in L is the random variable measured in D + a random variable Q.
This makes them not independent, by definition, unless Q is the same as D (in which case they're trivially independent by being the same distribution). The reason is, you've just defined Pr(L|D) = Pr(Q), and to be independent, Pr(L|D) must be Pr(L) (and Pr(Q) can only be Pr(L) if it is also Pr(D), since you've set it up as a summation).

The weighting thing is irrelevant.

Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
Thanks, fugu. I thought that they might not be independent for that reason, but my handle on stats isn't great, and I didn't think being mathematically dependent would make them necessarily statistically dependent.
Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
SenojRetep
Member
Member # 8614

 - posted      Profile for SenojRetep   Email SenojRetep         Edit/Delete Post 
I'll start by saying I'm a bit confused by your description. Your statement "D occurs 90% of the time, and L occurs 10% of the time" leads me to think D and L are binary random variables, not Gaussian, as you've stated.

Now, I think what you mean is D is the event that some Gaussian r.v. (call it d) is greater than some threshold (call it d_0), and L is the event that d > d_0 + q, for some q. Is that right? If so, L and D are certainly not independent, and neither is L+D Gaussian (since L and D are binary, anyway).

Alternatively, this statement "L has a slightly higher average and slighly higher variance then does D, but is related to D" makes me think you really do mean for L and D to be Gaussian (rather than binary), and that L = D+Q, where Q is also Gaussian and independent of D. In this case, L and D are still not independent, but L+D = 2D+Q will be Gaussian (since D and Q are presumed independent).

<edit>Can you describe the problem you're actually modeling with these random variables? That might help.</edit>

Posts: 2926 | Registered: Sep 2005  |  IP: Logged | Report this post to a Moderator
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
I was thinking of the problem in a certain way just to try to wrap my head around something.

But the actual problem is T=a+M+N+P. And I'm trying to intuit the shape of the distribution function that describes T. I think its similar to a gaussian, but with a longer tail on one side then the other, and a steeper climb on the short side compared to the long side.

a is just a number. M is a random variable. N is a random variable. And P is a random variable that is 0 90% of the time, and then 0-x 10% of the time. (All of these random variables are 0-whatever).

Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
SenojRetep
Member
Member # 8614

 - posted      Profile for SenojRetep   Email SenojRetep         Edit/Delete Post 
So the question is what is the distribution of T where T = a+M+N+P.

a is a constant (non-random) and so won't affect the shape of the distribution.

P is a binary random variable, with parameter p = 0.1 (i.e. it's 1 10% of the time), presumably independent of M and N. This will have the following effect: your final distribution will be the weighted sum of two distributions, slightly shifted relative to each other. Let p(x) be the distribution of M+N. Then the distribution of T will be 0.9*p(x-a)+0.1*p(x-a-1). So if p(x) looks like a slug (steep-rise, gentle taper), the distribution of T will look like the sum of a large slug (with peak shifted to the right by a) plus a smaller slug (with peak shifted to the right by a+1).

So the real question is what are the distributions of M and N, and are they independent of each other. Your statement that they are 0-whatever I take to mean they are non-negative, meaning they can't be Gaussians. So do you have a sense for the shape of their distributions? And are they related somehow to P, or is P independent?

Posts: 2926 | Registered: Sep 2005  |  IP: Logged | Report this post to a Moderator
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
M and N are distributions with ranges 0-c, and 0-d. Each value in each range has equal probability of occuring. Hypothetically, we could call c=10, and d=20. Which means that M+N gives us a normal distribution around 15. P is 0 90% of the time, and 0-25 (hypothetically) 10% of the time.
Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Summing two uniform distributions that look like that doesn't give a normal distribution, especially not around 15. For one thing, a normal distribution is symmetric and not bounded, and the sum of those distributions is bounded and not symmetric.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
SenojRetep
Member
Member # 8614

 - posted      Profile for SenojRetep   Email SenojRetep         Edit/Delete Post 
So M and N are uniform over the ranges [0,c] and [0,d] respectively? This means that any value is that range is equally likely. Are they also independent? If so, the distribution of M+N will be a tepee, centered at (c+d)/2.

From your description of P, it seems to be the product of two random variables: one binary (0 or 1) and one uniform on some range [0,d] (where you hypothetically set d to 25).

So we have the following (neglecting a for the moment): 90% of the time T = M+N and 10% of the time T = M+N+K (where K is uniform over some range).

This distribution will look like the following: take that tepee and scale it down to 90% of its original size. Then add to it another tepee, but this one scaled down to 10% of the original size, shifted to the right by d/2, and with sides slightly bowed in and the peak slightly raised and flattened. That's the distribution function for T (once the whole thing's shifted to the right by a).

Posts: 2926 | Registered: Sep 2005  |  IP: Logged | Report this post to a Moderator
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
thanks senoj, that makes sense, and sounds right.

I apologize for my abuse of the word "normal."

Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Yeah, you have to be careful with that one, since it has a precise meaning in statistics.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Paul Goldner
Member
Member # 1910

 - posted      Profile for Paul Goldner   Email Paul Goldner         Edit/Delete Post 
Something that my physics professors apparently forgot [Smile] Thats where I picked up calling something like that type of function "gaussian," or "normal."
Posts: 4112 | Registered: May 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
I'd suggest taking a look through the cartoon guide to statistics. It is both a fun read and an excellent first semester course worth of stats.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
rivka
Member
Member # 4859

 - posted      Profile for rivka   Email rivka         Edit/Delete Post 
quote:
Originally posted by fugu13:
I'd suggest taking a look through the cartoon guide to statistics. It is both a fun read and an excellent first semester course worth of stats.

That's by the same guy who wrote the Cartoon Guide to Physics and the Cartoon Guide to Genetics, right? He's good.
Posts: 32919 | Registered: Mar 2003  |  IP: Logged | Report this post to a Moderator
The Rabbit
Member
Member # 671

 - posted      Profile for The Rabbit   Email The Rabbit         Edit/Delete Post 
quote:
The sum of Two gaussian distributions is itself a gaussian distribution, if the random variables being looked at, X and Y, are independent.
I'm not sure if you are saying what you intend here. If you sum two gaussian distributions from populations that don't have the same mean, you will end up with a bimodal distribution. What I think you mean is that if you add paired samples, one from distribution X and one from distribution Y, then if both X and Y are normally distributed, the result will be normally distributed.
Posts: 12591 | Registered: Jan 2000  |  IP: Logged | Report this post to a Moderator
The Rabbit
Member
Member # 671

 - posted      Profile for The Rabbit   Email The Rabbit         Edit/Delete Post 
Paul, I'm going to take a wild leap here. Is this data collected using some sort of pulse counting technology? If so chances are very good that the data has a Poisson distribution and not a normal distribution. For the measurements with higher averages, a Poisoon distribution will approach a Gaussian distribution and so the presumption of normality is reasonable. However, for a population with a very low mean, most of the measurements will actually be zero and the distribution will not look remotely Gaussian.

If all your measurements are truly independent, I think the resultant data will also end up with a Gaussian distribution. I'll check that.

Posts: 12591 | Registered: Jan 2000  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
Yeah, Gonick. And for each one he teams up with an expert in the field. I know there are still classes in the corresponding fields teaching with each of those three books.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
fugu13
Member
Member # 2859

 - posted      Profile for fugu13   Email fugu13         Edit/Delete Post 
We'd probably be able to give a lot more help given some additional information about what's being modeled.
Posts: 15770 | Registered: Dec 2001  |  IP: Logged | Report this post to a Moderator
Kwea
Member
Member # 2199

 - posted      Profile for Kwea   Email Kwea         Edit/Delete Post 
My brain hurts just trying to READ this thread. [Big Grin]
Posts: 15082 | Registered: Jul 2001  |  IP: Logged | Report this post to a Moderator
   

   Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:


Contact Us | Hatrack River Home Page

Copyright © 2008 Hatrack River Enterprises Inc. All rights reserved.
Reproduction in whole or in part without permission is prohibited.


Powered by Infopop Corporation
UBB.classic™ 6.7.2