The Virtual Philosophy Club: The Central Limit Theorem (CLT)

Monday, November 16, 2009

The Central Limit Theorem (CLT)

[From Stu. Images added by Ira from this source.]

OK,OK Ira, you have guilted me into action and so I will share something that has been bothering me lately. As many of us may know, the CLT is, next to the Law of Large Numbers, the most important principle in Statistics and is used to justify many a research study. So, I am thinking it behooves us all to try to understand this CLT so we can become more discerning citizens, n'est pas?

Here is the way I understand it. Given a random variable, X with mean mu and standard deviation sigma (X may or may not be normally distributed). Now we do the following m times: We draw n samples from X and compute the mean X-Bar giving us m X-Bars which will have their own particular probability distribution, PD. Finally the CLT promises that as m and n approach infinity, PD will approach a Normal distribution with mean mu (the mean of our original random variable X) and a standard deviation of sigma divided by the square root of n (the sigma of X and the n of the n samples). Pretty amazing actually. Please correct me if my understanding of this is incorrect as I'm going from memory here.

Now here's the problem that is bothering me. Say a research study is done where the researcher does not know the actual probability distribution so he or she can use the CLT to draw inferences about the population but precisely how? From what I understand they want to use as large an n as possible but surely do not use a large m (repeated sampling). And while I understand that once you have a normal/Gaussian probability distribution, it's easy to compute deviations from the mean and confidence intervals, just exactly what is the procedure used. Can anyone give me a useful easy-to-understand example?

Bewitched, bothered and bewildered,
Stu

5 comments:

Ira Glickstein said...: Thanks Stu for posting a new Topic that is quite a departure from the normal fare here.

For those who are unfamiliar with the Central Limit Theorem (CLT), I added a couple of illustrative example animations to Stu's posting. The first starts with random samples taken from a PARABOLIC distribution. The second starts with a UNIFORM distribution.

In both cases, repeated sampling ends up with an identical NORMAL distribution! That is the whole point of the CLT. (The source for these animations also shows an initial TRIANGULAR and INVERSE distribution that, when sampled, also results in a similar NORMAL distribution, but with a different Mean.)

I'll think about this and try to come up with a practical example that makes sense from a philosophical view.

HOWARD !!! - ANY IDEAS ???

Ira Glickstein; November 16, 2009 at 9:32 PM
Howard Pattee said...: I'm not good at abstract statistical thinking. I find examples more helpful as shown for example in Learning by Simulations.
Howard; November 16, 2009 at 10:16 PM
Ira Glickstein said...: Howard's link shows that even a BI-MODAL distribution has a NORMAL one at its heart.

Why do YOU care?

Here is a simple explanation for readers not familiar with statistical distributions, and why it matters to you!

When you measure things in Nature, or do "social science" studies of people, you (almost) always get what is called a NORMAL (bell-shaped) distribution. The illustrations at the head of this Topic show how the familiar bell-shape emerges.

So, what is a BI-MODAL result, and how can you get it? OK, say you want to know the average height and variability in height of young men in the US. You could go to a college campus and measure the heights of the first 100 or 500 or more men who happen to pass by. As shown in Lies, Damned Lies, and Statistics on this Blog in 2007, you will find that virtually all the young men are between 62" and 77", with a peak around 69" to 70". If you graph the distribution, you will get a nice, normal bell-shape.

But, what if, by bad luck, you happen to do the experiment outside the athletic building just as a championship basketball tournament is letting out and half your sample happens to be basketball players? Competitive players tend to be over 72" tall. Instead of getting a nice symmetrical bell-shape, you will get one with a bump on one side, like the one in Howard's link.

Misuse of Anecdotal Math

It seems to be a Law of Nature that all things that can be measured tend to follow a normal curve. So, when reviewing results of experiments and surveys and so on, expect to find that bell-shape. Also, when interpreting results, the statistics of the normal curve can bring out truths that are not necessarily apparent at first glance.

For example, as I showed in Lies, Damned Lies, and Statistics, anecdotal math can be used to falisfy the truth and truthify falsehood.

Much has been made of the apparent bias against women and minorities in sports and the professions. Some of it is, unfortunately, all too true, and must be corrected of course. However, some of the disparity is valid and not due to bias. The best-selling 1994 book The Bell Curve showed why in a "politically incorrect" way.

Here is a specific example. Young women, on average, are only about 5" shorter than young men, which is less than 10%. Therefore, you would expect to find only a 10% difference in the number of women and men in athletics. Right? WRONG!

It turns out that in sports where height (and weight, etc.) are critical, that 10% average height difference should result in a 100 to 1 ratio of men and women in that sport. That is why it is legal and customary for the highest levels in many sports to have woman-only leagues that exclude men.

The bell-curve also explains representation in professions where academic intelligence is critical. The fact that some identifiable groups are under- (or over-) represented does not necessarily mean there is any illegal bias involved.

Ira Glickstein

Ira Glickstein; November 17, 2009 at 1:31 PM
Stewart A Denenberg said...: Thanks Ira and Howard for the timely responses. As fate would have it, after much more research I found this website:

http://people.hofstra.edu/Stefan_Waner/realWorld/finitetopic1/confint.html

which explains pretty well how just one sample to compute the mean and standard deviation from a population with unknown probability distribution can be used to compute the confidence interval (the probability that the sample mean lies between a specified upper and lower limit around the mean of the normal distribution (usually expressed in scaled units of the standard deviation). If you read the text at the link, be careful as the author is somewhat cavalier in his use of sigma and s.

The reason I got interested in this was from my reading of "The Black Swan" by Nassim Taleb who claims that the Normal or Gaussian or Bell Curve is not the best representation of a large chunk of random variables such as the stock market, book sales, and any population where mean and std dev. are not adequately descriptive statistics. Instead he proposes fractal probability distribution for these cases --- so that is my next project and while I understand what a fractal is, a fractal probability distribution blows my mind --- so it's back to Mandelbrot...BUT, before I go, could you (Ira) pls explain this quote from your response? Why the square law?
{Ira said:}
Here is a specific example. Young women, on average, are only about 5" shorter than young men, which is less than 10%. Therefore, you would expect to find only a 10% difference in the number of women and men in athletics. Right? WRONG!

It turns out that in sports where height (and weight, etc.) are critical, that 10% average height difference should result in a 100 to 1 ratio of men and women in that sport. That is why it is legal and customary for the highest levels in many sports to have woman-only leagues that exclude men.; November 18, 2009 at 12:37 PM
Ira Glickstein said...: Here is Stu's Link in clickable form.

In my previous Comment I link to my 2007 Blog Topic where I say:

The height of young American women ranges from about 4' 9" to 6'. For young men it is 5' 2" to 6' 5". That's a difference of about five inches -- less than ten percent.

Therefore, in basketball and other sports where height is critical, you'd expect about ten percent fewer women than men. Right?

Anything less would be proof of discrimination against women. Right?

WRONG !!!

Actually, if you had a cut-off of six feet, over 100 men would qualify for every woman who qualified!

Stu asks me to "explain this quote from your response? Why the square law?"

I understand why it seemed like a "square law" because I said a 10% difference in average male/female heights would lead to 100:1 under-representation of women in sports where height is critical. However, it is not a square law, but rather a result of overlapping male and female normal curves.

Here is why: Standard deviation in height is about 2.5", so women are about two standard deviations shorter than men. Say championship play in a sport like basketball generally requires players who are over 72" tall. That is in the 2- to 3-sigma range in male height ("very tall", and "extremely tall" on my graphic). Women over 72" would be in their 4- to 5-sigma range (even taller than "extremely tall" in their range).

Using height data for young Americans, given 1000 men and 1000 women, about 136 men will be "tall", 21 will be "extremely tall", and about 1 above that, for a total of about 158 who qualify. For women, only about 1 will qualify. So, the actual ratio will be about 158:1.

As I point out in the Blog Topic, even if we relax the height requirement to 67" (just above average for the combined male/female population), there will be more than a 5:1 disparity between men and women. (Actual calculation: about 842 men 158 women, a ratio of 5.3:1).

Ira Glickstein; November 18, 2009 at 7:40 PM

My Personal Best on the Internet

>>>The Virtual Philosophy Club The Virtual Philosophy Club - Courteous Discussion of Serious Topics
>>>Visual Ira - Visualize Science and Technology With Ira
>>>"2052 - The Hawking Plan""2052 - The Hawking Plan" (Free online Novel) Amore, amorality and Stephanie Goldenrod's mission to save civilization for an infinite future.
>>>My Future Tech Predictions My predictions for the next several decades (companion site to my novel)
>>>Curb Your EnthusiasmCurb Your Enthusiasm - Fantasy Episode- Fantasy Episode
>>>My YouTube Videos>>>>My YouTube Videos
>>>"What is Time?">>>>My "What is Time?" Video

BLOG GUIDELINES

Click Here to Read WELCOME Posting

Express your opinions forcefully. Collegial cross-discussion is encouraged. Use logical, fact-based arguments. Stick to the subject of the main topic. No political or religious diatribes. No personal attacks on others. No chain letters.

Participation levels:

LURKER - Anyone with web access may read this Blog.

COMMENTER - Anyone with a Google account may post a Comment to any existing Topic thread. Unless you are an Author (see below) your Comment will not appear on the Blog for a day or two until I approve it.

AUTHOR - An Author may initiate new Topics and have their Comments instantly appear without my further approval. [To become an Author, send an email to Ira Glickstein at Ira@techie.com with your name, email, and brief bio.]

New Topics should be carefully written, as if you were planning to give them as a presentation at a club meeting. Please sign Topics with your full name.

NOTE: This Blog is not officially associated with the Philosophy Club of The Villages, FL. However, current and former members of that organization are invited to participate.

Curb Your Enthusiasm - Fantasy Episodes

My brother and I are members of the "cult" that enjoys Curb Your Enthusiasm, the HBO comedy series about to enter its seventh season.

Although our political viewpoints are different, I think Larry David, who created and stars in Curb, and previously co-created Seinfeld, is a comedy genius. Here is the link to the fan Blog I created

WHY DID I CREATE THE BLOG?

Some years ago, while asembling a wooden cabinet, I had a disastrous experience. I got too cocky and made a mistake that caused the partially-built cabinet to collapse "like a house of cards". It was deeply disturbing at the time, but is hilarious in retrospect.

That experience inspired me to write a story concept where Larry David, who portrays himself in Curb as a clutz when it comes to manual skills, is shamed into tackling a do-it-yourself project. In my story idea he, in his inept way, ends up doing a better job than an expert cabinet maker.

Since Larry David does not accept story ideas from the general public, my story outline has mouldered away on a computer disk in my closet somewhere. After a visit by my brother reignited my passion for Curb, I decided to resurrect the story idea and "free" it on the Internet.

HERE ARE THE LINKS

Football fans play "Fantasy Football", so why can't sitcom fans play "Fantasy Episodes" of their favorite series? I've started a Blog Curb Your Enthusiasm - Fantasy Episodes. Read the Welcome posting and the first Fantasy Episode DIY Larry.

Here are some short highlight clips of past seasons of Curb and a tickler for the seventh season as well as a trailer for a Woody Allen movie staring Larry David.

2052 - The Hawking Plan

2052 - The Hawking Plan is Ira's free online novel. Please read it and pass it on! Stephanie Goldenrod strives to save human life and civilization for an infinite future. Amorality, amore and deep ethical and philosophical issues. What will life, liberty and technology be like several decades in the future? Click here for PREDICTIONS - How technology will affect life and liberty in future decades.

A peaceful world where large-scale military conflict is absent.
A political economy dominated and effectively ruled by transnational corporations.
A civilization where reason is trusted and faith is suspect.
A "positive ID" regime where religion-based terrorism has been suppressed, along with virtually all anonymity and privacy.
A society where most people, including ministers, priests and rabbis are not literal believers.
A post global warming world where humanity has been decimated by genetic engineering disasters.
A population served by Intelligent Robotic Agents, certified intelligent at the average human level.

A brilliant plan to spread human life and civilization far and wide into space as insurance against further disasters here on Earth. 2052 - The Hawking Plan

Want the latest version of the novel as a .pdf file? Want a professionally printed and bound copy? http://www.lulu.com/spotlight/queenbeebooks.

ALSO PUBLISHED BY IRA Several "Knols" (bits of knowledge) on Google's Knol platform. Some are a bit technical but should be accessible by a general audience:

Exercise in Bed - Do Your 4's! Some easy exercises you can do in four minutes flat in bed. Do some easy exercises in bed morning and evening. Energize yourself for the day. Relax yourself for some solid sleep every evening. If you do this every day for 120 years I guarantee you will live long :^)

Optimal Span What is the most effective span for a hierarchical structure? For example, Management Span of Control is optimally between 6 and 7...

Quantifying Brooks Mythical Man-Month Brooks Law states: “Adding manpower to a late software project makes it later.” This Law is applicable to any task involving lots of people in complex interaction...

Bayesian AI Advisor Bayes Theorem has practical applications. Use it to make real world decisions. A relatively simple Excel-based tool helps you choose the right course of action in the face of uncertain probabilities and inexact test results. It is available for FREE.

Decision Aiding Tool - Do a Trade Study Choose the Best out of a set of attractive Alternatives. Buying a house, car, pet or PC; choosing a course of action at work? Here is how to get everyone involved and make a rational choice. A relatively simple and FREE Excel-based tool helps you make the right decision, including decisions in the face of uncertainty!

Aristotle's Physics - The Four Causes How would Aristotle have used computer graphics to depict his ideas about the Four Causes (Material, Formal, Efficient, and Final)? Written ca. 350 BC, his ideas appear primitive by our 21st Century standards but they capture the wisdom of the ages. The plain text is inadequate for modern understanding - graphics explain his concepts more clearly to a contemporary audience.

Aristotle's Physics - The Five Elements Powerpoint charts Aristotle might have drawn explaining the Five Elements (Aether, Air, Fire, Water, and Earth).

Nash Bargain Advisor. John Nash won the 1994 Nobel in Economics for his work on what came to be known as "Nash Equilibrium", where two or more competing entities "cooperate" (without illegally colluding) to reach a "Nash Bargain". The book and movie "A Beautiful Mind" dramatized Nash's life story and work. A relatively simple Excel-based tool helps you calculate a Nash Bargain in a competitive situation. It is available for FREE.

Ira Glickstein

Monday, November 16, 2009

The Central Limit Theorem (CLT)

5 comments:

Ira Glickstein, Administrator

Authorized Authors

LIES, DAMNED LIES, AND STATISTICS

Causality and Determinism

Morality, Ethics, and Religion