Friday, October 21, 2011

Education pays!

What costs society a quadrillion dollars of lost value? Not educating the poor! This is much bigger than the damage done by the worst nightmares models of any global warming. On par with WW3 in lost value. So it is nice to see it getting summarized in this article. On a personal note, my father was the guy who first did the economics analysis of the Perry Preschool project. Basically the curious fact which seems to keep getting rediscovered is that preschool helps later life outcomes. It doesn't help 2nd grade--but it does help finding a job. My personal view is that it would help the 2nd grade marshmallow test which is one of the best predictors of future success in life. But this experiment hasn't been run to my knowledge.

Sunday, October 25, 2009

Whining about Parker's reliability

Robert Parker has changed wine by adding numbers to ratings. Luckily for me, we both like big wines and so I can buy wines pretty much by the number. Unfortunately, these numbers don't seem as verifiable as one might like. What is their accuracy? Without that, it is hard to think of them as real measurements of anything of importance. Recently Parker tasted some wines he had previously rated. The relationshipe between them is pretty weak. For those that like numbers with their graphs, the p-value is about .5. The lower right point is what Parker likes the best now. But showing the power of the market--I can't seem to find this to buy. Others beat me to it?

Sunday, April 19, 2009

What the bell curve should have talked about

The Bell curve book argued that intelligence is mostly inherited. They when on to make a leap--therefore it is not worth while to educate the stupid since they will always be stupid. But they miss the point. In statistics, we are interested in contrasts. So the question isn't whether a stupid person can be made in to a genius, but instead whether his IQ can be increased enough to get him a better job. It turns out, this is pretty easy to do and fairly cost effective. So once again, going for the controlled experiment helps focus attention on what can be changed, and further provides proof as to whether or not the change is worth it. Go statistics!

Saturday, April 18, 2009

More ice in my antarctic please

This one is for Adi. Not all green house related news is reported on equally.

Thursday, January 22, 2009

Statistics of War

I contend that the distinction between fact and opinion in unwarranted. Facts and opinions are really just varieties of statements, which are testable to varying degrees and "true" in the sense that they are supported by evidence of varying quality. So a statement like "rain yesterday" on the TV news is considered fact because it is 1) obviously testable and 2) reliable in the sense that the weatherman has been doing this for a long time and gets it right nearly every time. He also has no motive to lie, incentives to be accurate and consequences for errors. A thoughtful analysis of facts should consider the 1) supplier 2) testability 3) quality of supporting evidence. Here is an example: It was common place in the media to decry the horrors of Israel's war in Gaza. This is most easily done with a lament about the number of civilian deaths. So lets consider a factual claim made by Ethan Bronner in the NY Times on January 10th.
A tank shell landed outside the home of a family in Jabaliya, northeast of the city, killing eight members of the same family who were sitting outside, hospital officials said, bringing the death toll to more than 820. Nearly half of the dead were reported to be civilians.
Now the "fact" (i.e. Statements) here are two: 1) the death toll (820) and the 2) civilian death toll (approx 400). Let's Anaylze them closely:
  • Are these statements readily testable?
First, Hamas fighters do not wear uniforms. They fight in highly concentrated civilian areas and they readily employ young adults to provide cover. Now of course, these considerations require verification but there are abundant videos on the web that testify to these statements. Attribution of death is further complicated by cases of"friendly fire" or secondary explosions or a myriad of other inevitable accidents caused by placement of the machinery of war in the middle of a city. So the premise that casualty figures can even be determined accurately is questionable. Now this thesis itself suggests its own testable hypothesis: reported casualty figures should be inconsistent and variable. Indeed, that is exactly the case: on January 6th the NY Times reports that:
The death toll in Gaza reached around 640 on Tuesday, according to Palestinian health officials. The United Nations has estimated that about one-fourth of those killed were civilians, though there have been no reliable and current figures in recent days.
The provide a credible estimate of the intrinsic variance. First, note that on Jan 6th it was reported that out of the 640 dead 160 were civilians. Then on Jan 10th it was reported that out of the 820 dead 410 were civilians. So the reported number of total dead in the 4 days between the two Times articles grew by 180 while the number of civilian deaths (which must of course be lower than the total number of deaths) grew by 250. From this contradiction we can prove that the uncertainty in the casualty statistics is at least 100%. It is interesting, for those who like to dwell on MSM bias that the Times' reporters do not suggest that these numbers are inaccurate, only that they my be out of date.
  • Who is the supplier of these statements?
The data comes from Hospital officials- presumably Palestinian Arabs. Now hospital officals are certainly not in a position to determine civilians from fighters if the latter are not clearly identifiable. Hospital officials by definition are not on the battlefield and therefore can only communicate what they have been told. So the true "supplier" are Hamas officials and other residents.
  • What is the extrinsic variability of the data?
The intrinsic accuracy of the casualty statistics are very poor; the problem by its very nature is hard to get right. But there are enormous questions related to extrinsic factors that have nothing to do with the estimation problem directly. We have to answer two questions:
  1. What are the incentives that the actual data suppliers have to communicate honestly?
  2. Do the suppliers have a history of error?
Before answering these questions we have to find a strategy behind Hamas' war against Israel. Why do they fight at all? The battle is not even remotely even; Israel could choose to level the strip, sending Hamas as well as the civilian population to its destruction or exile. Somehow Hamas knows this will not happen. How do they know this? Certainly this is exactly how wars have taken place historically; this is what Hamas would do to Israel if they could; it is what Arab states do to each other; it is how Russia handles the Chechnyian; it is what everybody does in Africa. Hamas knows Israel will not wipe them out because 1) it hasn't yet 2) it expects and counts on the "International Community" to prevent a Western Industrialized state with few allies from exterminating a poor, long suffering, basically defenseless, third world society. So to answer question one: Why would they lie about casualties? Because it is a powerful weapon and it is their only weapon. Finally, to answer question 2: Do the suppliers have a history of error? YES. There are numerous examples of outrageously false Palestinian casualty claims: the Jenin Massacre, Muhammad Al Durah, the Gaza Beach Massacre, Green Helmet Man, Fauxtography. Ordinarily, this kind of gross manipulation should lead to an enormous credibility problem which would undermine their goals. It doesn't. Outside of the natural set of ardent supporters of the State of Israel, Palestinian Arab claims are accepted as true until demonstrated false. The quest to find an acceptable explanation for this is the hardest problem of all.

Tuesday, December 09, 2008

Lilac Bloom

Researchers at the Climate Change Research Center at the University of New Hampshire have colloborated with the organization Clean Air-Clean planet to study indicators of climate change in the North East. One of the indicators is the start of the Lilac bloom. The hypothesis is that as climate changes and temperatures rise the Lilacs will bloom earlier and earlier. The data consists of measurements made every years since numerous sites around New England: On average, the bloom has begun 1 day earlier per decade. Now is this result "statistically significant"? To assess this, one would need a model whereby some independence can be asserted. What should NOT be done is exactly what was done: a calculation of the P-value assuming independence. Now in a given year the data are measurements at different spatially local locations: so independence among observations and across years is plain silly. Having overstated their sample size by a factor of 30, the small decrease is now quite statistically significant. They should have measured every lilac and increased the sample size by a factor of 1000!

Thursday, September 04, 2008

Death to tall people

Once again tall people are at a higher risk of dieing. If your tall and haven't died in the past decade, you might recall that there was a study that showed each inch of height cost you one year of life. That is a lot of tree branches to run into in order to kill tall people off at that rate. (On a side note, that same year, left handers were told they would die a decade early.) Well the scientific story behind both of these old results is simply that early in the 1900 people were often short and right handed. Later in the century, more people were left handed and more people were taller. Hence if you died and were tall--you were more likely young. This is called a cohert effect. So the NYT article is quite possibly rediscovering the same old bad science. Without enough of a better link--it is hard to know for sure. But that is where my money is.