A look at Keiths paper

Posted on April 14, 2015 by phoodoo

So, here is the link to a paper which Keiths claims says something about income inequality, and I say is another example of the proliferation of shoddy science.

http://pss.sagepub.com/content/early/2015/03/19/0956797614567511.abstract

The highlight of the paper is this claim:

“We found that of the 40 search terms used more frequently in states with greater income inequality, more than 70% were classified as referring to status goods (e.g., designer brands, expensive jewelry, and luxury clothing). In contrast, 0% of the 40 search terms used more frequently in states with less income inequality were classified as referring to status goods.”

Where does one begin to critique the ridiculousness of this claim? 70% of the majority of searches are for luxury goods in some states, 0% of the most searched items in other states?

If one claims the difference in search patterns from one state to another is that dramatic, shouldn’t ones bs detector already be ringing alarm bells?

And what is considered a luxury good? What is the cut-off for equal states and unequal states? Did they decide the luxury terms before or after they viewed the data? Who do they claim is doing all this searching for luxury, the haves or the have nots?

The red flags are everywhere. Isn’t it likely that they had a conclusion that they wished to reach, and that they fulfilled their own prophecy?

140 thoughts on “A look at Keiths paper”

Alan Fox on April 16, 2015 at 4:47 pm said:

phoodoo,

Then you need to tell me which comments you are injured by.
DNA_Jock on April 16, 2015 at 4:50 pm said:

phoodoo asks repeatedly:

I think you didn’t quite understand my question. For the purpose of the results, was Michigan considered a state with equality or inequality? What about Oregon? They are next to each other in the rankings.

But Hobbes very patiently explained to you yesterday:

To answer one of Phoodoo’s questions, there was no cut-off for equal or unequal states, rather each state was assigned a value that could be higher or lower.

Income inequality was treated as a continuous variable, not a dichotomous one. Thus the phrasing in the abstract which so confused you phoodoo – “of the 40 search terms used more frequently in states with greater income inequality” refers to the search terms that showed the strongest positive correlation with the residual Gini values.
This is the primary reason people are unwilling to engage you seriously phoodoo. You do not appear to comprehend the simplest things that people explain to you.
keiths on April 16, 2015 at 4:51 pm said:

Alan,

You are not obligated to honor phoodoo’s request. Moderation is supposed to make the site better, not worse. Honoring phoodoo’s hypocritical request will make it worse.
Alan Fox on April 16, 2015 at 4:57 pm said:

keiths,

You are not obligated to honor phoodoo’s request.

No indeed. It would help if everyone could attempt to avoid the chippy asides however.
Neil Rickert on April 16, 2015 at 5:00 pm said:

phoodoo,

It seems to me that the entire thread is a guano thread.

So maybe we can just remember that it is guano thread, and then we won’t have to move anything.
Alan Fox on April 16, 2015 at 5:03 pm said:

Neil Rickert: So maybe we can just remember that it is guano thread, and then we won’t have to move anything.

Trying for a venue that welcomes free and open discussion can be a rocky road. 🙂
Hobbes on April 16, 2015 at 5:05 pm said:

phoodoo:
Hobbes,
I think you didn’t quite understand my question. For the purpose of the results, was Michigan considered a state with equality or inequality?What about Oregon?They are next to each other in the rankings.

They weren’t considered either one. The data (the residual Gini values by state) were submitted to Google Correlate. Google Correlate then produced those search terms that were distributed by state (in terms of frequency of search) in as nearly the same way as the residual Gini values as possible. In other words, if Connecticut is at one end of the residual Gini value scale, Utah at the other end, and Michigan and Oregon near the middle, Google identified those terms that had the property of being searched for the most in Connecticut (and the states near it in the ranking), while being searched for least in Utah (and the states near it in the ranking), and while also being searched for an average number of times in Michigan and Oregon (and states near them in the rankings). Google Correlate identifies patterns of searches that match the pattern of some other data you submit to it, in this case the pattern of residual Gini values. No one ever had to decide if a particular state is equal or unequal.
Hobbes on April 16, 2015 at 5:30 pm said:

phoodoo:
Hobbes,
How did they weight these variables? Does having more foreign born people make a state more or less equal?Does having a rural population make a state more or less equal?

Asking “how did they weight these variables?” isn’t really the right question.

The regression analysis is how the effects of the variables are accounted for, and the regression analysis reveals the “weights” , you don’t choose them. To simplify, let’s imagine that we only had to worry about the Gini values and the rural/urban ratio. If you plot on an x-y plot the Gini value for each state versus the percent of urban population for each state, which I did using the author’s data, you get a big mess of scattered data points that doesn’t look like it will tell you a whole lot.

However, if you do a regression on the data — that is if you assume that there might be some relationship between Gini and %Urban population, and then use the regression mathematics to reveal what that relationship might be, which I also did with this data, you’ll find that there is a very slight trend toward higher Gini values with higher %Urban population; that is, states with relatively higher urban populations, have a statistical tendency toward higher inequality as measured by Gini value. But the “weighting”, to use your term, is relatively weak – the impact of the urban population on Gini value is there, but it isn’t very substantial.

Because the trend is small and their is a lot of scatter in the data, this means that %Urban population can be used to predict a small amount of the Gini value, but not the majority of it. The residual Gini value for each state is then that portion of the Gini value that cannot be attributed to differences in urban population after subtracting out the trend.

The important point is that the amount subtracted away isn’t just “picked”. It is determined by analyzing the data. That said, there are different ways data can be analyzed, and a researcher must pick a method based on some defensible rationale.
hotshoe_ on April 16, 2015 at 5:31 pm said:

Hobbes: They weren’t considered either one. The data (the residual Gini values by state) were submitted to Google Correlate. Google Correlate then produced those search terms that were distributed by state (in terms of frequency of search) in as nearly the same way as the residual Gini values as possible. In other words, if Connecticut is at one end of the residual Gini value scale, Utah at the other end, and Michigan and Oregon near the middle, Google identified those terms that had the property of being searched for the most in Connecticut (and the states near it in the ranking), while being searched for least in Utah (and the states near itin the ranking), and while also being searched for an average number of times in Michigan and Oregon (and states near them in the rankings). Google Correlate identifies patterns of searches that match the pattern of some other data you submit to it, in this case the pattern of residual Gini values. No one ever had to decide if a particular state is equal or unequal.

Perfect! Thanks for explaining this so clearly.
phoodoo on April 16, 2015 at 5:35 pm said:

DNA_Jock,

Well, first I would suggest that is not the way the sentnce is written. But fair enough, I don’t have the data from the study, I don’t know what these search terms were, but I did a little informal study using Google correlate just now.

Anyone can do it. Just type in a word that you think relates to a status item-say Ferraris, or Patek Phillipe watches, or Cartier or Chanel perfume. And guess what the results are, as consistent as can be?

With everyone of these types of queries, the highest concentration of hits is virtually in the exact same places-California, Texas, New York Florida, Louisiana, Illinois and Nevada. And guess where the lowest concentration of hits are, Montana, Wyoming, Idaho, Utah Nebraska..Surprise surprise!

In fact, its so predictable, that you can say exactly what the map is going to look like before you finish the query. Type Mercedes Benz guess what the map will look like? A dark band of green around those coastal states with money and retirees. There is no mystery here. It is as clear as day. People who live in Texas and Florida and California and New York and New Jersey look for Mercedes Benzes. People who live in Montana and Wyoming look up equestrian and cattle ranching and saddlery.

That’s your answer. Just try it.

Hobbes asked me to improve on their study? I just did it.
phoodoo on April 16, 2015 at 5:54 pm said:

And here is how even more ridiculous this is. If you type in hot cowgirls, its lights up like a Christmas tree in Wyoming, Montana Nebraska, Oklahoma, Utah, Idaho, but the lights are off in California, New York, Illinois.

But then if you type in sex, or porn, or even massage it gives you no results at all.

So much for the science.
Hobbes on April 16, 2015 at 6:03 pm said:

phoodoo:
Hobbes,
Also, do you agree or disagree with the link I provided you about most published studies being wrong?

I am aware of the article you linked to, “Why Most Published Research Findings Are False,” by Ioannidis. It is a valid reference for you to invoke given our discussion. I won’t say that I agree or disagree with it because the issue isn’t black and white. I think it is an interesting and valuable paper that raises some valid points, but it is itself also subject to criticism (for example, Why Most Published Research Findings Are False: Problems in the Analysis, by Goodman & Greenland, which argues why Ioannidis’ estimates are too high. http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0040168).

[insert joke here about how, if most published research findings are false, we can be sure that the research finding that most research findings are false is not itself false, haha].

The details are way too involved to get into here (and frankly many are over my head). But let me say this. Despite its provocative title, Ioannidis’ paper was theoretical. He didn’t actually show that any particular research was incorrect, he simply made a statistical argument, based on certain assumptions, to show the possibility that many research findings of certain types (particularly observational studies in medicine) could be false positives due to lack of statistical power. Nonetheless, there is merit to the concerns he raised. In subsequent work he has backed up some of his claims empirically by documenting medical studies that drew conclusions based on small samples of observational data which were later overturned by better studies using larger, more randomized data sets.

But while acknowledging the issues raised in that paper, it needs to be put into some context. That paper has been one of the most downloaded research papers ever, and it has been cited by at least 2700 other articles according to Google Scholar. This is a good thing, because it indicates that the research community takes seriously the need to get things right. So whether Ioannidis’ analysis was accurate or not, it fostered discussion, and in the discussions that have occurred in the ten years or so since the article was published, there have been a lot of useful ideas about how to improve the quality of research and to identify questionable research.
DNA_Jock on April 16, 2015 at 6:25 pm said:

Great progress phoodoo, really great.
You have gone from
“Where does one begin to critique the ridiculousness of this claim?”
to
“Well, it’s obvious!”
Although you really haven’t improved on the paper.
You noticed lot of hits for Merc’s in Texas, Florida, California, and New York. Yes, rather obvious: those are 4 of the top 8 states in terms of raw Gini.
What Walasek and Brown did was to remove the effect of [income], [% foreign-born], [%urban] from the Gini score; after subtracting out these effects, Texas, Florida, and California end up in the bottom half for residual Gini – they become states with low inequality, once you have accounted for those other factors. But searches for status items are still correlated with inequality.
Try reading the paper. You might learn something.
Hobbes on April 16, 2015 at 6:32 pm said:

phoodoo:
DNA_Jock,

Hobbes asked me to improve on their study?I just did it.

I’m glad you found Google Correlate and played around with it. I think you are confusing me with someone else, however, since I never asked you to improve on the study. I also don’t think you just did. I agree with you that you can pick many search terms to plug into Google Correlate and probably predict in advance based on common sense about demographics what the resulting map distribution will look like. And it is kind of fun to do. But this has nothing to do with the research in the article, and says nothing at all about that research. Places with large populations of well-off people will obviously Google search for Mercedes more often than places with small populations of farmers. That’s obvious. It’s also why in the research they control for population and income. In fact the whole point of research is generally to discover those things that aren’t obvious because the obvious gets in the way of being able to see them.
Richardthughes on April 16, 2015 at 7:06 pm said:

Have you read the paper yet Phoodoo? Pathetic.
phoodoo on April 17, 2015 at 1:42 am said:

Hobbes,

Hobbes, I appreciate your replies. Yet I stand by my opening statements about the paper, even more so now. The study was supposed to show that in places where they have high income inequality, people have a disproportionate interest in luxury or status goods. But by simply typing in any status or luxury items, you can easily see that actually the data doesn’t show this at all. Quite the contrary in fact (according to their own formula). It is the places where there are large concentrations of wealth, as well as lots of retirees where they search for the most luxury items. I don’t think that could be any more clear, by simply typing in any luxury item you can think of.

As I also said, you can then take it a step further and type in a luxury item which would seem more pertinent to a region, and you also get the results you expect. If you type in horses, or ski boots, it lights up right where you would think, Utah, Wyoming, Montana, Colorado. It has zero to do with income inequality as they say, it has to do with the type of people who live in those areas, plain and simple. The results are stunningly easy to predict.

People are supposed to read their study and accept that buying luxury goods is about people wanting them where there is inequality. But, I can easily, and I just have, show that really people buy luxury items, where people who like luxury items live. In fact, before I typed in each of my queries, I tried to imagine where they would be popular. And its exactly where you would expect. People don’t search for more luxury goods in West Virginia and Tennessee and Kentucky. They do in Texas, California, Florida, New York, Nevada, Hawaii.

Dna Jock wants the study to be right, even without reading it, because he believes all statisticians must be good. You can find a way to make numbers say just about whatever you set out to. But the more obvious answer is the right one. The study does not prove what it purports to, because a more simple methodology shows the true answers.
keiths on April 17, 2015 at 2:00 am said:

phoodoo,

It is the places where there are large concentrations of wealth, as well as lots of retirees where they search for the most luxury items.

What part of the following sentence don’t you understand?

We used Google Correlate to find search terms that correlated with our measure of income inequality, and we controlled for income and other socioeconomic factors.

[Emphasis added]
phoodoo on April 17, 2015 at 2:22 am said:

DNA_Jock,

Its overwhelmingly obvious the paper is wrong, not that it is right! They don’t search for more luxury items in Tennessee and Alabama and Kentucky and West Virginia. But you are so lacking in common sense, that you will believe anything someone with a science sounding name tells you. Its a typical “scientific skeptic” blind spot.

YOU haven’t read the paper and yet you are happy to believe its true. But spend two minutes on correlate, and you can see it a much simpler answer. They search for luxury items EXACTLY where you would think they do in America.

In my opinion all the paper really shows is that with enough mutli-factorial regressions, and causal correlation analysis, any result is possible.
phoodoo on April 17, 2015 at 2:55 am said:

keiths,

California, Florida, Texas, Nevada, New York, Illinois.

What part of that don’t you understand?
phoodoo on April 17, 2015 at 3:12 am said:

Hobbes,

Here is how I would have conducted the study. Get a group of volunteers to list their 40 most likely luxury or status items (Mercedes, LV, Chanel, Ferragamo, Rolex, Patek Phillippe, Macallan …) and then put those terms into Google. And what will you find? California, Florida, Nevada, Texas…

Then get the volunteers to come up with 40 luxury goods for rural folks (ski bibbs, cowboy boots, horse jumping..) and see what you get. Montana, Colorado, Wyoming, Vermont, Idaho…

Study over, results show no surprises whatsoever.
Neil Rickert on April 17, 2015 at 3:23 am said:

phoodoo: Here is how I would have conducted the study. Get a group of volunteers to list their 40 most likely luxury or status items (Mercedes, LV, Chanel, Ferragamo, Rolex, Patek Phillippe, Macallan …) and then put those terms into Google. And what will you find? California, Florida, Nevada, Texas…

Then get the volunteers to come up with 40 luxury goods for rural folks (ski bibbs, cowboy boots, horse jumping..) and see what you get. Montana, Colorado, Wyoming, Vermont, Idaho…

Study over, results show no surprises whatsoever.

Awesomely bad experimental design. I would not trust any conclusions from such a study.
phoodoo on April 17, 2015 at 3:36 am said:

Neil Rickert,

Because there are no terms like “multifactorial regressions”, to impress the likes of you and Dna Jock?

Right, because its so hard to figure out where people like luxury goods.

Then how about sales figures from Mercedes Benz? Would that help you?
Neil Rickert on April 17, 2015 at 4:20 am said:

phoodoo: Because there are no terms like “multifactorial regressions”, to impress the likes of you and Dna Jock?

The fancy terminology isn’t important. But it does matter that you have good controls to make sure that you are not just grabbing data that happens to support your own pet theory.
phoodoo on April 17, 2015 at 4:38 am said:

Neil Rickert,

Right Neil, except this is exactly what happens throughout science. people are showing data which supports their pet theory, and not showing data which doesn’t. So I made it simple.

I just thought of as many obvious luxury status items as I could and typed them into Google correlate. And the results were equally as obvious. People in California and Texas like Mercedes and Patek Philippe. People in Montana like horse jumping and rodeos. People in Idaho like skiing equipment. People in Mississippi and Louisiana like wigs. People in Florida and New Jersey like cruises. People in New York like deodorant. People in Wyoming and Vermont don’t.

This is a hell of a lot more correct about America than what their study tries to show.
Neil Rickert on April 17, 2015 at 4:57 am said:

phoodoo: Right Neil, except this is exactly what happens throughout science. people are showing data which supports their pet theory, and not showing data which doesn’t.

No, that is not what happens “throughout science”. It happens sometimes, but scientists usually attempt to avoid that. And people who read the research look at the experimental design and criticize designs that are open to this kind of mistake.

I have not read the particular paper we are discussing, so I haven’t looked at their experimental design. However, if it got through peer review, then it would not be as bad as your suggested study.
keiths on April 17, 2015 at 5:34 am said:

This is too funny.

Phoodoo

1) tries to criticize a paper he hasn’t read;
2) completely misunderstands the abstract;
3) accuses the authors of a mistake they did not make;
4) makes mistake after mistake while being spoon-fed by a very patient Hobbes; and then
5) suggests an “improvement” to the paper in which he commits the very error that he falsely accused the authors of making!

That’s some “improvement”! Heckuva job, phoodoo.
phoodoo on April 17, 2015 at 5:41 am said:

keiths,

When are you going to get around to reading the paper Keiths?

After you figure out why you don’t mind square dancing perhaps?
keiths on April 17, 2015 at 5:41 am said:

A classic phoodoo flameout.
phoodoo on April 17, 2015 at 5:47 am said:

keiths,

California, Florida, Texas, Nevada.

Try to keep up keiths.
DNA_Jock on April 17, 2015 at 1:34 pm said:

keiths,

It is something of a repeat of the M&M thread.

1) phoodoo finds something incredible
2) people who are numerate gently explain that, while the result might be counter-intuitive, if you explore what is going on, you will see that it is correct
3) phoodoo invents his own personal idiosyncratically error-strewn version of the analysis and proudly presents his (meaningless) results, claiming that this somehow disproves the original analysis
4) the numerate roll their eyes and move on, leaving phoodoo to his ritual chanting.

I am particularly enjoying his focus on “California, Florida, Texas” & [wtf?] Nevada, which are all in the bottom half of residual GINI scores.

Sigh.

eta: To clarify, I don’t have an opinion on the validity or usefulness of this study. My point is merely that phoodoo does not understand how the analysis was performed
phoodoo on April 17, 2015 at 2:53 pm said:

DNA_Jock,

Right dufus, those states are all in the bottom half.

And Kentucky and Tennessee and Mississippi and Alabama and Georgia are listed as some of the most unequal states,as are West Virginia, South Carolina, North Dakota, Oklahoma… Precisely places where they DON”T search much for luxury goods (just slightly more than Wyoming if you don’t count anything having to do with horses) .

So what the fuck does the paper really show? Nothing! Because anyone with a brain already knew that they like luxury goods in New York and California and Florida (and if you type virtually ANY luxury brand name into correlate you can see THAT is where they search them most!) , and that they don’t in Kentucky and Tennessee and West Virginia (just try it!)! And if a paper claims they overwhelmingly like luxury goods in West Virginia and Tennessee, because of their income inequality, the paper is bullshit.

But to clarify, you have no opinion, you have not read the paper, but you can understand what Hobbes told you. Congratulations science skeptic genius!

You can’t even understand why they like luxury goods in Nevada. Are you an idiot or from some country that has never heard about America?
Richardthughes on April 17, 2015 at 3:11 pm said:

“So what the fuck does the paper really show? Nothing!” – you’d have to read it and understand it first, Phoodoo. Both of which are beyond you.
Richardthughes on April 17, 2015 at 3:15 pm said:

Phoodoo *thinks* he has a $10,000 watch. We can’t be sure given his skill with numbers.
DNA_Jock on April 17, 2015 at 3:44 pm said:

You are sticking with step 3 above, I see.
You need to show why phoodoo-results are diametrically opposed to Walasek & Brown’s results. This will involve reading (& understanding) the paper.
Not.Holding.My.Breath.

For fun, I did analyze the drivers that W&B used to adjust GINI scores (thank you Hobbes), which confirmed my suspicion about the effects of %urban and %foreign-born on GINI. The co-efficient for urban% is ~23% larger than the co-efficient for foreign-born%. What would you conclude, phoodoo?
keiths on April 18, 2015 at 1:37 am said:

Rich,

Phoodoo *thinks* he has a $10,000 watch. We can't be sure given his skill with numbers.</blockquote> He probably overlooked the decimal point. It was really$ 100.00.
phoodoo on April 18, 2015 at 5:06 am said:

Countries with high income inequality:
Chile, Turkey, Mexico, Central African Republic, Honduras, Angola, Haiti, South Africa and Namibia, Paraguay, Seychelles

Countries with least income inequality:
France , Luxembourg, Germany, Sweden, Norway, Austria (basically all of Northern Europe) Japan, South Korea…

Places where they buy luxury status goods…..take a guess.
keiths on April 18, 2015 at 6:21 am said:

Seriously, phoodoo? After all this time, you still don’t understand what Walasek and Brown were testing for?
phoodoo on April 19, 2015 at 3:58 am said:

keiths,

I think its great that this is your final analysis of the situation. I hope we can highlight it.

You apparently want to say that the paper is NOT about trying to show that people are more prone to buy luxury goods where there is income inequality. If societies were just more equal in wealth, people wouldn’t need to bother buying nice cars and watches to show off. A ten year pick up truck, and a Seiko (well, not an overpriced, showy Seiko, ok) will be good enough for everyone in Switzerland.
hotshoe_ on April 19, 2015 at 7:29 am said:

phoodoo:
keiths,
You apparently want to say that the paper is NOT about trying to show that people are more prone to buy luxury goods where there is income inequality.If societies were just more equal in wealth, people wouldn’t need to bother buying nice cars and watches to show off.A ten year pick up truck, and a Seiko(well, not an overpriced, showy Seiko, ok) will be good enough for everyone in Switzerland.

Jayzus. phoodoo, could you possibly get things more backwards in three sentences ??

Keiths (and everyone else here besides you) knows that this paper IS showing that people are more prone to internet-searching for luxury goods where there is income inequality.

So, from that evidence, Keiths might predict that people won’t have such a strong need to look for nice cars and watches to show off with, when they live in a more-equal society. Keiths might predict that there will still be a large aggregate number of luxury-goods purchases in areas with large populations of well-to-do people, but that per capita purchases of showy-luxury goods will be fewer, even for wealthy people, in areas with income equality.

This prediction could be tested with data about actual purchases of status goods, not just searches for them, but that was not the scope of this paper.
OMagain on April 19, 2015 at 11:38 am said:

phoodoo: Congratulations science skeptic genius!

au contraire mon ami.

Any congratulations are all due to you. You have provided the perfect illustration of what it takes to be an ID supporter.