‘Geography of Hate’ Project Shows Racist, Homophobic Tweet Concentrations Across the U.S.

    May 13, 2013
    Josh Wolford
    Comments are off for this post.

If you’ve spent any time at all on Twitter, you know that it can be a great place for a variety of things – real-time news, celeb-watching, comedy, and the list goes on and on. But you also know that Twitter is full of the kind of homophobic and racist language that can make you physically recoil. Now, a group of researchers have developed an interactive map of all the hate speech that Americans are pumping out on a daily basis.

The map was created by geography students at California’s Humboldt State University, the same group of people who brought us the post-election Twitter racism map back in November. Back then, they looked at racist tweets the focused on President Obama’s reelection and found that Mississippi and Alabama were the two hotbeds for such activity.

“Rather than focusing just on hate directed towards a single individual at a single point in time, we wanted to analyze a broader swath of discriminatory speech in social media, including the usage of racist, homophobic and ableist slurs,” say the researchers.

For instance, here’s the map of generally “homophobic” tweets, which are determined by the use of words like “dyke,” “fag,” “homo,” and “queer.”

And here’s the map of racist tweets – those containing the words “nigger,” “chink,” “wetback,” “gook,” or “spick”:

Of course, analysis like this is never going to be 100% accurate. Keyword analysis has inherent issues. For instance, the word “queer” is not always used in a derogatory, hate-filled manner. People could be tweeting out the word “fag” in another context, such as bemoaning its usage.

On the other hand, it’s hard to justify many used of words like “wetback” on Twitter. Sure, it’s not completely solid analysis, but it’s pretty close. You have to to imagine that the majority of people tweeting about fags, dykes, niggers, and chinks are doing so in a hateful manner.

But to completely cut out this sort of uncertainty, the researchers manually read and coded each tweets to judge the sentiment, “in order to address one of the earlier criticisms of our map of racism directed at Obama.” This way, they could know, for sure, whether a tweet that contained the word “queer” was actually posted in a hateful context.

Using DOLLY to search for all geotagged tweets in North America between June 2012 and April 2013, we discovered 41,306 tweets containing the word ‘nigger’, 95,123 referenced ‘homo’, among other terms. In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way. For example the phrase ‘dyke’, while often negative when referring to an individual person, was also used in positive ways (e.g. “dykes on bikes #SFPride”). The students were able to discern which were negative, neutral, or positive. Only those tweets used in an explicitly negative way are included in the map.

You can check out the full interactive map here, where you can zoom in to see specific concentrations of twitter hate speech.

[Floating Sheep via MIT Technology Review]

  • http://dailyfetch.blogspot.com/ Fran

    Pretty disgusting…the intrusion that is. Great assumptions in what people are actually talking about or to. And isn’t it true that some of these disgusting words are used among people that call each other these names…such as the “N” word? Oh, I guess this map goes to prove that we are a mean spirited society here in America, quite biased and racist, eh? I do wish someone would do a study on people helping others in need or in dire trouble….no matter who the victims are.

  • Paul in NJ

    Well, if you read the article carefully, you’d see that it claims that “students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner.”

    However, they also note that only Tweets that are geocoded were included. But according to Twitter, “Tweeting with your location allows you to selectively add location information to your Tweets. This feature is off by default and you will need to opt-in to use it.” (emphasis added)

    Perhaps those over-eager students would like to re-do their study, and attempt to correct for regional bias WRT geotagged Tweets. For example, do liberals – who profess to be concerned about personal privacy – geotag less, more, or about the same as other ideological groups? Just sayin’.

  • abrabloom@hotmail.com

    Not only is this study flawed, but there is literally nothing to be gleaned from it and it’s sloppily executed.

    1.) As others have already pointed out, you have the bias of individuals reading the tweets.

    2.) Additionally you have to worry about what’s called internal validity. For example, using different readers to rate tweets means that a tweet read by one person may be considered offensive, while another person may consider it not. In an extreme case, if only two readers had done the tweet reading, one for the eastern half of the U.S., one for the western, all this study would show is that one reader thought tweets were more offensive than the other, western reader.

    3.) The inability of a reader to establish between tweets that are jokes, for example posted by someone other than the user if the user accidentally leaves their account logged in. Although this point could be argued on the grounds that using words like “dyke” in a negative connotation are hateful even if meant to prank someone.

    3.) Using geotagged tweets is a small sample size out of all tweets, and I would be interested to know if they ran significance tests to see if their results are even statistically different from zero, especially in rural areas with less tweeting activity. Spoiler alert: probably not at all significant.

    0/5 = 0, 0/250 = 0, 0/1,232,123, 643,234,234 = 0. That is to say that 0 divided by anything is zero. So if you have an area that has only one geo-tagged tweet and it is not racist, your ratio for that area is going to be zero. Ditto for an area that has 5 billion geo-tagged tweets and not a racist one. So you’re going to show on this map that those two places have the same score on the “hate scale”, that is, zero hate. However, these two groups could actually have HUGELY different “hate” levels. The fact that the ONLY geo-tagged tweet in area 1 wasn’t racist doesn’t say anything about the population as a whole: maybe the only non-racist in the whole county just happened to post that tweet. Whereas in area 2, the fact that out of billions of tweets, only one was racist DOES say something about the population because there were lots and lots of tweets that weren’t racist, as opposed to only one that wasn’t.

    tl;dr Geo-mapping should be used to display more scientific, harder to understand/interpret information in a way that makes it accessible to a larger audience. However, if the study behind the map doesn’t make sense because it is conducting by people with the sole purpose of then displaying it, the results and the display will be meaningless… as anyone can see by looking at this map which doesn’t provide any insight about anything.

  • Carl

    It looks like most of the tweets are coming from the areas that have the most people. Those areas would have the most racists and the most non-racists.

    • Scott Robinson

      I have looked at how these Google heatmaps work. They are not appropriate to represent this type of data or any data I have seen them used for.

      There are 3 seperate reasons for this but for the punchline just compare the map here to a map of counties of the US. The hotspots in the heatmap corelate to density of counties, particularly the more you zoom out (the representation of the data should not change at different levels of zoom!!).

      Shame on Google for making their heatmap feature so useless and shame on everyone else for using it without thinking.

      • Scott Robinson

        Just to clarify it is density of counties, not population, which is the problem. The data was normalized for population of counties, but the density of the counties along with Google’s wierd heatmap settings causes the problem…and other problems.

  • Brad Haycraft

    White people in America will never get it. Blacks have more respect for the few whites that have the balls to be honest about how they feel about blacks unlike the people commenting on this article who are the worst type of cowards…the ones that will never be honest about their feelings toward blacks until they are in a protected setting like the internet. You guys make us all look like cowards and some us respect black men and women as our equals and appreciate them for being the toughest and wisest people in the world.

  • Brad Haycraft

    Abrabloom, you are a ridiculous person. The funny thing is you think they don’t see right thru you. That’s why you get no respect.