Data are Everywhere
From day one in my statistics course, I tell my students that data are everywhere. Even though the word makes it sound like data is everywhere, the word data is plural thus they are everywhere.
Facebook helped me make the point recently when they posted a note and shared information gleaned from posting patterns (empirical data!) during the week that the Supreme Court heard arguments on marriage equality.
The note, written by Eytan Bakshy and the Facebook Data Science Team, discussed the changes in profile post updates the week of March 26 and 27, the two days of the proceedings. The Human Rights Campaign, an organization devoted to LGBT Equal Rights, posted the original red square with pink equal sign on their Facebook page and asked that others make it their profile photo to show their solidarity while the Supreme Court was in session.
The increase in traffic was tremendous, even in following days, as creativity took over and multiple variations on the theme emerged.
The data analysis clearly showed an increase in profile photo changes – although they did not have access to what photos were being posted. One might assume that this campaign resulted in these posts, yet there is no way to confirm that other than looking more deeply into the data. I wonder if any other similar rapid increase in profile photo updates have occurred.
They also found that age played in important role in who was most likely to use this logo, finding that those around 30 were more likely to make these changes. I found it interesting that the typical pattern of changing one’s profile photo is related to age: the younger one is, the more often one changes their profile photo.
Geographically, some fascinating patterns emerge. The map (below) shows the counties in which more changes were made in darker shades. The note mentions that college towns and high-tech areas are more likely to have many changes while large urban areas were not necessarily as dense as they expected. The one exception to the urban lack of density was San Francisco County. They did mention that they accounted for density of population in various areas so that all would be comparable.
It struck me that this analysis is looking at Facebook users, who might tend to be located in college towns and high-tech areas. It isn’t clear if they controlled the data for this population of Facebook users versus simply looking at the increase in profile updates. Those around the age of 30 and those who live in these particularly dense profile posting areas are the target demographic for Facebook – and probably tend to be the more active users of Facebook.
I also wonder how Facebook users are connected into social networks – how many “friends” are not solely in the city in which we live but they are spread out across the nation and the world. Thus, how did the profile update spread across those networks? Did posting one geographical area prompt posting in another? Is a geographical analysis really useful (especially if not controlled for a population) or is it negated by the social geography of our networks?
Some have already started discussing how this phenomenon signals social activism and support for marriage rights. Since the HRC symbols is for marriage equality and profile posts went up so markedly, one conclusion could be that most of those posts were in support.
Before we can analyze the data with social activism in mind, we might need more details on what people were actually posting (pro or con) and how these posts were affecting their Facebook friends. Were some people posting the image without knowing what it meant? Were some people posting images that countered the original intent? Did some friends post either types of images for reasons that had nothing to do with the issue?
More importantly with respect to social activism, does any of this posting reflect actual voting patterns? Or work people might do to support or advance the side of the issue they support? Without knowing what they were actually posting, or how it worked through the social networks, or how such posting reflected behavioral patterns offline, we won’t know.
Facebook’s privacy settings and policies may prohibit such access thus we have to go with what they can share with us. It is clear that profile updates increased in a pattern clearly tied to the HRC’s post and the Supreme Court’s activities.
The beauty and challenge with data is that once we have it, we want more! Analyzing data patterns might answer some questions but it also brings up other issues. Issues for which we need more data, but which may be data we cannot access.
What do you propose to help us better understand what’s going on here? With ethical limits, what can we do to better understand how people think and act on their opinions about social matters such as marriage equality?
Comments