This is a guest post by Robert J. Moore, the CEO and co-founder of RJMetrics, an on-demand database analytics and business intelligence startup. His last guest post was an analysis of Twitter user data.
It’s no surprise that Chatroulette is the latest media darling. It has all the elements of a good story: technology, mystery, celebrity, and sex. If you haven’t heard of Chatroulette, this Daily Show segment is a good primer.
We were itching to study Chatroulette in a RJMetrics Dashboard, but no one seemed to have any good data for us to explore. So, we decided compile the data ourselves by leveraging Chatroulette Map, some scrappy programming, and a passionate tech community. We soon had detailed data on 2,883 Chatroulette sessions that tied users to geography, gender, appearance, and more.
Here are a few highlights from our findings:
- About half of all Chatroulette spins connects you with someone from the USA. The next most likely country is France at 15%.
- Of the spins showing a single person, 89% were male and 11% were female.
- You are more likely to encounter a webcam featuring no person at all than one featuring a solo female.
- 8% of spins showed multiple people behind the camera. 1 in 3 females appear as part of such a group. That number is 1 in 12 for males.
- 1 in 8 spins yield something R-rated (or worse)
- You are twice as likely to encounter a sign requesting female nudity than you are to encounter actual female nudity
How We Did It
Thanks to RJMetrics, the analysis was easy. Getting the data, however, was a bit of a challenge. The good news, however, is that a roulette wheel is the statistician’s best friend. The central limit theorem tells us that a large set of random observations allows us to draw high-confidence conclusions about the underlying data set.
We started our process at Chatroulette Map, an awesome new site that plots screenshots from random Chatroulette sessions on a map.
It’s a little-known fact that anyone you chat with on Chatroulette can determine your IP address using a program like Wireshark. Chatroulette Map uses this IP data to geolocate and map random chatters on their website (along with still photos from their chats).
Chatroulette Map is also nice enough to expose all of its data points to anyone who clicks “View Source.” Right in the raw source code of their homepage is the image URL, latitude, longitude, city, state, and country of every chatter on their map. As an added bonus, the file name of each image is a UNIX timestamp of when it was taken. Jackpot. (Note: we tried contacting the creators of Chatroulette Map to participate in this story but did not receive a response.)
Once we had photos, times, and locations, we needed data on what was happening in each chat photo. We coded up a quick webpage that displayed a random photo from the data set and asked some basic multiple-choice questions about that photo. These included questions on age, gender, and what the person in the photo was doing. We coded up the backed so that a photo wouldn’t be taken out of rotation until two votes from different IP addresses provided an identical set of answers.
We posted the link to Hacker News on Saturday night. In under two hours, we received 10,770 photo assessments from 1,012 distinct IP addresses. Every photo received a corroborated profile. We had our data.
Five minutes later, the data was loaded into a hosted dashboard on RJMetrics and returning the results you see below.
Before we get to the data, we should point out the uncontrolled inputs that could be skewing these results:
- We know nothing about how Chatroulette matches up chatters, and we act on the assumption that pairings are truly random.
- We know nothing about the methodology used by Chatroulette Map. If they excluded data points for any reason or did not sample randomly, our analysis could be skewed.
- Geolocation by IP address is an imperfect science that is typically only accurate within a few dozen miles. It can also be thrown off by users taking advantage of proxy servers or using other techniques to disguise their IP addresses.
- Human image recognition is imperfect (even if mitigated by our vote convergence system). Any images that were judged incorrectly could skew the results.
- It’s also important to note that statistics about “the average chat session” (which we present here) are not the same as stats about “the average user.” For example, imagine if female chats averaged 100 seconds each, but male chats averaged 10 seconds each. Even if there were equal numbers of male and female users, males would enter the pool more often and would therefore appear in front of you more often, making the “average session” more likely to contain a male chat partner. Because of this, all of our statistics are about the average session and not the average user.
As you might expect, you’re most likely to encounter a solo male in any given chat session. 72% of our chat sessions were with solo males. Interestingly, 11% showed no person at all while only 9% showed a solo female. So, if you’re looking for women on Chatroulette, be forewarned: you’re more likely to encounter an empty chair.
Also interesting is the prevalence of groups on Chatroulette. In all, 8% of chats featured a group of people (4% all-male, 2% all-female, and 2% mixed). If you include groups, your chance of encountering a female grows to 13%. However, this means that if you do encounter a female, there is about a 1 in 3 chance that she will be part of a group. In contrast, the chance a male will be part of a group is only about 1 in 12.
This analysis excludes cams where age could not be estimated. As you might expect, most people were young adults (about 70%). About 20% were under 20 and about 10% were 40 and older.
When we combine age with the gender statistics that we tracked above, we learn even more. For example, females tended to be younger than males, with 23% under 20 (vs. 18% for males). Only 3% of females were over 40 (vs. 8% for males).
Groups of females were even younger. Female-only groups were “Teen or Younger” 65% of the time, while groups of males were “Teens or Younger” only 36% of the time. There were no groups whatsoever of people 40 or older.
47% of the Chatroulette participants measured were from the United States. The most popular countries are shown below:
When we combine geography with gender and age, we learn even more:
- Italy had the highest concentration of solo males at 98%. It also had the highest concentration “Men over 40″ at 13% (more than 3x the US rate of 4%).
- The US has the highest concentration of groups at 13%, followed by The Netherlands at 9%.
- Canada had the highest concentration of solo females at 13%, followed by the US at 10%.
If you’ve ever used Chatroulette, you probably noticed that not everyone is there just to chat. Some users, which we have affectionately labeled “perverts,” fit into any of these three categories:
- Appear to not be wearing any clothes whatsoever
- Are displaying explicit nudity
- Appear to be committing a lewd act
The overall pervert rate in Chatroulette is 13%. This means about 1 in 8 chat sessions will have something decidedly Rated R (or NC-17) on the other end. Of the perverts that were identified, only 8% were female. Combined with the overall female rate, that means less than 1% of chats feature a female pervert.
Below, we see the “pervert rate” by country:
The United Kingdom dominates the rankings here with a pervert concentration of 22%! Turkey, France, and Germany tie for second place with rates of 15%. Bringing down the global average is the United States, which boasts the lowest pervert concentration of the bunch: 10%.
Also worth mentioning are the users who display signs (like the one below) requesting female nudity.
Signs like this make up between 1% and 2% of all chats. This means that you’re twice as likely to encounter a sign requesting female nudity than you are to encounter actual female nudity.
In trolling through the thousands of photos collected by Chatroulette Map, I came across this extremely interesting image. It contains a statistical breakdown of what this user saw during his many Chatroulette chat sessions. Sound familiar?
These stats appear to be based on a data set of 1,090 points (pretty impressive for a single user). The numbers are generally in the same ballpark as ours (although we observed a higher pervert rate). We’re not sure who was behind this, but we like their style– they managed to sum up the gist of this blog post in a single image.
Scarcity of the data made this project both challenging and exciting. In an ideal world, it would be great to analyze things like average session length based on different attributes, chat user return rates, cohort analysis, and more. Because of the mostly-anonymous nature of Chatroulette, that data will be hard to come by. For now, at least you have a better idea of what you will see when you hit that Next button.
Guest author Robert J. Moore is the CEO of RJ Metrics, a startup that helps online businesses measure, manage, and monetize better. He was previously a venture capital analyst and currently serves as an advisor to several New York startups. Robert blogs at The Metric System and can be followed on Twitter at @RJMetrics.