This is a guest post by Robert J. Moore, the CEO and co-founder of RJMetrics, an on-demand database analytics and business intelligence startup. His last guest post was an analysis of Twitter user data.
It’s no surprise that Chatroulette is the latest media darling. It has all the elements of a good story: technology, mystery, celebrity, and sex. If you haven’t heard of Chatroulette, this Daily Show segment is a good primer.
We were itching to study Chatroulette in a RJMetrics Dashboard, but no one seemed to have any good data for us to explore. So, we decided compile the data ourselves by leveraging Chatroulette Map, some scrappy programming, and a passionate tech community. We soon had detailed data on 2,883 Chatroulette sessions that tied users to geography, gender, appearance, and more.
Here are a few highlights from our findings:
Thanks to RJMetrics, the analysis was easy. Getting the data, however, was a bit of a challenge. The good news, however, is that a roulette wheel is the statistician’s best friend. The central limit theorem tells us that a large set of random observations allows us to draw high-confidence conclusions about the underlying data set.
We started our process at Chatroulette Map, an awesome new site that plots screenshots from random Chatroulette sessions on a map.
It’s a little-known fact that anyone you chat with on Chatroulette can determine your IP address using a program like Wireshark. Chatroulette Map uses this IP data to geolocate and map random chatters on their website (along with still photos from their chats).
Chatroulette Map is also nice enough to expose all of its data points to anyone who clicks “View Source.” Right in the raw source code of their homepage is the image URL, latitude, longitude, city, state, and country of every chatter on their map. As an added bonus, the file name of each image is a UNIX timestamp of when it was taken. Jackpot. (Note: we tried contacting the creators of Chatroulette Map to participate in this story but did not receive a response.)
Once we had photos, times, and locations, we needed data on what was happening in each chat photo. We coded up a quick webpage that displayed a random photo from the data set and asked some basic multiple-choice questions about that photo. These included questions on age, gender, and what the person in the photo was doing. We coded up the backed so that a photo wouldn’t be taken out of rotation until two votes from different IP addresses provided an identical set of answers.
We posted the link to Hacker News on Saturday night. In under two hours, we received 10,770 photo assessments from 1,012 distinct IP addresses. Every photo received a corroborated profile. We had our data.
Five minutes later, the data was loaded into a hosted dashboard on RJMetrics and returning the results you see below.
Before we get to the data, we should point out the uncontrolled inputs that could be skewing these results:
As you might expect, you’re most likely to encounter a solo male in any given chat session. 72% of our chat sessions were with solo males. Interestingly, 11% showed no person at all while only 9% showed a solo female. So, if you’re looking for women on Chatroulette, be forewarned: you’re more likely to encounter an empty chair.
Also interesting is the prevalence of groups on Chatroulette. In all, 8% of chats featured a group of people (4% all-male, 2% all-female, and 2% mixed). If you include groups, your chance of encountering a female grows to 13%. However, this means that if you do encounter a female, there is about a 1 in 3 chance that she will be part of a group. In contrast, the chance a male will be part of a group is only about 1 in 12.
This analysis excludes cams where age could not be estimated. As you might expect, most people were young adults (about 70%). About 20% were under 20 and about 10% were 40 and older.
When we combine age with the gender statistics that we tracked above, we learn even more. For example, females tended to be younger than males, with 23% under 20 (vs. 18% for males). Only 3% of females were over 40 (vs. 8% for males).
Groups of females were even younger. Female-only groups were “Teen or Younger” 65% of the time, while groups of males were “Teens or Younger” only 36% of the time. There were no groups whatsoever of people 40 or older.
47% of the Chatroulette participants measured were from the United States. The most popular countries are shown below:
When we combine geography with gender and age, we learn even more:
If you’ve ever used Chatroulette, you probably noticed that not everyone is there just to chat. Some users, which we have affectionately labeled “perverts,” fit into any of these three categories:
The overall pervert rate in Chatroulette is 13%. This means about 1 in 8 chat sessions will have something decidedly Rated R (or NC-17) on the other end. Of the perverts that were identified, only 8% were female. Combined with the overall female rate, that means less than 1% of chats feature a female pervert.
Below, we see the “pervert rate” by country:
The United Kingdom dominates the rankings here with a pervert concentration of 22%! Turkey, France, and Germany tie for second place with rates of 15%. Bringing down the global average is the United States, which boasts the lowest pervert concentration of the bunch: 10%.
Also worth mentioning are the users who display signs (like the one below) requesting female nudity.
Signs like this make up between 1% and 2% of all chats. This means that you’re twice as likely to encounter a sign requesting female nudity than you are to encounter actual female nudity.
In trolling through the thousands of photos collected by Chatroulette Map, I came across this extremely interesting image. It contains a statistical breakdown of what this user saw during his many Chatroulette chat sessions. Sound familiar?
These stats appear to be based on a data set of 1,090 points (pretty impressive for a single user). The numbers are generally in the same ballpark as ours (although we observed a higher pervert rate). We’re not sure who was behind this, but we like their style– they managed to sum up the gist of this blog post in a single image.
Scarcity of the data made this project both challenging and exciting. In an ideal world, it would be great to analyze things like average session length based on different attributes, chat user return rates, cohort analysis, and more. Because of the mostly-anonymous nature of Chatroulette, that data will be hard to come by. For now, at least you have a better idea of what you will see when you hit that Next button.
Guest author Robert J. Moore is the CEO of RJ Metrics, a startup that helps online businesses measure, manage, and monetize better. He was previously a venture capital analyst and currently serves as an advisor to several New York startups. Robert blogs at The Metric System and can be followed on Twitter at @RJMetrics.
Chatroulette! is a web site that allows users to connect to and video chat with random (site-chosen) users. It was developed by Andrey Ternovskiy of Moscow. The idea may have been influenced by Omegle, a similar service for text chat only. Chatroulette uses seven high-end servers all located in Frankfurt, Germany. Network throughput is 7 gigabits a second. The application uses Adobe Flash Player, and the RTMFP peer-to-peer technology acquired from Amicima and introduced in Flash Player 10.0. From now on...
Robert J. Moore is the CEO and co-founder of RJMetrics, a on-demand database analytics and business intelligence startup that helps online businesses measure, manage, and monetize better. He was previously a venture capital analyst and currently serves as an advisor to several New York startups. Robert blogs at The Metric System and can be followed on Twitter at @RJMetrics.