TechCrunch Disrupt Is Coming May 24-26 »
Twitter Data Analysis: An Investor's Perspective
by Guest Author on Oct 5, 2009

RobertJMoore

This is a guest post by Robert J. Moore, the CEO and co-founder of RJMetrics, a on-demand database analytics and business intelligence startup that helps online businesses measure, manage, and monetize better. He was previously a venture capital analyst and currently serves as an advisor to several New York startups. Robert blogs at The Metric System and can be followed on Twitter at @RJMetrics.

A few weeks ago, my former employer led a $100 million investment into Twitter and I must admit that I was quite jealous of my former colleagues. Chances are they got the opportunity to do some very cool analytics on Twitter's data.

Rather than wonder about what I missed, I decided to figure out what I could from the outside looking in. Using some statistical trickery, the Twitter API, and my RJMetrics dashboard, I uncovered a ton of astonishing new information about Twitter. Here are some highlights:

  • Twitter's user growth is no longer accelerating. The rate of new user acquisition has plateaued at around 8 million per month.
  • Over 14% of users don't have a single follower, and over 75% of users have 10 or fewer followers.
  • 38% of users have never sent a single tweet, and over 75% of users have sent fewer than 10 tweets.
  • 1 in 4 registered users tweets in any given month.
  • Once a user has tweeted once, there is a 65% chance that they will tweet again. After that second tweet, however, the chance of a third tweet goes up to 81%.
  • If someone is still tweeting in their second week as a user, it is extremely likely that they will remain on Twitter as a long-term user.
  • Users who joined in more recent months are less likely to stop using the service and more likely to tweet more often than users from the past.

Read on for some detailed charts a deeper dive into the data.

How We Did It

In most cases, this kind of outside-looking-in exercise wouldn't be possible. Twitter, however, is a special case for a few reasons:

  • The company is pre-revenue, so its value is wrapped up in user activity and engagement
  • A Twitter user's activity data (tweets, followers, etc) is all public by default
  • Twitter's API allowed me to automatically download up to 20,000 data points per hour
  • Twitter uses auto-incrementing ID numbers (1,2,3,4…) for both users and tweets
  • The central limit theorem tells us, among other things, that a large enough random subset of a large data set will behave like its parent set with a high degree of statistical confidence

In the end, our sample size consisted of about 85,000 users and just over 3 Million tweets. By piecing all of these things together and pulling the data into the RJMetrics Dashboard, I was able to chart loads of information about Twitter's user base and user behavior. I've looked around, and this appears to be the largest public analysis of Twitter's user base online. Enjoy!

Number of Twitter Users

This analysis leverages the fact that Twitter uses auto-incrementing ID numbers for both users and tweets. We identified the range of IDs that were consumed by the system in any given month and the percentage of them actually tied to real Twitter accounts. ("Dead" IDs are likely canceled accounts, SPAM accounts, test accounts, etc.) In combination, these numbers give us a reliable approximation of how many new users joined Twitter each month:

NewUsers

This shows us the exponential growth experienced by Twitter in 2009. In Q3, this plateaus at a rate of about 8 million new users per month. A chart of total cumulative users is below:

CumulativeUsers

Hockey, anyone? As of September 1st, the actual number of live Twitter accounts was just above 50 million.

Average Number of Followers

According to the data, the average Twitter user has 42 followers. It's interesting to see the distribution of users by the number of people following them:

FollowersPie

As you can see, the vast majority of users have ten or fewer followers, and over 20% have no followers at all! As we know, most users have been on the system for less than a year and, as shown in the chart below, the number of followers is proportional to the user's time since joining:

AvgFollowers

Number of Tweets

It's also interesting to look at the number of status updates, or "tweets" made by the average user. Obviously, the number of tweets from any given user grows over time (per the trend shown in the chart below):

UpdatesJoinDate

When we look at the distribution of tweets by user, we see a very surprising trend: over 75% of all Twitter users have tweeted fewer than ten times.

UpdatesPie

"Protected" (Private) Twitter Profiles

Before moving onto analyses at the tweet level, it's important to note that some of the users we identified have "protected" their tweets, meaning we were able to see how many followers they had and how many times they had tweeted, but were unable to download specific tweets (and, more importantly, tweet times).

The chart below shows how many users in our data set are "protected" by the month they joined. The overall number sits around 10% (and dropping):

ProtectedAccounts

Also interesting is how "protected" Twitter users differ from public users. As shown in the charts below, protected users tend to tweet far more often, but have far fewer followers:

AvgUpdates-protectedAvgFollowers-protected

Power Users

Another limitation of the API is that it can only return the 3,200 most recent tweets for any given user. This is obviously not a big deal for most users, but there are some users out there who have passed that mark. Our sample data set showed that less than 0.02% of Twitter users have sent more than 3,200 tweets. These users will have incomplete data sets in our study, but the population is so small that they should not have any meaningful impact on our conclusions.

Tweets by Source

It's interesting to see how different tweeting methods have risen up over time. Below I show the most popular methods and what percent of Twitter traffic came through them each month since 2007:

TweetsbySource

The web clearly dominates this list. Let's exclude it to get a closer look at which other sources are driving tweets:

tweetsbysourcenoweb

Twitterriffic has clearly seen better days, and text messages (txt) have been declining as a channel, as well. Meanwhile, TweetDeck appears to be aggressively gobbling up market share.

Time Between Tweets

Since we know the timestamp of every tweet in our sample data set, we can study the time between tweets and the recency of tweets from the userbase.

Remarkably, the average time between any two tweets from the same user is exactly 24 hours.

The chart below shows the average amount of time between tweets for a user's first ten tweets (when applicable). The x-axis contains the time of the tweet in question, and the value is the average amount of time since the previous tweet.

TimeSincePreviousTweet

Surprisingly, the time between Tweets actually drops as users do more tweeting. However, this could be biased by the fact that most users have tweeted fewer than ten times. To clear things up, let's look at the average time between tweets based on how many times the user has tweeted:

TBTUsage

Indeed, as you might expect, users who send more tweets also tweet more frequently, and the dropoff is quite significant.

Probability of Incremental Tweets

Since there is such a huge dropoff in tweeting activity up until the 10 tweets mark, we thought it might be interesting to look at the "probability of an incremental tweet" based on how many tweets a given user has completed. This can be calculated with just a few clicks in RJMetrics:

ProbInc

As you might expect, with every Tweet a user performs, their chance of tweeting again goes up.

Active Tweeters

We know that Twitter has 50 million registered users, but we also know that the vast majority of them have tweeted fewer than ten times. Let's investigate just how many of these registered users are actually actively tweeting.

Using our tweet data, we can identify what percent of the user base sent out at least one tweet in any given month. This "unique tweeters" statistic is charted below (to get a fair statistic we excluded protected accounts from our denominator):

PercentTweeting

The number seems to hover in the 25% range. In other words, only about 1 in 4 registered users is actually tweeting in any given month. (Although it's worth noting that some users may only be using Twitter to read others' tweets, meaning they are not full-fledged "zombie" accounts.)

Notice the bump in early 2009, right around the time when new user growth began to accelerate aggressively. This suggests the obvious: on average, a newer user is more likely to tweet than an older user. When new user growth exploded in early 2009, the concentration of new users became denser, driving this average up. To illustrate this (and get a better look at how users behave over their lifetime), we turn to cohort analysis.

Cohort Analysis

A cohort analysis is a great way to look at user behavior and loyalty over time. Each line in the chart below represents a different "cohort" of Twitter users based on the month they joined (we chose 7 cohorts from different time periods to avoid clutter). In the chart below, we monitor what percent of the users in each cohort come back to tweet again in each month after having tweeted in the first month. Obviously, month 1 is 100% by definition:

MonthlyCohort

This is quite a telling chart:

  • There is an expected usage dropoff in month 2, but after that point usage holds predictably steady. This is great news for anyone trying to forecast user activity early on in a new user's lifetime.
  • The newer cohorts, despite being significantly larger in size, actually consist of more loyal users. The two highest lines are also the two most recent, meaning that users who joined in 2009 are actually more likely to keep tweeting after their first month than those who joined in the same month in 2008.

Since the dropoff in Month 2 is quite pronounced, let's zoom in and look at weekly cohorts to see if we can see how usage drops off at the weekly level:

WeeklyCohort

We see a similar pattern here, although more recent cohorts don't stand out as much as in the monthly analysis. Again, however, the dropoff in the second period doesn't seem to further decline as time goes on. This means that by the second week of a cohort's lifetime, Twitter can reliably predict its users' future behavior as a group.

Another cohort analysis that might be interesting is to look at how many tweets a cohort makes each month after joining. This metric will incorporate both the dropoff in usage from the users who churn in the first month and the uptick in activity from users who stay on the platform:

TweetCohorts

Wow! This is a remarkable image. Despite the massive dropoff in users after the first month, the tweeting activity from the users who are left is so voluminous that it makes the "tweets per month" of each cohort average over 100% (and, as before, the more recent cohorts are the more loyal)!

In other words, the users who stick around actually tweet so frequently (and at such a rapid pace compared to their first month) that they more than make up for the lost activity of those who churned after the first month. This is a very powerful and unexpected statistic.

Conclusion

Everyone has their own feelings about Twitter's reported $1 billion valuation. I hope this article gave you a taste of what its new investors likely considered before coming up with that number.

To learn more about RJMetrics and our original blog posts including the business intelligence rap and our twitter followers guide, check out our website and follow us on Twitter @RJMetrics.

Advertisement
Advertisement

Responses

Comments rss icon

  • I found these points interesting:

    “After that second tweet, however, the chance of a third tweet goes up to 81%.”

    “If someone is still tweeting in their second week as a user, it is extremely likely that they will remain on Twitter as a long-term user.”

    “Users who joined in more recent months are less likely to stop using the service and more likely to tweet more often than users from the past.”

    I think the first wave might have been the sign-ups. A second wave might be tweets as their existing userbase becomes more active. Even if they don’t add more registered users, if they can increase the activity of their existing users, that could generate a lot of traffic and provide many ways to monetize. However, a long way to go to a 1 billion valuation.

  • I agree that there is a lot of useful data here. I also agree that getting to 1 billion is going to be quite the challenge.

    I’ve talked about this before in other palces; I think they should worry about the usability and value of twitter as it is now. If ever time I go to tweet I have to by pass some advertising that is in the way. I’m just going to stop using it and find something else.

    • I see you pretty often in the comments here on TechCrunch and I just have to comment on your website… Nice site. :D

      On topic:

      Very interesting data. It’s nothing unexpected but it’s nice to see some real numbers.

      • ye I was going to check if it was a spammer and then got distracted

        • Sometimes I am named Alyssa (click the link!), sometimes Becky, sometimes who knows what else.

          I realize you’re not concerned about my name, but would you be disappointed if I told you I was really a guy pretending to be his ex-girlfriend? With pictures?

        • Note also that all of the links on “my” “personal” site redirect to some “dating” site that I’m not a member of in any name.

          What I’m really doing is spamming with plausible deniability. Mixing in some ham with the spam. Shpamming. Sphamming?

          I make up quasi-reasonable comments to the content of the article (it isn’t difficult to exceed the relevance of the average blog comment), but really I just want to send traffic to my “dating” site partner. I also make sure to post first or nearly first, to maximize visibility, but never return for followup comments.

          But make no mistake, I’m exploiting the lowest common denominator of web content creation to further my own ends. This is fairly common among the self-promoting, but I up the ante by using fake names and merging my sense of self with a for-profit “dating” site.

          Yes, indirect astroturfing..! But isn’t the model I chose for the pictures kinda hot in a geeky way? I’d do her, if she was available.

          • Actually I find this comment chain nearly as interesting as the article, as my god, that is one of the most “trippiest” ways I have seen of getting round-about traffic to a non relevant site… At least the spammers are trying to be a bit inventive now :)

          • This is just awesome! Did you give such a honest answer because you knew they will not come back to read it so they will not care or was it something else :D

            I didn’t notice referral link (to dating site) before you pointed it out… who knew… pictures are clickable :)

          • I think this comment thread actually highlights my primary question of this analysis: how does the spsham account skew the results stated above? In particular, the account with 0-1 tweets and few, if any, followers but many followees?

  • Here is another (older) analysis: http://blog.unto.net/twitter/sampling-twitter/

    Might be worth comparing the methodologies used.

  • Fascinating post – thanks, Robert!

  • That was seriously awesome data. Thanks!

  • TwitterIsForSelfRighteousDoucheBags - October 5th, 2009 at 8:13 pm UTC

    fantastic data, proves Twitter is about as useless as a turd on my lawn, fertilize the surrounding grass and disappear!

  • I hope the VCs did a similar amount of due diligence before valuing the company at $1 Billion… 75% have less than 10 followers. Wow.

    • Enjoyed the whole article, except I think you way overshot on that sentence about how the investors likely came to that valuation. Having been involved with multiple VC firms, very few of the best analysts would ever run those numbers. And, if they did, it wouldn’t make an iota of difference to the partner who desperately wanted to invest.

      One more analysis that I’d like to see. How many tweets have an exclamation mark?

      • Yeah, If the VCs who invested knew 2/3 of users are one and done i don’t think that would go over too well…

        When the numbers are discussed about total users that is very misleading and the vast majority don’t even use the service anymore. I, like millions of others, am in that camp.

  • Data porn! Now this is an AMAZING article, thanks!

  • Wow. Fascinating and insightful stuff. Thanks! I’ll be tweeting this … my first tweet in a while. Turns out I am about average … based on RJMetrics analysis.

    • I have done my own data analysis on twitter, and the results show that all users on twitter are well below average. Get a life. Deadpool.

  • the real question is…. how many are bogus accounts or p0rn accounts?

  • Dear VC’s into Twitter,

    You were dooped and played into believing that this was a hot property. Fake accounts played a HUGE roll into the fast growth & 25% of all Twitter users drive traffic with 75% of users have 10 followers? Wow. That is impressive. When growth is only at 8 million new users, there is no real social networking within friends to share photos to drive up time spent on the site. You have a older audience, no “tweens” to drive ad sales and no freakin’ revenue with a $1 billion dollar valuation. Pathetic. Especially when the technology can be copied (and or argued in court with some previous patents).

    Hand shake (jobs) all around for the VC’s involved in this deal.

  • OUCH !!!! those numbers are very telling . It looks like Twitter may be more hype than anything else. Sad but true

  • Wow – Great data! Thanks for sharing! There is a lot made about the concentration of twitter usage among power users.
    Do you think this is necessarily a bad thing for Twitter? It seems that many users join twitter to follow their favorite celebrities, bloggers, local news, etc. They primarily consumer tweets and post little or nothing at all. Does this concentration at the top necessarily represent a negative, or is it what we’d expect at this early stage of the medium?
    Obviously Twitter would want everyone to be publishing, but it seems that in general a small % create content. As a comp, Gartner’s engagement quadrants suggest that in general 80% of web visitors are “lurkers” (passive consumers/not creating content). http://bit.ly/3Xc792

  • Well done. Hope to see more from you.

  • Awesome article Bob! You should get into the statistics business. :)

  • not that other people haven’t said this above, but wow that was great.

  • Very interesting data. While some people see the data on low follower count, etc and get worried about user activity, many people forget that consumption of information through Twitter is probably more important for the business than production.

    Think about it in the context of Google or Reuters:

    How many people search Google vs how many people actually create web content? Does Reuters make more money when they have more news providers or only when there are more people consuming their content?

    It is no surprise that the vast majority of content is created by a small minority. The biggest unanswered question in the data shown here is what % of tweets are delivered / read. For any advertising-based revenue model, this is going to be far more important than the number of content producers (and by extension their level of activity, follower count, etc).

  • data overload. Thanks for the concluding summary! Very interesting suggestion that twitter is overhyped.

  • I’m not normally one to comment, but I must reiterate all of the sentiments of the previous posters. THANK YOU very much for sharing this eye-opening, comprehensive report! Wow!

  • And on a side note, 95% of statistics are made up :)

  • Yet any one of us would still love to be a founder.

  • I love these kind of posts. Techcrunch gets a very interesting article, we get to read fantastic stuff, and RJ gets the best advertising one can imagine. Win-win-win and if I ever start investing in startups I’ll make sure to consider your services!

  • I love the analysis. It revealed some interesting things about the behaviour of Twitter users.

    I do however have two suggestions that relate to the presentation of data:

    1) Pie charts are not the best way to present data. A horizontal bar chart with a x-axis of 0-100% is a more effective way to present proportion of a whole category data.

    2) It would have been better if the line graphs with a percentage of the whole y-axis were all standardised to 0-100% rather than the 0- 88%, 90% and 110% options than are shown in the report.

    • Why isn’t a pie chart an effective way to present proportion of a whole? Seemed completely clear to me.

    • I totally agree with Infoholic, and would add:

      1. “over 75% of all Twitter users have tweeted fewer than ten times” This point would be driven home by showing a frequency diagram or Pareto chart instead of the arbitrary selected bin sizes displayed in a 3D effect chartjunk pie.

      2. Overall, the vomit of different colors between all bars is distracting. Even worse, the colors are not consistent when “Tweets by Source” removes web to zoom in (which was a great move, by the way)

      3. The colors could have actually conveyed useful data in the cohort analysis if they were selected in a single hue of varying brightness. That way you could see the evolution of more recent cohort behavior without the eyeball pingpong with the legend.

      4. This article contains several examples where the x-axis labels should be horizontal for easy reading, not on edge. That’s just being lazy with your Flash charting module.

  • Thanks Robert

    I found this very interesting. I use Twitter as a business tool to track what is happening in my market. There must be ways that investors can see added value in this huge user base.

    I suppose a new term “Once tweeted twice, always a tweeter.” will ring true.

    Twitter should find ways to catch those 75% who don’t ever tweet, and almost force an introductory tweet – that might hook a couple more.

    Good article, thanks.

    Malcolm

  • how many times did you use the word ‘interesting’? That word is the kiss of the death when trying to describe something.

  • Lots of data!

    I would still like to see:
    1. how many tweets contain links
    2. how many are replies (conversations)
    3. clicks per link – woudl have to combine bit-ly api

  • ok NOT BAD STATS LESSONS. none the less you are missing one huge point. In vestment in a product such as twitter is not based on what it is now, but what it can be tomorrow. This of course is not taught in statistic classes.

    Thus Robert great stuff but actually meaningless to the creative mind a serious investor who understands the process. Twitter if it does not reinvent itself or gain some new life will certainly die a quick death.

    Fact is twitter’s future is to be take up where hi5 failed and not just become a web1.0 or 2.0 social network of uploading pictures and playing senseless games and quizzes but more of a web3.0 portal.

    Thus they can think as much as they want but the fact is they have ran out of real ideas an lack anymore talent. Contacting me would not be a bad start. TC eat my friggin shorts.

  • What twitter has done for me to be honest (their concept) is really a bare bones social network and way cheaper cost to implement. Thanks guys, your contribution to free market economics has been great.

    FB biz model really does not exist. Twitter does not really know how to get off first base. And VCs are just too smart to know that they are not so smart. :)

  • If only the also knew that 90% of the new sign ups are spam bot generated and 90% of followers are from spam bot accounts pimping dating sites, porn, diet fads, and sex toys! How would that have changed the valuation?

  • It will be interesting to see, how many account are filled automatically by rss-feeds, monitoring bots, etc. twitter is not only a website or community but also a service so intelligent apps can run ontop of it. Future will tell, if twitter itself will provide useful apps or someone else will.

Advertisement

Leave Comment

Commenting Options

Trackback URL
Short URL