How early-stage startups can use data effectively

Koen Bok Contributor

Koen Bok is the co-founder of Framer, a design software company focused on helping people express their most creative ideas. Previously, Koen founded Sofa, an Apple Design Award-winning agency that was eventually acquired by Facebook.

It is a commonly held belief that startups can measure their way to success. And while there are always exceptions, early-stage companies often can’t leverage data easily, at least not in the way that later stage companies can. It’s imperative that startups recognize this early on — it makes all the difference.

In this piece, I draw on my experiences using data to take Framer from seed round to Series B. More concretely, I’ll describe what to (not) focus on, and then, how to get real results.

There are good and bad ways for startups to use data. In my opinion, the bad way unfortunately is often preached on saas blogs, a/b test tool marketing pages, and especially growth hacker conferences: that by simply measuring and looking at data you’ll find simple things to do that will drive explosive growth. Silver bullets, if you will.

The good way is comparable to first principles thinking. Below the surface of your day to day results, your startup can be described by a set of numbers. It takes some work to discover these numbers, but once you have them you can use them to make predictions and spot underlying trends. If everyone in your company knows these numbers by heart, they will inevitably make better decisions.

But most importantly, using data the right way will help answer the single most important – but complex – question at any moment for a startup: how are we really doing?

Let’s start with looking at what not to do as a startup.

Common pitfalls
Make a plan
Perfect setup
- Decide on infrastructure
- What to build, ideally
Typical problems
Discovering data

Common pitfalls

Don’t measure too much

Technically, it’s easy to measure everything, so most startups start out that way. But when you measure everything, you learn nothing. Just the sheer noise makes it hard to discover anything useful and it can be demotivating to look at piles of numbers in general.

My advice is to carefully plan what you want to measure upfront, then implement and conclude. You should only expand your set of measurements once you’ve made the most important ones actionable. Later in this article, I provide a clear set of ways to plan what you measure.

A/B tests are anti-startup

To make decisions based on data you need volume. Without volume, the data itself is not statistically significant and is basically just noise. To detect a 3% difference with 95% confidence you would need a sample size of 12,000 visitors, signups, or sales. That sample size is generally too high for most early-stage startups and forces your product development into long cycles.

While on the subject of shipping fast and iterating later, let’s talk about A/B testing. To get reliable measurements, you should only be changing one variable at a time. During the early stages of Framer, we changed our homepage in the middle of a checkout A/B test, which skewed our results. But as a startup, it was the right decision to adjust the way we marketed our product. What you’ll find is that those two factors are often incompatible. In general, constant improvements should trump tests that block quick reactionary changes.

Understand your calculations

Now once you’ve started measuring, you’ll typically determine a set number of KPI’s which are often calculated as monthly users or average use per user. While some of these are fairly straightforward, others get a bit more complex.

If you rely on third parties for metrics, make sure you understand exactly what they mean and how they’re calculated. This is especially important if you use these numbers in reports to others (i.e., explaining churn to shareholders). There are many ways to express churn and not fully understanding how yours is calculated can dilute its importance. In the worst-case scenario, someone might discover that you don’t understand your own numbers and you’d better hope that doesn’t happen during due diligence for your killer investment round. So, if you do one thing with data, make sure to deeply understand how you calculate your churn.

If you elect to use a third-party provider, spend a few hours rebuilding most of the calculations you report from raw data. Using a common provider like Baremetrics helps, but I still recommend reverse engineering your most important metrics. You don’t have to be technical to do this, a bit of Excel-fu should be enough, or use one of the readily available example templates available online.

Use the real numbers

A startup is small for a long time before it gets big. So it’s common that founders feel ‘motivated’ to use slightly inflated numbers, or worse, data that means nothing. These are often called vanity metrics. A good example of this is when startups pay more attention to signups than they do active users. The signup number will likely seem a lot more impressive, but that doesn’t represent who truly uses your product.

If you are lucky enough to have access to a great data scientist, you can try to model an ‘ideal user’ and measure their activity. At Facebook, they use L5/7 which references people who use the application 5 out of 7 days per week. At Framer, we looked at our internal users and found out that our ideal designer uses Framer 3 out of 5 workdays for more than 10 minutes per day.

Keep in mind, this exact number may only be a fraction of our typical monthly active userbase, but we already know that every one of these users is more motivated and can net a sale or organize a meetup. These real users are also a lot less prone to seasonality or marketing events, which helps avoid ending up with a user graph that is a series of splashes.

If an exact lower number bothers you, Weight Watchers offers an easy trick: Focus on the trends, not on the numbers. If you focus on the trend instead of the real numbers long enough, it will grow.

Don’t just use quantitative data

While numbers can help you improve your product or business, they will not help you crack the most important goal of any startup: making something people actually want. For that, you’ll have to talk to real people. Learn the basics of running great user tests and understand how to get information in an unbiased way.

Test with more people than just your friends or family — ideally just people that closely resemble your target audience. A word of warning: most people don’t actually know what they want. So it’s up to you to observe patterns and read between the lines. This doesn’t have to be a ton of work; science dictates that you only need to test your product on just five users before you get diminishing returns. And if you have ever tested an early product on users, you’ll know how true this statement is. The more broken your product is, the fewer users you need to find out the obvious.

Make a plan

Define a high-level product funnel

A funnel is a set of stages someone goes through while using your product. It helps you identify and focus on specific parts to measure and improve. You can focus on a single part of the funnel or multiple parts at once.

These should seem pretty straightforward — as people hear about your product they try it out, do a few things and eventually end up buying (or not). Once they’ve bought, ideally they stick around for as long as possible and continue paying.

Define a metrics plan per stage and implement it

Now that the stages are clear, we can define them with real-world events for your app. The following example is a simplified version of what we use at Framer.

Get your whole team on board

There’s little point in having just one person fighting for data-driven decisions with the rest of the company working off of gut feelings. The only thing worse than not using data at all is having a small data-driven minority get shut down because their projects seem less fun to work on. You need a balance of both.

The best way to get the whole team on board is to make it founder-driven. Let the founder work on a simple data plan (like the one above), present it, and explain it to the team. It shouldn’t take longer than a day and there’s no better way to emphasize its importance.

If you’re capping in at under 20 employees, I’d recommend dividing up the most important metrics and assigning them to the different employees responsible for each part of the company. Ask them to set goals for metric and report weekly on progress. Once the company starts to scale, these metrics can move to be team-based. You may also want to consider rallying the entire company behind one important metric.

Our weekly key metric sheet a year ago.

It took us a few meetings to go through our trove of data and find a good format to present, but after a while we were able to focus on the anomalies. These ended up influencing the way we planned for the week. The bonus is of having people be responsible for specific metrics also really encourages ownership.

But perhaps the best and biggest motivator for working with data is performance — both personal and company-wise. Good startups work on review cycles and make regular compensation adjustments based on those reviews and overall company performance. At Framer, we have four performance reviews per year and adjust compensation annually based on those. While not all those goals are quantitative, the best ones are because they are objective.

Perfect setup

Decide on infrastructure

There are many out of the box platforms that help you track data. The most popular ones are Google Analytics, Mixpanel, and Crashlytics. These all provide quick all-in-one solutions to gather, analyze, and visualize data, offering up a pretty great place to start. We, however, chose to invest in building some of our own infrastructure and we had two great reasons for that:

To make data useful you need full control. You’ll likely need events from many inputs like your site, your app, your mailing list, and multiple payment provider apps. You’ll quickly find that without flexibility you can’t make your data reliable, correct, or even work at all.
It doesn’t take rocket science to build. Our tracking infrastructure is an endpoint that streams data to Cloud Storage and transforms it into BigQuery.

Thinks like Redshift or Postgres are fine data stores too, but I really like that you can connect Google Sheets to BigQuery directly. Make apps, not sheets.

What to build, ideally

The best way is to split your data into two buckets or stages. The first one is a messy, unstructured bucket where you just record everything you need from different sources in different formats. From there, you transform the data back through a set of functions that cleans up the data and structure it in the right way.

Some examples of our transform functions:

We have a “derived” source for our signups that sets the source to ‘producthunt’ if the referrer contains producthunt.com OR the utm_source is ‘producthunt’ OR the ref url variable is set to ‘producthunt’.
We connect sales to the users that bought the app before they signed up via an email address lookup (yes, this happens).
We consolidate the sale event into a single format from two payment providers so we can use a single accurate MRR / churn formula across both data sources.

Having all the raw data, a playback and flexible transform functions allows you to improve or fix the structure at any given time. Just do full replay from the beginning of history and everything will be updated.

The transform functions are literally a set of Python functions that write each event to their own table in BigQuery. Optionally they clean up some data or combine different events together. Make sure they’re quite fast and can run incrementally from any point in the raw recorded data. But most importantly the transform functions need to be ‘pure’, which means that you get the exact same output every time you replay.

Typical problems

Non-measurable data

Anyone who has ever looked at Google Analytics knows that acquisition reports can spit out a ton of useless information. That’s because there are a lot of technical challenges that make it hard to measure correctly:

Tracking behaviour across platforms and properties is almost impossible.
Website visitors have ad blockers installed.
Websites don’t send the referrer anymore (mainly because of https).
Most of our traffic is direct or has no clear origin.
A lot of our traffic comes from search without keyword information.

You first need to make sure that you measure at least everything you can. In practice that simply means add tracking to every link you publish. The best way to do so is through utm tags. While they are a simple concept, it’s quite hard to consistently implement them and keep them (and thus your data) neatly organized. Attention to detail is required.

For ‘unmeasurable traffic’ the simple solution is just to ignore it. You can’t control it, so don’t worry about it too much. Make sure any of these don’t drop without a good reason, but that’s about it. Focus on what you can control. As an added bonus, the ones you do measure are likely a sample of everything, so if you can grow those, the non-measurable will grow with it.

Defining what results are “good”

This is one of the most important questions, but also one of the hardest to answer as it differs per company, question, and metric. There are basically two strategies to find answers.

The first is to compare yourself to others – benchmarking. Maybe you are lucky enough to have direct access to all your direct competitors’ data and you can directly compare. This is obviously the highest quality data, but often you’ll have to resort to less reliable sources. This is akin to being a private investigator, so it’s a bit of a gray area. Everyone does this, and you should assume your competition does as well. Proceed with caution.

You can also learn a lot from talking to competitor employees, corp dev, and venture capitalists. The last group often has great insights within their portfolio companies and can sometimes share without disclosing exact data about any one company. Most of their work is benchmarking when they evaluate whom to invest in.

Another great way is to research your competitors online and find proxies. Website traffic analysis, for example, is pretty easy to do through Adwords, Google’s ad system. If your subject has a fairly unique name, just type their names into the keyword tool and export the historical data.

Lastly, you can compare metrics by broad industry or vertical. Google Analytics has industry benchmarks built-in, and you can research reports from analytics firms. In some cases, service providers offer really nice overviews like MailChimp on click through rates.

Alternatively, you can compare yourself to yourself. The most common way to do this is an A/B test, but these typically measure small incremental changes. To find what is good, you are looking for extremes. So what you could do is make a few extreme versions that optimize (too much) towards a single goal, ignore everything else and test that on a small part of your audience. If you do this for a signup on your site, it would typically look like a huge signup action in the dead center and the bare minimal information. This is an obvious exaggeration, but you get the point – measure for a single optimized goal to get information on how far you can push something.

If you compare yourself to yourself, make sure to account for things like seasonality in time, other changes that might influence the results, and your own bias.

Turning data observations into actionable learnings

This is obviously the holy grail and biggest promise of data. And they way most people talk about it makes it seem like it would be a self-evident step that follows the measuring and analyzing of data. But the truth is a lot more complex. It is often surprisingly hard to go from measuring to knowing what to do, so I’ll spend a whole chapter on it.

Discovering data

Visualization

People are terrible at discovering patterns from sets of raw numbers. But if you visualize them, they suddenly become great at it. In short, you need graphs and lots of them. Which leads you to… a dashboard.

General data visualizations are a well-solved problem; graphs, pie charts, etc. It gets harder if you want to make your visualization more extensive like multiple value scales or combining multiple graphs into one. The most effective graph prototyping tool is still a sheet. What works well for us is to use a combination of Google BigQuery and Google Sheets and then use transformations and graphs to sketch what we want to have.

A customer health graph prototype made in Google Sheets

From there, turn it into a real dashboard for everyone to use. There are many products that focus on data reporting like Mode, Periscope and Looker. But I strongly recommend also looking at some smaller players that can be just as good: Cluvio and my current favorite: MetaBase.

What to look for

What makes ‘actionable data’ so hard is that you don’t always know what you are looking for. On top of that, it’s easy to record data that the default is almost always to record everything you can. A tough combination. The best trick is to look for meta signals; patterns that hint at something interesting going on.

Analyze your splashes

The most common data in a startup unfortunately is not a hockey stick but a splash. A splash is a huge peak, often caused by a release or marketing event, and then normalization back to a hopefully higher base. Splashes are homework. You can’t easily draw learnings from them, but you need to compare them to previous ones to see if your event-driven efforts are improving or plateauing.

Look for trends that are growing structurally

Now you can graduate onto monitoring structurally growing trends. This often takes the form of a piece of content or a product feature that people keep coming back to overtime. As an example, most of our blog posts are splashes but we have a few that keep growing structurally. The best one is on how to make high-quality gifs. It keeps bringing new people in because it’s something people Google every day, and the content is truly useful. Try to find these nuggets and repeat them.

Look for trends that are growing exponentially

This one is the rarest and easiest to spot because it will find you if you don’t find it fast enough. Symptoms include: breaking servers, customers that want to give you money faster than you can handle, and investors sleeping on your doorsteps. But they are truly hard to spot when they start because the beginning of a hockey stick looks flat.

Typically you are spending time thinking about how to create these in a startup. A really good strategy is to start with a linear trend that has retention upside. So for example, you have an ideal user count that is growing linearly with weekly retention of about 50%. If you start focusing on both bringing users in and figuring out how to keep them around, it results in a compounding effect and your linear will become an exponential trend over time.

Look for trends that are dropping of cliffs

I’ll keep this one simple: avoid them. Spot them in time and pray that it is an error in your metrics recording. You’ll need a second independent system in place to cross-compare. In our case, we use Google Analytics for this.

Look for predictors: correlations

This is more of a bonus. Once you get truly advanced you can work with a data scientist to discover predictors through linear regression, or even training some neural net. The key is to end up with high probability truths in the form of ‘if these things happen that will happen’.

Examples of some great findings in popular products are:

Freemium pricing upsells: Slack asks you to become a subscriber after a certain history retention or integrations. I’m going to assume that they experimented a lot with data to end with these as they are not ‘obvious’ limits.
Onboarding: Facebook’s found out that you were very likely to never become an active Facebook user if you did not make ~10 friends soon after you created a new account.

Deciding what to work on

Once you spotted some trends and opportunities, it is time to start sequencing what to work on first. Every prioritization framework is different, but this is a decent general checklist to score potential projects by.

Things that have the right effect (signups, conversion).
Things that have major growth potential.
Things that you can influence.
Things that are structural (not one-offs).
Things that can grow exponentially (people referring each other).
Things that take relatively little time.
Things that are free (for you).

Ironically, I started off this piece by making a statement that doing data the bad way meant applying what you read on the Internet to your own startup. But I hope that by sharing the trial, error, and successes we’ve had a Framer, I’ve managed to shed some light on the murky waters of growing your startup through data.