Trooly is using machine learning to judge trustworthiness from digital footprints

Trust greases the wheels of the sharing economy, paving the way for transactions to take place between total strangers. But figuring out who is trustworthy and who is not remains a sticky bottleneck for digital businesses wanting to scale faster. Meanwhile the consequences for customers when startups screw up these risk calculations can be very unpleasant indeed.

The traditional route to assessing risk is to run a full background check on an individual — a process that can be time-consuming and expensive, given it can involve sending an actual person to an actual courthouses to parse actual paper records.

Which is why, in recent years as sharing economy businesses have been gunning to scale up, other entrepreneurs have spotted an opportunity to step in to offer online services for verifying identity and screening for unsavory behavior, to try to steal a march on more established but slower paced background checkers. A couple of examples of startups entering this space in recent years include UK startup Onfido, which just this April announced a $25 million Series B round; and US based Checkr, which raised a $40 million Series B in March.

Well here’s a third: Trooly, which is today announcing its Series A. Along with an earlier unannounced seed round it says it has now raised $10 million in external funding. Investors in the Series A are Bain Capital Ventures and Milliways Ventures, with the latter also investing in its seed, which it closed at the start of 2014.

Trooly’s big claim vs competitors is that it’s doing things drastically different, even compared to the other digital newcomers, by applying machine learning algorithms to public digital footprint data — so information that can be freely found online (and on the dark web), for example on social media services and police registers of offenders — to enable risk and trust to be assessed much more speedily and cheaply.

The idea being that businesses can use Trooly to perform quick pre-checks based on the public data that’s out in the wild in order to make decisions on whether to proceed with the time and expense of a full background check. Or to run periodic checks to keep more up-to-date tabs on the behavior of existing staff.

Trooly is not saying it can fully replace a “gold standard” background check (yet), but it argues there’s an advantage to offering businesses a fast risk assessment so they can hedge their bets about whether to even proceed with a full background check, whilst also helping them with their ongoing goal of reducing friction from online interactions.

“Background checks are very labour based, people based. It’s literally physical court runners that are going out to court houses to look at records. So typically the way we’re being used in a background check mode is as a triage before sending someone to the courthouse. So a first step,” says Trooly co-founder and CEO Savi Baveja.

“If we can probabalistically say here’s a person or business who is extremely unlikely to fail a background check… that’s worth a lot to our customers. Because then they can make a positive decision much earlier.”

In about half of the cases he says Trooly will be able to say the probability of a failed background check is “virtually zero”. Whereas around eight per cent of requests will get a negative hit from a background check — so that would reduce the business’ costs of performing a full background check in those instances, given they could just turn down a hire or a potential customer.

We’re essentially trying to reinvent legacy mechanisms like background checks, credit checks and fraud mechanisms.

“Our big picture idea is to launch a way of evaluating the trustworthiness of an individual or business that fits better with modern use cases than legacy mechanisms do. We’re essentially trying to reinvent legacy mechanisms like background checks, credit checks and fraud mechanisms,” he says.

“The way we do that is we use public and legally permissible digital footprints, and in about 30 seconds — using very little input information about the individual or business — we return a scorecard that does three things: it verifies whether the input information is authentic; it screens for any relevant and seriously antisocial or pro-social prior behavior; and then it runs a series of predictive models on that footprint to say what is the propensity of this individual or small business for future antisocial or pro-social behavior.”

The advantage vs competing services is speed and cost, says Baveja. He also claims Trooly is more comprehensive, and better suited to diverse tech platform use-cases.

“Both Checkr and OnFido are nothing more than a slick technology workflow on top of the same old background check as has been done for decades,” he argues, discussing other startups in the ID verification and screening space. “So under the hood of these companies, there are court runners going to courthouses taking up to 3 weeks to return a record (vs 30 seconds for us).

“They charge on the order of $20 (vs <<$1 for us).  And their results are subject to all the same problems with bias and underlying data incompleteness as the traditional background check companies like Sterling, Hireright and dozens of others… For every hit that a company like Checkr or Onfido would return, we would be returning one additional hit (at a fraction of the cost and time).”

Baveja says Trooly has built its own dark web crawlers to be able to harvest relevant data to feed into its systems. It has also built in-house indexes of public registers of offenders, as well as licensing relevant public record data-sets from third parties (at least for its current North American markets).

One area where it’s applying machine learning is to help correctly interpret the data displayed on the web — to, for example, link a specific identity with a specific crime (and importantly avoid incorrectly linking pieces of data that might be being displayed on the same webpage but are not necessarily linked).

It’s also using machine learning to bridge gaps in public data — to help join some partial dots to improve the accuracy of its matches and also avoid making connections when there’s not enough coherent information to be sure.

“A lot of public records are incomplete. They don’t give you a date of the conviction, for example, they don’t give you a resolution of the conviction. It’s slightly messy data. And so because we’re applying machine learned identity and machine learned models on top of the underlying records we’re able much better to say: ok just throw out this record, you’re never going to figure out anything about what this person actually did or when they did it,” says Baveja.

“We also have machine learned models on top of the record itself telling the difference between a speeding infraction and an assault. Pretty important. So when we return something to our customer we distinguish between major crimes and minor crimes. We distinguish between recent crimes and non-recent crimes. In a way that allows them to be much more careful about what they use and don’t use.”

To power its future behavior prediction feature, it’s using component models trained on the specific types of behaviors its customers want to identify. The core training for these models consisted of recruiting tens of thousands of people to fill out a series of standard personality/behavior instrument tests and psychology questions, according to Baveja.

“When we get a new customer we ask them to give us some data on what it is that they’re trying to predict, what bad or good behavior are they trying to predict, and then we tune our component models, we weight them to predict that good or bad behavior. So the last 10 per cent of what we do is tuned to the customer’s use case,” he adds.

Given it’s working with partial and public data — and despite applying its salve of machine learning to heal some of the cracks — he concedes the system is still not always able to serve up a score on a particular individual or entity. In fact it’s only able to provide an answer in about two-thirds of the cases.

“For the remaining one-third of requests we have to tell the customer we’re not sure,” he admits. “There generally is 70 to 80 per cent that we are, with some degree of confidence, able to put into an inclusive bucket. The rest we just tell our customers we’re not sure, there’s no footprint, or we just can’t figure it out.”

The system also returns a confidence score — rating how confident Trooly is in a particular assessment. “Our customers use that confidence score to decide where they want to draw the line in terms of using our scorecard,” he adds.

Of course risk mitigation is never going to be an exact science, so a margin of uncertainty and error is to be expected. Although there are perhaps wider questions to be considered about whether technology services in general — as well as tech specifically aimed at speeding up background checks — might not be encouraging businesses and consumers to accept wider risk thresholds than they otherwise might. Given that where there’s risk there’s clearly also opportunity.

But Baveja disputes the notion that Trooly is encouraging more risk. “We are not encouraging companies to do less thorough background checks at all — rather we are suggesting that whatever trust mechanism companies use should actually be demonstrated to predict the behavior they are trying to prevent,” he tells TechCrunch.

“For example, if an on-demand cleaning company is worried about theft, then they should first prove/test that failing a background check is a good predictor of theft on their platform. Way too many companies default to using background checks as their basis for trust when there is little or no proof that using a background check will actually prevent the undesirable behavior from happening on their platform.”

“As a society, we have way overestimated the usefulness of background checks as a reliable predictor of anything, and, as a result, many companies live under the false security of having done a background check when the result of a background check might have no relationship to the behavior they are trying to prevent… We stand behind rigorous models that are based on proven predictive power with undesirable behaviors that our customers care about — can background checks make this claim?” he adds.

At this stage Trooly is not disclosing the exact number (or names) of its customers but Baveja says it’s in the single digits. Size wise it’s going after “very very large” entities, working with “one of the major sharing economy companies” on its pilot to build and refine its models — with that company now a fully fledged customer. The full Trooly production service has been up and running since August of last year.

While it’s started focusing on the sharing economy, where trust is clearly a pressing issue, Trooly is actually eyeing financial services companies as its major target going forward.

“We are in active pilot with financial services companies and related use cases,” notes Baveja. “Here there’s a lot of interest in verify, the first step of what we do [such as from online lenders]… And there’s a lot of interest in the compliance part. The pressure on financial institutions for KYC [know your customer], and anti-money laundering and politically exposed people… That pressure is going way up… so financial institutions are quite keen to find ways to reduce the friction.”

A third use case for FS companies is Trooly’s predictive modeling to help assess risk of growth areas for their lending businesses where they’re likely to encounter so-called ‘thin file’ customers — such as younger people, immigrants and smaller businesses.

“They don’t have great models for all those things. Traditional type of scores don’t do a good job of telling you about millennials,” he adds. “So what we do adds a lot of value there.”

Its investors clearly agree with that assessment. “Trooly has built an impressive service that demonstrates the full potential of machine learning,” says Ajay Agarwal, managing director of Bain Capital Ventures, in a supporting statement. “They’re not simply using data to automate an existing process. They’re addressing an entirely new need – one that’s particularly critical for today’s financial institutions and peer-to-peer marketplaces.”

Trooly’s Series A funding will go towards scaling up the engineering team specifically to focus on the financial service sector, as well as ramping up on marketing generally, says Baveja.

It also wants to explore possibilities for expanding internationally, currently only offering services in the US and Canada, although he notes this will have to be done “extremely carefully” given the different legal, compliance and regulatory frameworks elsewhere.

“That obviously is going to take a lot of investment,” he adds.