Techcrunch recently ran a piece by Michael Wu of Lithium. The following is a response written by Ferenc Huszar, who, prior to joining Peer Index PeerIndex as lead data scientist, was was a PhD student at the Machine Learning Lab at Cambridge University.
Quantifying aspects of human behaviour and social phenomena has never been simple. But in today’s world, one thing is inescapable. We are creating a new market and ecosystem of personal preferences and patterns of influence. We are creating an exponential amount of data – 3.2bn likes and comments per day, over 400m tweets per day, and rapidly being joined by Pins and Cinema.grams. We are connected in some way to more people than ever in history – an average of 229 friends of Facebook, the ability to be magnified to hundreds of thousands or millions on Twitter.
And out that seismic, epochal change, social influence, a delicate concept, rises in importance. Why? Influence is that delicate concept which describes or predicts the behaviour embedded in these trillions of connections.
Intuitively, we understands social influence exists, and we all have some idea about what it means. We believe that people recommend things to their friends; and their friends act on those recommendations. We believe reviewers on TripAdvisor affect hotels people book. We believe that a friend who enthusiastically raves about a new shampoo, might encourage us to try it.
But of course, influence is a complex system of many moving parts, involving the relationship between people, fraudsters who try to game the system and so on. This complexity evokes a natural skepticism in people about whether these signals be meaningfully analysed and whether it is at all possible to build predictive models of influence.
We believe it is possible. Recent developments and interest in academic research confirm that the study of social influence is a well-posted scientific problem. As online social networks become mainstream, their data allows scientists and companies to gain previously unprecedented insights into social phenomena. Nine out of ScienceDirect’s top 25 academic papers in Computer Science study human behaviour on online social networks. This summer Science, one of the most prestigious and hardest-to-get-into academic journals featured an article on identifying influential and susceptible members in social networks. And in addition there is a growing number of scientific meetings devoted to the study of online influence.
While academic researchers have the luxury to focus on single challenges in isolation, public influence platforms like PeerIndex have to address many scientific challenges at once, while also maintaining a viable business model.
1. Rigour not instinct
A central element of science is in the application is real rigour to measurement and validation processes in favour of any gut instinct. Our methods should not based on gut feeling and intuition, and in the case of PeerIndex, we’re trying to limit subjective bias to the minimum and let the data speak for itself. With scoring methods based on a modern technique called statistical machine learning, the scoring formulae used to measure degrees of social influence are a result of an optimisation that seeks to maximise the predictive value of our scores.
Overgeneralisation on the basis of a sample is a risk, but more advanced procedures can tackle that. We use a technique called cross-validation which reports the predictive performance and generalisation ability of statistical models. All data is split into two non-overlapping sets: the training set and the test set. When the model is trained, a formula for quantifying influence is found, with the algorithm only having access to the training set. Then the performance of the method is evaluated only on the held-out test set. It’s like taking driving lessons in one city and then having to take your driving test in a different city you have never been to before.
This careful procedure makes sure that the formulae we find can generalise to previously unseen data, or even previously unseen users and it makes sure any circular reasoning is taken out of the picture.
Algorithms are also increasingly trained and validated using multiple independent data sources. While primarily based on social feeds, measurement companies also have proprietary data. PeerIndex uses information from audience experiments like the rate-my-mates Facebook app, and can use ‘preference learning’ methods to evaluate how well its ranking is aligned with a user’s intuitive expectations. Whilst this may not be the most important driver, it’s an important independent validation. Fuse that with the data gleaned from a track record of brand campaigns and external data from partners and the algorithms will only grow in power.
2. Transparency vs. innovation
Another area where social influence platforms have a hard decision to make is transparency. Users and clients want to know how exactly our formulae work.
What is the right level of transparency? From PeerIndex’s standpoint, we identify the principles we use to evaluate social capital and what we seek to predict. These principles are how you affect others, on which topics and with how much effort. We also describe our analytical methodologies, which are to use statistical machine learning to build an appropriate model.
Can you get more transparent than that? To a non-technical audience with limited time on their hands probably not. The maths is hard and the process by which the maths applied is hard. And the scoring is neither as simple nor as linear as saying “a retweet gets your 10 points” and an “a mention gets you 15” – it actually doesn’t work like that.
Additionally, the features and formulae used by the measurement platforms, in particular PeerIndex, are constantly validated, improved and modified, and signals are swapped in and out. The algorithms are constantly searching for the best, most predictive solutions, while work is ongoing with the academic world to devise new techniques for bringing further scientific rigour to the models of measuring influence.
I think the requirement of perfect transparency is at odds with innovation and experimentation which is needed to provide the best possible solutions, acknowledging that no-one yet has final answers to all the questions we ask.
3. Playing the gaming
The final thorn in the scientific side is gameability: the concern that people can become obsessed with their scores, and thereby change their behaviour to achieve higher scores. Gameability is a real problem, and there has been a fair amount of gaming happening already on Twitter and Facebook which date back to well before influence platforms became mainstream. Sure, follow people hoping they will follow you back then unfollow them. But the extent to which PeerIndex’s scores can be gamed is actually something we can rigorously analyse, address and to some degree control if we rely on science.
The relevant branch of science is called mechanism design, a sub-field of game theory, pioneered by John Forbes Nash, Jr, a Nobel laureate and also subject of the movie The Beautiful Mind. The same theory is used to examine the fairness of electoral systems, public auctions or speculation in financial markets, and can be used to analyse the extent to which our scoring scheme is gameable. We are a long way away from completely solving this problem, but we’ve started to do so, and we do everything to keep up with developments in academic research.
Crucially, most influence scores depend not only on your behaviour but also of those that interact with you. If we observe that you tweet more often, it is an indication of your increased activity, not authority within a topic. If this increased activity is not matched with an expected increase in responses, retweets, likes or other forms of interaction, then your influence score, if anything, should be diluted. You cannot really game your score by changing your own behaviour alone. You should also be able to change your peers’ behaviour and the way they interact with you – do that, and you’ll be influential regardless of the extent to which your influence is scientific.
Social influence exists. We know it exists because we all recognise its patterns at work. We know it exists because we can see those patterns in data flows on social networks. And we can make pretty good predictions about those flows.
The science exists to model those flows and to give strong descriptive influence indicators on people’s influence in topics. That science is hard and ever changing, but the goal remains the same: to help users understand and benefit from their influence and social capital.