Feedback loops and online abuse

I’ve long thought that much of the world can be explained by feedback loops. Why are small companies nimbler than large ones? Why are private companies generally more efficient than governments? Primarily because in each case, the former has a better feedback loop. When faced with a baffling question — such as, “why do online companies do such a terrible job at dealing with abuse?” — it’s often helpful to look at the feedback loops.

Let’s look at the small vs. large and private vs. government comparisons first, as examples. Small companies have extremely tight feedback loops; a single person makes a decision, sees the results, and pivots accordingly, without the need for meetings or cross-division consensus. Larger companies have to deal with other departments, internal politics, red tape, the blessing of multiple vice-presidents, legal analysis, etc., before they can make meaningful changes.

Similarly, if a private company’s initiative isn’t going well, its revenue immediately begins to plummet, a very strong signal that it needs to change its course quickly. If a government initiative isn’t going well, the voters render their verdict … at the next election, mingled with their verdicts on all the other initiatives. In the absence of specific and meaningful external feedback, various proxies exist … but it’s difficult to definitively determine actual signal from noise.

And when a social-media platform, especially an algorithm-driven one, determines what content to amplify — which implicitly means deciding which content to de-amplify — and which content to ban … what is its feedback loop? Revenue is one, of course. Amplifying content which leads to more engagement leads to more revenue. So they do that. Simple, right?

Ahahahahahaha no, as you may have noticed. Anything but simple. Content which is amplified is often bad content. Abuse. False news. Horrifyingly creepy YouTube videos. Etcetera.

Suppose that (many of) the employees of these platforms genuinely wish to deal with and hopefully eliminate these problems. I know that seems like a big supposition, but let’s just imagine it. Then why have they consistently seemed so spectacularly bad at doing so? Is it purely because they are money-grubbing monsters making hay off bullying, vitriol, the corrosion of the social contract, etc.?

Or is it that, because it did not occur to them to try to measure the susceptibility and severity of the effects on their own systems by bad actors, they had to rely on others — journalists, politicians, the public — for a slow, imprecise form of feedback. Such as: “your recommendation algorithm is doing truly terrible things” or “you are amplifying content designed to fragment our culture and society” or “you are consistently letting assholes dogpile-abuse vulnerable people, while suspending the accounts of the wronged,” to name major criticisms most often leveled at Google, Facebook, and Twitter respectively.

But this is a subtle and sluggish feedback loop, one primarily driven by journalists and politicians, who in turn have their own agendas, flaws, and their own feedback loops to which they respond. There is no immediately measurable response like there is with, say, revenue. And so whatever they do in response is subject to that same slow and imprecise feedback.

So when Google finally responds by banning right-wing extremism, but also history teachers, which is clearly an insanely stupid thing to do, is this a transient, one-time, edge-case bug, or a sign that Google’s whole approach is fundamentally flawed and they need to rethink things? Either way, how can we tell? How can they tell?

(Before you object, no, it’s not done purely by algorithms or neural networks. Humans are in the loop — but clearly not enough of them. I mean, look at this channel which YouTube recently banned; it’s clear at first glance, and confirmed by subsequent study, that this is not right-wing extremism. This should not have been a tough call.)

I’ve long been suspicious of what I call “the scientific fallacy” — that if something cannot be measured, it does not exist. But at the same time, in order to construct meaningful feedback loops which allow your system to be guided in the desired direction, you need a meaningful measure for comparisons.

So I put it to you that a fundamental problem (although not the fundamental problem) with tackling the thorny problem of content curation in social media is that we have no way to concretely measure the scale of what we’re talking about when we say “abuse” or “fake news” or “corrupted recommendation algorithms.” Has it gotten better? Has it gotten worse? Your opinion is probably based on, er, your custom-curated social-media feed. That may not be the best source of truth.

Instead of measuring anything, we seem to be relying on Whack-a-Mole in response to viral outrage and/or media reports. That’s still much better than doing nothing at all. But I can’t help but wonder: do the tech platforms have any way of measuring what it is they’re trying to fight? Even if they did, would anyone else believe their measurements? Perhaps what we need is some form of trusted, or even crowdsourced, third-party measure of just how bad things are.

If you would look to make a meaningful difference to these problems — which are admittedly difficult, although, looking back at the banned history teacher’s YouTube channel, perhaps not so difficult as the companies claim — you could come up with a demonstrable, reliable way to measure them. Even an imprecise one would be better than the “outrage Whack-a-Mole” flailing quasi-responses which seem to be underway at the moment.