Audio fingerprinting being used to track web users, study finds

A wide-scale study of online trackers carried out by researchers at Princeton University has identified a new technique being used to try to strip web users of their privacy, as well as quantifying the ongoing usage of some better-known tracking techniques.

The new technique unearthed by the study is based on fingerprinting a machine’s audio stack via the AudioContext API. So it’s not collecting sound played or recorded on a machine but rather harvesting the audio signature of the individual machine and using that as an identifier to track a web user.

They write:

In the simplest case, a script from the company Liverail checks for the existence of an AudioContext and OscillatorNode to add a single bit of information to a broader fingerprint. More sophisticated scripts process an audio signal generated with an OscillatorNode to fingerprint the device. This technique appears conceptually similar to that of canvas fingerprinting. Audio signals processed on different machines or browsers may have slight differences due to hardware or software differences between the machines, while the same combination of machine and browser will produce the same output.

The researchers have created a live demonstration page of the technique, which can be found here.

They found the audio fingerprinting technique was not widespread. But nor was it being picked by some of the common tracker blocker/privacy tools they also looked at, such as Ghostery.

To carry out measurements to track the trackers, the researchers used an open-source tool called OpenWP, which they say enabled a wider scale study — covering the top one million sites, as ranked by Alexa — that was also able to pick up more trackers because they used a fully featured consumer browser to harvest the necessary data versus a more stripped-down option.

“Without full support for new web technologies we would not have been able to discover and measure the use of the AudioContext API for device fingerprinting,” they write.

Tl;dr online tracking is an ever evolving arms race — and privacy-related research needs to reflect actual web browsing to get the fullest picture.

Other tracking techniques they looked at include WebRTC IP, canvas fingerprinting, canvas font fingerprinting and battery API15.

While privacy tools such as Ghostery and Firefox’s third-party cookie blocker were rated as effective by the researchers, they found “obscure trackers” pose more of a challenge, concluding that: “The long tail of fingerprinting scripts are largely unblocked by current privacy tools.”

So even though audio fingerprinting might not be a very common tracking technique, it’s novel enough that it will probably fly under the radar of any privacy tools a user is running.

One technology the researchers envisage being helpful in future to keep on top of proliferating and ever-changing tracking techniques — specifically by helping keep tabs on fingerprinting scripts — is machine learning, which they hope will be able to be used to automatically detect and classify trackers, potentially being able to replace the current necessity for developers to manually curate block lists.

“If successful, this will greatly improve the effectiveness of browser privacy tools,” they write of machine learning. “Today such tools use tracking-protection lists that need to be created manually and laboriously, and suffer from significant false positives as well as false negatives. Our large-scale data provide the ideal source of ground truth for training classifiers to detect and categorize trackers.”

The study can be read in full here.