Digital audio listening is in the midst of a historic transition from standalone usage through dedicated hardware and applications to embedded experiences — something that will reshape the $65+ billion global audio sector.
We have already seen early signs, with audio embedded in smartphones becoming the dominant digital listening experience. Emerging car infotainment systems, wearables and smart homes promise even greater changes.
Having written elsewhere about the growth of podcasting, we decided to step back and map the digital audio ecosystem in order to begin understanding the most valuable elements.
Developing a perspective on the broader landscape is important to us because we are a young company with limited resources that needs to make smart bets about where we play and how we add value to other participants.
With a tip of the hat to the famous LUMA Partners charts, we mapped the digital audio ecosystem as a flow from marketers to consumers.1 Attempting to place players in boxes crystalized our most powerful takeaway from what might otherwise have been an academic exercise: reality on the ground is messy and many companies are playing across multiple categories simultaneously, often with integrated offerings.
Companies in three categories have been especially active in taking on multiple roles:
- Ad networks and distribution reps often branch out to spoken-word production and ad tech
- Spoken-word producers sometimes extend directly to ad sales
- Hosting providers are developing ad networks and insertion technologies for clients
Companies are playing in multiple spaces partly to retain optionality at a time when it’s not clear which activities will generate the most value. Also, digital audio remains an immature business system, where category-leading specialists have not yet established themselves or gained scale that would help them provide clearly superior value versus in-house solutions.
So, participants can rationally justify build and buy decisions versus partnerships and arm’s-length vendor relationships.
As we look toward the future, we identified four fault lines in the industry landscape that will determine how digital audio takes shape during the transition to embedded audio — and where players can generate outsized value.
Fault Line 1: How Will Audiences Access Audio Content?
Traditionally, audio consumption has occurred in a dedicated manner with single-purpose hardware and single-purpose applications. We have seen the hardware side of this equation erode: Smartphones have displaced MP3 players, with iPod sales in 2014 dropping to less than one-third of their 2011 levels. More recently, multi-purpose infotainment systems have begun to displace the car stereo and home audio systems.
Even where multi-purpose hardware has made inroads, audio consumption at the application layer still occurs primarily through dedicated tools such as Pandora and Spotify in music and podcatchers in spoken word. Outside of audio, we have seen the primary source of access to digital content move from portals to search to, most recently, social and messaging platforms.
In the past year, social networks for the first time outstripped search engines as the largest drivers of referral traffic to web sites. Search has never been able to gain significance as an access mechanism for audio content, arguably because there is a mismatch between the intentional nature of search and the more passive nature of audio consumption.
As social and related messaging platforms continue to emerge, however, they may become access points for audio content. Looking to video as an analogy, we have seen social platforms emerge as a significant force in a way that search never did. For example, Facebook now rivals YouTube in aggregate video plays.
If socially driven access defines the future of audio, there is potentially significant value for enabling tools that bridge audio content into social platforms and possibly even new social platforms that are anchored in audio. Our direct experience as a company gives us conviction about the potential that social media holds for audio.
Multi-purpose infotainment systems have begun to displace the car stereo and home audio systems.
Since summer 2015, we have been in beta with tools that allow audio content to play natively on social media with a link back to the original source and a platform that allows fans to share their favorite audio highlights to social media. We have seen audio highlights of podcasts and other sonic content shared to social media generate a 10X incremental lift in playback, a strong indicator that social distribution can play an important role in digital audio.
Fault Line 2: How Will Content Silos In Text And Audio Relate To Each Other?
The definition of audio content is a second major fault line. Traditionally, text and audio have occupied separate spheres, with relatively limited overlap. Consuming audio editions of print sources has been a relatively fringe behavior (we speak from personal experience as longtime, enthusiastic members of this fringe group).
Among other reasons, listening to text can be tedious for some and, in the case of news sources, audio can be inefficient as compared with scanning top stories. Audio production with voice talent is also expensive, limiting the supply of available content.
Several trends are converging that promise to blur the line between text and audio. With Audible and others leading the charge, audio books are transforming from a bare-bones narrator reading text to high-production performance. There are even authors such as British novelist David Hewson who have released titles straight to audio, foregoing or deferring print editions. Audio books have seen a quiet explosion with 30 percent annual growth rates in recent years.
With news content, continued improvements in text-to-speech and voice recognition promise to bring a flood of new content to audio. Apple Siri, Google Now, Microsoft Cortana, Amazon Alexa and other efforts to develop virtual assistants have also brought about significant improvements in speech synthesis with narration that sounds increasingly human.
Speech recognition promises to make audio content browsable in new ways.
Likewise, speech recognition promises to make audio content browsable in new ways. For example, Google recently announced an almost 50 percent increase in the accuracy of its voicemail transcription service and a 3X improvement in voice recognition error rates on voice-based search.
If text and audio use cases continue to blur, there may be significant value from traditional text media companies entering audio and bringing along with them ad and consumer spend from text.
Likewise, tools that enhance the browsability of audio content through voice interfaces and format (e.g., short-form summaries of long-form text articles; playlisting of podcast intros and highlights) stand to generate value. In our own experience, we have observed that short-form audio has generated a compelling user experience.
We typically condense longer news stories by 90 percent into shorter audio highlights that link to the longer content. We have seen 7X higher rates of sharing and social engagement in bite-sized audio versus full-length audio news stories.
Fault Line 3: Which Ad Dollars Can Support Digital Audio?
Advertising in audio is primarily a local game — more than 60 percent of ad revenues are local spot. As consumer behavior continues to shift from broadcast to digital platforms, there are significant value creation opportunities for approaches that can bridge local spend into digital, as well as those that can bring new dollars into audio.
The most significant challenge in bridging local advertising to digital lies with ad sales operations. Despite inroads by automated, self-serve platforms such as Google AdWords and Facebook, high-touch relationship-based sales continue to be important to local advertisers. In the broader advertising landscape, more than 75 percent of local ad spend continues to go to traditional media.
In the past 24 months, Pandora has moved aggressively to capture local ad dollars by poaching sales reps from radio broadcasters. Pandora’s scale — where its digital platform generates 9 percent of all radio listening hours and is equivalent to the No. 1 radio station in 14 of the 15 largest U.S. markets — allows the company to take on the cost of building an in-house local ad sales force that can sell geo-targeted ads reaching its captive app users.
Pandora’s ability to scale local ad sales while maintaining positive returns on sales costs will serve as a bellweather that tests how quickly local dollars can support digital audio.
The most significant challenge in bridging local advertising to digital lies with ad sales operations.
In addition to transitioning local dollars, consumer adoption of digital audio platforms creates an opportunity for new ad dollars to enter audio, particularly national spend. Local fragmentation and limited ability to track effectiveness have traditionally hindered the flow of national ad spend into radio. Digital platforms such as Pandora address the problem of geographic fragmentation by enabling ad insertion in disparate content formats and geographies.
The dominance of the smartphone as the preferred listening device and the promise of digital audio in the car create opportunities for ad targeting, as well as closed-loop measurement based on tracking user actions both in the physical world and on digital devices. There is potentially significant value for measurement and ad tech solutions that can reliably target ads and track ad effectiveness at a level that national advertisers are accustomed to with other digital formats.
Fault Line 4: How Will Spoken-Word Audio Be Aggregated?
Although spoken-word popularity trails music formats by a factor of more than 6:1, it still comprises 10-15 percent of all listening — and may have an outsized economic role due to a less contentious rights regime than music.
Among dedicated audio access platforms, Apple’s iTunes and its Podcast App dominate in spoken word, driving more than 60 percent of listening in the podcast niche. Apple, however, is closed to content owners and third-parties that might seek dynamic ad insertion, geo-targeting or direct consumer payment. At the same time, Apple has not yet indicated that it plans to develop the Podcast App beyond much more than an RSS playback utility.
Other end-user platforms (aka podcatchers) in spoken word are similarly closed and highly fragmented, with single-digit market share that does not give them the kind of scale required to support significant in-house ad sales operations directly.
There is significant value from bringing scale to the distribution side of spoken-word audio and combining it with monetization tools (e.g., access to dynamic ad insertion, geo-targeting, payment mechanisms) or at least openness for content owners to monetize on-platform directly. With app stores highly saturated and cost per installed loyal user growing by more than 80 percent in the past year, an independent aggregation play would require significant capital in order to get traction and scale.
Leading music services have recently moved to bring spoken word more prominently onto their existing platforms. With established users, music services do not face the user acquisition costs that standalone spoken-word aggregators do.
The audio sector is poised to undergo a period of rapid change.
Pandora recently announced an arrangement with This American Life to stream the popular serial podcast in short chunks. Spotify and Google Music have both added podcasts to their content choices. In a similar vein, leading music service Deezer purchased leading spoken-word aggregator Stitcher in the past year. These companies will test whether a music-oriented app experience can simultaneously integrate spoken-word formats in a way that attracts listeners.
Coming from a different direction, Audible has an established user base for audiobooks. The company has recently been developing a content team with a spoken-word radio and podcast background, which could position it to aggregate audiences for paid and subscription spoken-word content outside of audiobooks.
Recent success in crowdfunding by PRX’s Radiotopia and early efforts by Midroll to place premium content behind a paywall with its Howl app suggests that paid models can be a viable part of spoken-word audio. If Audible is able to aggregate sufficient audiences on its platform, it can develop a broader marketplace that independent creators can tap to support consumer paid models such as subscriptions and purchases.
The Embedded Audio Era: Back To A Bright Future
Looking across the span of 200,000 years of human existence, audio is arguably the media format for which humans are most naturally wired. While reading and writing are relatively recent innovations that have emerged during the last 5 percent of that time span, we have always been able to speak and listen.
With increasingly ubiquitous smartphones, widespread mobile broadband and infotainment systems as standard features in most new car models, the audio sector is poised to undergo a period of rapid change. As existing players evolve and new entrants emerge, the digital audio landscape promises to be a dynamic place with a bright future.