Whose Data is it Anyway?

Henry Marsden
May 20, 2025
5 min read

Updated: Feb 25

Photo by Mick Haupt — *Photo by* *Mick Haupt*

In the music publishing ecosystem, data is the oil that keeps the engine running. It defines what’s registered, what’s matched, and ultimately, what (and who) gets paid. But unlike your typical crude oil, rights data isn’t fungible or easily traded- it’s locked in silos, guarded as proprietary, and scattered across a network of databases and gatekeepers.

The core question that’s long hovered over this space is deceptively simple: who owns the data? The answer is, of course, anything but.

Societies: Gatekeepers or stewards?

Performing rights organisations and mechanical societies around the world have, by necessity, long held vast swathes of data on behalf of their members. It’s their job: to collect, manage, and distribute royalties based on repertoire and usage information. But over time many of these societies have developed a defensive posture around this data- particularly when it comes to bulk access, machine-readable formats, or APIs.

It’s a strange contradiction when the data has often been through 2 parties already before society registration- songwriters and their publishers. Plus of course much of this data is publicly accessible, searchable via public interfaces like ASCAP and BMI’s Songview, or through member portals like PRS’s, but any attempt to interrogate it at scale is met with resistance. The argument is often framed around privacy, accuracy, or proprietary logic- but underneath lies a very real commercial truth: this data is often seen as a competitive moat.

All societies believe their data is cleaner, more complete, or better matched than others. In a globally fragmented system, having “better” metadata is heralded as an advantage. The irony, of course, is that these entities are all trying to do the same thing- manage global rights- and in a digital world they’re all working from effectively the same set of underlying recordings and compositions (... the same 'hymn sheet', if you will).

Redundant work in a digital world

Any lack of openness creates a tragic kind of inefficiency. In a market that is now dominated by digital-first, borderless, and real-time platforms, rights data generally remains fragmented, duplicated, and manually reconciled. Each society builds and maintains its own database of works and its own recording-to-work matches for apportioning incoming revenue. The base cause of this redundant effort is explained by one of my favourite industry quotes (courtesy of Joe Conyers III)- “Music is global, but Copyright is local”. The second order effects we experience are caused by this inherent friction.

Publishers are also maintaining their own datasets too, and even here the lines blur. When catalogs change hands, data portability becomes a nightmare. Even simple use cases, like sharing correct metadata with societies or checking if a co-writer's shares are registered are made unnecessarily difficult by a lack of interoperable standards or shared repositories. Publishers are also disincentivised from making catalog transfers smooth as they can benefit from any residual income that persists beyond the retention periods.

The failed attempt to build a Global Repertoire Database (GRD) stands as a cautionary tale. Despite industry-wide support and clear need, the project collapsed under the weight of political tension and the age-old question: who would control it?

Where creators lose out

In the middle of all this are songwriters- the very people this data is about. Not even in the middle really… they are the very nucleus! The initial seed! No data exists in any of the music industry without being birthed as the twin-sibling to a creation- drawn from nothingness by the hands of a creator. They are the ones who suffer when works aren’t matched, when co-writers’ registrations don’t align, or when a sub-publisher in another country hasn’t picked up the right data picture. Publishers do suffer too, but pro-rata market share payments (cough black box cough) can numb the pain somewhat for those with scale.

These aren’t hypothetical data issues. They’re the daily reason royalties go unclaimed or unpaid. And yet, creators and their publishers are rarely given any kind of access to see (or fix) these problems. Even when they do try to engage, the lack of bulk visibility makes it nearly impossible to participate meaningfully in data cleaning or revenue recovery.

The vast majority of individual claims (as we’ll see in future articles) are of very low value. Publishing in the digital age is a micro-penny business. The sheer quantity of these claims is what adds up to the headline grabbing, billion dollar figures. This means, however, that the only economical way to process them is with scalable solutions- with machine accessible and readable data.

This is a missed opportunity for everyone. When the right people have the right data and the right tools to interrogate it, the entire system improves.

The case for shared data infrastructure

This isn’t “society bashing.” Every stakeholder in the publishing ecosystem has valid concerns and competing priorities. But there’s a better way forward that doesn’t require total reinvention, just shared intent.

API access, bulk downloads, and clear, consistent data standards aren’t radical ideas. They’re the table stakes of modern digital transformation. Surely the industry can agree there is a better way than CWR (based on a technology fashionable in the 70s)? In the rare cases where data is made open, we’ve seen real progress.

Look at the MLC. While it’s far from perfect, the MLC’s available database- mandated by the Music Modernization Act- has set a new standard for transparency. Yes, the data is full of gaps. Yes, the matches aren’t always perfect. But by being visible, those issues can be addressed. Transparency has incentivised publishers, administrators, and songwriters to engage in the clean-up process in a way that could never have happened in the dark.

Just last week I was on a call where multiple publishers were asking ICE (PRS, STIM and GEMA’s data and licensing hub JV) to make their new database accessible via API. There is clear demand, and real commercial opportunity. Again- when the infrastructure improves, everyone benefits. Fewer disputes, faster payments and greater trust.

Reframing the question

So maybe the question isn’t “who owns the data?” at all. Maybe it is "who is best placed to make it work for everyone"?

Ownership implies control. But stewardship implies responsibility.

If publishers, societies and creators alike saw data not as territory to guard but as shared infrastructure to maintain, we’d be a lot closer to fixing one of the industry's most consistently plaguing issues.

We don’t need to centralise everything- the post-GRD fashion of licensing Hubs and SPVs have shown a hybrid approach can work well. But we do need to connect the dots better. The goal isn’t to force creators to become data experts- it’s to make sure the data works for them and their publishers- not against them.

And for that, access is everything.