Where the Money Really Is (... from real data)
- Henry Marsden

- May 26, 2025
- 6 min read
Updated: Feb 25
We’ve been exploring the MLC data since 2021. It’s not the only dataset we use at Fix, but it’s certainly one of the more fascinating.

Firstly- it’s available (by legislation), unique among the CMO world, and creating its own second order effects.
Secondly- it’s for the biggest music market in the world, and (again by legislation) contains data on every recording available on digital platforms in the US- an effective proxy for every recording available on a global scale.
Thirdly- it provides an intriguing insight into the overall state of the music publishing ecosystem, particularly when it comes to the interplay between data, matching, and revenue flow.
It’s this aspect we'll explore in this post- illustrated by 5 key insights surfaced from the data itself. While the MLC reports steady progress in matching royalties to works, a deeper dive into the data reveals some intriguing nuances, and some hugely valuable opportunities.
Here are five things the MLC data tells us that every publisher should know:
1. There’s still a job to be done
MLC has previously mentioned matching 90% matching rate of recordings to musical works, measured by value. Their unmatched dashboard indicates 56.6% of historical value has been attributed (as of May 2025).
Diving into the data and looking via a different lens- only 22% of actual ISRCs in the entire dataset have been attributed to any work. That means over 170 million recordings (out of 217 million total) remain unlinked. In other words, while the top 90% of value may be accounted for, the vast majority of recordings, that is- long-tail revenue, are still sitting unmatched and waiting to be claimed.
The interesting part? Composer information is available for over 88 million of these unmatched recordings, with Fix identifying ‘matchable works’ for a significant portion of these. This isn’t the great recycling ‘washing machine of data’ approach, just pragmatic analysis of interconnected datasets at scale.
The data is out there. The value is on the table. It just needs to be connected.
2. Matched doesn’t necessarily mean matched.
One of the more subtle problems we’re seeing is that ISRCs aren’t always matched consistently. The same recording will appear on multiple platforms, albums, or compilations- with each ‘instance’ being individually reported to the MLC (noting as an interesting side note that granular detail is available from tech giants, though not always passed on by CMOs).
At MLC these ‘instances’ are matched to works individually- not by ISRC. This creates a situation where the same ISRC can be both matched and unmatched (and in many cases, matched to more than 1 work). The underlying logic is sensible, as an ISRC cannot necessarily be trusted to be unique to 1 sound recording- yet publishers are frustrated by (and the data shows) this careful approach is creating lower match rates than are possible.
We analysed the top 100 ISRCs with the most value in unclaimed. Here are 3 recordings with some of the most value in unclaimed- where there is money on the table, and a match available to made:
USUM71914275: ‘Dior’ by Pop Smoke (written by Andre Loblack, Bashar Jackson). 1,168 entries with value attached in the MLC’s unclaimed dataset- that’s real money waiting to be claimed by Warner Chappell
USUG11902877: ‘Woah’ by Lil Baby (written by Chris Rosser/Dominique Jones). 274 instances in MLC unclaimed that belongs to Universal Music Publishing
USS1Z1001234: ‘Hey, Good Lookin’ by Jett Williams (written by Hank Williams). 5,299 instances in MLC unclaimed Sony Music Publishing should be collecting
For this analysis we only drew out recordings of traditional ‘songs’, skipping over what could be classified as ‘functional music’, e.g. ‘Spa Music’ and ‘Sleep Sounds’. It was pure coincidence that the first 3 that fulfilled this criteria should each be claimed by a different major.
These aren’t obscure tracks with immaterial value. The metadata is known, and the money is real (of course, we could help them claim it!)
3. The music industry is a hits business, but there's money in the long tail
The music industry is founded on hits- with any successful business within it building longevity around a handful of well earning copyrights (read Blockbusters or Rockonomics for a wider exploration of the hits driven nature of entertainment industries). The money at large, and exponentially so, is in the top tracks. This is hence naturally where the focus with traditional claiming approaches has been.
Interestingly, analysis shows that tracks with the highest streaming volume are typically well matched. This makes sense. The more high-profile a song, the more likely it’s been prioritised for metadata accuracy and rights resolution. It’s a default consequence of the ‘top down’ approach. The fallout, however, is that as you go ‘further down’ a catalog, into mid-tier or long-tail recordings, the share of unmatched data (and associated value) rises sharply.
It also flips the industry norm. Most income tracking teams are wired to focus on ‘the hits’ because that’s where the bulk of revenue has traditionally lay. But in the MLC data, that strategy can mean leaving millions on the table. Our analysis shows there certainly is money to claim for the bigger hits, but there’s far more when you delve deeper down.
4. Small claims at scale are the hidden goldmine
This is the absolute kicker- and the most interesting discovery we’ve seen. It’s a similar insight drawn from the data, but from a differing altitude. The same as above holds true for specific recordings as well as across entire catalogs.
The real revenue story is in the micro.
Here’s a specific example that’s illustrative of a trend we’re seeing across the whole dataset. For the Jett Williams track in point #2 there’s nearly 5,000 unclaimed entries sat in the MLC’s lowest-value band, each worth only cents or pennies. There are (at time of writing) 4 instances in the highest-value band. Though individually worth more than any single claim of lower value, the total value in the lowest band far outstrips that of the single highest value claims. This is the long tail in action.
The MLC has done a fantastic job in creating tools for publishers (and creators) to match unclaimed royalties. When it comes to claiming, Publishers have struggled with (a) where to focus their energy, understandably starting at the perceived top of the value chain, and (b) being able to efficiently claim every penny. It simply isn’t worth it to manually, individually, claim every instance. Scalable, machine applied, tools are logically the only way. The value is there, and now the software is available to retrieve it economically.
5. The most recorded songs in the world- and what they tell us
As a final curiosity, we pulled from the MLC dataset the songs with the most recordings attributed to them- the most recorded songs in history. The lion’s share of the top 100 fall into 1 of 4 buckets:
Christmas songs (... of course)
Public Domain songs or arrangements (e.g. Happy Birthday)
Generic or ambiguous titles (a source of endless mis-matches. There was also many examples of classical pieces falling into this category)
‘Standards’- a specific set of Jazz/Swing/Showtune compositions, often from 1930s-40s, and typically canonised in the Great American Songbook
These types of registrations all naturally generate huge numbers of matched recordings (I’ll let you decide if those are legitimate ‘covers’ or not- or in the case of PD works, whether the very work claims themselves are in the spirit of the law).
There also aren’t any particular surprises when pulling out individual works not belonging to these categories. Here are the top 5 most recorded songs not in the above groups:
Besame Mucho (although arguably a standard)
These songs continue to earn, be re-recorded, and re-found across generations. An ‘ever green’ copyright is one that continues to consistently earn, year-in, year-out, and is what has become so attractive for investors and funds alike. Dependable revenue with a provable track record… if the data is matched.
So what does this tell us?
The value of publishing data isn’t just in the top line. It’s in the margins. The overlaps. The duplicates. The missed links. Careful analysis shows money is still hiding in the gaps between datasets- and why publishers need scalable, data-driven tools to retrieve it.
The MLC is doing a remarkable job in making this data public, and providing the associated tooling to clean it up. This is only the first step. What publishers do with that data and tools is what determines whether those lost pennies stay lost… or turn into meaningful revenue.




Comments