After many, many iterations, I finally have the data in some semblance of order.
One day in 2004 I migrated my music library from Winamp to iTunes. Winamp had some plugin which attempted to collect playcount statistics, but it barely worked. iTunes, on the other hand, has playcount tracking built-in. iTunes use of this number is minimal – the number is incremented whenever a track plays to the end of its audio – and you can sort by this number, but that's it.
However, iTunes' library database is exportable, and once I learned that, I knew what I had to do. Finally, I would be able to see the advanced, math-intensive statistics that I knew my music library deserved. But first, I would have to build the system. Starting in 2007, I wrote a small script which could compute playcounts not just by track, but by album and artist. Then I added to it, making it show how many hours I had spent listening to my music. Then I added to it again, this time adding in trends and comparisons. And I've just kept adding new analyses ever since.
I am fundamentally aware that nobody except me cares about what music I listen to. This project is as self-serving as they get. I do though have a few altruistic goals for this project:
- To give definitive answers to the question "What's your favorite album?" It is pedantic and point-missing to feel a need for such a precise, detailed answer to this question, but I am a nerd, so this isn't surprising.
- To not forget about music I like. Like John Cusack's character in High Fidelity I have a large and growing collection of music which must be constantly re-sorted so as to remind myself what's actually in the collection. These statistics have umpteen times helped me re-discover great music that I already have in my collection, but had just forgotten about.
- To guide my search for new music to add. These statistics are a great way of indicating which way my tastes are changing over time, and have been a great help for continuing to find new music from new artists.
- And lastly, to make cool statistics. A large part of the fun of this project has been challenging myself for how to best analyze and present the information. In the early years, the statistics were numbers dumped all over the screen. Over iterations, I've refined the display, with a final major improvement having only been finished now.
My largest hurdle, which I believe I have only recently solved, is how to sort albums. Sorting them purely by total playcount tends to favor shorter albums, whereas sorting by total time spent listening favors longer albums. The ideal is to favor neither – to have album rankings determined by impartial mixture between playcount and time spent listening. I've tried many iterations of that over the years, but have only recently come down to this system: first finding the highest playtime and the highest playcount in the whole library, then calculating each album's playtime and playcount as percentage of those highest numbers, and then taking the hypotenuse of those two percentages as a "score". The higher the score, the higher the album is ranked.
Whether that's perfect or not is subjective, but there are fewer issues with this sorting method than any I've developed prior.
Another major challenge has been how to calculate changes in popularity over time. iTunes gives no insight what-so-ever as to when a song was popular in your library, so the only way to calculate changes in popularity is to subtract an earlier export file from a later one. This is a simple concept, but very difficult in actuality, and took much effort to pull off. I would love to get the data to where there are line graphs showing playcounts over time (instead of the single arbitrary "recently" column there is now), but that remains a future goal.
Sometimes I discover errors in music metadata, and they need to be corrected, but doing so can throw off the ability to match that corrected entry against the earlier uncorrected versions. I do not have a great answer to this problem, so the current approach is to be extra diligent that when new music enters the library, it does so cleanly and error-free. But sometimes revisions to data happen, and dealing with trying to programmatically tie together different versions of the same data is an ongoing challenge – updates to different fields require each different solutions. For instance, changes to "genre" are worked around by re-interpreting old versions of the data using the new genre definitions. And changes to other data are worked around through maintaining a separate "fixed data" table with rules about specific rows in the library import. But this continues to a challenge, and future revisions to how it is met are likely.
There are a few ideas brewing in my noggin, ideas which may work their way into reality if I squeeze my brain hard enough. One is a re-working of the way related artists are linked together. Right now, related artists are tied together one-to-one in a simple table, with no nuance to how those artists are related. Further, the linking is at artist-level, where I would love to get it so that tracks "featuring" other artists can link to those artists (if they're in the database).
Another addition planned is for statistics to be calculated based on album type (such as LP, EP, live album, remix, etc.). Right now, album type data is only used for display purposes, no statistics on it are gathered at all, but I'd like that to change.
And a potential addition that may or may not happen is the inclusion of playlists into the data. I take heavy advantage of the iTunes playlist features, having hundreds of them, and to somehow see that information reflected on these statistics is a possibility.
I know that these statistics are meaningless to everyone who's not me, but I still have found the project worthwhile and a good use of my time. I may not be able to play an instrument or write a song, but by golly I can tell you how many times I listened to The Decemberists' "The Hazards of Love" in 2015. I take what I can get.