Music Hash identification
#1
So now v17 is out, lets start to discuss some new features Smile

I'm working with another media player application who are integrating a music scraper into their interface. The developer is frustrated with all the different tagging formats out there and is very keen to introduce a music "hash" for for track and album lookups.

Does anyone have any thoughts on how it should be done? Any method needs to be fast and reliable, and work independently of a users tagging.

I can handle the back-end database scraping stuff and API.

We've looked at acoustID and opensubtitles hashing. Both have upsides and downsides.

Current thinking is to do a custom hash of a tiny amount of data after the 1min mark of a track and -1min from the end as a 2nd hash. Anyone got any comments about that approach?

EDIT: Please help out by hashing your music collection with our Windows app here:

Download

Then upload to here:

http://www.theaudiodb.com/submit_hash.php
Reply
#2
Some feedback on this code would be nice

https://github.com/bLightZP/Audio-File-H...udioDB.com

Quote:The algorithm itself is based on a modified version of the OpenSubtitles.org code:
http://www.yanniel.info/2012/01/open-sub...elphi.html

Unlike the OpenSubtitles.org hash, in this case, hash offset position within the file is determined by the file size to support smaller file sizes, while allowing larger TAG data (embedded images) to be changed without affecting both hashes (unless the embedded image changes the file size from under 2048KiB to over 2048KiB).

Windows app here:

Download

The idea is it should make an XML file with a load of hashes for your music. Its virtually instant as its using slightly modified opensubtitles code.

Any comments on the approach?

Example
PHP Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<audiohash>
    <entry filename="C:\Disc1\01 - Erasure - Take A Chance On Me.mp3" fileext="mp3" filesize="3620477" hash1="84139935D9A5007C" hash2="051D6F57BDDE360F" image="False" />
    <entry filename="C:\Disc1\02 - Ce Ce Peniston - Finally.mp3" fileext="mp3" filesize="3957278" hash1="0C586509B16B3A0E" hash2="E5427597B973E3C6" image="False" />
    <entry filename="C:\Disc1\03 - K.W.S. - Please Don't Go.mp3" fileext="mp3" filesize="3461477" hash1="310C3EFB659EE74F" hash2="0763B46870A8453A" image="False" />
    <entry filename="C:\Disc1\04 - Take That - It Only Takes A Minute.mp3" fileext="mp3" filesize="3644752" hash1="9C9A3DF5D125A0D9" hash2="070FD6F4324666AE" image="False" />
    <entry filename="C:\Disc1\05 - Nick Berry - Heartbeat.mp3" fileext="mp3" filesize="2153228" hash1="B1B7DAFE1DA7F569" hash2="AB34357613C25E6B" image="False" />
    <entry filename="C:\Disc1\06 - Snap - Rhythm Is A Dancer.mp3" fileext="mp3" filesize="3560246" hash1="CAF1A72E5000EF93" hash2="930717624C274A1B" image="False" />
    <entry filename="C:\Disc1\07 - Utah Saints - Something Good.mp3" fileext="mp3" filesize="3403347" hash1="585F5C896A2B1469" hash2="F2ECEFDEBB221EAE" image="False" />
    <entry filename="C:\Disc1\08 - The Cure - Friday I'm In Love.mp3" fileext="mp3" filesize="3401275" hash1="13361AB581B81F59" hash2="D47A8D498096294D" image="False" />
    <entry filename="C:\Disc1\09 - Marc Almond - The Days Of Pearly Spencer.mp3" fileext="mp3" filesize="4290321" hash1="97A159AD6D96DD34" hash2="7B80F2CB9B316CE7" image="False" />
    <entry filename="C:\Disc1\10 - The Beautiful South - Bell Bottomed Tear.mp3" fileext="mp3" filesize="4403740" hash1="5F10954F3A35A85F" hash2="2D1217303422592C" image="False" />
    <entry filename="C:\Disc1\11 - Prince And The New Power Generation - Thunder.mp3" fileext="mp3" filesize="5528382" hash1="8C31C0C58741DFE5" hash2="36CB74967C6D67A8" image="False" />
    <entry filename="C:\Disc1\12 - U2 - Even Better Than The Real Thing.mp3" fileext="mp3" filesize="3533477" hash1="238D4D67B8C7BBB3" hash2="A37C38138D7DB5C0" image="False" />
    <entry filename="C:\Disc1\13 - The Shamen - L.S.I..mp3" fileext="mp3" filesize="3709558" hash1="C66621CDB7F805EC" hash2="736CA1E002C0F96E" image="False" />
    <entry filename="C:\Disc1\14 - Electronic - Disappointed.mp3" fileext="mp3" filesize="4173577" hash1="DEA31CD4C653F1CC" hash2="A4F476BF0A68F7A8" image="False" />
    <entry filename="C:\Disc1\15 - Shakespears Sister - I Don't Care.mp3" fileext="mp3" filesize="4231754" hash1="BC408890174BD273" hash2="D2BCB0A7E87C1CB7" image="False" />
    <entry filename="C:\Disc1\16 - Carter The Unstoppable Sex Machine - Do Re Me, So Far So Good.mp3" fileext="mp3" filesize="2944884" hash1="8A20CDA049BDC739" hash2="E092511E040D4A4E" image="False" />
    <entry filename="C:\Disc1\17 - Ugly Kid Joe - Everything About You.mp3" fileext="mp3" filesize="3952686" hash1="3C5A7B7E36B48CA3" hash2="1A647DA507111CC7" image="False" />
    <entry filename="C:\Disc1\18 - SL2 - On A Ragga Tip.mp3" fileext="mp3" filesize="3594068" hash1="604A4F2F9C2B01CB" hash2="C5E40FC85EF261A0" image="False" />
    <entry filename="C:\Disc1\19 - The Orb - Blue Room.mp3" fileext="mp3" filesize="3958959" hash1="2D3F9F6F28C32342" hash2="4E4F52200AD679A5" image="False" />
    <entry filename="C:\Disc2\01 - Richard Marx - Hazard.mp3" fileext="mp3" filesize="4614485" hash1="A60E111670B97CF2" hash2="6EA1BB60A655C8D7" image="False" />
    <entry filename="C:\Disc2\02 - Elton John - The One.mp3" fileext="mp3" filesize="5630140" hash1="D974DB873DC7019F" hash2="2A2FAA91816A6C9A" image="False" />
    <entry filename="C:\Disc2\03 - Roy Orbison - I Drove All Night.mp3" fileext="mp3" filesize="3589506" hash1="54101929EE926389" hash2="D5EFBB87C57014AA" image="False" />
    <entry filename="C:\Disc2\04 - Jimmy Nail - Ain't No Doubt.mp3" fileext="mp3" filesize="3805344" hash1="8D79D531D8CD45A9" hash2="B7280B4741B3D352" image="False" />
    <entry filename="C:\Disc2\05 - Joe Cocker - Unchain My Heart (90's Version).mp3" fileext="mp3" filesize="4908623" hash1="95066880CAE13087" hash2="A570FC94145C88AC" image="False" />
    <entry filename="C:\Disc2\06 - Curtis Stigers - You're All That Matters To Me.mp3" fileext="mp3" filesize="4458461" hash1="EB6FAAA889737E5A" hash2="03E7F2940BC47C73" image="False" />
    <entry filename="C:\Disc2\07 - Wilson Phillips - You Won't See Me Cry.mp3" fileext="mp3" filesize="3706628" hash1="22393C6AFCC2C255" hash2="19BED1821D85A500" image="False" />
    <entry filename="C:\Disc2\08 - Crowded House - Four Seasons In One Day.mp3" fileext="mp3" filesize="2693341" hash1="AB5591090A84E4C4" hash2="151DD0DB6F1FC765" image="False" />
    <entry filename="C:\Disc2\09 - Annie Lennox - Why.mp3" fileext="mp3" filesize="4729559" hash1="4A702BFA31556C3C" hash2="0814FDF5649F3410" image="False" />
    <entry filename="C:\Disc2\10 - George Michael And Elton John - Don't Let The Sun Go Down On Me.mp3" fileext="mp3" filesize="5561001" hash1="340D542B30BAF520" hash2="D565A110121CBFAF" image="False" />
    <entry filename="C:\Disc2\11 - Diana Ross - One Shining Moment.mp3" fileext="mp3" filesize="4578093" hash1="B15A4BDB51787850" hash2="ECD3C353AB838D14" image="False" />
    <entry filename="C:\Disc2\12 - Vanessa Williams - Save The Best For Last.mp3" fileext="mp3" filesize="3511696" hash1="40169A301B3E3A52" hash2="882D74F7585CCBE2" image="False" />
    <entry filename="C:\Disc2\13 - En Vogue - My Lovin'.mp3" fileext="mp3" filesize="4050085" hash1="C5E3BA1EB9DB38FD" hash2="BA499DBDAC909A44" image="False" />
    <entry filename="C:\Disc2\14 - Soul II Soul - Joy.mp3" fileext="mp3" filesize="4006580" hash1="579ED9EC463F1EFB" hash2="CEB47109B193FB1F" image="False" />
    <entry filename="C:\Disc2\15 - Incognito - Don't You Worry 'Bout A Thing.mp3" fileext="mp3" filesize="3971560" hash1="97CAFCC10C9947DF" hash2="CFCA8A9CE44BEE24" image="False" />
</audiohash> 
Reply
#3
We have a proof of concept now working and importing to TADB.

Next step is to setup an API method to get the Artist, Album and Track MusicBrainz ID from the hash we store.

Some initial results
PHP Code:
Original MP3 file       hash1="659D200D3B4F2BAD"   hash2="0AF56452067CF197"   image="False" 
Change filename         hash1="659D200D3B4F2BAD"   hash2="0AF56452067CF197"   image="False" 
Change id3 tag year     hash1="659D200D3B4F2BAD"   hash2="0AF56452067CF197"   image="False" 
Embed cover image       hash1="CC2E2B80F6A3C3AC"   hash2="0AF56452067CF197"   image="True"
Remove tag completely   hash1="9F65097AD1E38352"   hash2="EDD5972919703D39"   image="False" 
Retag with Picard       hash1="F46F746965C28CF8"   hash2="0AF56452067CF197"   image="False" 

As you can see it's all pretty good up until the removing of the tag.

Whats suprising is when I retag the empty MP3 with musicbrainz picard, the 2nd hash is identical to earlier tests. Very cool Smile
Reply
#4
How would this work for the same song ripped from 2 different sources? For example, vinyl and cd?
Reply
#5
(2017-02-15, 22:10)helta Wrote: How would this work for the same song ripped from 2 different sources? For example, vinyl and cd?

Those 2 formats would have no hash relation to each other and would require a separate hash.

Same goes for something ripped to MP3 and FLAC.

We can store as many hashes as we want though, its not limited to just one set.

I'd imagine for popular tracks, there would be hundreds of hashes submitted over time.
Reply
#6
So I've been working hard on this feature with the lead developer of ZoomPlayer and I think we have something that is working now...

Example hash lookup - http://www.theaudiodb.com/api/v1/json/1/...88B105020B

Effectively we can lookup an album's or artists details and artwork from a single track hash Big Grin Boom!

The hashing code is very quick and simple, I've hashed my own collection of scene FLAC files (23,000 tracks) and it only took 2 minutes.

EDIT: This method could be extended to video as well...
Reply
#7
I was just thinking about this actually. A couple of different questions:

1) I haven't ripped any CD's in a few years. Lets say for example, I had Track 1, of Artist 1, from his Album 1. If I were to rip and to convert to flac for example, would this hash be the exact same between my cd rippers hardware vs another person's hardware? Or does something like AccurateRip now make all the rips the exact same? I guess what version of flac encoder you use would also make a difference.

2) Is there a "preferred" or "correct" hash? I could imagine people downloading the same exact track from the same exact source, but one being "corrupt" and not being bit perfect to the original. Could you verify your tracks against a database? Kind of like CDDB but on a track level?
Reply
#8
(2017-02-23, 19:58)helta Wrote: 1) I haven't ripped any CD's in a few years. Lets say for example, I had Track 1, of Artist 1, from his Album 1. If I were to rip and to convert to flac for example, would this hash be the exact same between my cd rippers hardware vs another person's hardware? Or does something like AccurateRip now make all the rips the exact same? I guess what version of flac encoder you use would also make a difference.

Theoretically if you used the same default settings of the ripper, then yes it would have the same hash.

But this method works far better on common sources such as tracks downloaded from online stores iTunes, Amazon or Peer-to-Peer networks.

I'd be interested if the online stores add any extra metadata that breaks the hashes. I've yet to test that properly.

(2017-02-23, 19:58)helta Wrote: 2) Is there a "preferred" or "correct" hash? I could imagine people downloading the same exact track from the same exact source, but one being "corrupt" and not being bit perfect to the original. Could you verify your tracks against a database? Kind of like CDDB but on a track level?

Yep that's exactly what TheAudioDB is already doing on the backend, it just needs the app written to check it.The beauty here is we take 2 hashes that are independent of any id3v2 tagging so it should be pretty accurate, but also blazingly fast to create the hash.

I suspect, if this becomes popular, we can store millions of hashes and all kinds of cool apps can be created on top of it.
Reply
#9
Quote:OK we have a new song hashing feature! But we need help to fill the hashes... Please download this small windows app and run on the root of your music MP3 or FLAC collection:

http://zoomplayer.com/t/AudioHash.zip

It should only take a few minutes to hash, even on huge collections.

And then can you upload the resulting XML file to this page please?

http://www.theaudiodb.com/submit_hash.php

Thanks in advance!

Thanks to the couple of people who have submitted to the hash DB so far. We already have 50,000 file hashes Wink
Reply
#10
I will do this for you when I get home.... expect a decent size update.....


There you go, uploaded.
Reply
#11
(2017-02-27, 19:30)docwra Wrote:
Quote:OK we have a new song hashing feature! But we need help to fill the hashes... Please download this small windows app and run on the root of your music MP3 or FLAC collection:

http://zoomplayer.com/t/AudioHash.zip

It should only take a few minutes to hash, even on huge collections.

And then can you upload the resulting XML file to this page please?

http://www.theaudiodb.com/submit_hash.php

Thanks in advance!

Thanks to the couple of people who have submitted to the hash DB so far. We already have 50,000 file hashes Wink


where does it save the xml file

Found it, its in the folder you you passed to it as the root ?
Reply
#12
a couple of files uploaded for you
Reply
#13
Will that audiohash work with wine on Linux?
Reply
#14
(2017-03-01, 14:58)Rusendusen Wrote: Will that audiohash work with wine on Linux?

Yes I believe it should work. It just needs the mediainfo.dll which is included.

Let us know if it does.
Reply
#15
Thanks to tkgafs, Manfred and Helta we have another 20,000 hashes imported!

I've been testing this more and more and its quite accurate when you start looking up hashes now. For example a popular track like:

http://www.theaudiodb.com/track/32794824
Image

Has already got 10 hashes both FLAC and MP3.

And its surprising how common the tracks are around the internet and how many tracks that have duplicate hashes Smile even with wildly different filenames and tagging

What we really need is people to submit their hashes that have already tagged their collections with MusicBrainz Picard, that would really be a goldmine.

EDIT: we now have a total of 133,203 file hashes
Reply

Logout Mark Read Team Forum Stats Members Help
Music Hash identification2