waider: (Default)
waider ([personal profile] waider) wrote2005-04-09 10:27 pm

my contribution to the DMCA argument

So I got a Sony music-playing gadget for Christmas. It's not an MP3 player; it's still mired in Sony's proprietary ATRAC format, and of course all the tools for getting data on and off the device are Windows-only, which is very hard on us Linux folk. And they're resolutely Windows-only, meaning they don't play nice with Wine. I did have some sort of Byzantine setup involving VMware, but the less said about that the better.

Now, when I say "mired in Sony's proprietary ATRAC format" I mean that dropping MP3 files as-is onto the device didn't work, and using the MP3FileManager tool that actually lives on the device left me with a bunch of files that weren't MP3s. In fact, I didn't know what they were. Due to some cross-licensing deal, RealPlayer can play some ATRAC files, but it sure as hell didn't recognise these. More to the point, various people on various fora made comments to the effect that the MP3 files were transcoded to ATRAC for the player, and Sony's bulk conversion tool seems to want to do exactly that.

However.

One thing that caught my eye very early on was that the MP3FileManager tool moved the data very quickly. About as quickly as copying the file straight onto the device would have taken. Transcoding tends to be a slow process; you've got to reconsitute the PCM data, and then do the new compression magic. This tends to be math-intensive stuff, particularly the second step, and it takes time, and I could not really see this time being taken, but the files on the device at the end of the process were certainly not MP3s.

As geeks do, I got curious.

The first thing I do in situations like this is crack open the file and look for some identifying file magic that I can google for. These files contained the string WMMP at the head of the file, but googling that produced nothing. So I poked and prodded some more. There was also a pair of files which seemed to hold all the metadata, and these were tagged WMPLESYS. Also a non-starter with google. So much for that idea.

The metadata files were easy enough to crack; the metadata was plainly obvious (although encoded in Windows' big-endian UTF16 format) and after a bit of poking around and experimenting with adding and deleting files I figured out most of the structure. Then I turned my attention back to the music files.

Following my normal approach to decoding file formats, I started poking around for file structure. Things like two or four bytes that match the file length to within a few bytes (to allow for both absolute-size headers and number-of-bytes-beyond-this-point headers) are a good place to start. Somewhere in the process of doing this, I had done a hexdump of the first 256 bytes or so of one of the encoded files, and I was flipping between this and some other stuff, and accidentally hexdumped the wrong file.

The original MP3 file.

And lo, there was a marked similarity in the structure of the file, even if the actual byte values were different. So I checked the filesizes of both, and they were close, but not equal. Then I remembered that the metadata was in the WMPLESYS file, so I stripped it all out of the MP3 - and removed the obvious header from the WMMP file - and lo, byte-for-byte size match. Obviously I was onto something here. Further investigation and a bit of Perl produced a conversion array of array[converted_byte]=original_byte, and after running the converted file through it, I had the original MP3. Woot!

Then I picked up the second file on the player, ran it through the conversion process, and got a bunch of garbage. CRAP. The conversion array obviously changed per file. And so I went back to my hex editor (hexl-find-file in emacs) and scribbled a lot on a piece of paper, and pretty much stumbled on the conversion algorithm. I set up a test that ran through thirty files on the player and compared them to the original via the discovered algorithm, and all was well. HURRAH!

I'm leaving out some detail here. In particular, the number of times I did completely stupid things with my code and, not noticing, assumed I'd screwed up the algorithm or misunderstood the file format. Also the quest for a usable MP3 decoding library, because in order to successfully build the header for the WMMP file you need to know both the length of the song in milliseconds and the number of MP3 frames in the file, and you need to filter off the ID3 tags, and what I ended up doing after a frustrating number of days spent arguing with various libraries was to rip a chunk out of XMMS' MP3 plugin and throw away anything I didn't need. Right now I've got a half-assed library and some command-line tools that allow me to load, list and unload the device, and I've posted it on my website (here) as much as an incentive to myself to clean up and finish the library as anything else. Oh, and I should probably document in places other than the comments what I've actually discovered about the file formats. The big win for me, though, is simply being able to load and unload the player from Linux.

The amusing part of all this is that I'm pretty sure it can be argued, in light of Adobe's pursuit of Elcomsoft over unlocking PDFs, that this is a DMCA violation. I'm not sure how you claim that the trivial obfuscation of a MP3 - which is argued by those who would like to own the keys to all your media to be a copyright violation itself - is a copyright protection device, but that's a handwave.

[identity profile] boutell.livejournal.com 2005-04-09 10:17 pm (UTC)(link)
Nice job, man!
ext_181967: (Default)

[identity profile] waider.livejournal.com 2005-04-10 12:52 am (UTC)(link)
Thankyou kindly sir, especially for your graphic.

[identity profile] nothings.livejournal.com 2005-04-10 02:27 am (UTC)(link)
sweet