waider: (Default)
waider ([personal profile] waider) wrote2005-04-10 04:11 pm

geekery update

(Further update: Hello Slashdot. Please READ CAREFULLY. This has NOTHING to do with the SonicStage application which, to the best of my knowledge, uses ATRAC encoding/DRM. This is SOLEY concerned with the MP3FileManager application which lives on the NW-S23. Thank you for your time, energy, and bandwidth usage. Also, please do NOT post links directly to my site from this thread; I will delete them.)

(minor update/clarification: note, this is not how ATRAC files work. This is solely what happens with MP3 files that are dropped onto the Sony device with their MP3FileManager application as featured on the device itself.)

I’ve written up the file formats from the previous entry. If you’re curious,
Sony Network Walkman NW-S23 MP3 File Storage

Introduction

The NW-S23 apparently supports real MP3 playback; it just obfuscates
the files before writing them to the device. I’ve reverse-engineered
the obfuscation sufficiently that I can read and write files from the
device, and the few bits I haven’t managed to explain away don’t seem
to matter - they’re probably “magic bytes” (version, etc.) or other
non-critical data.

File Mechanism and Layout

When you load an MP3 file to the device via the MP3FileManager
application, the first thing it does is to create a folder for the
file. If you’ve just dropped a single MP3 file onto the application,
your folder will be called “New Folder” or similar; if you drag an
entire folder to the application, the folder name will be copied. The
folder data is stored in a file called PBLIST1.DAT; whenever this file
is modified, it is backed up to PBLIST0.DAT and the new data written
to a fresh PBLIST1.DAT. Once the folder has been created, the MP3
file’s ID3 information is stripped, some of which ends up in the
PBLIST1.DAT file, the MP3 file is obfuscated and written to the
device, and the track number and folder/playlist position of the
obfuscated file are copied to the PBLIST1.DAT file. I’m not clear on
the exact order of how this happens, but it seems logical that the
application would attempt to first write the obfuscated file, and only
if that succeeds update the PBLIST1.DAT file. The obfuscated file is
named MPXXXX.DAT, where XXXX is the track number in zero-padded hex
format. Note that this has no relation to the ID3 track number; it’s
simply an internal index used by the NW-S23 to identify the
track. It’s also used in the obfuscation algorithm, as you’ll see
below.

To finish with the grosser details of the file handling, the files are
located on the device as follows:

[device root] (e.g. /media/NW-S23 on Linux, E: on Windows)
|
+-control     (various files I don’t know/care about live here)
+-esys
| |
| +-nw-mp3    ** MPXXXX.DAT files go here
| |
| +-PBLIST0.DAT  (backup playlist)
| +-PBLIST1.DAT  (live playlist)
|
+-hifi        (more files I don’t know/care about, probably ATRAC area)

Specifics of File Formats

General

All multi-byte integers are stored in big-endian format, which means
if you’re writing code for glibc on Intel chips to interface with this
you’ll need to do an amount of byteswapping. The library code I’ve
written does this for you where appropriate, e.g. in extracting track
numbers, but foldernames and suchlike are left in their on-disk
format.

All text strings appear to be UTF-16 (or maybe UCS-2) with null
termination. Folder names run to a maximum of 126 characters + NULL,
and other metadata runs to 127 characters + NULL.

PBLIST format

* The file starts with an 8-byte signature consisting of the
  characters “WMPLESYS”.

* There are six 2-byte words following this, the first two of which
  appear to be a timestamp with an epoch of 15:36 on May 26 1978
  (honest, I did the math), but that’s only speculation - the player
  doesn’t seem to care what data you put in here. The next two words
  are 0x08 0x9F 0x9E 0xFF, a sequence which also appears in the
  MP*.DAT files and may indicate a version number. The last two words
  of this block are 0x00 0x03 0xCE 0xA0.

* Next, there’s a pair of 4-byte longwords containing the number of
  folders and the number of tracks on the device.

* The last four bytes of header data are a longword XOR checksum of
  the header bytes. If you take the entire header including this field
  as longwords, and XOR them, you should end up with 0.

* Next we get to actual data. First comes the folder list: for each
  folder on the device, there’s a 256-byte block. The first 252 bytes
  are are 126 words containing the folder name in UTF-16 (I think)
  format. The last four bytes make up a longword pointing to the start
  of the tracklist for this folder as an absolute file offset - you
  should be able to fseek() to this offset and start reading the
  tracklist for the folder.

* After all the folders comes the tracklist which the folders’
  longword pointer points into. There is only a single tracklist on
  the device, consisting of a list of words representing each
  track. Thus for multiple folders the list is something like “Folder
  1 Track 1”, “Folder 1 Track 2” ... “Folder 2 Track 1”. Obviously
  this allows you to trivially move tracks between folders or change
  the order of tracks without having to reencode the files. The block
  is rounded up to the nearest multiple of eight bytes by
  zero-padding, and the only way you can find out how big the block is
  is by using the number of tracks field in the header plus a bit of
  math. When writing files to the device, the Sony application appears
  to try to fill holes in the existing tracklist before allocating new
  numbers, so for example if you’ve got tracks 1, 2 and 4 and you add
  a new track, it will be numbered 3 rather than 5. Your tracklist
  will still be written out in the correct order, i.e. 1, 2, 4, 3.

* After the folder list comes the track metadata. For each track the
  device stores the original filename, the track title, and the
  artist, in that order, in fixed-sized blocks of 128 words. Unused
  space (i.e. for short strings) is zero-padded.

* The file ends at the last block of metadata - there’s no trailer.

MPDAT format

This is the fun one.

* The file starts with a 4-byte signature, “WMMP”

* Next is a 4-byte longword giving the total file-size in bytes. This
  includes the file header, i.e. it’s exactly what you’d see displayed
  in a directory listing of the file.

* Next is the duration of the track in milliseconds, again in a 4-byte
  longword.

* The third 4-byte longword gives the number of frames in the file. If
  you’re trying to write a file to the device using your own code, I
  recommend ripping bits out of XMMS or mp3info to get this number, as
  I had difficulty locating a library that would calculate it without
  actually decoding the entire file.

* There are 16 bytes of magic: there’s the 0x08 0x9f 0x9e 0xff
  sequence that occurs in the PBLIST file, followed by 0x01, and
  padded out to 16 bytes with 0x00. I’ve no idea what any of this is
  but it seems unchanging.

* The rest of the file is the obfuscated MP3 data, with no ID3 frames
  - strip those out before you encode or your file will not play in
  the device.

The Obfuscation Mechanism

The obfuscation mechanism is a trivial “substitution cypher” based on
the track number. Start off with a 256-byte array (one for each
possible byte value) and fill it with array[index] = 256 -
index. Then, start working your way through powers of 2 from 1 up to
the biggest power of 2 less than or equal to the track number. For
each power N, if the track number has bit N set, go through your array
in blocks of 2N, and swap the first N bytes of the block with the
second N bytes. Here’s the C code I’ve written to do this:

void mple_build_conv_array( guint16 trackno, guint8 *conv ) {
  guint16 bit;
  guint16 i;

  for ( i = 0; i < 256; i++ ) {
    conv[i] = 255 - i;
  }

  bit = 1;
  while( bit <= trackno ) {
    if ( trackno & bit ) {
      guint16 j;
      guint16 k;
      for ( j = 0; j < 256; j+= bit * 2 ) {
        for ( k = 0; k < bit; k++ ) {
          guint8 temp;
          temp = conv[j + k];
          conv[j + k] = conv[j + k + bit];
          conv[j + k + bit] = temp;
        }
      }
    }
    bit <<= 1;
  }
}

Note that this array works for conversion in either direction.

===============================================================================
v1.0 / Ronan Waide / April 10, 2005 / Distribute as you see fit

I have no interest whatsoever in reverse-engineering the ATRAC stuff because (a) it’s likely to be far harder and (b) all my music is in MP3 format thanks to several months of ripping my CDs, correcting track information, etc. and I’m really very unlikely to through it again just to use a different encoding format.

[identity profile] bitpuddle.livejournal.com 2005-04-10 03:41 pm (UTC)(link)
Zoinks. How long did that take to do?
ext_181967: (Default)

[identity profile] waider.livejournal.com 2005-04-10 03:48 pm (UTC)(link)
The hard part - the obfuscation algorithm - fell out pretty much by accident, as related in the previous entry, along with a bit of guesswork as to what was going on with the powers-of-two nonsense; I didn't work that out by doing any math, I just dumped out conversion arrays for a bunch of files obtained from comparing the encoded file with the unencoded file and then stared at them for a bit before trying a few things until I stumbled on the algorithm. The rest of it was pretty trivial. All in all I've been dinking with it most evenings for the past fortnight.

(Anonymous) 2005-04-10 06:43 pm (UTC)(link)
That's dumb and Sony is dumb.

[identity profile] ac-slater.livejournal.com 2005-04-10 06:54 pm (UTC)(link)
I bet if you send this to 2600 they'd publish it.

(Anonymous) 2005-04-11 07:25 am (UTC)(link)
That's not saying much.

(Anonymous) 2005-04-11 08:09 am (UTC)(link)
for sure. now its just a bunch of lame articles about setting up a dynamic dns service and running apache on windows.

DMCA violation?

[identity profile] kineticfactory.livejournal.com 2005-04-11 09:28 am (UTC)(link)
The encryption doesn't have to be good enough to be hard to break. It just has to be good enough for Sony to be able to get any attempts to break it taken down, and/or threaten anyone attempting to do so with ruinous lawsuits.

[identity profile] dex.livejournal.com 2005-04-12 12:33 am (UTC)(link)
Any chance anyone could do this with Hi-MD?

(Anonymous) 2005-04-12 09:33 pm (UTC)(link)
(The_Stamp) from minidisc.org
I'll be doing some tests once my rh910 comes in :D

im also thinking on making a shopping list program for the hi-md 2nd gen. wouldnt that be useful?

congratulations and thanks Waider

(Anonymous) 2005-04-13 08:55 am (UTC)(link)
Thank you for detail information. Have a drink on me.

Cheers

Grey Mouse

[identity profile] zqfmbg.livejournal.com 2005-04-13 07:37 pm (UTC)(link)
Nice.
I'll try it with my RH10 when I get home.

(Anonymous) 2005-04-13 07:48 pm (UTC)(link)
heh... better in a more trendy, less technical mag such as CPU. might as well add a bit of editorial about how Sony is evil for obfuscating our files for us and make it a soapbox piece ;)

(Anonymous) 2005-04-13 07:59 pm (UTC)(link)
Is there any compiled program or script that will do the ofuscation for you? I'd like to be able to use my NW in linux but don't know any C.

Re: DMCA violation?

(Anonymous) 2005-04-13 08:46 pm (UTC)(link)
Last time I checked, the DMCA was an american law, I doubt it is available in Europe!

[identity profile] guitarromantic.livejournal.com 2005-04-13 09:31 pm (UTC)(link)
Slashdot called you a brave hacker! Good work soldier.

(Anonymous) 2005-04-13 09:45 pm (UTC)(link)
How do you mount the nw on linux (I tried sudo mount -t /dev/sda1 /media/sony, but this does not work. I have a NW-E75 and I don't think this is recognized as a mass storage system on my system.

[identity profile] widowwolf77.livejournal.com 2005-04-13 10:48 pm (UTC)(link)
THERE IS SOMETHING TO SAY FOR PEOPLE WHO HAVE TOO MUCH TIME ON THIER HANDS, BUT USE IT TO DO SOMETHING CONSTRUCTIVE(WELL AT LEAST IN OUR EYES) CHEERS MATE!

Initialize With Linux? Just bought one, don't have Windows.

[identity profile] shae.livejournal.com 2005-04-13 10:50 pm (UTC)(link)
I guess the title says it all. I bought one of these today, I can mount it just fine, but it has no files on it and I don't have access to a copy of windows. Any idea if I can get it working without Windows? Any help would be appreciated.

PS. I played with your code, looks awesome! Hope I can use it :-)

Re: Initialize With Linux? Just bought one, don't have Windows.

(Anonymous) 2005-04-13 11:17 pm (UTC)(link)
How do you mount it ? What modules (output of lsmod) are loaded when you mount it ?
ext_181967: (Default)

Re: Initialize With Linux? Just bought one, don't have Windows.

[identity profile] waider.livejournal.com 2005-04-13 11:24 pm (UTC)(link)
Try using mple-load. If it doesn't work, let me know and I'll see what other files might be necessary to get the thing working.

[identity profile] mskala.livejournal.com 2005-04-14 01:06 am (UTC)(link)
The epoch sounds about right to be the birth date/time of some engineer. I'd cast his or her horoscope if I knew the time zone.

Re: DMCA violation?

[identity profile] kineticfactory.livejournal.com 2005-04-14 01:17 pm (UTC)(link)
The EU has the European Copyright Directive, which is just as bad if not worse. It certainly has the draconian anti-circumvention provisions of the DMCA.

NW-E99

(Anonymous) 2005-07-04 06:04 pm (UTC)(link)
Almost same scheme for my NW-E99, but mp????.dat files is cyphered using simple XOR on all bytes by some value depending on tracknum. (i've got 2 drives of my e99 and its 0x4F-tracknum for drive1 and 0x50+tracknum for drive2)

BTW, that unchanging number in pblist at 0x0C and in mp????.dat at 0x10 is Media Serial Number, got by GetVolumeInfo at Windows

Re: NW-E99

(Anonymous) 2005-07-04 06:12 pm (UTC)(link)
A little investigation shows that 0x4F is a last digit in Media Serial Number at drive 1 and 0x50 in drive 2.

HD Player?

(Anonymous) 2005-07-09 06:52 am (UTC)(link)
Has anyone tried the nw-hd1-5 players on linux using this method?

Re: HD Player?

(Anonymous) 2005-08-31 11:27 pm (UTC)(link)
NW-HD5 (bought yesterday in EU) has totally different directory tree and control
file contents..

Re: HD Player?

(Anonymous) 2005-10-23 07:32 am (UTC)(link)
Just bought one of these too...

...anyone out there brave enough to use this information to create a decent windows app for transferring stuff?

Even...gulp...an extension to Windows Explorer?!!!

Re: HD Player?

(Anonymous) 2005-11-13 07:33 pm (UTC)(link)
The right approach is to write it in Java. That way it will run on just about any machine/OS out there. Instant support for Mac, Linux, and PC.

Re: HD Player?

(Anonymous) 2005-11-13 10:53 pm (UTC)(link)


Java being cross platform? You're taking the piss! I've seen apps written in Java that will only run on one version of one vendor's VM on a single platform. I've got one Java app in use right now that will run on Red Hat Linux, but won't work on Debian!

Re: HD Player?

(Anonymous) 2005-11-14 11:29 pm (UTC)(link)
Check this:
http://www.atraclife.com/index.php?showtopic=440