waider | public service announcement

Current Mood: apathetic
Current Music: Something Happens!: Behind Your Teeth

Entry tags:

geek,
html

public service announcement

You might want to check that your client for this and other "weblogs" converts accented characters (such as é) to HTML entities. Because if it doesn't, then (a) you're violating spec and (b) you're counting on everyone reading your page in, one presumes, iso-latin-1 AKA iso-8859-1. Me, I browse in UTF-8, so I see question marks everywhere you've got an unconverted entity.

Flat | Top-Level Comments Only

This makes sense, but there isn't a hope in hades that it will ever be adopted in Japanese web pages, where I have the most such headaches. As far as I can tell they assume that if you know the encoding for one page, you'll assume that encoding for any other page in that hierarchy otherwise not indicated.

Actually, I don't really know what the right thing to do for Japanese and other such non-ISO pages are, since the HTML entities list seems only to cover the various ISO widgets.

Blithely misusing the term/word/thing "ISO", I know. Japanese, after all, is iso-2022-jp.

My initial thought would be to use a charset header to tell the browser/user what charset is in use, but they're frowned upon by the W3C: Character Set considered harmful.

i emailed the phoenix (mac os 9 client) developer about it. he replied:

Phoenix does automatically transmute an é to "%E9" which is the Unix Ascii
code for an Acute E. Thats what the livejournal documentation suggests I do.

I have no clue if that helps at all.
~Chris

So sounds like your quarrel is with the livejournal dev team for recommending Unicode over entities, rather than with the individual client developers.

This would be the part in the protocol spec which says:

Convert everything else to %hh where hh is the hex representation of the character's ASCII value.

Aside from noting that the above snippet in the docs doesn't close the <LI> tag, and is thus actually in violation of spec itself, the above is the encoding convention used for posting form data in such a way that it doesn't get mangled in the transfer. Which is fine; it means your non-standard character has the same numerical value on the server as it does on the client. It doesn't, however, alter the fact that it is wrong to put such characters in a HTML page. The documentation could perhaps be clearer on this point, I guess.

Flat | Top-Level Comments Only

public service announcement

no subject

no subject

no subject

no subject