waider: (Default)
waider ([personal profile] waider) wrote2002-04-15 09:50 pm
Entry tags:

public service announcement

You might want to check that your client for this and other "weblogs" converts accented characters (such as é) to HTML entities. Because if it doesn't, then (a) you're violating spec and (b) you're counting on everyone reading your page in, one presumes, iso-latin-1 AKA iso-8859-1. Me, I browse in UTF-8, so I see question marks everywhere you've got an unconverted entity.

[identity profile] pobig.livejournal.com 2002-04-16 06:46 am (UTC)(link)
This makes sense, but there isn't a hope in hades that it will ever be adopted in Japanese web pages, where I have the most such headaches. As far as I can tell they assume that if you know the encoding for one page, you'll assume that encoding for any other page in that hierarchy otherwise not indicated.
ext_181967: (Default)

[identity profile] waider.livejournal.com 2002-04-16 01:14 pm (UTC)(link)
Actually, I don't really know what the right thing to do for Japanese and other such non-ISO pages are, since the HTML entities list seems only to cover the various ISO widgets.

Blithely misusing the term/word/thing "ISO", I know. Japanese, after all, is iso-2022-jp.

My initial thought would be to use a charset header to tell the browser/user what charset is in use, but they're frowned upon by the W3C: Character Set considered harmful.

[identity profile] ikkyu2.livejournal.com 2002-04-16 09:22 am (UTC)(link)
i emailed the phoenix (mac os 9 client) developer about it. he replied:
Phoenix does automatically transmute an é to "%E9" which is the Unix Ascii
code for an Acute E. Thats what the livejournal documentation suggests I do.

I have no clue if that helps at all.
~Chris


So sounds like your quarrel is with the livejournal dev team for recommending Unicode over entities, rather than with the individual client developers.
ext_181967: (Default)

[identity profile] waider.livejournal.com 2002-04-16 01:21 pm (UTC)(link)
This would be the part in the protocol spec which says:



  • Convert everything else to %hh where hh is the hex representation of the character's ASCII value.



Aside from noting that the above snippet in the docs doesn't close the <LI> tag, and is thus actually in violation of spec itself, the above is the encoding convention used for posting form data in such a way that it doesn't get mangled in the transfer. Which is fine; it means your non-standard character has the same numerical value on the server as it does on the client. It doesn't, however, alter the fact that it is wrong to put such characters in a HTML page. The documentation could perhaps be clearer on this point, I guess.