waider: (Default)
waider ([personal profile] waider) wrote2002-04-15 09:50 pm
Entry tags:

public service announcement

You might want to check that your client for this and other "weblogs" converts accented characters (such as é) to HTML entities. Because if it doesn't, then (a) you're violating spec and (b) you're counting on everyone reading your page in, one presumes, iso-latin-1 AKA iso-8859-1. Me, I browse in UTF-8, so I see question marks everywhere you've got an unconverted entity.

[identity profile] pobig.livejournal.com 2002-04-16 06:46 am (UTC)(link)
This makes sense, but there isn't a hope in hades that it will ever be adopted in Japanese web pages, where I have the most such headaches. As far as I can tell they assume that if you know the encoding for one page, you'll assume that encoding for any other page in that hierarchy otherwise not indicated.
ext_181967: (Default)

[identity profile] waider.livejournal.com 2002-04-16 01:14 pm (UTC)(link)
Actually, I don't really know what the right thing to do for Japanese and other such non-ISO pages are, since the HTML entities list seems only to cover the various ISO widgets.

Blithely misusing the term/word/thing "ISO", I know. Japanese, after all, is iso-2022-jp.

My initial thought would be to use a charset header to tell the browser/user what charset is in use, but they're frowned upon by the W3C: Character Set considered harmful.