waider: (Default)
waider ([personal profile] waider) wrote2004-04-30 12:22 pm

RSS WILL DOOM US ALL

Death of Internet Predicted, Film at 11. Couple of hints: 1. don't provide an RSS feed. 2. Don't advertise your RSS feed beyond people you're actually interested in having read it. 3. Provide an RSS feed with correct syndication headers, so that RSS readers know only to hit your feed once a week or whatever - sure, not everyone respects this, but it's at least worth doing.

[identity profile] wisn.livejournal.com 2004-04-30 06:19 am (UTC)(link)
Well, sure. And to elaborate on point one, save on hosting fees by taking your website offline.

The proposed solutions to the RSS volume problem in the article sound more like a throwback to Usenet than those involved would have the professional pride to admit.

It reads to me like the healthiest approach to the problems in RSS is not to offload the burden to centralized services - this is the internet, i think - but to stop treating RSS as a sideline/ideology (cf Dave Winer) and begin treating it as a platform or protocol with a standards committee, and formalize how to deal with clients that misbehave. Or write better servers.

I thought one of the goals of RSS was to reduce the transmission volume of a blog by piping out only relevant changes when appropriate, and reduce the bulk data transfers that occur every time somebody loads the site again on the off chance that a couple kb of text has been added.

[identity profile] denshi.livejournal.com 2004-04-30 08:17 am (UTC)(link)
...every time somebody loads the site again on the off chance that a couple kb of text has been added.

HTTP already has a last-modified field. Well behaving clients should return their time of last access and well behaving servers should respond with HTTP 304 when the page is unchanged since that time.

No bulk data transfers. 7 year old spec. Solved problem.
ext_181967: (Default)

[identity profile] waider.livejournal.com 2004-04-30 08:22 am (UTC)(link)
As I was discussing with [livejournal.com profile] wisn elsewhere, the problem is that it's a non-enforceable protocol, in much the same way as robots.txt is a non-enforceable way of telling spiders how to treat your site. Sure, the bulk of the spidering code might do it right, but that doesn't mean that everyone will. And it only takes one badly-written but widespread client to cause the sort of havok that the guy in the original article is talking about - i.e. what if <conspiracy>Microsoft accidentally deployed a crap version of a RSS client in Outlook which coincidentally respected some proprietary header handed out by IIS?<conspiracy>

That aside, the syndication tags are a step up from Last-Modified, although they kinda replicate the functionality of Expires.

[identity profile] denshi.livejournal.com 2004-04-30 08:34 am (UTC)(link)
Quite right. OTOH, servers could quite easily track which clients are broken and only serve them one page a day, or whatever, depending on how frequently the site is updated.
ext_8707: Taken in front of Carnegie Hall (quiet)

[identity profile] ronebofh.livejournal.com 2004-04-30 03:05 pm (UTC)(link)
This has already happened with sites that suck a private hierarchy, then repropagate it to Usenet-at-large. Or misconfigured suck sites that end up reposting their spool with new message IDs.

Solution: guns. Lots of guns.
ext_181967: (Default)

[identity profile] waider.livejournal.com 2004-04-30 03:15 pm (UTC)(link)
Tell you what, I'll go have a beer and see if it helps. I'll have one for you, too, just in case. Guinness okay for ya?

[identity profile] mskala.livejournal.com 2004-04-30 09:38 am (UTC)(link)
There was an IP that was hitting my Supreme Court RSS feed (an especially fat one because it contains a paragraph of legalese for every entry) twice a minute, 24/7. That feed updates once a day at the absolute most because that's when the script that generates it runs, and seldom that frequently in practice, usually more like once a week, because that's when the Web site it's watching updates. So I stuck the offending IP in my .htaccess with instructions to return 403 forbidden.

Several months later I got an email from the admin of that IP asking about it; I explained the reason for the block, he said he'd make his script behave, and so I removed the block.

IMHO, this is the way such things ought to work. I'm hesitant to try to make official Rules for dealing with such situations, and since the point of providing RSS feeds is to increase readership of the stuff I'm publishing, it would seem foreign to me to cry too many tears if in fact the readership does increase. If the traffic got to be more than I wanted to pay for, I'd start selling ad space.