Recently in Syndication Category

In Atom, doing so...

Commenting on the application of "desperate heuristics" to make sense of feeds like RSS, Phil Ringnalda adds:

In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.

Brilliant! Well put! The original post about some Atom tests he ran and additional comments are here.

Nick Bradbury notes the that the "full vs. partial feed content debate has risen from the ashes again" in this post. My comment turned into a post of sorts that I thought I'd repost a lightly edited version here.

I'm with Nick. I'm rather tired of this whole argument really. Some of the full feed arguments I've read seem to want to replace going to web pages and viewing the HTML entirely. That seems a bit off and extreme to me. I don't think this is a black and white issue anyways. For some sites it makes sense (read: has value) to have full feeds and some it does not.

Still I can see the need in both and agree it is a user interface issue more then anything.

I'm a proponent of Atom because you can include both in one feed in a clear and concise manner. Atom also provides the aggregator better meta data for interfaces that work consistently. My expectation in subscribing to Atom feeds is that the aggregator won't have to make guesses and make me suffer from issues like these highlighted in this post over on Signal vs. Noise. This entry illustrates an interface issue that is created by the ambiguities found in RSS feeds.

Just my view as a user and a developer.

During the syndication wars that eventually led to the Atom format emergence many argued that users should care about formats. That is true, but they are not entirely divorced from each other. If you use a crappy format making a good user interface becomes difficult if not impossible to achieve. This is why the suggestion of using OPML for attention data is concerning to me. I don't need to pound anymore nails into the floor with my forehead.

Syndicated Book Reading.

Tim O'Reilly notes a remix of Cory Doctrow's Someone Comes to Town, Someone Leaves Town using a syndication feed where, once subscribed, you get a couple pages every day. No matter when you subscribe to it, it sends you the book starting from the beginning.

This is a very clever example of syndication potential to do more then broadcast news headlines. Bravo!

Is a Crappy Format Worth Saving?

Nick Bradbury writes:

Bottom line: the imprecise RSS specification resulted in a lot of guess work, which complicated things for developers, end users and feed producers. The solution? We clarified the RSS spec.

The solution? We clarified the RSS spec. While problems with entity-encoded HTML haven't disappeared completely, in my experience they're far less common than they used to be (and when they do occur, we now have examples to point to).

And that's all that's needed here, too. Clarify the OPML spec, and we can skip another prolonged format battle.

Clearly RSS feeds have gotten much better, but I think the Feed Validator had a much greater impact on that clean-up then the clarification Nick sites. Don't forget RSS 1.0 and 0.9x feeds were equally as busted as the 2.0 version he points to.

Having had my fair share of frustration with both RSS and OPML and OPML's "spec" is far more ridiculous then RSS ever was and that is saying something. I have to wonder -- is it worth saving OPML? I'm not so sure. By default OPML is used as an import/export format by aggregators -- there was nothing else proposed and it just spread as the market for tools exploded. This isn't too dissimilar to how RSS grew. The difference here though is that OPML's use is quite limited in comparison to its intended scope. OPML was specified to be a general purpose outline format however it is only really used for representing blog rolls and it does that poorly. Without a real specification of the values, attributes or even the attributes case there are many variants of the OPML blogroll format in the wild. So is it "really" fixable if it were specified? Not without a lot of breakage really and you'd still need to be ready for all the crazy variants from the void left by the lack of a good specification.

I guess what I'm saying is it doesn't really matter whether it gets better specified or not. Best of luck who are taking it on. My view is that the toothpaste is out of the tube and OPML as a blog roll format will only be a bit player a best.

I'm much more interested and intriuged by XHTML outlines -- the microformat better known as XOXO. I've used it before in a few instances and its worked very well. (X)HTML's system of providing outlines has been around longer, is better specified and more widely supported in the grand scheme of Internet software that I have to wonder what does OPML provide that makes it better? What does specifying and developing another format buy us? I can't think of anything. The current universal support of import/export by aggregators is nothing to sneer at, but how hard would it be to convert current blog rolls to XOXO? Trivial. OPML blog rolls don't contain any more information per entry then the XHTML link tag. Which require more effort: fixing the spec and then all the OPML blog rolls or just converting to XOXO? I think its a tie. Which would provides the better footing going forward? I think clearly XHTML because you can do more now and its already here, well specified and well supported.

It's understandable why OPML is supported in aggregators and that that should continue to be exploited, however I just don't see furthering OPML for blog rolls. Let sleeping dogs lie. It's served it purpose. Lets not get trapped in past foibles and move on to new and better things.

UPDATE: Sam Ruby who I believe is the most patient and persistent person you'll ever find in technology has entered into the OPML conversation. The Feed Validator that he was instrumental in driving has OPML validation with a call for more tests. This I believe supports my assertion that his validator was more instrumental in cleaning up bad RSS then clarify the specification as Nick Bradbury wrote. So perhaps there is a bit more hope of a clean up then I had before.

tima@OSCON

I'm on my way out to the O'Reilly Open Source Convention. Ben Hammersley and I will be presenting 45 syndication hacks in 45 minutes. I have the utmost faith in Ben to pull off something so mad and only hope I can keep up. We got the time slot I alwys get – last day, last session. Oh, well. It still should be fun.

The Atom End-Game.

Last week Tim Bray writes on his weblog I recently proposed to the IETF Atom Working Group that we might be nearly finished. Some people think that’s a mistake because, as they point out, Atom doesn’t have much more in the way of features than RSS. He goes on to explain why he disagrees and relates the Atom work to his work on XML itself.

He concludes with an excellent observation that all standards groups should heed.

The worst thing the Atom WG could possibly do would be to spend another year or two trying to invent wonderful new syndication goodies. What on earth would give us the idea that we’re smart enough to predict what features the world is going to want? Our job is to write down what we already know works, to do it as cleanly and clearly as possible in as few pages as possible, then get out of the way.

You don’t think this can change the world? Just watch.

Atom is not RSS.

I've made it out to the west coast to attend to O'Reilly Mac OS X DevCon (quite happy to be here and to present) and listening to the first keynote by Chris Bourdon on the new features of Tiger, Apple's next version of Mac OS X. I like what I'm seeing a lot, but one slide bothered me as being a bit uncharacteristically off. It said something like RSS support – RSS 0.9x, RSS1, RSS2 and Atom. I'm glad to see this support of course, but Atom is not RSS. Rather then pick one over the other, it seems to me the term syndication would be more accurate and encompassing while being a bit less geeky for normal folk.

It's the API silly!

Phil Windley posted that there has been some interesting questions and discussions on his forum highlighting a readers post about RSS vs. Atom.

I thought it was worth reposting the reply I made on the forum here since I've been too occupied to post much else:

One thing that is always missing or overlooked in the discussion/debate/furor of RSS vs. Atom is that Atom provides a unified feed format and API – RSS, in any form, does not. The real value is when you need a publishing API and feed in your system. (The more powerful case for the Atom effort is the Blogger API vs. MetaWeblog API vs. Atom.) If all you need is a one-way syndication format then one of the RSS formats will suffice. That is why Google, SixApart and more recently Nokia are signing on to Atom.

As someone developing software for these syndication formats, I believe the eventual benefit of the Atom feed to the average Joe/Jane user will be that their aggregator software will provide a more reliable and consistent experience. Because of the extremely loose specs, (many) multiple versions, and the large number of optional and overlapping tags with similar meanings, it requires a lot of work, independent research and trial-and-error to reliably present any and all feeds to the average user. I've also found it requires on-going tweaking as new patterns emerge. (I wince when someone refers to RSS as simple. I realize that its the part of me that is a developer doing the wincing though.) Given the effort and care going into the Atom feed format and its pending submission to the IETF as a formal standard, I'm fairly optimistic that Atom will be an improvement in this regard.

I too will use both and let users decide.

The Automatic Discovery indeX (ADX).

While I was out on my latest blogging hiatus, James Snell picks up on my response to earlier discussion by Jeremy Zawodny and Diego Doval for developing a means of more robust RSS auto-discovery. He writes:

As much as I like WSIL, it's pretty much officially dead. There is no further work going into it at all. So while I like where Tim is going with this, I think an alternative approach needs to be developed.

He goes on to layout an example of a new alternate WSIL-like format he calls The Automatic Discovery indeX (ADX).

I agree with James and had suspected as much about WSIL. Given James employer is one of the authors of that spec I'll take it as fact.

I also like where he's gone with ADX. Like WSIL, it's more RESTful then UDDI (which just needs to die) and relatively simple and versatility enough to integrate numerous format pointers into one mechanism – SOAP, RSS, OPML, Atom and so on. This could also be used as a more robust and eventually a better formed and documented blogroll format.

While its a good start, ADX as James has detailed it I think it could use some refinement – mostly what I think are nits.

  • Keep the tags all lowercase. Most formats do it that way so I see nothing gained be switching to proper case tags names.
  • I like the reuse of existing RSS modules such as Dublin Core however their use seems inconsistent. For instance James uses dc:title, but does not namespace Description. Name is also not namespaced and is about the equivalent of dc:title. I realize that James was transcribing my WSIL examples, but since this is a new format we might as well clean that up. Perhaps if Dublin Core is going (and should) play such an important roll those elements should just be folded into the syntax of this format?
  • In the spirit of Dublin Core, I'd reuse this element sets naming conventions as much as possible. Service.EndPoint becomes source or perhaps the RSS standard link and so on.
  • IndexRef should be expanded to allow for additional meta data to be associated with a reference to another index. For instance, what type of index is on the other side of this link? Another ADX? Or perhaps a UDDI directory? Or perhaps a OPML file. This is also an important allowance for blogroll use.
  • For argument sake, I'd like to see an example of a WSDL file and a UDDI pointer.
  • Having a schema is good, but I think should be optional in an ADX document.
  • I'm really hesitant of the DNS Service Discovery method because most users do not have the knowledge or access to implement such a thing.

Mostly nits. So here is my riff on James' original ADX proposal where I incorporate my feedback into an example:

 <?xml version="1.0" encoding="UTF-8"?>
 <index 
   xmlns:dcterms="http://purl.org/dc/terms/" 
   xmlns="urn:temporary:uri">

  <title>News4Humans feedOnFeeds</title>
  <dcterms:modified>2003-09-12T23:45:37-00:00</dcterms:modified> 
  <source>http://news4humans.com/index.adx</source>
  <description>All the news preferred by highly evolved primates.</description>
  <language>en-us</language>
  <creator>newsfor@news4humans.com</creator>

  <service>
  <name>Latest News</name>
  <description>A syndication feed of the 15 most recent news posts.</description>
  <source>http://news4humans.com/feeds/index.xml</source>
  <format>http://purl.org/rss/2.0/</format>
  <dc:modified>2003-09-12T23:35:52-00:00</dcterms:modified>
  </service>

  <service>
  <name>Google Search</name>
  <description>A SOAP interface to the Google search engine.</description>
  <format>http://schemas.xmlsoap.org/wsdl/</format> 
  <source>http://api.google.com/GoogleSearch.wsdl</source> 
  </service>

  <!-- This was IndexRef –>
  <link>
  <title>News4Humans Technology News Feeds</title>
  <description>All the news preferred by highly evolved primates.</description>
  <source>http://news4humans.com/tech.adx</source>
  <format>urn:temporary:uri</format>
  <creator>News4Humans</creator>
  </link>

 </index>

Thoughts?

An Example of RSS Auto-Discovery in WSIL.

Summarizing the discussion on a more advanced RSS auto-discovery format that was recently started by Jeremy Zawodny, Diego Doval writes:

if Tima or someone else would have a bit of time to re-write my mock-up structure using WSIL, it would be most welcome!

Done. Here is a quick mockup of both approaches Diego used to representing hierarchical content in RSS.

The first is a single file example were I use the dc:subject element to define the category in which a client could group feed pointers.

http://www.timaoutloud.org/files/diego/index.wsil

I think this example is pretty self-explanatory. service is the equivalent of RSS's item.

The second example I constructed uses multiple files and WSIL's ability to point to feeds and other WSIL files.

http://www.timaoutloud.org/files/diego/index2.wsil
http://www.timaoutloud.org/files/diego/tech.wsil
http://www.timaoutloud.org/files/diego/world.wsil
http://www.timaoutloud.org/files/diego/various.wsil

index2.wsil contains links to the other files (I used the fictitious news4humans domain in the URLs so you'll have to do the mappings.)

I think the second option is the way to go because it scales for sites like Yahoo though I don't have a problem with looking at supporting both. I added a latest news feeds to index2.wsil just to demonstrate that services and links to other WSIL.

There are a few caveats to what I did here.

  • Took a few liberties with the WSIL 1.0 spec, but are completely legal XML. Mainly I used RSS modules to bring in additional meta data where ever needed instead of making up my own tags.
  • RSS 9x and 2.0 doesn't have am official namespace which continues to be an unfortunate and continuing design flaw. I made up one for the example – http://purl.org/rss/2.0/.
  • dc:anguage should probably be on a per service basis, but for the sake of replicating Diego's example I left it were it was.
  • service.dc:date should probably be dcterms:modified
  • Added an abstract, error reports to the examples.
  • I could have very easily added pointers to web services via WSDL files or UDDI directories. I could have very easily have added Atom feeds or archives or OPML files for that matter.

I'm pretty convinced that WSIL is along the lines of what would be optimal in creating one scalable format that can be inclusive to handle many formats in addition to web services. This said, I think WSIL in its current form has much to be desired. For instance, I think supporting extensiblity through namespaces is the way to go, but there are probably too many elements with namespaces in my mockups. Some of the tags names area a bit off and could be better. Let me go out on a limb here – I'm also not sure the RDF syntax is really much value in this non-RDF format. Could use to factor those out. (I'm sure the semantic web mob will be on me for that.)

RSS Auto-Discovery 2.0 and WSIL.

Yahoo! techie Jeremy Zawodny has posted his thoughts and suggestions on RSS auto discovery stating It has occurred to me that there's some non-existent infrastructure that we (whoever 'we' really is) need to build if RSS is going to really, really take off the way it should. He has also started threads on the syndication and aggregators mailing lists. Diego Doval contributes with some experiments using OPML and RSS.

I think whatever becomes of this directory format be inclusive of many of the formats that feeds exist in RSS x.9x, 1.0, 2.0, Echo in addition to Web services.

From my own experience and others the OPML/blogroll formats are all over the place. (See Meg Hourihan recent post on the subject for more.) There really was never a specification to my knowledge. Being frozen and not supporting proper (read: determinable) extensibility, I'm not in favor of pursuing anything based on OPML or existing blogroll formats.

Joe Gregorio posted a suggestion to use link tags which seems fine, but it doesn't seem scalable. Won't I have to repeat all of that information in every one of my HTML pages? Won't that be a major pain for some like Yahoo that could have hundreds of these? Perhaps I'm reading it incorrectly.

While everyone is throwing out ideas I figured I may as well unearth this post for some additional food for thought: WSIL meets RSS.

WSIL is Web Services Inspection Language. It seems right in line with what Jeremy and others are talking about. WSIL may not exactly be the ticket since its an under utilized and seemingly abandoned specification – then again wasn't that the case for RSS?

Here is a topline of the case I made then:

  • I have asserted that RSS syndication feeds are Web services and perhaps the most widely deployed Web services across the Internet.
  • In many ways, WSIL is like RSS for Web services. RSS is a file format with pointers to published content that can be syndicated and aggregated. WSIL is a file format with references to published Web services that can be discovered and bound.
  • I find WSIL intriguing because of its simplicity and lightweight implementation is more RESTful then UDDI. WSIL leaves the processing logic to the developer and makes its information trivial to access creating the potential for innovative and novel applications arise.

Back when I wrote the afore mentioned post I created a few quick and rough samples to get my point across. Here are the links:

http://www.timaoutloud.org/index.wsil (Extended)
http://www.timaoutloud.org/index2.wsil (Traditional)
http://www.timaoutloud.org/index-rsd.wsil (RSD-like)

One of WSIL's nifty traits is that it can point to other WSIL files which can be helpful with large and distributed sites. For example using Troy Hakala's case Yahoo! could have a main WSIL file, www.yahoo.com/index.wsil, that cover their primary feeds and points to other sections WSIL files like sports.yahoo.com/index.wsil and finance.yahoo.com/index.wsil and so on. Another benefit, if Yahoo! would introduce some Web service interfaces, they too could be included in these files for applications to auto discover.

Just thinking outloud.

Today Meg Hourihan writes:

I usually keep quiet during the myriad technology debates that flood certain web circles, preferring to just do my coding and building of things. So when I do dig into some technology or other – often way after all the geeks have argued and hashed to death some obscure techie implementation tidbit – I'm shocked to discover just how messed up it is. This week's struggle lies with OPML.

Keeping keep quiet during the myriad technology debates is probably very wise and more productive thing to do, but I'm not smart enough to listen.

She continues…

Unfortunately OPML has a DTD that says you can extend OPML anyway you want (which is crazy talk to me, a DTD you can change? What's the point?), meaning you can add more elements, or more attributes to your elements. So when someone (me) tries to implement something with various OPML outputs, you (again me) realize that one tool outputs an attribute url while another outputs htmlUrl and a third htmlurl – all to signify the same thing! Sure, some RegEx can clean this up, but weren't we trying to avoid all that with XML in the first place? Argh! I just want to be able to develop something and have a strong contact defined. Is that too much to ask? No extends XYZ, no I changed this just this is how you express X and that's it. Maybe if the format you're using requires you to change it to represent your data, you're not using the right format in the first place.

Reading Megnut's post pains me as I've experienced exactly what she writes when it comes to OPML. One of my first MovableType plugin attempts was to develop the ability to display OPML blogroll data in an MT layout. Astounded by the lack of a spec or any clear guidance and beguiled by the differing implementations I gave up and never finished it. (I took up the RSS plugin for MT which was and still equally frustrating, but is far more useful and important in concept then OPML or blogrolls.) One Internet standards pundit once refers to it as the ghetto of XML. I have to agree. It's pretty absurd and insane. It is also unnecessarily difficult to develop software for.

It's times like these though, that I feel a bit vindicated when an intelligent and normal person like Meg comes to experience, first hand, something that agitates and bedevils me on a regular basis as a career developer and made me quite very outspoken about. Developers exists for users and not the other way around. Absolutely agree. Technical implementation should be transparent and irrevelant to the user. Absolutely agreed again. What is lost in these idealistic view, is the indirect effect a poorly designed and documented specifications has on the user experience.

(There are a great many shared issues between OPML/blogrolls and RSS 0.9x/2.0. Is it any coincidence that the same genius is behind both?)

As a developer I only have so much time and energy. If I am forced to write and test extensive amount of bozo code to protect users from such messiness spilling over into their experience that means less time for me to implement new features. There is also good chance new and unaccounted for bozo will come along that could make the software that was once working as it should becoming unreliable or simple break.

Here is a small example of what I mean. The mt-rssfeed plugin I wrote and released last year before RSS 2.0. It worked as advertised until the new and unneighborly backwards compatibility hostile <guid isPermalink> was introduced. Thanks to unfunkification of RSS feeds like Mark Pilgrim's and Jason Kottke's my plugin doesn't know how to find the link to insert into MT layouts. (Click here for more of the funky RSS stupidity.)

I admit that my code could have been better and I need to fix it, but I don't have time for what is not a quick fix unfortunately. It doesn't help that such careless and poor design decisions of a specification is behind this breakage.

What's more ridiculous is that these specifications have been declared frozen and unchangeable. We now have to hobble along or suffer through the reinvention of the wheel.

Meg concludes:

Which makes me realize that I think some of the problems we've had in the weblog community around formats like RSS and OPML might stem from the fact that we use them in manners for which they weren't designed.

I don't know if I entirely agree with this assessment. If a specific is designed properly for extensibility – a simple core and the ability to extend it with namespaced modules – it can grow and adapt in a sane and reliable manner. These specifications were not and we see where that has gotten us. The problem is only compounded by a specification being frozen to address new common uses and shortcomings in its predecessors. Unlike meat, freezing a specification doesn't mean it will resist rot and decay.

We need to come up with better ideas and formats that work and can handle change and adaption. That is my rant for another day though.

Sam Ruby

I've taken a 2003/07/01 snapshot of the maximal example of the format previously known as echo.  The reason for this exercise is that I plan to start prototyping using this as a baseline.  I invite others to do likewise.  Let me know via comments of any implementations.  In particular, I am interested in templates that others can use.

Fair enough. This all seems like a rather silly reinvention of the wheel though as it has not incorporated sound elements of prior art. I've created three prototype MT templates for producing the conceptual Necho model in funky RSS 2.0, Dublin Core Metadata Element Set and Dublin Core with optional RDF elements.

[UPDATE: Werner Vogels wrote to let me know the links to my MT templates where incorrect. DRAT. This only proves that RSS/Necho is making me loose my marbles and I have to find a form of torture… I mean hobby.]

Necho in Funky RSS 2.0
This is an adaptation of my previous XSS and RSS Profiles within the loose confines of the conceptual model. MovableType template here.

Necho using the Dublin Core Metadata Elements Set
This is an example of Necho as I had proposed last week utilizing the naming and semantics of the standard for cross-domain information resource of Dublin Core. MovableType template here.

Necho using Dublin Core with optional and minimal RDF.
This is an adaption of another experimental proposal I put forward mostly based on the work and wisdom of Shelley Powers and Sean Palmer. This example is the previous example, but it uses some of the latest RDF/XML serialization in order to drastically reduce the RDF tax that was present in RSS 1.0. I propose these elements could be optional with a little added consideration to the core. MovableType template here.

Bray: Stamp Out Creativity Now

Tim Bray writes:

I am worried about the next-gen syndication process rooted in Sam’s Wiki is in danger of going seriously off the rails, because some of the participants have got the loony idea that it’s about trying to invent new technology or improve RSS.

What the Echo-that-was project should be about picking the stuff that’s already been proven to work and be interoperable, and writing it down in a clean, clear way, and arranging for the specification to be clearly out of the clutches of any vendor.

He then goes on to conclude:

In the Wiki, people are madly flinging proposals for radical new capabilities against the wall, like content-by-reference, and multiple-URIs-per-author and so on and so on ad nauseum and I’ll translate that Latin for free, it means to the puking point. Please stop.

Well put Tim. I have to agree. I've been a bit put off by some of the discussion and direction the Wiki has taken recently also. As I noted earlier when I proposed we consider the use of the Dublin Core that there seems to be a unhealthy desire to reinvent the wheel instead of working from the stuff that we have and building from there.

It not that the prior art is completely broken and worthless, as much as they are stuck in neutral – meaning not going forward – without clarification and some neutral less politically charged ground. So far that's been achieved. Let's

And for petesake can we settle on a name and quick wasting cycles on this!

(In case you missed, it was learned that Echo is the name of a Java application framework. Sam asked that projects leads if they cared and they said yes, they do. It's back to the drawing board now.)

UPDATE: Sam Ruby weighs in and has started an interesting thread on Bray's post with dicussion of what should and shouldn't be included in the core.

Echo and the Dublin Core.

The Echo wiki has been nothing short of amazing and the result good so far, but I'm not terribly thrilled with the Echo Example syntax proposals that have been posted recently. They seem a bit unfocused. I think what's that consensus is missing on the principle the syntax will be built on and would provide better clarity and focus to the discussion if established. The proposals are up though and I can deal with that – even if following it makes my head hurt.

My concern is that the primary proposal seem like a reinvention of the wheel in many ways. Conceptually the elements of an entry can be expressed in RSS. The elements also have near perfect alignment with the those in the Dublin Core Metadata Element Set. (Which is good in that it validates the analysis the group thus far.) Granted expressing Echo in Dubline Core doesn't mean we're all done, but I don't know why we would leverage this type of prior art more extensively. Dublin Core is in use by a lot of other systems and formats then RSS. I find it puzzling that it has been passed over or abandoned so quickly.

Wikis are not for the meek so I've posted my rough examples there. I also thought I post a copy here also. (Who knows what may happen to it once the wiki way gets to it). Feedback is welcome. Just use the wiki.

UPDATE: Ken McLeod points out that there is a complimentary DublinCore page on the wiki that provides background information and links on its use in Echo. (Let the Refactoring begin!)

Originally posted to EchoExampleInDublinCore

The propsal contains some examples and discussion to how Echo could leverage this Dublin Core Metadata Element Set in its syntax.

NOTE: This proposal would fold the Dublin Core semantics/labels into the core Echo namespace not create a seperate modulized namesapce. The use of dc: in the text below is meant to clarify what is coming from Dublin Core and what is not.

Dublin Core elements are already in common used within RSS feeds to supplement the core item elements of title, description and link. The Dublin Core has corresponding elements for rss:title and rss:description, but a specific tag for (perma)link is not explictedly defined. The dc:source tag can reasonable be considered the links counterpart in the Dublin Core. So with the Dublin Core you can assemble what almost looks like an RSS feed item.

Looking at the Echo ConceptualModel, all of the required elements and many of the highly-recommended optional elements have corresponding elements to the Dublin Core. Further clarifications and restraints are likely needed in the context of Echo's use For example, accordingly to the Dublin Core documentation, dc:source is said to be A Reference to a resource from which the present resource is derived. The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system. In Echo the formal identification system would be a URL/permalink.

Another example is the dc:creator and dc:publisher elements. Currently these elements are just strings. Echo may further defined their format and content, optionally allowing for additional meta rich extensions such as FOAF to be substituted.

Echo would also benefit from defining maximum lengths and element optionality.

PROS

 * Leverages prior art
 * Leverages an international standard
 * Is not a radical departure from RSS today

CONS
Tag naming my not always be ideal
Additional clarification and restraints are needed
* Elements may not be as meta rich as preferred

These examples illustrate what such an approach for Echo may look like. It assumes that the Dublin Core namespace (http://purl.org/dc/elements/1.1/) is part of the default namespace. These elements are wrapped in a container tag of entry. Content is embedded using the root or container tag of the native format (assuming it can be expressed in well formed XML). Alternatively a content:encoded with CDATA encoding could be used to embed non-well formed textual content. Binary sources should not be embedded, but reference via a dc:related link.

Core ConceptualModel Entry


<entry>
<source>http://www.example.org/archives/000000.html</source>
<creator>Paul Harrison (http://www.example.org)</creator> 
<date>2003-06-25T10:42:00-04:00</date>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>A do try alone, my your you with get on friends.  a my my, out from 
get. And i i your you're my do high. Think not from it you lend going 
a friends sang you. Be, lend away love little little. I because, to i
ears, end, from do tune, alone a your help. Friends, i out, out get 
little, on if little of with my. Of of do a to key i get a my no sang 
is. Think <a href="http://www.foo.com/">help with what you sang</a> 
help a by i a how what get tune does because friends. Do do help tune, 
sad are when my from, feel own, sing a, you me the what get friends a.</p> 
</body>
</entry>

Extended ConceptualModel Entry


<entry>
<title>With a Little Help From My Friends</title>
<description>You a, you're get. Do my with. What with how think, sad 
on would  how you try own a if by help and a i sang.</description>
<source>http://www.example.org/archives/000000.html</source>
<creator>Paul Harrison (http://www.example.org)</creator> 
<date>2003-06-25T10:42:00-04:00</date>
<related>http://www.bar.net/000000.html</related>
<related>http://www.baz.org/hello.html</related>
<body xmlns="http://www.w3.org/1999/xhtml">
<subject>hello world</subject>
<identifier>1056595208</identifier>
<rights>Copyright 2003 Paul Harrison</rights>
<p>A do try alone, my your you with get on friends.  a my my, out from 
get. And i i your you're my do high. Think not from it you lend going 
a friends sang you. Be, lend away love little little. I because, to i
ears, end, from do tune, alone a your help. Friends, i out, out get 
little, on if little of with my. Of of do a to key i get a my no sang 
is. Think <a href="http://www.foo.com/">help with what you sang</a> 
help a by i a how what get tune does because friends. Do do help tune, 
sad are when my from, feel own, sing a, you me the what get friends a.</p> 
</body>

Sam's wiki discussion continues full throttle. We have a name – it's Echo. Numerous people are in support of the roadmap. SixApart, Blogger and LiveJournal have all said they support this effort. Even logos are appearing. Excellent.

Trying to follow it is a full time job which has my head hurting and too spent to for words or any real interesting analysis. Shelley Powers makes an excellent post to summarize the action so far:

One only has to look at the change log to see the number of edits to realize that this is not an evironment for the cautious, the tame, or the wiki-challenged (or for those who want to sleep or eat, either). I'm not necessarily cautious or tame, but I do raise my hand for being wiki-challenged. Still, there are points that are solidifying out at the wiki, and I thought to duplicate these here in a format that, if nothing else, will help me understand what it's all about.

Well put Shelley and thank you. My head hurts a little bit less after reading your well done summary. The discussion of what is known as Echo has become been engrossing to the point of being detrimental to any other work and what's left of my personality.

Our favorite deity Clay Shirky made a brief appearance on the Echo wiki today and wrote:

Right now, the conversation looks muddled, because a lot of questions that were asked and answered in the development of RSS itself (it should be 7 bit; it should be represented in XML; _required_ metadata should be kept at a minimum; it should not try to be an input to the Semantic Web) are coming up again, to no good effect, imo.

In his reply Sam Ruby makes an interesting reply that intruiges me a lot more then the rest of the syntax discussions: One possible use of this analysis would be to produce a proper usage profile of RSS.

The conversation and collaboration on the Echo wiki has been nothing short of amazing and the result good so far, but I am curious to whether this loose fast paced collaboration can produce something practical.

In a comment on his weblog, Sam writes the pace does seem dizzying at the moment. I would like to see the result be something that we would feel comfortable living with for quite some time.

Agreed. We shall see where this sometimes wild ride takes us.

Phase 2 Roadmap.

Sam Ruby writes:

A week ago I quietly introduced a wiki to discuss the anatomy of a well formed log entry. It got a lot of interest. And a week later, it is still being actively developed.

Wow. It seems that I am not the only one desiring a bit of forward motion in this area.

Amen Sam. There has been some great progress towards consensus and clarity on the matter which is exciting. As I mentioned in my last post discussion has been generally conceptual and that the discussion needs to move forward to the next more concrete phase. And move forward it will! A roadmap has been post here. I think this is the logical way to go next and continue this great effort.

Update: I've posted a similar post here on my O'Reilly weblog.

As the discussion on Sam Ruby's Well Formed Log Entry wiki has progressed a certain amount of consensus and clarity is forming. Here is my summary of the key requirements being discussed.

Post date. There are 3 types of timestamps used in weblog systems today. Created-on, Last-Modified and Publication timestamps. Publication timestamp (or post date) is the most important and thereby required element of a well formed log entry. It allows an author to control the ordering of log posts which are ordered chronologically by nature. The publication date is similar in nature to the data on a newspaper or magazine where fo r editorial reasons the content is commonly published before the date on the cover. Created-On and Last Modified timestamps are optional and part of an optional extension module called Authoring which is ties into versioning. The created-on and last-modified dates of the actually representation (the HTML produced when a log entry and template are merged) are outside of the scope of the discussion though they may be used in substitution if a post-create and post-last-modified data are not tracked by the system in use.

Author. One and only one author is required in a well formed post. The author can be a system such as a Wiki or CVS. While post may be developed collaboratively with multiple individuals participating only person or system is the primary author with the remainder optionally listed as contributors.

Permalink. The discussion seems to be nearly consensus though it doesn't seem to have been totally reached. Sam Ruby one of the requirements for the discussion as must be on the Web. Tim Bray wrote a nice post on this topic and versioning that I think (there seems to be some confusion to what the other Tim actually meant) nails it. He writes A log entry's primary identifier should be a URI, just because this is about the Web and if you have a URI you're on it, otherwise not. Permalinks are URIs, but URIs are not necessarily permalinks so some clarification is needed. I think that a permalink should be required and is the primary identifier of a well formed log entry. Bray does seems to indicate similar thinking when he writes In the world of weblog entries, I can't imagine why you'd use an identifier that doesn't double as a locator, so I don't think URNs are particularly relevant. URNs don't necessarily resolve to anything necessarily, so is Bray suggesting that URIs must be permalinks or should be permalinks or neither? Here is the confusion amongst the group. I agree with his conclusion that a URI (permalink?) along with version info that uniquely identifies a well formed entry. The version info is a string whose value can be determined by the author and/or system developer – modified date, digest hash or sequential number.

Content. This area still is the least defined area of the whole discussion. There are still many issues left open and unresolved. Someone better articulated my previous questions and concerns on the context of the content. The answer is it depends whether the log entry is being used in the context of an internal or external model. Here is the explanation as it read as I write this:

An internal model is used by the source or provider of an entry and contains the entire breadth and depth of the entry.

An external model is used to convey information about an entry, possibly up to including the entire breadth and depth of an entry. For sets of entries, an external model may represent sets of entries differently depending on the purpose of the set.

This distinction helps to answer questions like is the content in the entry or available at the permalink, does the permalink point to a rendered presentation of the entry (HTML, image, etc.) or does the permalink point to the entry data (see content and PermaLinks ), and should contact information for the publisher be included.

The answer could be both, depending, then depending can be further clarified.

Internal models are most often used in content-management APIs (RESTlog, CommentAPI, BloggerAPI, MetaWeblogAPI), external models are most often used in meta-data and syndication APIs (TrackBack, RSS, aggregators, portals).

Well put. (I wish I knew who wrote it. The joy of the wiki. Update: The joy of the blog. It was Ken MacLeod.) I think it goes to the heart of why content is the most vague and has the most open issues. It hard to reach consensus when it depends on the context and model in use.

I'm not much for the abstract and have a tendency to start in on concrete application when studying the theoretical. This exercise has been fascinating and sometimes quite a brain twisting learning experience for myself. Fighting my tendencies has been the hardest part. With that bias disclosed, I think its time to move the conversation forward and drill down deeper into the context/models of a well formed log entry.

In related news, Tim Bray contributes another great post on the promise and peril of RSS. All the reason why we need to do this work.

Log Entry Anatomy 3.14

Yesterday Sam Ruby opened a wiki to develop a well formed log entry.

Like Tim Bray, I'm not much for abstract discussion, but I'm willing to play along.

This has all been very interesting and undoubtably progress has been made. The development of optional modules is great. Discussion of permalinks and URIs and unique identifiers has also been excellent. Of course, as is human nature, I won't dwell on the good stuff and focus on what bugs me.

While this has been and will continue to be interesting to observer, I'm not sure I'm sold on the use of a wiki. I've become a bit confused trying to follow the evolution of the content and the discussion.

For instance when Sam originally opened the wiki he wrote that the purpose of the wiki was for describing a conceptual data model of what constitutes a well formed log entry. During the course of the day, Aaron Schwarz, exercising the wiki way, modified it to read the conceptual data model of weblog entries. From my viewpoint this change was significant. I thought Sam's use of the word log instead of weblog was deliberate to convey a broader notion of an entry beyond weblogs. The conceptual optional modules that have been added and developed add to the impression that Sam's initial minimum requirements was a core that could be built on.

This was my understanding and I proceeded to comment accordingly. Then I noticed the change in the guidelines for the wiki and became unsure my comments where appropriate on or on target – the scope had seemingly changed. To add to my confusion, the wiki had no clear and immediate way of asking for clarification. (For this I'm falling back on my weblog in hopes some one answers me.) Personally I like Greg Reinacker's suggestion that we think of it as syndicated content.

Being a wiki I could just go in and change Aaron's changes, perhaps I still will, but that doesn't solve the real problem. There seems to be some confusion as to what we are talking about. I make my change. Aaron thinks I'm wrong, because he has a different notion of what the scope of this effort is and changes back. I could disagree and change it again and so on. The wiki way doesn't seem to facilitate the ability to clarify and focus a groups attention.

Also of issue is the context of the well formed log entry. Are we talking about the concept of an entry in a syndication feed or the more general concept of a log entry? Mark Cidade and I went around and around today and it would seem that the two of us are on somewhat different pages. Having discussed the issue of syndication feeds and beyond with Sam for months now so I was under the impression that was the focus of this effort. It was not explicited spelled out (or some one deleted that context) and I sense that context was not understood by Mark. From what I gather Mark was arguing the general notion of content that can take any form. I could be mistaken though.

This confusion from the rapid and easily evolved effort has lead me to seek answers:

  • Are we speaking about weblog entries or a more general notion that transcends weblogs to what Greg Reinacker refers to as syndicated content?
  • What is meant by content in the minimum requirements? What relationship does that content have to the required permalink? What do you link and what do you embed?

Assuming we are a) talking about the conceptual entry in a syndication feed and b) that the permalink points to more from the source, I feel strongly that some type of textual description (title/excerpt etc.) is required.

The discussion continues.

The Yahoo! Buzz is RSS.

Jeremy Zawodny points out that the Yahoo! Buzz Index is now available as a dozen or so RSS feeds. All are in the 0.91 format. All are perfectly valid when I checked. Good work Yahoo!!

Its good to see the big boys continuing to adopt this stuff even if its at a snails pace at times. Despite all of the bickering over RSS 1.0 and RSS 2.0 they chose to do their new feeds in 0.91. I don't blame them at all.

Tim Bray asks how do we explain [RSS] to people who don't need know that they need to know? This is an excellent question and one worth answering. Bray explains the need when he writes:

when I explain something to someone and they don?t get it, that?s my problem, not theirs. So I?d genuinely welcome?and I think it would be good for all of us?some discussion of how we can do a better job of explaining what it is we?re up to here and why it matters.

Let the discussion begin. The question is where?

RSS Core Profile DRAFT 2

Based on feedback collected from the comments on Sam's weblog I've updated the draft of the core profile. Changes are documented at the bottom of this post. Please continue to direct any feedback on this draft to the comments area Sam is so graciously hosting.

RSS Core Profile

DRAFT 2

ABSTRACT

The RSS Core profile defines a restricted subset of RSS 2.0 that balances ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to extended and adapt to various problem domains. It is designed for authors wishing to provide a well-formed "feed" of information to consumers. It is designed to provide a foundation for other more focused profiles to be based on.

The RSS Core profile is designed around a simple core of elements that may be easily extended through namespaces and modules. It is also designed to maximize backward compatibility with the RSS 0.91 format and its descendents. This allows the profile and derivatives to leverage the existing install base of 0.91 feeds and prior bodies of work such as Dublin Core meta data and RSS 1.0 modules.

The goal of the RSS profile is to serve as guidelines to best practices in a balanced and simplified approach to authoring and consuming of resources with RSS.

COMMON CORE TAGS

<rss>
Description: The root tag for the syndicated resources collection.
Sub-Elements: channel (required)
Attributes: version - a string identifying the version including profile and document type.
Notes: Only one channel is permitted.

<channel>
Description: Container tag for a specific channel
Sub-Elements: title (required), description (required), link (required), item (required)
Attributes: none.
Notes: Only one each of title, description and link is permitted. Language is deprecated.

<item>
Description: Container tag whose contents represents one resource in the channel. At least one item must be present in a channel.
Sub-Elements: title (required), link (required), description (optional, but highly recommended)
Attributes: none.
Notes: none.

<link>
Description: A unique URI using a IANA-registered scheme that specifies the location of the channel or item (resource). Applications are required only to support one of any of these IANA URI schemes however http:// is highly recommended. A required sub-element of channel and item.
Sub-Elements: none.
Attributes: none.

<description>
Description: A plain text excerpt of the channel or item (resource). A required sub-element of channel. Optional, though highly recommended, sub-element of item.
Sub-Elements: none.
Attributes: none.
Notes: The RSS Core profile supports plain text and does not permit encoded markup such as HTML to be included in the description. Recommended not to exceed 500 characters. Those wishing to embed markup language or larger pieces of content in the description tag should use the mod_content module.

<title>
Description: A plain text descriptive title of the channel or item (resource).
Sub-Elements: none.
Attributes: none.
Notes: Is the equivalent of the HTML title and only supports plain text. Encoded markup such as HTML is not permitted to be included in the title. Recommended to be no more then 100 characters.

DEPRECATED TAGS

Tags not defined herein that have appeared in documentation for RSS 0.91 and its descendents are considered deprecated from the core in the RSS core profile. This data can be furnished though various modules such as Dublin Core, mod_content and mod_admin. All deprecated tags in the RSS core profile where previously considered optional except language which was required by the 0.91 specification, but made optional in later specifications.

TAG MAPPING

As it exists today, most tags that would be considered deprecated by this profile, have a modular equivalent.

For brevity the following prefixes are assumed to be mapped to the list namespace URI. The prefixes used are based on the most common in use today. The namespace URIs have been linked to their documentation.

The following is an initial less then perfect mapping of tags for those who opt to comply with this profile.

  • language: dc:language (see footnote in outstanding issues.)
  • copyright: dc:rights
  • lastBuildDate: dcterms:modified matching HTTP 1.1 Last Modified.
  • managingEditor: dc:publisher
  • pubDate: dc:date
  • guid: link (links should be unique URI.)
  • webMaster: admin:errorReportsTo
  • category: dc:subject
  • expirationDate: dcterms:valid
  • generator: admin:generatorAgent
  • cloud: cp:server
  • rating: rss091:rating
  • source: dc:source or ag:source & ag:sourceURL
  • skipDays & skipHours: rss091:skipDays or rss091:skipHours or use mod_syndication for more advanced functionality.
  • enclosure: mod_image, mod_audio or mod_streaming depending.
  • ttl: similar functionality is in mod_syndication
  • docs: annotate:reference (Or should this be the namespace URI?)
  • comments: annotate:reference
  • image: see mod_image (Though mod_image is designed to work with and supplement the existing image tags from RSS0.91 from what I can tell. The conversation trailed off and never completed. Jon Hanna has proposed another specification that I thought wasn't as appropriate as Kevin Burton's module. http://www.benhammersley.com/archives/003096.html.)
  • textinput: (Needs to be addressed. RSS 1.0 allows it so no module has been designed to date. annotate:reference helps provide a link, however its an empty tag that you couldn't insert a dc:title or other tags.)

EXTENSIBILITY

Without detailed information to extending RSS 2.0 with modules and XML namespaces, the RSS Core Profile will follow the guidelines set forth in the RSS 1.0 format module documentation. Modules should make every attempt to keep module syntax streamlined and simple by minimizing the use of RDF/XML constructs.

EXAMPLES

OUTSTANDING ISSUES

  • How to identify the profile/document type? Place in version? or use XML document type? Both?
  • link tag language sufficient? http:// is highly recommended or should it be required?
  • Need to better address non-English language feeds. Morten Frederiksen suggests inclusion of a clause like "if the language is not English, please include the dc:language element according to XXX." Should non-English feeds be forced to use Extensible or Transitional? Is there a better way?
  • RSS 2.0 can never use a default namespace? Currently being tested out and considered to be optionally permissable.
  • content:encoded, xhtml:body or both

CHANGE LOG

DRAFT 2, Jun 04 2003

  • Switched depreciated to deprecated. (Lance at Brainpolis)
  • Change language under channel as per this post (Lance at Brainpolis)
  • Dropped the 15 items per channel note under item. A limit may (or may not) be set in profiles that inherit from the Core profile based on context.
  • Added prefix to namespace URI mappings with links to their documentation (Dare Obasanjo)
  • Added "less then perfect" clarification on tag mappings.
  • Changed "existing RSS formats" to "RSS 2.0" in the abstract for clarity.
  • Added content:encoded, xhtml:body or both to outstanding issues.
  • Added a note that a default namespace is being tested.
  • Various spelling and grammar fixes.

DRAFT 1, Jun 03 2003

RSS Core Profile DRAFT 1

UPDATE: A second draft has been published here.

This is a slightly more formalized write-up of a proposed profile for RSS that has been discussed and brought up again. Rather then just produce an iteration of what Don Box wrote up, I thought I'd take a step back to codify some of the design considerations in which other profiles may be built. I believe with this established going forward will be much smoother and better focused.

While we may not be able or want to identify them right now, there will be different profiles depending on the context. A comments feed will have different requirements then a news/weblog feed that will have different requirements then an embedded item in an API message. While these contexts and their needs will certainly vary, they needn't be entirely different. The Core Profile is an attempt to create a foundation that other profiles, such as "RSS for Weblogs," can be derived.

UPDATE: Sam Ruby has opened a comments area on on this post. Please direct any feedback on this draft there.

RSS Core Profile

DRAFT 1

ABSTRACT

The RSS Core profile defines a restricted subset of existing RSS formats that balances ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to extended and adapt to various problem domains. It is designed for authors wishing to provide a well-formed "feed" of information to consumers. It is designed to provide a foundation for other more focused profiles to be based on.

The RSS Core profile is designed around a simple core of elements that may be easily extended through namespaces and modules. It is also designed to maximize backward compatibility with the RSS 0.91 format and its descendents. This allows the profile and derivitives to leverage the existing install base of 0.91 feeds and prior bodies of work such as Dublin Core meta data and RSS 1.0 modules.

The goal of the RSS profile is to serve as guidelines to best practices in a balanced and simplified approach to authoring and consuming of resources with RSS.

COMMON CORE TAGS

<rss>
Description: The root tag for the syndicated resources collection.
Sub-Elements: channel (required)
Attributes: version - a string identifying the version including profile and document type.
Notes: Only one channel is permitted.

<channel>
Description: Container tag for a specific channel
Sub-Elements: title (required), description (required), link (required), item (required)
Attributes: none.
Notes: Only one title, description or link is permitted. Language is depreciated.

<item>
Description: Container tag whose contents represents one resource in the channel. At least one item must be present in a channel.
Sub-Elements: title (required), link (required), description (optional but highly recommended)
Attributes: none.
Notes: Recommended to not exceed 15 per channel.

<link>
Description: A unique URI using a IANA-registered scheme that specifies the location of the channel or item (resource). Applications are required only to support one of any of these IANA URI schemes however http:// is highly recommended. A required sub-element of channel and item.
Sub-Elements: none.
Attributes: none.

<description>
Description: A plain text excerpt of the channel or item (resource). A required sub-element of channel. Optional, though highly recommended, sub-element of item.
Sub-Elements: none.
Attributes: none.
Notes: The RSS Core profile supports plain text and does not permit encoded markup such as HTML to be included in the description. Recommended not to exceed 500 characters. Those wishing to embed markup language or larger pieces of content in the description tag should use the mod_content module.

<title>
Description: A plain text descriptive title of the channel or item (resource).
Sub-Elements: none.
Attributes: none.
Notes: Is the equivalent of the HTML title and only supports plain text. Encoded markup such as HTML is not permitted to be included in the title. Recommended to be no more then 100 characters.

DEPRECIATED TAGS

Tags not defined other tags from RSS 0.91 and its descendents are considered depreciated from the core in the RSS core profile. This data can be furnished though various modules such as Dublin Core, mod_content and mod_admin. All depreciated tags in the RSS core profile where previously considered optional except language which was required by the 0.91 specification, but made optional in later specifications.

TAG MAPPING

As it exists today, most tags that would be considered deprecaited by this profile, have a modulized equivelant. The following is an an initial mapping of tags for those who opt to comply with this profile.

  • language: dc:language (see footnote in outstanding issues.)
  • copyright: dc:rights
  • lastBuildDate: dcterms:modified matching HTTP 1.1 Last Modified.
  • managingEditor: dc:publisher
  • pubDate: dc:date
  • guid: link (links should be unique URI.)
  • webMaster: admin:errorReportsTo
  • category: dc:subject
  • expirationDate: dcterms:valid
  • generator: admin:generatorAgent
  • cloud: cp:server
  • rating: rss091:rating
  • source: dc:source or ag:source & ag:sourceURL
  • skipDays & skipHours: rss091:skipDays or rss091:skipHours or use mod_syndication for more advanced functionality.
  • enclosure: mod_image, mod_audio or mod_streaming dependinng.
  • ttl: similar functionality is in mod_syndication
  • docs: annotate:reference (Or should this be the namespace URI?)
  • comments: annotate:reference
  • image: see mod_image (Though mod_image is designed to work with and suppliment the existing image tags from RSS0.91 from what I can tell. The conversation trailed off and never completed. Jon Hanna has proposed another specification that I thought wasn't as good as Kevin Burton's mod_image. http://www.benhammersley.com/archives/003096.html.)
  • textinput: (Needs to be addressed. RSS 1.0 allows it so no module has been designed to date. annotate:reference helps provide a link, however its an empty tag that you couldn't insert a dc:title or other tags.)

EXTENSIBILITY

Without detailed information to extending RSS 2.0 with modules and XML namespaces, the RSS Core Profile will follow the guidelines set forth in the RSS 1.0 format module documentation. Modules should make every attempt to keep module syntax streamlined and simple by minimizing the use of RDF/XML constructs.

EXAMPLES

OUTSTANDING ISSUES

  • How to identify the profile/document type? Place in version? or use XML document type? Both?
  • link tag language sufficient? http:// is highly recommended or should it be required?
  • Need to better address non-English language feeds. Morten Frederiksen suggests inclusion of a clause like "if the language is not English, please include the dc:language element according to XXX." Should non-English feeds be forced to use Extensible or Transitional? Is there a better way?
  • RSS 2.0 can never use a default namespace?

CHANGE LOG

DRAFT 1, Jun 03 2003

In another thread to my O'Reilly post on Weblogs, Web services and the future, an anonymous poster raises the notion of using SOAP instead of XML-RPC stating SOAP is a great service, supported by all the big players. Quite a valid thought. Another anonymous poster chimes in SOAP is a useless waste of time and a wonderful example of NeedlessComplexity. Use HTTP. What else needs to be done besides GETting data, POSTing new data, PUTing changed data and DELETEing dead data? Focus on clean resource-space identifiers [URIs] and meaningful XML blocks. Then, DO NOT wrap them in an XML-RPC or SOAP noise. Just use the data.

I understand where this anonymous poster is coming from. As I have said repeatedly RSS is the Web service we already have. I also admit to having RESTful leanings however they are not absolute.

I asserted that creating a SOAP/RSS hybrid would simpler then most people think when document literal encoding is applied. Most developers are familar with the RPC encoding which does have more overhead and issues particularly in this case. Encoding document-based content into RPC forms is part of the reason why the XML-RPC based solutions struggle. Sam demonstrates the notion of SOAP interface with RSS at the end of his essay. Here is some additional reading on the matter here here and here. In comparision SOAP/RSS is not that much more complex then just RSS over HTTP (the pure REST way).

SOAP does have its place. For example, wouldn't it be helpful to post to your weblog via an email message in a standard way? (This assumes the client is handling the SOAP encoding of course.) I think, yes.

In recent weeks a significant amount of discussion has been ongoing as to the future of Weblog APIs. At issue is that there are two similar, but different Web service APIs in use – the Blogger and MetaWeblog APIs. Within each of those APIs are various interoperability and implementation issues and even some extensions. The community clearly wants one tool-agnostic API that all can utilize and integrate tools with, but there is differing views as to how this will and should happen. Full post on my O'Reilly Weblog.

A very productive discussion of an RSS profile has continued throughout the weekend and into this morning. Enthusiastically many have dived in – myself included. I still maintain that an important consideration has not been discussed and needs to. What are our design goals with this profile? (More specifically questions like the one I raised in my previous post, at what point does the specification stop and extensible modules begin?) This is a bit of concern to me. I don't think there is clear picture even if a certainly level of consensus is apparent in specific decisions. I'm am not so stuck on a specific set of constraints as much as I am purposeful and consistent decision making.

I think some discussion and clarity on this would be productive in further this discussion and its results.

Ben Trott has posted his thoughts on the issue of context in which an RSS profile should be developed laying out some of the type of information that needs to be considered.

The following is something in between that I putting out there as a conversation starter. Its loosely based on the design goals I set in the XSS profile I proposed last fall. I think they also coincide with what has been discussed thus far. Comments are welcome. Please use Sam's weblog here.

[UPDATE: Sam has opened a new thread on my post. I have modified the link above accordingly.]
Balance ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to extended and adapt to various weblogging tools[1] and beyond. (In short, simplify and clarify what we have and how to extend and evolve it.)
Promotes best practices of emitting well-formed and useful RSS feeds.
Design around a simple common core of elements (title, link, description, channel, item) that may be extended with namespaces and modules.
The profile opts for namespaced elements in modules over tags in the default namespace. (i.e dc:date over pubDate. dc:subject over category etc.)
Design to maximize backward compatibility with the RSS 0.91 format and its descendents though some discontinuities maybe be unavoidable in achieving long-term benefit. This allows the profile to leverage the existing install base of 0.91 feeds and prior bodies of work such as Dublin Core meta data and RSS 1.0 modules.

[1] I understand what Ben is getting at when he writes as soon as we start discussing a core profile for RSS, we need to define the context in which that profile applies. I'm not sure if Weblogging is the correct term to apply to the context at hand. It seems that this profile would be quite applicable to other domains like traditional online news sites and publications such as CNet, the BCC and so on. Just a thought.

RSS Profile Feedback.

As the discussion of a unified RSS profile has continued on Sam's weblog, Microsoft's Don Box has offered an initial proposal of a RSS profile for comment. (Don has only addressed items so far.) Separately, Ben Trott has posted some of his own thoughts on what the format should be.

I really think before this goes further an important issue needs to be addressed. At what point does the specification stop and extensible modules begin?

For instance, looking at Don's item proposal, you have pubDate, comments, category, and author elements. The dublin core (dc) module has elements for date, category (subject), and author. The dc module also has a lot of other elements not included in this initial proposal. As I read Don's proposal, either a new module for meta data must be developed like dublin core sans the overlapping elements (which is silly) or design overlapping and redundant elements (which is silly AND confusing). Some tags need to be depreciated in order for RSS to move forward and achieve its greater goals.

I still maintain the best approach overall is going to a simple basic core with modules (many of which has already been developed) to easily extend the format based on the context in which it is being used. It would raise RSS feed quality and make them more neighborly for users.

Getting more specific on Don's proposal…

I'm liking the use of xhtml:body more and more. I wasn't sure what I thought of it initially, but it's growing on me. I applaud the call for the description to be for excerpts only and not contain markup.

Instead of title and description being an either/or where you use the description if the title is missing and use the title of the description is missing – revert back to the rules set in the 0.91 spec. Don Box has proposed a profile… derived from a description is not nearly as helpful to me in a summary view then a proper title like An RSS 2.0 Profile. If you are the type of person who writes many small entries and doesn't want to title them you could use something like tima thinking outloud. May 11 2003 20:12 -5:00. (Blog name and timestamp.)

-I don't agree with the proposed definition of the guid or link elements. The vast majority of feeds use the link tag to point to site feed originates from. What value is gained from this change? No one who seems to understand the concept of a URI has ever been able to explain to me why we need a guid element to replace the link tag. What function does the link tag as used today not provide?-

UPDATE: Just as I posted this I see that Don has posted a version 0.3 in the meanwhile where the delta states Embraced <link> as the one true URL container :-)

Looking forward to the continuing discussion.

Wind'em up and watch'em go.

Via Sam Ruby Dave Winer is calling for a profile to RSS 2.0 that blogging tools could strictly comply with in order to achieve interoperability. I would have preferred that we just had a specification that resolves the issues, but I'd favor a RSS 2 profile. I proposed one when more of forking and flaming was going on last fall. The profile I proposed is not perfect, but it maintains a high level of backward compatibility with RSS 0.91 which is still the most widely used format in use. Whatever the case, this should be an interesting comments thread to follow.

Wind'em up and watch'em go.

(Cross-posted on my O'Reilly weblog. )

In a Sun InnerCircle publication Sun Chief Technology Evangelist Simon Phipps writes:

Still, despite wide consensus, the technologies usually associated with Web services are not actually standards or recommendations of any open standards organization. To the surprise of many, Web services are not just about SOAP and things that start with WS-*, as some vendors would like you to believe. Some of the most widespread Web services today — for instance, those in use by the fast-growing Web-logging ('blogging') community — are based on other technologies like RSS and XML-RPC.

I'm glad to finally see a major technology vendor acknowledge SOAP etc. is not all there is to Web services and that RSS as a legitimate technology in that space. Given its emerging uses, RSS is not just a format for syndicating content. As I've written in the past and others have noted, RSS feeds do qualify under the principles of the REST architectural style that the Web was built on.

Simon's mention of weblogging raises a question I've been meaning to ask for some time. Are there any Sun employees blogging? Microsoft and IBM have a handful that I know of off the top of my head. Macromedia is the model corporate blogging citizen with some significant top brass ( Kevin Lynch and, until recently, Jeremy Allaire ) making regular posts.

If I haven't overlooked any Sun employees blogging, this is quite an oversight. Microsoft is not the only company that could use a human face.

UPDATE: In my original post I omitted that I think Apple's Safari developer extraordinaire David Hyatt weblogging his work and views is marvelous and another great example of bloggings potential in these firms. David's communications with his existing and potential user base not only interesting, but provide me with a sense of understanding and confidence in his (and Apple's) work.

Also, some Sun employee blog sightings are coming in via private email and the comments on my O'Reilly weblog entry. One of these includes Simon Phipp's own personal weblog.

Lastly, you may have noticed that I ignored Simon's mention of XML-RPC. While it certainly has its uses today, the limtiations I've come to know leads me to not support its proliferation going forward. Its flaws are serious. Today, in randomly surfacing around, I happened upon this rant by Charles Cook on XML-RPC and weblogging APIs that covers some of the significant shortcomings quite well and figured I share the link and state my view.

Text Processing Innards.

Mark Pilgrim's latest Dive Into XML column has been published. Mark details how the RSS Validator is architected to process RSS and identify problems even if its malformed invalid XML. (Mark promises something other then RSS next month.) Reading this article a few comments came to mind.

Mark never covered the topic of unneighborly RSS – RSS that is perfectly legal by the RSS spec (or lack thereof), but causes logical errors, garbled display or in some cases receiving applications to just choke. (Oh wait, I did that already. ) My real wish is for the RSS Validator to provide warnings to these unneighborly practices along with tips to rectify the issues.

The system that Mark uses to parse a file is of interest to me currently and quite timely. In working to refactoring the TikiText engine to be more easier to extend and more efficient to run, I've actually been considering a similar approach. I originally based the code loosely on Text::WikiFormat and other Wiki implementations. Generally speaking, a series of individual regex(regular expressions) are passed over the same string. Order of operations becomes critical and sometimes, as I've found, conflicts are irreconcilable.

So while it works to a degree, it has become clear that this approach is flawed and that, in terms of processing, TikiText is not much different then XML. I've been restudying Dave Cameron's REX, a regular expression XML parser, that is the basis of the XML::Parser::Lite module I know all about. I've always marveled at REX because, when expanded, its the single longest and most complex regular expression I've viewed. It actually works quite well assuming you don't hit your head on the limitations of Perl's regular expression handling of Unicode characters. The problems of my bad hair day where introduced during the implementation of REX into the XML::Parser::Lite.

Currently I'm working on creating a couple large regular expression that create a stream of tokens that are passed (some times recursively) to various handlers and buffers. With all of my free time (ha. ha.) I hope to have something working by the end of the weekend.

CVS2RSS: This is, of course, what RSS was invented for: Kellan's CVS 2 RSS - it generates an RSS feed of CVS checkins. (via Ben Hammersley via Jeremy Allaire's Weblog )

This is neat and a great example of RSS-based Web service. I'll note that Jon Udell isn't the only one that has been speculating about RSS' expanding roll beyond content syndication and into the broader space of Web services. Joe Gregorio, DJ Adams and myself are among those who have also been speculating for some time. Interestingly this script uses the RDF-happy RSS 1.0 rather then RSS 2.0. RSS 2.0 is of course not capable with 1.0, but could be with 0.91 if you choose… (Shouldn't we have resolved this now by now?)

Then again this all could be moot since Interwoven has been granted a patent for using versioning systems to create web sites (via SixLog ). This is complete rubbish and yet another absurd patent awarded. As Ben Trott points out many a web site developer/publisher has been using CVS for just this purpose.

Perhaps the US Patent Office would benefit from weblogging these things before they grant a claim and create another potential legal mess.

Parsing RSS Without A Parser.

The latest of Mark Pilgrim's Dive Into XML column is out. This month covers the topic of coping with invalid XML disguised as RSS feeds without an XML parser. Before you get too excited Mark concludes:

Hopefully we're trying to use a real XML parser first and only falling back on this messy regular expressions-based sgmllib parser when that fails. However, in flagrant abuse of all things pure and sacred, I have managed to extend this script into a full-fledged parse-at-all-costs RSS parser that supports all the advanced features of RSS, including namespaces. It even handles exotic variations of RSS 0.90 and 1.0, where everything is explicitly placed in a namespace (even the basic title, link, and description tags). I don't recommend it, but it works for me.

Mark makes excellent observations as he presents his case and shows that he has the balls to write an article for XML.com demonstrating how to parse an XML-based format without an XML parser. (His words from his weblog.)

At the same time this article is more of the same news we RSS-aware people have already heard. I suppose it can't be reiterated enough.

Incidentally, it was Mark's initial observations on RSS and the release of his ultra liberal RSS parser that lead me into my foray with RSS that still curses me to this day.

I still will foolishly continue to advocate well-formed and hope for the day where only 1% of feeds are malformed.

RSS: The Web Service We Already Have.

DJ Adams writes:

It seems that beyond carrying syndication information, RSS is a very useful and flexible way to get all sorts of application data pushed to a user over time. In the same way that a web browser is a universal canvas upon which limitless services and information can be painted, so (in an albeit much smaller way) an RSS reader/aggregator might also find its place as an inbox for time-related delivery of all sorts of information.

Amen DJ! As I asserted in my prior weblog post Web Services We'd Like To See, I wrote Whether it is just assumed or simply overlooked, RSS is the most widely deployed Web service across the Internet. Granted, most RSS feeds have very simple interfaces with almost as simple backends that are unlike the Web services that usually come to mind. (Who says Web services need to be complex or sophisticated anyhow?) Under the principles of the REST architectural style that the Web was built on, RSS feeds do qualify. Consider that any site search engine becomes a Web service if it could emit results in RSS and the format's potential in the realm of Web services becomes more apparent. It is this perceived potential that I've been an advocate of getting the RSS format's house in order.

Numerous instances of the format's value in service delivery are popping up in experiments around the network. Sam Ruby's experiments with automatic linkbacks. Joe Gregorio's RESTful interface for publishing to weblogs. (On an aside, Joe and I are co-conspirators in starting a mailing list to discuss these very notions. Come one, come all.) DJ integrates SOAP and RSS web services in his experimental booktalk application script. Matthew Langham posted in his O'Reilly weblog about the use of RSS in deliverying business data. There is also the on-going discussion of remote commenting and tracking using RSS.

I'm happy to see the message spreading. Today Jeremy Allaire writes in An Explosion of Web Services?:

As I've been reading and writing today I've come to a somewhat obvious conclusion: there's been an explosion of 'web services' in the past year, and it has nothing to do with SOAP, WSDL and such standards as described in the industry but with the ascending role of RSS and RDF as XML data and syndication formats.

Lots of industry analysts have commented that 'public web services' (e.g. web services that can be accessed and used through Internet-accessible public APIs) haven't really happened. When one looks at RSS aggregation sites such as Syndic8.com it's quick to see that there are thousands of "web services" out there for people today.

In an earlier post Jeremy points to some recent thoughts on RSS published by Tim Bray where he outlines some the issues: growing bandwidth consumption, settling on a media-type and eventual demise of RSS aggregators as a standalone application class. He concludes:

...[RSS feed reader/aggregators] just belongs in the browser. It will take a couple of years for it to get cooked into mainstream browsers in a mature enough form to be usable, so the guys with the RSS-reader software should make hay while the sun shines and start figuring out their Next Big Thing.

He also adds anyone who does any kind of publishing software had better start offering a real-easy-to-use RSS interface and sooner rather than later or they're just not going to be in the game.

I think gord says it best when he comments I am so impassioned by this recent explosion in open and well documented interfaces. We are so near a cascade level event with all of the emergent software that is evolving. And each new generation of software raises the potential interactions at an exponential rate. I concur and look forward to where this is all going.

This is a summary-so-far and for posterity to a on-again-off-again discussion that he been on going for nearly a year -- comment tracking and monitoring.

There is a fairly simply solution, comments should be available as an RSS web service and aggregators should be able to subscribe to those feeds on an on-going or trial subscription basis.

Site with comments et al need to provide them as an RSS web service. (They can be dynamically generated or statically -- the implementation doesn't matter.) Some already do this, others could add some form of it quite easily.

Aggregators then need to evolve to support subscriptions based on time and/or activity. (Subscribe for 30 days. Unsubscribe if a new post is not made to this feed for 14 days.) An add-on to this is that aggregators can create groups and sort on them, perhaps even offer an activity log so you know when a subscription has been dropped and so on.

The bottom line is there is no need for a new web app and it doesn't have to be that difficult. Ben Hammersley also sounds off here.

For further reading and the source of this simple genius follow these links:

TrackBack in motion.

Interesting work is afoot in the world of TrackBack and other related concepts.

I received an email from Aaron Straup Cope that he has put my newly released XML::TrackBack module to work. Aaron is developing a OOP-ish interface to the Internet Topic Exchange dubbed Net::ITE.

The Internet Topic Exchange site is an implementation of Ridiculously Easy Group Forming concept. In its current form, ITE is a TrackBack repository with a twist -- participants can create channels that they and others can ping. The integration of a Wiki into the mix, albeit a loose one, is intriguing and one that has yet to be touched upon and explored.

David Raynes has been working on two concepts based on TrackBack infrastructure that he calls ComeBack and Post-It. Post-It allows users to publish whole entries to a MovableType weblog while ComeBack enables distributed comment authoring. I tested both with some basic test scripts using XML::TrackBack. Post-It works without issue. ComeBack uses a slightly different interface that returned an error when pinged. David has now integrated the two in one site achieving a forum-like effect where a user can make a post and others can comment on it.

The notion of a remote commenting interface that ComeBack represents is an intriguing one. This is a topic I will return to in a later post. Too much to write about here. Post-It is not as apparent to me. As a publishing API, Post-It bears a great deal of similarity in principle to the RESTLog API. The value of free-for-all posting that it enables via TrackBack I'm not entirely sure about.

Yesterday, Ben Hammersley was his own guinea pig as he attempted to implement TrackBack threading on his site. Ben had to retreat for the time being and shares his learnings in a later post here.

Sam Ruby recently put a different spin on Mark Pilgrim's automatic linkbacks system by utilizing RSS feeds as its source of excerpts. Sam explains To participate, you don't need to use weblogging software that supports trackback or pingback, you simply have to update your templates to have a link to your RSS feed. In a follow-up post he reasons I actually experimented with mark's code for a bit, but the biggest problem I had was that it looked like it would require continual investment to weed out the ever growing number of portals and personal aggregators. I was also concerned about the feedback loop that could occur given the amount of back traffic I get whenever I mention anything on Mark's page.

Shelley Powers continues to advanced something she calls BackTrack on top of TrackBack information. In this post she explains its purpose In each individual posting page is a section labeled with Sticky Strands and listing all of the TB pings the posting issued. The functionality I added today takes those pings, follows them back to the posted weblog, and then lists all of the trackbacks that weblog posting has received. Sam Ruby has joined in.

Both are excellent ideas that underscores the increasing value (and necessity) of meaningful titles and excerpts.

This experimentation has all been very intriguing and worthwhile in our discovery and understanding of the network and social effects of two-way hyperlinking systems. In reviewing this work I'm beginning to see some emerging issues and topics coming into focus. (In no particular order.)

  • Extensibility of TrackBack. How should this is achieved without breaking some semblance of interoperability. For instance, I was unable to make a ComeBack post with XML::TrackBack because email has been added, excerpt renamed comment and blog_name renamed agent. All are required. So a TrackBack enabled tool cannot interoperate with a ComeBack interface, but does it have to be that way? It would seem not if these situations where examined for consideration to developing standard.
  • What is the appropriate use and display of these various mechanisms? What improves usability and what degrades it? In commenting on Ben Hammersley's TrackBack threading experiment I wrote it seems the time is near, even here, where we need to begin discussing when is it appropriate/useful to use these different mechanisms and how are they best presented. Another case in point, since implementing a number of these mechanisms, Sam Ruby's comments board have been filling up with various links and excerpts to the point its becoming hard to grok.
  • Integration of RSS and a consolidation of efforts. Post-It uses a superset of the TrackBack. Is ComeBack was based on TrackBack's infrastructure and has a very similar interface, but breaks comparability. Post-It is quite similar to RESTLog. All make use of RSS or RSS-like structures including Sam Ruby's automatic linkbacks and let's not forget MLTFO (More Like This From Others) effort that happened over the holiday season. One thing is becoming clear RSS is bloody important and highly useful and far more then just a way to read news outside of the browser so we can stick it to the BigCos.

The subtle and underlying theme I draw from all of this is that RESTful interfaces that inherent in the Web's design work and have yet to be fully explored.

Here is to experimentation, innovation and evolution.

Dive Into XML Debuts.

Mark Pilgrim's monthly column on xml.com debuted yesterday with "What is RSS?" Mark's article is well done as expected and a great primer for beginners.

Since I'm not the rock star Mark is, I'll point out to those who missed it my O'Reilly article on RSS and my behind the scenes story.

WSIL meets RSS.

Since writing An Introduction to WSIL for O'Reilly, I have been occasionally giving thought to interesting ways this under-utilized specification could be applied.

In that article I said, "In many ways, WSIL is like RSS for Web services. RSS is a file format with pointers to published content that can be syndicated and aggregated. WSIL is a file format with references to published Web services that can be discovered and bound."

I find WSIL intriguing because of its simplicity and lightweight implementation is more RESTful then UDDI. WSIL leaves the processing logic to the developer and makes its information trivial to access creating the potential for innovative and novel applications arise.

More recently I have asserted that RSS syndication feeds are Web services and perhaps the most widely deployed Web services across the Internet. Most RSS feeds have very simple interfaces and backends comparatively speaking. However under the principles of the REST architectural style that the Web was built on, RSS feeds do qualify. Take for instance Joe Gregorio's experiment the RESTLog API, a RESTful XML over HTTP using RSS syntax, and should be able to see what I mean.

These two assertions got me thinking about how WSIL and RSS relate and compliment each other. Since its been over a year since WSIL was introduced with little progress, I have an experimental proposal to make that demonstrates some of the potential I see between Web services and RSS syndication.

If RSS feeds are Web services and WSIL is a format designed to facilitate the discovery and aggregation of Web service descriptions, couldn't WSIL be used to discover RSS feeds?

What I propose is utilizing WSIL to create collections of related syndication feeds (presumably from one site) and links to external collections or feeds in one file. Putting this in weblogging terms, combine RSS auto-discovery and channel rolls into one file format. I've created an initial sample of what this file would look like here.

I won't go into the basics of the WSIL format here. (You can read my O'Reilly article for that.) However the WSIL format was not specifically designed with this usage in mind requiring some additional clarification and guidance.

In my example I point to services (aka feeds) my weblog produces. service.name maps to channel.title and service.abstract to channel.description. service.description@location would be the URI of the syndication feed.

I also include links to syndication feeds based on my own channel roll. Ideally, the link tags (using the location attribute) would point to other WSIL files and not directly to a single RSS feed. The last link that I have commented out is a fictitious one to a WSIL on movabletype.org, but demonstrates what a reference to another WSIL collection would look like. Using WSIL pointers allow applications to discover an entities entire collection of feeds and their channel roll with one additional step. (A future path of exploration is combining "traditional" Web services and syndication feed links in one WSIL file and examining the new dimensions to social networking it may or may not enable.)

This proposal for providing RSS feed and channel roll discovery in a single file works well within the WSIL 1.0 specification, however it could use further refinement that begins to break compatibility with the format in its strictest sense.

The WSIL defines the location attribute as optional. In order for this usage of WSIL to be effective, location is required.

I made several criticisms of the WSIL formats initial design that this experiment illustrates. For instance, while WSIL is extensible using XML Namespaces though this extensibility is limited to the service.description and link blocks. What is lacking is richer meta data facilities in the root inspection. The WSIL 1.0 format only allows for the catch-all abstract to be applied. I propose the use of module such as mod_dublincore and mod_admin be utilized in the root inspection tag. While not permitted by the WSIL specification, it is a valid use of Namespaces. Here is an example of a WSIL file with these module applied may look like.

The WSIL specification defines two approaches to auto-discovering a WSIL file. One is by specifying a standard file location. This added convention would allow aggregators and other agents to being the discovery process by pinging a remote site for the existence of a file. Using HTTP a 404 code means keep looking while a 200 means the application can continue processing. This is a more efficient use of bandwidth and processing then RSS auto-discovery because the application did not have to download and then hack out any XHTML link tags pointing to RSS feeds the associated site has to offer. When multiple feeds are offered, it also saves the agent of additional trips back to the server to retrieve additional information such as the title and description of each feed. (This of course assumes that Dublin Core extensions and other meta data has been properly applied.)

One of my nits with the 1.0 format is that the fixed filename is inspection.wsil rather then conforming to the common naming convention of index.*. I suggest that you use of index.wsil as I have with the second convention to WSIL auto-discovery.

If a WSIL file cannot be located by fixed filename, the specification optionally supports embedded references in HTML documents. This is similar in nature to the way stylesheets and RSS files can be auto-discovered with an XHTML link tag. WSIL misuses the meta tag instead. I propose that implementers use the link as I have.

This notion is still a bit rough, but it is my belief that this approach could blur the lines between RSS syndication and traditional Web services and create the potential for innovative applications to form.

Your comments and feedback are welcome. For those interested in discussing this proposal and other similar topics (Joe's RESTLog experiment, My RSS/XSS profile, TrackBack) I encourage you to join and participate in the mailing list I've recently opened on Yahoo Groups. (I've been meaning to make a more formal and grandiose announcement for the last couple of days, but so this will have to do for now.)

XHTML as Syndication Format?

[The following are highlights of comments I made in response to Anil Dash's proposal to use XHTML as a syndication format instead of RSS. I've lightly edited the text for relevancy.]

A set of tags already exists: they're called RSS. The point of XHTML (or one of them at least) is that it can leverage the full range of XML toolkits and specification. This includes XML Namespaces that allows tags from other schemas to be included thereby extending the original from its designed purposes. Some have already been experimenting with combining RSS and XHTML tags into their pages here. (Do a view source to see what I mean.)

What Anil proposes (H3 with this class...) has been done before in the past before we had XML -- its called screen scraping -- albeit more refined screen scraping, but screen scraping nonetheless. It all seems rather retro to me.

I don't mean to sound harsh. I really don't. Its an interesting notion that has its merits. I can understand the argument that ideally content authors shouldn't have to create two versions. I think that in practice its limitations will outweigh its benefits.

A separate syndication file should be more bandwidth efficient especially with aggregators and the like banging away frequently on them. Aggregators have recently improved from their early days of brute force updates -- downloading a feed on some interval regardless of changes. RSS is more about data (that just so happens to be about content) where XHTML is more about display. Combining the two is fine, but inefficient in that the information necessary for one task must be ignore when the document is used for the other.

I complete object that RSS exists out of laziness as Anil says. If a content author is "too lazy" to generate two versions of their content, I'd suggest that they author their content in RSS. You can easily convert RSS more efficiently and reliably into XHTML. RSS is for machine processing while XHTML is designed for display. In fact I could be really lazy as a content author and have multiple XHTML pages generated from one RSS file.

I think of RSS files as more of a Web service then a web page. That may help provide a different perspective.

In responding to my comments Scott Andrew LePera writes:

Properly-structured XHTML is far more robust than RSS for providing syntactic structure for a Web document, and is just as machine readable. The fact that an <EM> is rendered as italicized text in a browser is completely incidental.

The more I learn about these issues, the more I become convinced that it's wrong to ask authors to jump through additional hoops to support formats for alternate endpoints like RSS newsreaders. At the end of the day, I'm paying the RSS tax through additional bandwidth and ensuring that what I put in my XHTML won't break my RSS (like matching character encodings and avoiding relative links).

Syntactic structure of a Web document is not the intended purpose of RSS. RSS was designed to syndicate a collection of online resources called a channel. Admittedly the RSS format and its documentation have been a disaster and are lacking, so let me clarify: RSS files were never intended to contain or transport HTML. If you go back to the version 0.91 format documents, you will see that was to be "plain text" that was required/recommended to be 500 characters or less. The <description> element was to be an excerpt or brief abstract of the content on the other side of the <link>. What has become common practice is the work of a one-man design committee and his personal agenda. Let us not confuse intent with misuse.

I'm all for experiments such as Anil's proposal, though I'm personally skeptical that is will be more successful then (proper) RSS and XHTML working together.

[UPDATE: In related to this growing discussion, Mark Pilgrim writes in "The rebellion will be syndicated": Tantek Çelik: XHTML vs. the world. Bet on the world. Another good point.]

A RESTful Publishing API.

I thought the Blogger Web service API and its close cousin the MetaWeblog API were interesting ideas. After attempting to experiment with them a bit I came to realize that their implementations where lacking the features or flexibility to take full advantage of MovableType features. For instance both APIs did not support MT's extends fields such as excerpt, "more text" and recently a keywords/meta data field. In the case of Blosxom, another blogging tool I use, implementing these APIs would be more substantial then the tool itself -- a disproportionate fit to say the least.

While (re)aquatinting myself with the concepts of REST architecture and other related thoughts, I began to consider how they could be applied to developing a flexible and scalable interface that was bettered suited for a wider range of publishing tools like MT and Blosxom. The thought has been bouncing around my head for months and then I saw the making of what I was thinking.

Joe Gregorio has opened The Well-Formed Web site begun work on a weblogging tool (dubbed RESTLog) featuring a Web service API based on REST architecture principles. (Nice work getting something out there Joe!)

While a good piece of work, the implementation of the RESTful interface is not exactly what I had in mind though. Also of issue, the API and the tool are currently married. This is primarily why the Blogger API is ineffective it reflects to simple functionality of Blogger and was not designed with other tools in mind. I'd suggest that the scope of the API not be limited to blogging tools. This API could perhaps be added to a Wiki tool or other like tool.

If this API where to have a life away from Joe's tool, the language current in the APIs description would need to reflect the different implementations behind the tool. For instance when POSTing a weblog entry, the current interface documentation reads "Creates a new news item and rebuilds the index.html and index.rss files..." Not all tools need to statically generate (rebuild) pages. For example, Blosxom dynamically generates its pages. Other implementation may not want (or be able) to use the index.html or index.rss names. Another such instance is the template interfaces that assume only two templates are in use -- one for HTML and another for RSS. This of course is not the case for most other tools.

The interface should allow for processing options to be passed to the receiving system. Going back to my experience using MovableType, authors can check options for automatically convert line breaks to HTML and allowing comments or TrackBack pings. The interface would have to allow for such options to be sent and modified.

I think Joe has the right idea leveraging RSS in the interface. However, the interface needs to be more specific by setting very clear constraints and guidelines particularly when it comes to RSS 2.0. Interoperability and the reliable expectation of elements are issues without additional constraints because the RSS 2.0 format is too loose and ambiguous otherwise. The XSS Profile is an example of such an attempt. Perhaps it could offer some suggestions -- a simple core of mostly required elements augmented by modules via XML namespaces with all prior cruft depreciated.

While a good starting point, I am a bit skeptical that RSS will entirely suffice in this role. If the <description> element is being used to transport the body of a post, where would an excerpt go? The <description> element was original designed to carry an excerpt of the entry. Something like <content:encoded> could be used instead, but should an element that is core be in a module? A <body> would make sense, but there is no such tag in the RSS format. Derivating from the RSS format by introducing tags specific to the task at hand in my opinion is likely to be necessary.

I think this effort is a good idea that has been long overdue. I'm enthusiastic to see it advance. Hopefully my comments are helpful in doing so.

Raising the Bar on the RSS Validator.

Sam Ruby links to my O'Reilly RSS article and asks "Perhaps the RSS validator should optionally issue warnings (as opposed to errors) nudging people in the directions of best practices such as the ones that Tim has outlined?"

Needless to say I agree. As I've written in the past, being valid RSS does not guarantee that a feed's content is well-formed enough to be useful to an end user. Thanks to the loose design of RSS2 format that generally has made things worse rather then better, there are many perfectly "legal", but less then neighborly uses. It's why I drafted the XSS profile and its why I support such warnings being added to the RSS Validator.

Raising the Bar on RSS Feed Quality.

Earlier this week O'Reilly published my latest article "Raising the Bar on RSS Feed Quality." In it I offer recommendations for authoring more useful and effective feeds with an approach that is neutral, practical, and conservative.

This article requires more then just a summary and link though. It was much too involved an effort to not say more.

I am no stranger to publishing having co-produced my own indie music fanzine for several years that eventually made its way onto the web and into the My Netscape Network the started it all. What spurred my interest that eventually lead to this article was a conversation I've highlighted here before.

"You see," lamented Mark Pilgrim, "most RSS Feeds Suck." Mark's comments couldn't have been timelier when first published. I had been experimenting with ways to streamline my intake of weblogs and news using Rael Dornfest's lightweight Perl aggregator blagg. (I now have taken on furthering a RSS feed plugin for MovableType and have other related projects in the works.) While I had achieved a certain level of success, I was surprised and taken aback by the varying quality and inconsistencies of RSS feeds that made my solution less then optimal and at times unreliable. I stopped reading some weblogs because their feeds where too poorly done and simply not useful or worth the hassle. I'm certain I am not the first.

Mark's solution for the technical issues was to develop and publish an "ultra liberal parser" that would allow for common mistakes and other anomalies while processing output.

Joe Gregorio, developer of the Aggie RSS news aggregator, agreed with Mark's assessment, but questioned if Mark's solution was too liberal. "...where is the motivation to fix those feeds?" he asked.

Mark followed with a very noteworthy response: "I. Was. Missing. News." End users don't care about standards. They care about, in this case, getting their news. Developers care about standards because they help developers.

Both viewpoints expressed are valid, important and symbiotic. Without the predictability and structure that standards provide, application developers will struggle to reliably deliver content from feeds to end users.

This exchange only focused on the technical issues of consuming a feed. It does not address other issues that can detract from a feed's utility and effectiveness such as the absence of basic elements or a lack of descriptive and meaningful content that are sadly quite prevalent. (Hence my article.)

Not to sit idly by, Mark along with Sam Ruby and Bill Kearney developed the RSS Validator service that checks RSS feeds for problems and generates friendly and instructive messages to fixing them. The service is optimized for RSS 2.0, but supports other versions of the format. This recent development is significant because it provides a much-needed tool for alerting publishers to issues in their syndication feeds. More work is still needed.

As I mentioned this article was involved. Not necessary the article itself, but all of the discussions, debates and projects it lead me too. In addition to the mt-rssfeed plugin that I completely rewrote, I got involved in the great RSS "war" back in September. I lobbied for RSS 1.0 reform with a simpler RDF-based format and learned a bit about the shortcomings and merits of RDF. I also drafted a "more sane" RSS 2 profile dubbed Extremely Simple Syndication or XSS when it became apparent that was going to be one step forward and two steps back. I also learned about the proper use of XML namespaces in developing my own liberal parser (for the plugin), the shortcoming of the XML::Parser::Lite module (the hard way), using HTTP ETags amd Last-Modified and all about proper XML encoding. Currently I have a plugin for MovableType that does proper XML encoding/decoding (UTF8, CDATA...) nearing release and version 1.1 of the mt-rssfeed plugin on the drawing boards.

Now that the article is finally out after nearly 3 months (it's a long story I won't go into), I'm looking forward to finishing some of these projects off -- at least for a while. I'd like to turn my attention to something a bit different since I'm a technology generalist.

It's been a long strange trip.

RDF Follow-Up.

The recent "what's wrong with RDF?" discussion has been highly enlightening watching from the sidelines. It clarified some of the issues RDF has yet to address adequately and put other aspects into perspective for me.

Over at O'Reilly, Simon St. Laurent follows my summary with "What's right with RDF" where he writes "RDF is excellent at addressing a particular set of problems. The Resource Description Framework's primary approach is description. XML often presents something (a document, a table) directly; RDF more typically presents a description of something, not the thing itself. For some applications - like metadata and ontology development - this approach fits beautifully with the problem set." He continues "if RDF fits your problem set, run with it. If it doesn't fit, fight it - for that problem set. We're not all going to be happy all of the time, but RDF's strengths should not be forgotten in the arguing."

Also over at O'Reilly, Kendall Grant Clark reviews Tim Bray's proposal for an alternate RDF/XML serialization, called RPV, that is unambiguous and highly human-readable. The goal of RPV is not to support the full RDF specification, but rather the most common elements. Bray explained, "What RDF needs is the equivalent of XML, a brutal reduction (at least at the syntax level) that hits 80/20 points and anybody can figure out in 15 minutes by looking at it." He continued "I'm not saying RPV is the way to go, it's just a challenge: it proves that you there is a way to encode resource/property/value triples in XML that is human-readable and human-writeable."

Shelley Powers makes some final clarifications on her comments during the debate.

...I do not discount the complexity and difficulty inherent with RDF. I am aware, all too aware, of how complex the RDF Model documents can be. I know that there is much of the lab and not enough of the real world associated with the effort. And I'm not trying to dismiss people's concerns with the model or the RDF/XML serialization when I say that we need to release the RDF specification rather than start over.

When I say that I don't have problems with the RDF/XML, people should be aware that this is because I spent an enormous amount of time with the RDF specifications learning the core of the RDF model. I then spent a considerable amount of time learning how RDF is serialized with RDF/XML. I will now spend a significant amount of time reading through the newly released specifications to see where my understanding differs from the newest releases.

All of this has taken time and effort. I do not deny this.

I sincerely appreciate Shelley's honesty, effort, patience and passion in this recent debate. My personal encounters with RDF advocates/experts have been lacking and generally unsatisfactory. They've lacked a sense of clarity or acknowledge the realities of the "real world." Shelley' comments provided the perspective and sense of mutual understanding I wish most of the RDF community would exercise. (I don't hold this again RDF though.)

Shelley writes "...I'm not speaking for the RDF Working Group, in any way. I am giving my own viewpoints and opinions, which the WG may not agree with. No one can speak for the WG members, but they, themselves."

I wish Shelley did speak for the RDF workgroup. I found it rather odd, almost disconcerting, during the whole affair that few (any?) member of the RDF working group got involved. I would feel even better about this recent discussion hearing their viewpoints and knowing they are listening and have taken heed.

Commenting on Shelley's post Simon St. Laurent writes:

Thanks, Shelley.

You've brought some really difficult issues to a much broader group of people than the usual suspects, and I'm hoping that we'll see some interesting results over the next few months as people think about the questions you and other participants in the discussion have raised.

I doubt we'll all be living in peace and harmony by then, but we might at least have a better perspective on what we all see going on with these technologies.

I agree.

What's Wrong With RDF?

"What puzzles and confuses me is why there is so much animosity towards RDF" writes Shelley Powers, author of O'Reilly's upcoming book on RDF.

Shelley's post was made in response to Tim Bray's attempt to implement an RDF model into the RDDL specification that ultimately lead to his recommendation to use XLink instead. Bray's comments where picked up through the community unleashing a torrent of criticism and "animosity" directed at RDF. Jonathan Borden summarizes the significance of Bray's comments when he wrote, "this is the crux of the problem. If Tim Bray can't do RDDL/RDF using his little toe, with his hand tied behind his back and the rest of him hog tied and upside down, then what prayer to we have trying to foist this upon the rest of the world, i.e. people who just want to create and document XML namespaces?"

Shelley Powers' post to the xml-dev touched off a heated discussion late last week that continued across mailing lists and weblogs through the weekend. In this weblog post I will attempt to highlight and summarize this conversation. I attempted to order the comments in a way that they make sense and do follow somewhat of a chronological order though not entirely. I have attempted to compensate for the distributed and parallel nature of the conversation in order to maintain some semblance of its flow.

"I am particularly unhappy because of Tim Bray's involvement in all of this," wrote Shelley Powers on her weblog. "There's an implication and an assumption made that because Tim Bray 'invented' XML, he's qualified to be a definitive judge of RDF and RDF/XML. However, the two efforts are not the same: XML deals with meta-language, RDF with meta-data. Tim has a right to his opinion, and I don't fault him for it though I don't have a tremendous amount of respect for his half-hearted and rather dubious effort to use RDF/XML to model RDDL."

(Jonathan Bowen subsequently posted a human-readable RDF compliant RDDL format to demonstrate a human readable RDDL format could be created with an RDF model.)

Shelley offered some advice to anyone put off by RDF: "If you don't understand it, and don't want to take the time to understand it, or don't feel it will buy you anything, or hate the acronym, or you're in a general bitchy mood that's easily triggered if someone uses "Semantic" in the same sentence that contains "Web", the solution is simple: don't use it. Don't use it. Don't study it, look at it, listen about it, work with it, sleep with it, or generally go out and dance late at night with it."

She also notes "However, you may feel about RDF, the spec, or RDF/XML, the serialization, I would hope that you all remember one thing: in the last few days, the RDF Working Group has released not one, not two, but six new working drafts. Six. That's a hell of a lot of work."

(See this post for more on these latest RDF drafts from the W3C's Semantic Web Activity.)

Simon St. Laurent writes "I have a lot of respect for certain RDF applications that appear to be working, a general lack of interest in describing the world as graphs, and a serious distaste for RDF syntax. I genuinely resent what I see as the unfortunate influence of RDF on XML's post-1.0 development and the URI-centric viewpoint it has foisted on XML."

Simon later went on to say "RDF is powerful stuff, great for those who want to use it. Just keep it off _my_ dance floor, please."

Tim Bray responded "I'd go further. I think the current RDF/XML syntax is so B.A.D. (broken as designed) that it has seriously got in the way of people being open-minded about RDF. I'm baffled why the RDF working group has been forbidden to work on replacing that syntax."

In response Shelley Powers posted "because, Tim, there are implementations of RDF/XML as described, including Mozilla and RSS 1.0. I know you don't approve of them, but they are real, they are production, they are in use. Bitch about them as much as you want, but people use them."

On the comment board to Shelley's weblog Mark Pilgrim offered his take. "The fundamental flaw of the overzealous RDF advocates is the implicit assumption that "because I want to work with this data as RDF, it must be produced and stored natively as RDF." This is demonstrably false, and is what people are objecting to when they talk about "the RDF tax"."

Joe Gregorio published similar sentiments, "...my animosity comes from a push by possibly overzealous RDF proponents to change every format they come in contact with into valid RDF serialized as XML. I can point to RSS 1.0 and now the abortive RDDL as RDF attempt as failures of that strategy. On the other hand I can point to the use of RDF in Mozilla as a successful strategy of *leaving the native format alone* but still getting the benefits of RDF, as I pointed out Wednesday."

(Gregorio later published that Mozilla's use of RDF is smaller then first believed according to a OSAF mailing list thread.)

Gregorio continues, "I think a healthy dose of skepticism and a critical eye turned on it by people outside of the usual circle would be helpful to RDF, and it certainly couldn't hurt the XML serialization."

Elsewhere Tim Bray offered: "<famous-anecdote>Stuart Feldman, the Bell Labs guy who invented "make", woke up one morning a few weeks after he'd released it, and realized that the syntax basically sucked - all those tabs and colons and weird continuation rules. He started working on something better and was shot down because someone said "Stuart, there are *dozens* of people using this, it's too late to change it."</famous-anecdote> I think the number of people who are now using RDF is comparable, in relation to the number of people who need something like RDF, to the couple of dozen make users in 1970-something. It is *not* too late to fix the RDF syntax, it just takes some courage and initiative."

Shelley Powers responded:

"Yeah, but who is to say that [Stuart Feldman's] new approach would have been better? We can work and work and work a spec until we're blue in the face and not find a perfect solution. People learn to work the situation, or they learn to automate it -- i.e. autoconf, automake, and libtool.

Tim, we need the [workgroup] to finish. We have been waiting over a year for them to finish. We need something stable that we can work with. We do NOT need to start all over again. I would pack it in at that point. I really would."

Responding to a comparison of RDF now to HTML in the early days of the Web, Bray wrote, "HTML was by no means "bad". It was exactly what the world needed, and millions of people started using it because they liked it and because they could do "view source" and figure it out. My gripe with RDF/XML is precisely that it's failing to learn this lesson from HTML's success. Thus not enough people are using it, even though it's arguably what they need."

Shelley Powers notes, "the RDF Working Group was given a charter not to rewrite RDF/XML but to answer issues and provide as much cleanup and clarification as they could but to still remain within that support for previous implementations. It's sad that one can't just throw things out and start over again, but that's the way of the real world."

To that Bray responds "No it's not and yes you can, and you should."

Elsewhere Mark Pilgrim wrote in response to similar comments by Shelley, "you're hurting yourself more than anyone else by defending the status quo. You have a lot invested in RDF (the theory), and it'll all go to hell. The rest of the world will remain blissfully unaware that there was this great idea here, buried under mounds and mounds of incomprehensible angle brackets."

Tim Bray also wrote "The proponents of RDF (including myself) say that RDF's value add is that it allows the efficient interchange and manipulation of [Resource, Property, Value] triples. I happen to believe this propaganda and I also believe that one of the obstacles is the human-incomprehensible syntax. If you believe that RDF/XML's current syntax is not a problem please continue with your project of trying to sell it to the world, but it feels to me you're trying to accomplish a good thing with one hand tied behind your back."

(Mark Pilgrim offers a personal account of his attempts and frustrations with RDF here. I don't quote it here since the entire post is worth a read as a first-hand account of the issues being discussed throughout this discussion.)

In addressing the XML serialization of RDF Danny Ayers offers, "probably the primary cause of the ugliness of RDF/XML is the mismatch between the tree model of XML and the graph model of RDF. To explicitly represent a graph in XML the syntax will start getting ugly whatever you do. This is a weakness of XML, not RDF.

In a post to the xml-dev mailing list Shelley Powers wrote "I'll be honest, I don't care about the human readable/writeable aspects of RDF/XML as much as I do care that there are tools and APIs that manage it all for me. Sorry -- but I just don't think that is the most important aspect of either XML or RDF/XML. Again, IMHO."

To which Sean McGrath replied:

"I'm afraid, I take a diametrically opposite view. Things should be as complex as necessary but not more complex.

Punting to tools and APIs to salvage mankind from complexity of its own making is one of the main reason this industry is constantly battling the alligators rather than clearing out the swamps."

Elsewhere Jack William Bell echoed the same sentiment. "I have a problem with [an (easily) human readable format not being necessary]. If you don't care about being able to read it easily, why not use a binary format of some kind in the first place and reduce the bandwidth footprint?"

Tim Brays writes "I guess where Shelley and I would agree to disagree is that she doesn't think that easy human-readability is very important in the data formats she uses, and I think it's terribly, terribly important; I think one of the central lessons of the Web is that enabling people to do a "View Source" and roll their own based on what they see is, well... there's nothing more important."

Shelly Powers explained "RDF/XML is a mapping of that model to XML -- a mapping that's not necessarily easy or uncomplicated. XML was picked because XML is the prime metalanguage format used in many intra-mechanical transitions, such as forming the messages and providing the framework for something such as SOAP. It wasn't necessarily picked because XML is human readable, though we hope that would be a side benefit."

Tim Bray writes:

"for the record, I did *not* invent XML, I was a member of a [workgroup] of 11 people supported by an interest group of another hundred or so who subsetted an existing standard called SGML whose position was spookily similar to where RDF is today: it's important, some smart people are using it to do some big things, but it has no grass-roots uptake.

Turns out that some of the things you could do with SGML you can't do with XML, and some of them are awfully handy, but in the end it turned out that the complexity cost for doing them pushed the cost/benefit ratio into really lousy territory. Hmm, there's an echo in here."

In response to a post on why RDF is hard, Simon St. Laurent wrote "I don't think the RDF community has ever really understood that what they do is genuinely difficult for most people. The RDF community seems very self-selecting to me - those who can cope with RDF like it, and the rest of us keep our distance. I'm not sure it's ever been clear to people who find RDF intuitive why so many people bounce off of it completely, and I'm not convinced that it's possible to explain that to someone who genuinely likes RDF."

Shelley Powers replied "No one is forcing anyone to use RDF. This isn't a dismissal -- this was meant to be a reassurance."

RSS 1.0's Deeper Value?

Bill Kearney writes referring to Edd Dumbill's article Addressing RSS's logical model:

Basically what Edd's saying is that there could be more than one channel in a feed document and that the items could be shared among those different channels. This as opposed to sending out a separate feed for each channel and duplicating items all over the place. The channel itself has a sequence container that indicates not only what order to use but what items to use. That rdf:Seq container is telling you what items to use in the channel.

So, once again, the strange way RSS-1.0 appears to be doing things has a much deeper value.

While in theory RSS 1.0 may be doing something of deeper value, that value is generally not a feasable realization in practice. This was a question I raised some time ago and never got a satisfactory reply that justifed such a need for this "value."

Recent threads have indicated that RSS bandwidth consumption, whether centralized or decentralized, is becoming a major issue. Part of this can be solved by designing smarter aggregators which has begun to happen. Another part to handling this issue rests on publishers to develop a feed that conserves bandwidth while still remain useful to its taraget audience.

In certain circumstance having one item in multiple channels may help conserve bandwidth, but overall I believe it would not. In fact, if put in practice this would enlarge RSS feed files even further and make it more difficult for consumers to conserve bandwidth. Suppose I where to use RSS 1.0 to create one "uber" RSS file with channels for my latest posts overall and in each category. If a consumer wanted the items in one channel, they would have to download all the items and channels then resolve my channel's <Seq> list with the items. Perhaps the item in the channel of interest has not been updated while another has thereby changing the modified on timestamp -- more wasted bandwidth. Since I use a tool that generates my RSS feeds, as do most others, creating separate files for each channel is trivial. This approach can conserve bandwidth by containing only relevant information to the channel. (This is of the utmost importance when the end user is in a low-bandwidth resource constrained environment such as a mobile phone.) It also allows end users to subscribe to only receive the channels that interest them by a unique URI. (Multiple channels in one file/URI is not very RESTful. RESTonians have railed against SOAP's one URI, many methods approach how is this different?) Its also more straight-forward to process and allows for HTTP Modified-Date and ETags to be reliable and accurate.

I'm of the mind that the <rss> and <channel> tags should get folded together into one tag -- preferably just <channel>. Assuming the window has not already closed, I'm also of the mind that rather then try and justify its design decisions with theoretical idealism the RSS 1.0 working group spend its making modifying the specification to be useful and more practical -- or perhaps they should be the ones to take up knitting.

Un-Neighborly RSS.

[From my post to the RSS2-Support list.]

I've noticed recently that a number of feeds I follow have not been working properly in that my aggregator doesn't generate a hyperlink to the entry. I took a look at these feeds and noted they are all RSS 2.0 being generated by Radio. Furthermore, there is a <guid> element, but no <link> tag. The <guid> in fact a valid URL that points to the content, so all that has happened is a renaming of a tag.

What was the point of renaming the <link> element? I know the "spec" says this is legal, but it seems silly since not long ago Sam Ruby's quick survey of RSS tag usage nearly 88% of feeds provided an item <link>. Its breaking interoperability that the user base has defined and for what? In order for me to read these feeds the way I like I have to modify/upgrade my software? That doesn't seem very neighborly at all.

Phil Ringnalda writes about the irony in Sam Ruby's post about valid RSS and the problems with entity encoded HTML. I've been quite vocal that entity encoded HTML is a bad idea that we need to get away from. Phil's post just illustrates exactly what I'm getting at. The forefathers of the XML specification intended for markup to be encoded with CDATA, NOT with entity encoding. Had the practice of entity encoding been avoided as bad form (as it should), issues like Phil's would almost be non-existent.

In related news, Shelley Powers points out an unexpected surprise she had when using the MovableType templates that the RSS Validator provides -- its will included the entire text of the post in the feed. This is OK if that is your intention, but it also means large feeds which suck up bandwidth. In being a good citizen I've begun offer a feed with the full content in addition to the standard feed with just a descriptive excerpt.

The Next Logical Step for RSS.

It no secret that RSS 2.0 format contains many questionable design decisions that where developed with an even more questionable approach. Guidelines for developing namespaces are virtually non-existent in the current documentation. The default namespace cannot be declared. Entity encoded HTML is not only permitted (and evil) but encouraged. (My thoughts here.) Overlapping tags such as language and webmaster are not depreciated and no guidelines are given to precedence between these tags and their modular equivalents. Then there is my favorite, all tags within an item are completely optional -- either a title or description must be present. The list goes on and is quite long. The RSS 2.0 design team format had the opportunity to rectify these issues and failed to do so. In fact in some cases it made the situation worse.

Some will argue that RSS's current design is advantageous as they provider publishers to choose what suits them. I disagree because they generally undermine predictability for consumers impeding them with a plethora of variables that seem to change daily. This is unfortunate because developers of RSS consumer software, such as aggregators or toolkits, will be forced to repeatedly put effort into addressing these odd, yet "legal," variants. This effort would better serve the community and the proliferation of RSS if it where focused on advancing RSS applications. I believe in flexibility and evolable formats to a degree, but these issue cross that line into design flaws.

Recently, while testing the latest release of my RSS Feed plugin for MovableType, I found a feed that uses a guid tag instead of the commonplace link tag most software expects will be present. Perfectly legal, but it breaks existing software. So what is the point?

Brent Simmons asked whether RSS feeds should allow relative hyperlink in embedded HTML and if so what should an aggregator like his NetNewsWire do? The general consensus was that hyperlinks should be absolute, but there where those who disagreed and felt like browsers, the aggregators should resolve relative URLs. The lack of documentation leaves the question open to interpretation. (For the record I believe hyperlinks should be absolute. Its easier for the publisher to resolve and RSS feeds often exist in different location then the HTML that is embedded.) The fact that this question needs to be asked seems silly to me.

Sam Ruby points out that "a number of things which are quite legal RSS, but are less than neighborly." He is calling for the community to discuss best practices that he and Mark Pilgrim will then implement as warnings in their RSS Vaildator service. I completely agree and back this as the next logical step.

The purpose of the XSS profile I drafted was to add additional constraints to the existing RSS 2.0 format that makes feeds more predictable and more "neighborly." The XSS profile draft attempts to balance ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to be extended and adaptable. I resubmit XSS for the community's consideration in the discussion Sam has proposed.

Being valid RSS (or XSS for that matter) does not guarantee that a feed's content is well-formed enough to be useful to an end user. Here are some additional editorial recommendations to making feeds more useful and well-formed:

Use CDATA for embedding HTML in description tags. I can't advocate this enough. This is perhaps the most important recommendation I can make because it goes a long way to avoiding malformed XML/RSS files with almost no fuss. The method of entity-encoded HTML, also known as double entity-encoding, while quite common and not going away anytime soon you should consider avoiding it and saving yourself and others some headaches. Besides being a nonstandard practice within the XML specification, this method requires more processing cycles, adds to the file size unnecessarily. It’s also prone to occasional error.

Minimize the use of HTML in descriptions. John Postel's maxim on robust protocols says "be conservative in what you do..." and its in this same spirit that I make this recommendation. None of the RSS specification actually limit what you can embed in a description tag. While feed consumers should be prepared to strip out unwanted formatting, it's simply good manners as a content publisher to help avoid issues that could break their aggregator or layout.

Include a descriptive title for each item. Examine any collection of written thought, such as a magazine, newspaper or book, and you will note how information is organized in layers that can be easily scanned and processed by a reader. A good title, subtitle or summary (referred to as heads, desks and leads in media parlance) will not say anything that that isn't contained in the main body of the piece. Without scannability content consumption simply becomes too laborious and time intensive that most of us would hardly bother reading a thing. Try removing titles from any magazine or newspaper and you'll come to appreciate what I'm referring to. Besides being good for scannability, descriptive tiles are good for accessibility. Despite these time-tested best practices, many feeds fail to include titles let alone informative ones. Some webloggers claim its too time consuming and difficult to create a title for the numerous and short posts that they make daily. While I appreciate their standpoint a title, such as the site or collection name with a timestamp ("tima thinking outloud: September 1, 2002 20:13 -4:00") is more helpful and perhaps appropriate. The end user does not have to guess at a title usually by taking some number of characters or words from the beginning of description. ("Today I saw something that..." )

Avoid embedding HTML in the title. The channel and item titles in RSS, like its counterpart in HTML, are considered metadata and thereby is not expected to have display elements such as HTML tags. Embedding markup, even encoded with CDATA, could break an end user's application with your feed. Keep HTML in the description only – if at all.

Consider writing a meaningful and concise summary for the description. Like including a descriptive title, including a meaningful and descriptive summary improves the scannability, and thereby the utility, of your feeds. It helps readers determine if they want to continue reading and communicate the main point of the content for readers lacking time.

If you insist on including the full content of items in a feed, offer end users a choice. This can be a bit of a controversial issue as some users prefer include the full content of an item in the feed so they can read the content in their aggregator. Others prefer concise excerpts that can be quickly scanned or consumed over low bandwidth connections. These viewpoints are neither right nor wrong. It is the content publisher's decision based on the use of their content and needs of their intended audience. However publishers would be wise to offer end users a choice. Since most feeds are generated by a tool this is not difficult to provide. Also consider that end users may only be interested in a particular topic or resource. RSS is highly versatile and can be used to create feeds on a specific topic or resource like a calendar of events, mailing list archive, recent comments, or document repository.

Include contact information in your feed. With vague documentation and varying interpretations of RSS implementations issues will happen. Publishing an email contact responsible for the generation and management of the feed opens the lines of communication in rectifying these issues and collecting feedback.

I look forward to the feedback and insights that these community discussions with certainly produce.

Mark Pilgrim and Sam Ruby have announced the launch of a RSS validator service. The service is optimized for the recently proposed RSS 2.0 format, but supports prior versions. This is some excellent work! I particularly appreciate the friendly and instructive error messages is generates. This validator will be quite helpful and instructive to those who do not understand (or simply don't have the patience for) the mostly vague and loosely worded documentation. Being valid RSS does not guarantee that a feed's content is well-formed enough to be useful to an end user.

In other related, while the flaming on RSS has subsided for the time being, active discussion continues. Joel Spolsky raised the issue of RSS bandwidth consumption. The persistent downloading of his feed by aggregators using a brute force GET is costing him in bandwidth overruns. Mark suffers from the same issues and summarizes the conversation thus far here. Sam proposes a bandwidth friendlier solution here. I've noticed that my syndication feed hits also outweigh my web pages -- however I'm not the rock star that Mark and Joel are so I haven't had to pay extra. Yet. I know blagg that I use as the basis for my own homegrown aggregator is one of those pigs. Implementing DJ Adams' ETag-enabled wget script with blagg should help.

Ben Hammersley points out Ian Davis' LiSA "a SAX-like API for processing syndication formats. It's intention is to abstract away all the fluff contained in the current raft of syndication formats into a set of event notifications such as startDocument, startChannel, startItem, endItem etc."

I haven't been writing as much because I've been heads down coding the next version of the mt-rssfeed plugin and the underlying liberal parser. The liberal parser will be fully namespace-aware -- or at least as I understand it. The plugin has been completely rewritten and has too many new features and enhancements to list here. I should be starting my own internal testing today with a public beta test in the coming week.

In Praise of Evolvable Formats.

Mark Pilgrim rewrites Clay Shirky's In Praise of Evolvable Systems to explain RSS.

RSS 0.9x and 2.0 are the Whoopee Cushion and Joy Buzzer of syndication formats. For anyone who has tried to accomplish anything serious with metadata, it’s pretty obvious that of the various implementations of a worldwide syndication format, we have the worst one possible.

Except, of course, for all the others.

Well done Mark. I agree with your general sentiment, but think some of the flaws cited could have been easily avoided and the format would have been just an evolvable. Instead we get to endure the shortcomings of one person's rash decision making and whimsy.

The XSS profile I have proposed is an attempt to apply some common sense and make the most of what has already been done.

This is a second draft of the Extremely Simple Syndication (XSS) I proposed here. XSS was drafted up a more detailed description of recent discussions and proposals for a simple core that maintains a high level of backward compatibility with RSS 0.91 and its descendants. I have classified XSS as a profile (a restricted subset) instead of a specification. I'm developing it in the spirit of "release early, release often." Feedback and suggestions to improve this description are greatly appreciated.

This draft incorporates additional clarification and feedback I received after publishing the first draft September 27.

I've set up my weblog to produce feeds in this format (here and here). The MovableType templates I'm using can be found here and here.

Extremely Simple Syndication (XSS) Profile

DRAFT 2

ABSTRACT

The XSS profile defines a restricted subset of previous RSS formats that balances ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to extended and adapt to various problem domains. It is designed for applications operating in resource constrained environments such as mobile phones, PDAs and other devices or sites needing to conserve bandwidth for economic or physical reasons. It is also designed for authors wishing to provide a well-formed "feed" of information to consumers.

The XSS profile is designed around a simple common core of elements that may be extended. It is also designed to maximize backward compatibility with the RSS 0.91 format and its descendents. This allows XSS to leverage the existing install base of 0.91 feeds and prior bodies of work such as Dublin Core meta data and RSS 1.0 modules. The XSS profile contains three document types -- Strict, Transitional, and Extensible.

The goal of the XSS profile is to serve as guidelines to best practices in a balanced and simplified approach to authoring and consuming of syndicated resources with RSS.

COMMON CORE TAGS

<rss>
Description: The root tag for the syndicated resources collection.
Sub-Elements: channel (required)
Attributes: version - a string identifying the version including profile and document type.
Notes: Only one channel is permitted.

<channel>
Description: Container tag for a specific channel
Sub-Elements: title (required), description (required), link (required), item (required)
Attributes: none.
Notes: Only one title, description or link is permitted. Language is depreciated.

<item>
Description: Container tag whose contents represents one resource in the channel. At least one item must be present in a channel.
Sub-Elements: title (required), link (required), description (optional but highly recommended)
Attributes: none.
Notes: Recommended to not exceed 15 per channel.

<link>
Description: A unique URI using a IANA-registered scheme that specifies the location of the channel or item (resource). Applications are required only to support one of any of these IANA URI schemes however http:// is highly recommended. A required sub-element of channel and item.
Sub-Elements: none.
Attributes: none.

<description>
Description: A plain text excerpt of the channel or item (resource). A required sub-element of channel. Optional, though highly recommended, sub-element of item.
Sub-Elements: none.
Attributes: none.
Notes: The XSS profile supports plain text and does not permit encoded markup such as HTML to be included in the description. Recommended not to exceed 500 characters. Those wishing to embed markup language or larger pieces of content in the description tag should use the mod_content module.

<title>
Description: A plain text descriptive title of the channel or item (resource).
Sub-Elements: none.
Attributes: none.
Notes: Is the equivalent of the HTML title and only supports plain text. Encoded markup such as HTML is not permitted to be included in the title. Recommended to be no more then 100 characters.

DEPRECIATED TAGS

Tags not defined other tags from RSS 0.91 and its descendents are considered depreciated from the core in the XSS profile. This data can be furnished though various modules such as Dublin Core, mod_content and mod_admin. All depreciated tags in the XSS profile where previously considered optional except language which was required by the 0.91 specification, but made optional in later specifications.

DOCUMENT TYPES

Strict. RSS 2.0/XSS-strict. This document type only permits the use of core elements. Depreciated tags are not permitted. This document type does not expect the use of extension modules in order to maintain a simple lightweight footprint. Authors requiring more rich meta data and/or extensibility should use the Extensible document type. This document type is compatible with versions 0.92, 0.93 and 2.0. Other then the removal of the required language tag and content lengths, this document type is also compatible with 0.91. Since modules are not permitted, a namespace cannot be declared.

Transitional. RSS 2.0/XSS-transitional. All the stated expectations of the Strict document type apply. Depreciated tags such as language, copyright, and image found in the 0.91 format and its descendants are allowed, but should be ignored in favor of their extension module counterparts, when those counterparts are present. All tags deprecated in the XSS profile were previously considered optional (except language which was required by the 0.91 specification, but made optional in later specifications).

Extensible. RSS 2.0/XSS-extensible. All the stated expectations of the Strict document type apply except modules are permitted and expected. The default namespace must be defined. With this added capability to the document type, the X can be considered as representing 'extensible' rather then 'extremely.' Extensible Document Type can become quite complex and tedious to author without automated tools for generation. This document type breaks compatibility with formats prior to version 2.0. (XML namespace aware applications developed prior to version 2.0's release are likely to be able to process this document type, but are not guaranteed to do so.)

Content publishers are recommended to always utilize the Strict document type and only utilize the module extension document types (Extensible and Transitional) if richer meta data is required. When such a need exists, content publishers are also recommended to produce both a Strict document type feed and an Extensible or Transitional feed. This practice provides consuming applications a choice as to the depth information and the necessary sophistication to process it.

EXTENSIBILITY

Without detailed information to extending RSS 2.0 with modules and XML namespaces, the XSS profile Extensible and Transitional document types will follow the guidelines set forth in the RSS 1.0 format module documentation.

EXAMPLES

OUTSTANDING ISSUES

  • How to identify the profile/document type? Place in version? or use XML document type? Both?
  • Need to build DTD or schemas?
  • Need to list out depreciated tags and map to specific modules and fields.
  • link tag language sufficient? http:// is highly recommended or should it be required?
  • Need to better address non-English language feeds. Morten Frederiksen suggests inclusion of a clause like "if the language is not English, please include the dc:language element according to XXX." Should non-English feeds be forced to use Extensible or Transitional? Is there a better way?
  • RSS 2.0 can never use a default namespace?

CHANGE LOG

DRAFT 2, Oct 4 2002

  • Corrected numerous offenses on the English language. (Misspellings etc.)
  • Added revisions to Document Types Strict paragraph. (Michael Bernstein)
  • Resolved omission of item as sub-element of channel. (Morten Frederiksen)
  • Added omitted item description.
  • Added hyperlink to RSS 1.0 modules document.
  • link tag now has IANA-registered URI scheme clause to with link. (Morbus Iff)
  • Recommendation on Document Type usage selection. (Sam Ruby)
  • Added compatibility and namespace notes to document type descriptions.
  • Added clarification that Extensible is a superset of the Strict document type.
  • Added mod_slash tags to Extensible MovableType template.

DRAFT 1, Sept 27 2002

  • Initial release.

In his content syndication weblog, Ben Hammersley points out a nifty mobile application named PeekAndPick that he's begun using on his new Java enabled phone. PeekAndPick is a J2ME MIDP application for reading headlines and excerpts from RSS feeds and selecting the items you are interested in reading later. Your selections can be compiled into a list of links and emailed to your desktop for later reading.

PeekAndPick, the work of Jonathan Knudsen, is a brilliant mobile app because it illustrates good mobile application design. It doesn't try to do too much or be like a desktop application. It does demonstrates how mobile and traditional Internet channels can be used to provide a useful and appropriate solution together -- though it could do that a bit better. (More on that in a bit.)

The application's design acknowledges that mobile means "on the move." This mobility generally finds the end user in a less then ideal setting (context) that requires information be concise and interactions simple in order to be useful. Mobile users do not browse -- especially on a screen that typically is not much large then a postage stamp. (Note to telcos and device manufacturers: Web content was not designed or with a mobile devices or users in mind. It futile to try because it really doesn't work out and it just annoys people.)

While very well done application that we have to give Jonathan a great deal of credit for, PeekAndPick is not without its issues or room for improvement.

PeekAndPick underlines the need for RSS feed publishers to exercise good form and provide descriptive titles and concise well-formed excerpts. Since quite often this is not the case, PeekAndPick would probably benefit from a server-side proxy that retrieves feeds, cleanses them and strips any extraneous information that the app does not need. Besides heading off trouble this approach also helps keep bandwidth use to a minimum. Optimizing bandwidth usage not only improves response time, but it saves the end user some money. Remember mobile data services are typically priced by the bandwidth consumed.

Most of the configuration for the app, such as defining your list of feeds, should be handled through a web interface and not the mobile device. At a former employer, where I consulted on a number of mobile projects, we professed "configure while seated, consume while mobile." Who wants to type in a URI like http://www.timaoutloud.org/xml/index.xml with their phone's keypad?

In my cursory review of the code, the application does not queue messages if a connection if not available. Given the unreliable nature of mobile networks, handling these occurrences is important, even crucial, to developing user-friendly and reliable mobile applications.

I could really use PeekAndPick myself because I find myself woefully behind when I'm away from my desk for any extended period of time. Alas, I cannot use it because I use a Blackberry 957 that is not Java-enabled. (I love my Blackberry and still find it useful, but I've been deeply disappointed that RIM has forsaken users and not continued to advance this model's capabilities by providing J2ME support as its descendants. I suppose that's for another posts though.)

Check out PeekAndPick and consider your mobile application future.

Extremely Simple Syndication (XSS).

[UPDATE October 4 2002: A second draft has been published here.]

I continue my exploration of syndication formats. Inspired by some of yesterday's conversations on the RSS-DEV list and past proposals for a simple core I drafted up a more detailed description of such a format. (I'm releasing it in the spirit of "release early, release often.") It maintains backward compatablity with RSS 0.91 and its descendants.

From what I can tell this is the same format that Sam Ruby and Mark Pilgrim proposed except I've documented it in more detail and have classified it as a profile (a restricted subset) instead of a specification. Feedback and suggestions to improve this description are greatly appreciated.

I've set up my weblog to produce feeds in this format (here and here). The MovableType templates I'm using can be found here and here.

[UPDATE: Mark Pilgrim and Justin Rudd have written in to point out that I forgot to add an RDF namespace declaration to the XSS-extensible format. Thanks gents. Fixed. (The RSS 1.0 modules use RDF to structure the extensions in a principled way and avoid trouble.)]

Extremely Simple Syndication (XSS) Profile

DRAFT 1

ABSTRACT

The XSS profile defines a restricted subset of previous RSS formats that balances ease of use and authoring with ease of consumption by applications while maintaining the richness necessary to extended and adapt to various problem domains. It is designed for applications operating in resource constrained enviroments such as mobile phones, PDAs and other devices or sites needing to conserve bandwidth for economic or physical reasons. It is also designed for authors wishing to provide a well-formed "feed" of information to consumers.

The XSS profile is designed around a simple common core of elements that may be extended. It is also designed to maximize backwards compatability with the RSS 0.91 format and its decendents. This allows XSS to leverage the existing install base of 0.91 feeds and prior bodies of work such as Dublin Core meta data and RSS 1.0 modules. The XSS profile contains three document types -- Transitional, Strict, and Extensible.

The goal of the XSS profile is to serve as guidelines to best practices in a balanced and simplified approach to authoring and consuming of syndicated resources with RSS.

COMMON CORE TAGS

<rss>
Description: The root tag for the syndicated resources collection.
Sub-Elements: channel (required)
Attributes: version - a string identifying the version including profile and document type.
Notes: Only one channel is permitted.

<channel>
Description: Container tag for a specific channel
Sub-Elements: title (required), description (required), link (required)
Attributes: none.
Notes: Only one title, description or link is permitted. Language is deprecaited.

<item>
Description:
Sub-Elements: title (required), link (required), description (optional but highly recommended)
Attributes: none.
Notes: Recommended to not exceed 15 per channel.

<link>
Description: A unique URI that specifies the location of the channel or item (resource). A required sub-element of channel and item.
Sub-Elements: none.
Attributes: none.

<description>
Description: A plain text excerpt of the channel or item (resource). A required subelement of channel. Optional, though highly recommended, sub-element of item.
Sub-Elements: none.
Attributes: none.
Notes: The XSS profile supports plain text and does not permit encoded markup such as HTML to be included in the description. Recommended not to exceed 500 characters. Those wishing to embedd markup language or larger pieces of content in the description tag should use the mod_content module.

<title>
Description: A plain text descriptive title of the channel or item (resource).
Sub-Elements: none.
Attributes: none.
Notes: Is the equivelant of the HTML title and only supports plain text. Encoded markup such as HTML is not permitted to be included in the title. Recommended to be no more then 100 characters.

DEPRECIATED TAGS

Tags not defined other tags from RSS 0.91 and its decendents are considered depreciated from the core in the XSS profile. This data can be furnished though various modules such as Dublin Core, mod_content and mod_admin. All depreciated tags in the XSS profile where previously considered optional except language which was required by the 0.91 specification, but made optional in later specifications.

DOCUMENT TYPES

Transitional. RSS 2.0/XSS-transitional. Depreciated tags such as language, copyright, and image found in the 0.91 format and its decendents allowed, but should ignored be ignored in favor of their extension module counterparts. All depreciated tags in the XSS profile where previously considered optional except language which was required by the 0.91 specification, but made optional in later specifications.

Strict. RSS 2.0/XSS-strict. This document type only permits the use of core elements. Depreciated tags are not permitted. This document type does not expect the use of extension modules in order to maintain a simple light-weight footprint. Authors requiring more rich metadata and/or extensibility should use the Extensible document type.

Extensible. RSS 2.0/XSS-extensible. All the stated expectations of the Strict document type apply except modules are permitted and expected. With this added capability to to the document type, the X can be considered as representing 'extensible' rather then 'extremely.' Extensible Document Type can become quite complex and tedious to author without automated tools for generation.

Without detailed information to extending RSS 2.0 with modules and XML namespaces, the XSS profile Extensible document type will follow the guidelines of set forth in the RSS 1.0 format module's guidelines.

EXAMPLES

OUTSTANDING ISSUES

  • How to Identify the profile/document type. place in version or use XML document type?
  • Should build DTD and/or schema?
  • Some deprecaited tags are lacking corresponding modules.
  • Need to list out deprecaited tags and map to specific modules and fields.
  • Add Background and Requirements section?

Jon Hanna gives his admittedly late comments on Shelley Powers' questions.

The Internet and the Web both have current uses quite different to those, and many of their parts are used in manners further removed from their origins (e.g. the use of HTML in many Windows technologies doesn't even make use of the Web's distributed nature).

I maintain that RSS is such a technology. It's designed purpose is the syndication of "news" items from individual websites to a portal website, to be more specific to the MyNetscape portal website. It was *not* designed to be of any particular use to bloggers, aggregators, or metadata providers, but it *does* serve them and others.

Morbus Iff asked "Why is RSS 2.0 bad?"

Dan Brickley replied:

My concern: it introduces the ability to use XML namespaces for future extensions, whilst simultaneously failing to use it for the extensions introduced in 2.0. If those extensions were folded into an extension module, 2.0 and 1.0 would be both technically and conceptually a lot closer. The core of RSS doesn't need much more than 'item', 'channel', 'link', 'title', 'description'. A simple common core could represent that.

Ben Hammersley replied:

...my major concern is also with the namespaces. There appears to be no indication as to how modules should work. With RSS 1.0 there is, more or less (whoa there), a 'grammar' of sorts that tell us how to relate the data within, and how to describe the vocabulary with a schema. With 2.0, there is no guidance, and no thought as to how to deal with the world when there are 50+ modules. Without blathering on semanticwebbaly, I really really like the potential for schema-aware readers that comes with RDF, and the discipline it imposes. I'm lacking that cosy feeling with 2.0.

Morbus replied to Ben:

Ok. Well, if you want, I can take the 1.0 Module Building doc, turn it into a 2.0 Module Building doc, and then that "2.0 roadblock" will be removed.

He later announced that he had begun drafting up "Extending RSS 2.0 With Namespaces"

Sean Palmer posted a detailed (read: long) list of issues. Dave Winer replied on his web site.

Phil Ringnalda writes:

I am not a developer. I am not a professional. I am an interested amateur publisher. I am your audience.

Dave Winer: "I'm going to start posting to this list as long as personal comments about me don't dominate."

Morten Frederiksen notes:

The "simplicity" of HTML was great for its quick widespread production, but as mentioned elsewhere, it's a pain to consume.

This is, I believe, because simplicity is confused with "optional" and other lax definitions, either in the definition or the use.

I would have thought that a lesson was learned there, that what is really needed is *strict* definitions, at least if it is in any way meant to be consumed by machines.

I.e. a <description> element that is not defined as to its contents, and a <link> element that can point anywhere, is useless in practice, even though it's easy to produce.

This is why I like vocabularies like the Dublin Core, which provide precise definitions for syntax and semantics. They may be somewhat more difficult to produce, perhaps needing date format conversion, but they are easily used.

This is also why I dislike RSS 0.9x - it's too loose, it is unusable for anything *but* "the display of headlines in a browser for human consumption", and why I think that the path of RSS 1.0 is better.

The discussion continues.

Shelley Powers, through her weblog and posts to RSS-DEV site, offers a refreshingly different and enlightening take on the current discussions about RDF in RSS that has clarified much for me.

The title of her post "Who is your audience, and what are you trying to accomplish?" is her overriding point and brings into focus a crucial issue that is seemingly at the root of much confusion circular debate.

If RSS, past and current, is based on providing syndication and aggregation feeds, and nothing more, than I agree with those that say RDF adds nothing to the mix, and not because RDF adds complexity -- the reason is because the business of RSS isn't necessarily compatible with the business of RDF.

She continues:

If this group wants to continue providing a specification that defines syndication feeds, then it needs to consider that RDF not only doesn't buy the group anything -- it can harm the tool developers that use the spec. (Not to mention that trying to use RDF inappropriately can actually negatively impact the acceptance of the RDF specification.)

If, however, this group sees that what they're working on transcends throwaway syndication feeds, then it needs to formally define exactly what the business is _before_ trying to create a spec that implements it. Hence my questions: who is your audience and what are you trying to accomplish?

In echoing these thoughts on the RSS-DEV list, Shelley asked for some use cases of RDF in RSS in which Danny Ayers offers two:

*** Use Case 1 : Meme of the Moment ***
How do I find those events?
How do I find the people involved in those events?
How do I find the location of those events?
How do I find future events related to these?

The first of these is 'news-like' data, the other information is relatively static. I want this all in a feed.

*** Use Case 2 : Dubya Gets Drunk ***
A newspaper can provide a news headline saying 'Dubya gets drunk', which is fine, if the readers already know all about Dubya. If they don't, then the newspaper can provide some background in Section 2. The background information about Dubya will remain relatively constant, but the headline will change. If semantically rich data is available, then that background information can automatically be linked into the feed.

Shelley responded that case 1 expanded the scope of RSS beyond its current use, and that 2 was in scope and fits a potential need. However her question of purpose and audience where never really answered or agreed upon which prompted her post today.

Making a long story short: though I respect many of the individuals involved with RSS 1.0, their effort and hard work and intelligence and capability as well as energy, I can't continue to support RSS 1.0 or RSS-Dev. Not with this current level of confusion about what the group sees as its purpose.

Unfortunately, not supporting RSS 1.0 is seen as giving victory to Dave Winer at Userland, by forcing us into choosing an RSS 0.9x/RSS 2.0 path. However, I still don't approve of Dave's approach to implementing RSS and his unwillingness to give up ownership of it. I can respect Dave's contribution, and his hard work and effort, and his intelligence and capability, but I can't support a supposedly 'open' spec that's controlled by one company.

Ultimately, supporting either specification means, to me, continuing to support this competition between the groups, competition which threatens to Never...Go...Away, as can be seen in the comments to Phil's posting.

Sometimes, when I read these types of comments, I feel as if you and I don't matter at all; that you and I are nothing more than scraps of meat being fought over by two junk yard dogs. Well, this just peeves me. So, I'm taking the route that's been available to consumers since the beginning of time: I'm not buying.

I understand how Shelley feels. I've appreciated her comments on the RSS-DEV list recently as they have clarified a lot of my confusion and given me a new perspective.

In related news and in commentary made to one of Shelley's posts, Dave Menendez wrote:

This thread got me thinking about what makes RSS useful in the first place. What is RSS about at its core?

My understanding now is that the key feature of RSS is the list of items in the channel. Everything else--textareas, images, titles, descriptions--is secondary (but useful). Boiled down to its essence, the essential part of RSS is rss:items.

Dave has authored a proposal he calls RDF channel that is quite an intriguing and interesting concept. Assuming it fits the purpose once (if?) it's defined.

More FFKAR, RDF and FOAF.

The RSS-DEV list has been alive with much discussion and debate to the future direction of the groups efforts. Bill Kearney made a motion to consider a potential name change of the specification given the nasty circumstances and the potential mass confusement that sparked a discussion of redefining the format's purpose. The extent of RDF's roll in what I humorously refer to as the "format formerly known as RSS" or FFKAR (I'm going to be called a problem or a rat or a monster yet) continues to be a hot topic of discussion and debate. I like what Sam Ruby and Mark Pilgrim proposed (stripped down core, namespace support, but no RDF), but I can't help but be intrigued by RDF and its declared potential. Late this week dove into the debate to play devil's advocate to the many well-meaning and passionate advocates of RDF. My hope is to strike a balance between the most RDF support with the least amount of markup while improving my understanding of RDF's merits and uses in FFKAR and beyond. Maybe others will learn from my public ignorance.

Jon Hanna posted his strawman year-zero proposal. It was an interesting and valiant attempt, but my personal view is the "RDF tax" (syntax and rules) are too heavy. The developer community will struggle to understand its use or value and soon thereafter revolt.

As I wrote on the RSS-DEV list, ideally I see the purpose of FFKAR as "...a simple (single?) focus that has broad applications is the way to go. Different 'features' and restraints will come from extensions (modules) and their combination dependent upon the problem domain." I continued, "[The format] should be a simple core of a handful of elements and an absolute minimum of required RDF structures. (Additional RDF would come through optional extensions.) It is a simple, yet extensible, format to represent a collection of resources (URIs) with meta data. When I think beyond news syndication and weblogging this notion of a "collection of resources" makes a lot of sense to me. It is simple and focused, but can be applied to virtually anything and extended for specific problem domains. It also ties into the principles of Web architecture that the W3C and REST advocates profess."

In the spirit of friendly debate and experimentation, I've created my own FFKAR experimental format proposal that is based on one first made by Shelley Powers and later refined by Sean Palmer. A simple version is here and a meta-rich version is here. The MovableType template files can be found here (simple) and here (meta-rich).

They are both well-formed XML and valid RDF. What I like about this format is how it streamlines the overall syntax and still supporting RDF. According to Sean's original post, its using the DAML list construct that has been approved, though not formally to be added, to the RDF namespace in the future. (The W3C RDF validator doesn't support this yet so I used the DAML namespace.)

My modifications where in fact minimal to what Sean produced. (Like I'm an RDF master.) I took the Dublin Core elements back out of the core in keeping with a small tight core. I also changed some of the tag names to all lowercase.

This format is not perfect, but it demonstrates what I think is an important consideration in these proceedings.

The redundant info rss:link and rdf:about will be questioned. Does this have to be this way? In order to not completely break existing aggregators and toolkits you have to leave the rss:link tags. In order to not break with RDF standards you need rdf:about. This is one area that perhaps will have to be solved with education and evangelization over time.

This format does break backward compatibility with all previous specifications. As I have mentioned asserted before the question of compatibility is a difficult one. Its my belief that, under the current circumstances, backward compatibility with existing aggregators and toolkits is more important. I tested both feeds with my liberal parser/MovableType plugin (worked fine) and AmphethaDesk (choked).

This format limits the file to one channel. Personally, I think this is a good thing, but others I know disagree. Someone noted on the RSS-DEV list (I can't recall who) that an item can belong to multiple channels. While understandable, I worry that this capability will be taken too far. The issues of bandwidth efficiency and processing multiple channels and mapping items are likely to outweigh the benefits. I make it a policy of staying out of trouble by avoiding it all together.

Food for thought. Your comments are welcome.

Mark Pilgrim points to Bill Kearney who asks "what's so bad about RSS 1.0?". My instinct is that that the added syntax for ordering is seemingly redundant and/or of little value to the average developer's needs now. This is a question I have to ponder further though. I'd be interested to hear others' thoughts.

Mark and DJ Adams continue their exploration in RDF sharing their links of interest and thoughts along the way. Thanks guys.

Mark has also begun experimenting with the RDF-based FOAF (Friend Of A Friend) format this weekend. Mark wonders what this buys him, but notes he'll try anything once. (Me too.) He is"scrounging" for FOAF files to add to his profile. I've added my FOAF to the world here.

RSS Gets Ugly.

With today's post of his latest missive "Monsters" Dave Winer has taken the "RSS wars" to a whole new level of ugliness. This is sad and nothing I'm going to dwell on, but I am shocked and appalled even after my exchange on Blogroots yesterday. In my exchange with him he told me to be " positive" and "take the high road." I respectively disagreed with his definition of what that high road was, but this what he had in mind?

Yesterday I said he was "incorrigible." Today's post proved he is no longer capable of participating in further discussion of RSS and is now irrelevant.

More RSS Fun. To RDF or Not to RDF?

Dave Winer announced "The new target date for RSS 2.0 is Tuesday of next week. Maybe Wednesday. But no later than that." This is despite issues objections and issues raised by the likes of Sam Ruby and Ben Hammersley. An interesting thread on RSS was started over at Blogroots. I have a refreshing debate with Dave Winer ending with his trademark answer-tough-questions-with-the-same-question technique. (My apologies to Meg and all for taking the thread off track.) This dialog was constructive for me (and I hope others) because it help me determine that he is incorrigible and that I should spend my effort elsewhere. I'm starting right now.

There is some coalescing in the community on balancing the simplicity and extensibility (via XML Namespaces) of future RSS formats. The topic of constructive conversation is currently centered on RDF -- how can the "tax" by minimized or does it have a place in RSS at all? RDF is not fully understood by most to really say (a problem for RDF) so there is a lot of instruction on its merits being exchanged. Here are some links on the subject:

There seems to be two unification "2.0" formats that are resonating through the mailing lists and weblogs talking up RSS. One derived from the thoughts of Sam Ruby with refinements by Mark Pilgrim. Another is the RDF-friendly proposal made by Shelley Powers and refined by Sean Palmer. Interestingly both proposals break a certain amount of backward compatibility with previous specifications. This is still (at least to me) an outstanding issue -- compatable with what?

RSS 0.91 broke compatibility with 0.90. 1.0 is based on 0.90 while 0.92-0.94 is based on 0.91, but breaks backward compatibility with it. Yet most aggregators and toolkits (I just finished developing one) will derive basic RSS feed information irregardless of which specification a feed may or may not comply with. Standards are not as useful as applications, shouldn't the primary concern be a format that works with most applications rather then a specific standard or fork?

In order to settle this argument, I'm of the mind to let backward compatibility of the specifications suffer a bit in order to serve the greater good of developing a strong "core" with extensibility through namespaces. (I keep an open mind on RDF and have yet to form an opinion.) This is why I like and support the proposals by Mark/Sam & Shelley/Sean and will be studying them more closely.

RSSFeed v0.35 Released.

RSSFeed v0.35, a MovableType plugin for retreiving and inerting RSS feed content into templates, is now available for download here. (Finally.) This release is a maintenance release to fix a few things and test a new parser module.

This release includes an additional module, XML::RSS::LP, a liberal RSS parser. This parser is "liberal" in that it does not demand compliance to a specific RSS version and will ignore tags it does not expect or understand. The parser's only requirement is that the file is well-formed XML. The module is an improvement over the XML::RSS module that versions prior to version 0.35 utilized in that it is leaner -- the majority of code was for generating RSS files which is of no use to us here. XML::RSS::LP will also fall back on XML::Parser::Lite if XML::Parser cannot be found. This will help users who do not have XML::Parser on their system and are unable to compile it. (XML::Parser::Lite is a simple pure Perl module that does have some limitations. Please consult the module's documentation for further information. If you have SOAP::Lite installed you have at least this.)

I ran a few tests against all (yes all) the feeds in the Syndic8 directory that where reachable and had a status of either Syndicated or Awaiting Repair. Just under 10k in all. The success rates, where well-formed XML and did not choke the parser, where encouraging:

Syndicated feeds: 96%
Awaiting Repair feeds: 93.3%

Here is what's new in this release.

  • Dropped in RSS liberal parser with minor code edits
  • Eliminated version parameter and default
  • Eliminated code for fixing malformed XML.
  • Minor bug fixes and optimizations (Thanks Chui Tey)
  • Added "Exists" and depreciated "If" tags.

Nothing too exciting it is a maintenance release afterall. I promise the next one will be more exciting. Your feedback is welcome. Enjoy.

I'm catching up on my weblog reading. Jon Udell posted a response to a message I sent him pointing out that he should use XML's CDATA rather then entity encoded HTML content. (Also known as HTML-escaped content) He goes on to quote the mod_content (an RSS 1.0 extension module) specification allowing for this technique to used. (Good point. Missed that.) Userland's 0.92 format document also states that entity-encoded HTML is permitted. He concludes "clearly entity-encoding fits best with current practice. But I agree that CDATA is more desirable for a variety of reasons."

It may be the current practice, but its one that needs to change. The reasons are many if you're a content publisher or developer:

  1. Unnecessary file "bloat"
  2. Non-standard XML encoding
  3. It can be prone to error
  4. CDATA requires less processing

Unwittingly, I was just as guilty of not encoding my RSS as I should. MoveableType's default RSS templates use entity encoded HTML. Tonight I modified those templates and CDATA encoded my RSS feeds. If your newsreader can't handle my feed, please let me know. I am told that some aggregators do not handle CDATA encoded descriptions. If you have trouble reading my feeds now, please let me know.

In the continuing discussion of the future of RSS, Rael Dornfest, RSS 1.0 working group chairperson, addresses Dave Winer's plan to release the final draft of what he is now calling "RSS 2.0." (It was original conceived at 0.94.) Rael writes "I believe what you're hearing from various quarters is that there are mega problems, minor problems, not the least of which is that you're about to cut off some of the most productive discussion on the future of RSS -- both in terms of community and technical details -- in two years."

It seems alot of the action has been at Ben Hammersley's RSS weblog lately and I had been missing out. Ben is collecting questions for Dave Winer on his RSS proposal. The attached comments to his posts have some insightful observations worth reading illustrating the need to stop rushing the design of the specification. Ben also points to Sam Ruby's suggestions for the Userland proposal.

I like Sam's thinking. With agreement on namespaces in principle, I think cleaning up the "core" is a smart idea. Sam's notion of removing (depreciating?) redundant and "optional" tags in modules strikes a reasonable balance to satisfying the divergent opinions. It also simplifies the format.

It does not end the discussion. In fact it raises a major question in my mind that I have not seen discussed directly: What do we break compatibility with? Looking at it another way, what format do we maintain compatibility with? The is not clear in any of the discussions I've read.

RSS 0.91 broke compatibility with 0.90. 1.0 is based on 0.90 while 0.92-0.94 is based on 0.91, but breaks backward computability with it. Proposals by Rael (2 years ago) Sam, Jon, Mark and Shelley (with refinements by Sean Palmer) strike some interesting compromises with varying degrees and types of compatibility with previous formats.

Perhaps it has not been discussed because it is the no-win scenario. Perhaps backward compatability with existing aggregators and toolkit is the key. Nonetheless, it would seem to me that until some consensus is formed on this issue, other discussions cannot be had.

I'd like to be wrong. Am I?

RSS Makes My Eyes Bleed.

I've completed my RSS article and sent it along for editing. I'm not done with RSS yet as I intend to finish up my work on some RSS facilities for MoveableType.

The debate has continued through the weekend into today with the reported status of "RSS 2.0" and its roadmap ranging from "frozen/last call" to "have we started yet?"

Mark Pilgrim continues to collect and report many interesting posts and viewpoints on the debate here. Sam Ruby points to three different viewpoints, noting one example where the extremely rushed design process has been disappointing with its results.

On a personal level, I found today terribly confusing and disheartening. Who is this shadow committee making these decisions? What is the truth of the matter? "I feel like Fox Mulder" I quipped in one of those emails.

If RSS 2.0 is to be taken seriously it will need to be the community's specification, designed by the community through discussions and debates in a community forum like a mailing list. If its not, we may as well stop trying to further a simple extensible syndication format for the Internet.

This rushed design process of this increasingly vital format simply will not stand.

Sam Ruby writes "Lots of really, really, really good progress on RSS.  Now, I'd like to make a plea.  Slow down."

I'll second Sam's motion!

Since we are, as Jon Udell put it, "...still at the beginning of the RSS adoption curve", it doesn't seem logical to just say "what's done is done" and continue to build on a shaky foundation. Backwards compatability should be a concern --- some transitional and depreciation work is warranted to start straighten the path forward.

I don't think full backward compatibility is possible with all of the RSS formats that have been published. For instance .9 and 1.0 use <rdf:RDF> as their root while the other .9x formats use <rss version="">.

My Life is RSS Feeds.

This past week, I've been up to my eyeballs in the delights of RSS feeds. I'm currently for an article and my work to further the MoveableType RSSFeed plugin. I've also been following the rapid fire and some times fierce discussion of how RSS should move forward. Keeping tabs on this discussion has been a full time job. Its been disheartening see such an acute level of acrimony with little relevenance to the mission of RSS. Today the path forward began to come into focus. Here are some of the more insightful, level-headed posts to note.

My hope is for a social engineering miracle to allow it to happen now.

I made the following comment in response to a post by Karsten Januszewski of Microsoft on the recently formed Aggregators mailing list:

With all due respect to Microsoft and in the spirit of feedback, the recent post suggesting RSS feeds be registered and discovered in UDDI and associated paper puzzles me. Why would I want to discover a simple lightweight syndication mechanism (RSS) through a mechanism that is significantly more complex and heavyweight as UDDI? Otherwise, from a technical perspective and at risk of sounding like a REST advocate, why would I want to discover a"service" that I retrieve with a simple HTTP GET with SOAP API calls? There are a number of mechanisms already deployed that allow one to register and discover feeds in a more lightweight fashion. How does this make the architecture more sustainable when UDDI is a centralized service and that does not have any type of push or pubsub mechanisms built?

I am of the opinion that while it may be "a logical application of UDDI in its mission" it defeats the mission of RSS.

[UPDATE: The latest version of this plugin can be found here.]

When MovableType first introduced plugins, Richard Rainwater was one of the first to release a plugin for retrieving RSS feeds and inserting them into MovableType layouts. In recent weeks (months?) Richard's site with the plugin source has been offline and he has been unreachable. Interest in this plugin still persists with occasional requests for it being posted to the Movabletype Plugin Development forum.

I just so happens I downloaded a copy of what I believe was his last public release (RSSFeed v.3) before Richard's site went off line. As I understand the license Richard released it under, I am allowed to "redistribute it freely" under terms that I believe I have adhered to. It is now available here. (If I am mistaken about the license please contact me and I will take it down.)

All is not well though because as Richard indicated in his first post to this thread, the XML::RSS module his plugin uses has "serious problems handling minor problems in RSS files." He had planned to rewrite it with a different parser. To my knowledge that version was never released.

Based on my own experience writing a homegrown RSS feed aggregator with Blagg and examining the XML::RSS feed module, I have a sense of what the problems are. Mark Pilgrim put it most elegantly when he said "You see, most RSS feeds suck." Many RSS feeds break compliance with the sometimes vague RSS specs. Other product (sometimes user) specific tags have been added under the guise of a new specification. Worse many more are not even well-formed XML.

Mark's solution is to use an "ultra liberal parser" to grab information out of RSS files. In order to raise the function and reliability of this plugin, it will need to take a similar route until (if?) feed quality improves. It's something I'm considering doing if there is interest in the MT community.

RSS Feeds Suck.

Mark Pilgrim: "You see, most RSS feeds suck."

Amen! Mark laments the sad state of most RSS feeds' compliance with specifications and consistency with each other. In response, he's developed an "ultra-liberal RSS parser" in Python that attempts to handle many of the common mistakes so news aggregators and other automated agents don't simply "choke" trying to process large numbers of feeds.

I've been experimenting with ways to streamline my intake of blogs and news using Rael Dornfest's blagg. While I've achieved a certain level of success issues from poorly formed feeds persist.

We can do better and must. RSS feeds are simply too useful to do otherwise.

About this Archive

This page is an archive of recent entries in the Syndication category.

Quotable is the previous category.

Viewpoints is the next category.

Find recent content on the main index or look in the archives to find all content.