Saturday, August 05, 2006

The Windows RSS platform and DTDs

I really like the idea of the Windows RSS platform. One central feed store on your machine, which each application can access.

What I don't like about the Windows RSS platform is, that it does not support feeds with DTDs. If you use IE7 to go to a feed that uses a DTD, you will get the following error displayed:

Internet Explorer does not support feeds with DTDs.

When I imported my OPML into the Windows RSS platform using IE7 (see my description here how to do that), several feeds would not display.

I did ask the Windows RSS team why these feeds are not supported. Here is the reply from Walter von Koch:

You are correct that the RSS Platform does not support feeds with DTDs. This is an unfortunate limitation based on the technology used for parsing feeds. Specifically, certain version of MSXML do not allow for DTDs being ignored. In later versions of MSXML it is possible to ignore DTDs. We are looking into possible using such flag. However, it would only work if that newer version of MSXML is installed on the machine.

With previous versions of IE, it was common to ship a new version of MSXML. See this knowledge base entry for a list of IE and MSXML versions.

However, apparently Microsoft customers got some problems with existing applications when a new version of MSXML was installed. Walter von Koch:

Redistributing new versions of MSXML is not a trivial undertaking. MSXML is the basis for many corporate applications and many enterprises tightly control which versions of MSXML are allowed on their machines. Hence, having IE7 require and install a new version MSXML can have significant impact on many customers.

Sounds like some DLL hell issues.

Why does the Windows RSS platform have to ignore the DTD? Turns out, parsing a DTD is a security issue. You can create a DTD that is so complex, that the computer parsing the DTD will use all its memory and CPU. So by creating a malicious DTD, you can run a denial of service attack against all parsers of this DTD.

My next idea was to strip the DTD reference from the feed's XML, as parsing the feed should be possible without a DTD. Turns out that this is easier said than done, too. Walter von Koch also replied to this idea:

Stripping out the reference to a DTD is not a simple task. Unfortunately it's not just applying a simple regex and strip the DTD. Parsing of xml is actually a non-trivial task given the number of encodings etc. Getting it right, and making sure it's done in a secure and safe way is challenging. Hence, the RSS Platform relies on MSXML for the xml parsing.

Recently, several of the sites I had problems with, now support an alternate feed without a DTD. In IE7, if you are on a site with multiple feed formats, the feed icon shows a drop down menu where you can choose between the different formats. So maybe by the time IE7 ships, this will be mostly a non-issue.

However, I still hope that IE7 will support feeds with DTDs, at least if an updated MSXML version is already installed on the system.

No comments: