Archive for the ‘Tech’ Category

Podcasting and WordPress

Imho there should be a field in the post editor where I enter the URL of the item enclosure.

If there’s nothing there, no enclosure.

Scraping the HTML gives unpredictable results.

Apparently it misses the MP3 sometimes, as in this post:

http://rebootnews.com/2009/12/17/rebooting-the-news-37/

A Google search shows the problem has been reported.

http://www.google.com/search?q=wordpress+podcast+enclosure

Cross-platform issue in Frontier kernel

Just spent about an hour chasing down a bug that was caused by a cross-platform difference in how the kernel converts dates to strings on Mac and Windows.

On Mac if you convert a date to a string, the year is expressed in two characters, on Windows in four characters.

For example, on Windows:

string (clock.now ()) == “12/6/2009; 7:09:24 PM”

And on Mac:

string (clock.now ()) == “12/6/09; 7:09:24 PM”

Usually it doesn’t matter, the date is just converted to a string for display purposes.

In one of my applications I use the date string to index into a table. It works fine until you try to take a table from the Mac and use it on Windows. It doesn’t work, of course.

Probably can’t fix this without breaking something. At least I’m documenting it. :-)

Ideas for a BitTorrent namespace

Background: On Friday, I started reviewing the RSS produced by some of the BitTorrent sites. What I found was pretty great. There were some problems, but nothing that can’t be easily sorted out. What was really exciting was that through Twitter, some of the developers of the feeds and apps that use them, got in touch. As a result of the discussions, I agreed to outline ideas for a possible BitTorrent namespace. That’s what I’m doing here.

These are just ideas. Please don’t implement anything based on what you read here. If you have observations, please post a comment below. If this yields a namespace that people use, there will be a fixed URL that contains the docs for the namespace. It won’t be on unberkeley.com. :-)

All elements in this namespace are optional. It’s perfectly valid to have an RSS file that represents torrents in enclosures without using any of these elements.

Each RSS item describes a single torrent. The enclosure element describes the torrent file itself. The length attribute is the size, in bytes, of the torrent file. The type attribute is the MIME type of the torrent file. The url attribute is the address of the torrent file.

While the namespace is designed for use in RSS 2.0 feeds, it may be used in any XML-based file format that allows extension through namespaces, such as Atom or OPML 2.0.

For terminology, I used the BitTorrent vocabulary Wikipedia page as a guide.

Now for some of the possible elements of the namespace.

torrent:contentLength

  • The total number of bytes in all the files the torrent makes downloadable.

torrent:contentFiles

  • The number of files the torrent makes downloadable.

torrent:seeds

  • The number of clients that have complete copies of all files made available by the torrent.
  • It’s a guide to how quickly the files may be downloaded.

torrent:peers

  • The number of clients that have a partial copy of the files made available by the torrent.
  • Note: This was originally “leechers” but was changed because there was a consensus that it should be called “peers.” DW 12/6/09

torrent:verified

  • A boolean value, if true, the torrent has been verified — it’s a real working download, not a fake or scam. If false, it has not been verified.

Reviewing BitTorrent RSS feeds

First, let me say that it’s great that RSS is so widely supported by BitTorrent sites.

When I met Bram Cohen at a Wired awards ceremony in 2003 (BitTorrent won, Skype was also nominated, as was RSS), I told him I thought the two technologies were a great match. Both were low tech, open formats for content distribution, that solved parallel problems.

So it’s great that six years later so much is being done at the intersection between BitTorrent and RSS.

That said, there are some obvious improvements possible. I’m going to take some notes here, and hopefully add to them over time.

ezrss.it

Looking at the feeds produced by ezrss.it. An example.

1. The length attribute on the enclosure must be the length of the torrent file, not the length of the file the torrent describes. For example, the correct length of this torrent is 14680, not 367450064 as the RSS feed indicates it is.

2. It would be great if the feed items had <category> elements. Obviously everything on eztv.it would be TV Show. Use the same categories that Mininova uses?

3. It’s a good idea for <item>s to have <guid>s. It’s the foolproof way for clients to tell if they’ve seen an item before. It might make the file a bit larger, but not by very much. It’s really worth it. ;->

isohunt.com

Looking at feeds produced by isohunt.com. An example.

1. Add some carriage returns and tabs to the XML text. Makes it easier to read the feeds.

2. The content-type that’s being returned is text/html. That’s pretty hokey for XML data. Try text/xml for a better result.

3. <item>s have <guid>s — good! you can leave off the isPermaLink=true, it’s the default value. Any client that isn’t defaulting this way has a bug.

4. They make the same length mistake that ezrss.it makes. The length should be the length of the torrent file not the length of the file it describes.

5. The category is provided, but it’s encoded in HTML in the description. That’s okay but there should also be a <category> element. It’s esp important for a site like isohunt that handles multiple types of torrents (not just TV like eztv.it).

kicaksstorrents.com

Here’s an example feed produced by kickass.

This feed is interesting because it provides good hints for extensions to RSS that might prove interesting for BitTorrent applications.

But first, a list of things that need immediate attention.

1. There should be an enclosure element on each item, that’s how the client knows what to download.

2. The author element must be an email address.

3. The following elements are not part of RSS, and probably should be part of a namespace just for BitTorrent clients: torrentLink, hash, peers, seeds, leechs (probably should be spelled leeches), size (good idea, since the enclosure length would be the size of the torrent file, but it’s probably better to call this length for consistency), verified (use true/false instead of 0/1, slightly easier for a non-programmer to understand).

Good stuff…

1. They use the category element! That’s good.

torrentzap.com

Here’s an example feed produced by torrentzap.

1. No enclosure element, would be hard to write a client that processed this RSS feed.

2. Good — it uses the category element correctly.

3. There is no size element in RSS. Again highlights the need for a BitTorrent-specific namespace.

4. Would make it easier to debug if there were some carriage returns in the file.

5. Please use guids.

Update: Here are some ideas for a Torrent namespace.