Official Google Reader Blog - News, Tips and Tricks from the Reader team

XML Errors in Feeds

12/23/2005 09:50:00 AM
Posted by Mihai Parparita, Software Engineer

Dealing with the millions of RSS and Atom feeds out there is hard work. We're not trying to make you feel sorry for the Reader team, but as anyone who has attempted to implement a feed parser knows, there are many subtle deviations from the spec that you have to handle if you want to have any hope of satisfying the needs of your users (who shouldn't have to care about such things).

The feed generating/parsing world has had the debate about Postel's Law, as it applies to XML and feeds, several times. We are not here to weigh in on either side of the argument. Instead, we hope to provide some data so that such discussions can be made on more than philosophical grounds. Without further ado, here are the top XML errors that we have encountered when parsing all of the feeds that our users have added to Reader (and there are a lot of them):

% of errors Error description
15.6%Input claims to be UTF-8 but contains invalid characters.
14.9%Opening and ending tags mismatch
13.9%An undefined entity is used (e.g.   in an XML document without importing the HTML set)
7.8%Documented expected to begin with a start tag, but no < was found
5.7%Disallowed control characters present
5.5%Extra content at the end of the document
4.2%Unterminated entity reference (missing semi-colon)
4.2%Unquoted attribute value
3.8%Premature end of data in tag (truncated feed)
3.3%Naked ampersand (should be represented as &amp;)
2.1%XML declaration allowed only at the start of the document
1.8%Namespace prefix is used but not defined
0.75%Comment not terminated
0.64%Attribute without value
0.17%Unescaped < not allowed in attributes values
0.11%Malformed numerical entity reference
0.11%Unsupported/invalid encoding
0.10%Comment must not contain '--'
0.10%Attribute defined more than once
0.07%Char out of allowed range
0.03%Comment not terminated
0.02%Sequence ]]> not allowed in content

As a whole, about seven percent of all feeds that we know about have some of these errors (this data is based on a one-day snapshot, so transient errors may be present). Note that these are all XML errors, meaning that the feed is not well-formed. We are not talking about complying with and validating against the RSS or Atom specs - that is an even higher bar than we have set here. In general, our recommendation to feed producers is to use the work that the community has put into the feed validator.

On a related note, we're aware that Reader has some issues with titles. It's great that there are test cases, and we will add this bug to our to-do list.

Why should text have all the fun?

12/15/2005 11:58:00 AM
Posted by Mihai Parparita, Software Engineer

We at the Reader team like to receive some visual stimulation with our reading, so we're subscribed to a bunch of photo feeds. It's great that RSS and Atom can deliver more than just text, but it gets boring to view everything in the exact same fashion.

We've therefore come up with what we call "photo templates," which is a special display mode we have for photo sites. When it's triggered, we try our best to expand thumbnails to full-size photos. Additionally, on the right side of the screen we display a list of clickable thumbnails of other photos from that feed, so that you can cherry-pick the best ones to view. Right now we support the feeds from a few sites; here's a list of them and a sample feed from each one:

This is great if you use one of these photo services, but what about other sites or self-hosted photo blogs? For now we've specifically whitelisted the above five sites for photo template support. This doesn't scale that well - there's thousands of sites and only a few overworked Reader engineers.

Our plan is to support the Media RSS extension to RSS and Atom (the thumbnail and content tags are most relevant to photo feeds). This way, if you include the right tags, Reader will be able to display your feed with the photo template without us having to do any work. The Media RSS spec is pretty thorough, and you can use Flickr's feeds as examples of usage.

Subscribing to feeds via little Google buttons

11/28/2005 10:20:00 AM
Posted by Chris Wetherell, Software Engineer

The web is full of little buttons these days. Little buttons pop up everywhere to email an article, watch a video, play a song, post to your blog, or bookmark a site. They can claim affiliation to various ideas, communities, or ideologies. Browsing the web these days with an eye towards looking at these tiny, active buttons is almost zoological in nature.

See!→ Add to Google

In recent days we added a little button to the button zoo. Google is now offering a little "Add to Google" button which you can put on your site, blog, or corner of the web that can make it easy for people to subscribe to your feed. Here's some instructions for adding the button to your site.

If you'd prefer more direct links from your browser (and if you are a bit brave) you can try dragging any of the following bookmarklets to your links toolbar. Now here's something funny: some feedreaders strip out potentially malicious scripting as can exist in bookmarklets. Google Reader is one, so if you're reading this post from there, you'll have to visit our blog to get 'em. After adding them, you can click them to preview the site you're visiting in Reader and easily subscribe to it. We can't issue a warranty on this approach 'cause we might change something since Reader isn't yet 2 months old. (A toddler!)

  • → Subscribe - Views the first available feed in Google Reader.
  • → Show all feeds - Lists all feeds and links them to Google Reader. Sadly this link won't work in IE6 with SP2 due to recent changes Microsoft has been making to provide a more secure browser. If you're using Internet Explorer then we recommend skipping this one.

We have our eye on further solutions for one-click subscriptions and like many others we're looking into ways we can help but for now we hope a little button makes for happier subscribing and reading.

May we get you some chips and a soda too?

11/09/2005 06:18:00 PM
Posted by Jason Shellen, Product Manager

It turns out some folks, like Reader-fan Moebius, are enjoying Reader in new ways:

I don't like to use my laptop on my lap, because of heat and other reasons, and I don't like to be pushing the 'J' key very often, so I downloaded "JoyToKey" to use my gamepad for browsing Google Reader. With JoyToKey I mapped "J" to down and "K" to up, "V" to right, and "Ctrl-W" to left. The other joystick was mapped to other normal browser commands. So, I can read Google News very comfortably sitted on my sofa.

Watch out for gamepad thumb and that other RSS.

Warning: Geekery ahead!

11/03/2005 12:39:00 PM
Posted by Mihai Parparita, Software Engineer

You may have noticed that some Greasemonkey scripts broke with the recent release (for example, the excellent Google Reader Auto-Read). First, a bit of background. Reader uses JavaScript. A lot of it. So much that it would take a while to download even on a broadband connection. What we (and other Google products) do is to compress it before sending it to the user. So this line of code:

FR_Queue_currentQueue.pageDown();

becomes:

t.ma()

Only people care about descriptive names like pageDown; to a computer, ma is just as good. It turns out that these compressed names will change from release to release, as we tweak the code (the more often a name is used, the shorter the compressed name that's chosen for it). Greasemonkey scripts that rely on these compressed names (like the aforementioned one) will therefore break.

So far the situation sounds pretty dire. How can more stable scripts be written? The answer turns out be quite simple. Reader has UI controls for most things that you'd want to do from a script. For example, if you want to automatically move the queue down, you can think of that as being equivalent to the user clicking the "Down" button repeatedly. Those buttons have IDs that we promise won't change without a good reason. Through JavaScript, you can simulate user clicks. Therefore, if your Greasemonkey scripts relies on them, you'll be all set. To give an example, I've written a modified version of the auto-read script that uses this method. It has code like the following (the simulateClick function is included in the script):

simulateClick(getNode("queue-down"));

I hope this helps other Greasemonkey scripts authors that are trying to tweak Reader (and other sites too).

P.S. We just pushed a new Reader release. No new features, but we have fixed a few bugs with unsubscribing and keeping things unread.

A new Reader release

10/27/2005 06:02:00 PM
Posted by Jason Shellen, Product Manager

Earlier this week we pushed out a new release of Reader. Most of the changes are under the hood and should make for a faster, smoother experience. However, there were a few user interface tweaks too. My favorite is support for the space keyboard shortcut. In all browsers, pressing the space key moves down in the current page. Reader's addition to that is to advance to the next item if you're at the end of the current one. This means that you can read your entire reading list with just one finger press! I'm sure there is some sort of Pavlov's dog joke to be made here, but we can't take too much credit for the one-click advance, since it's been present in email clients for ages.

Here's a more complete list of other changes we've made:

  • Progress messages for most operations.
  • Usability tweaks when subscribing to feeds.
  • Stopped using "click here" for link text (thanks for reminding us Philipp).
  • Fixed OPML import issues for Newsgator users.
  • Fixed issues with item links in some Planet and Odeo feeds.
  • Fixed Firefox issue that made it eat up the entire CPU when loading items.

We plan on keeping the features and improvements going strong. Feedback = better Reader. Thanks for your help.

Greasemonkey Scripts

10/21/2005 07:51:00 AM
Posted by Mihai Parparita, Software Engineer

I've written my share of Greasemonkey scripts. I'm therefore very glad that in turn other people are writing their own scripts for Google Reader. We make no guarantees that we won't (inadvertedly) break them, but we'll certainly be looking at them for inspiration as to what our users want out of the application.

Get Google Reader scripts and more at the Userscripts.org repository. To learn more about Greasemonkey and learn how to install scripts, check out the excellent Dive Into Greasemonkey.

Google Reader: Two weeks

10/21/2005 04:51:00 AM
Posted by Chris Wetherell, Software Engineer

First post! Everyone from the Google Reader team would like to say hello. (Say hello, everyone.)

(Everyone looks up while still typing.) "Hello, internet."

I'm lucky I got their attention - the last two weeks have been a whirlwind. Most products at Google see incredible attention whenever they're released and Reader followed this now familiar pattern:

  1. Speculation
  2. Deluge
  3. Feature requests

Given that some servers survived their newfound celebrity and that all of the team members are still breathing (just checked again) I'm willing to call this a remarkable success. Especially for a Labs launch of this scope and for an actual beta-level project. I'd like a recap now - which is as much for my benefit as yours since we've been heads-down for a bit.

Bellweather, labs

A small Labs effort can be used to gauge the amount of interest in Google helping in some area. Since Reader accounts number in the hundreds of thousands in only our first two weeks of being out there it seems fair to say that there is some. Demonstrated need drives development - so we think we can go ahead with many of our plans which have included more interfaces (the lens is just one of several planned approaches), better ways of recommending new things to you and performance bolstering.

Big kitchen? Big table.

Every few seconds or so there's a bit more of everything on the internet. Feeds reliably so. Reader is using Google's BigTable in order to create a haven for what is likely to be a massive trove of items. BigTable is a system for storing and managing very large amounts of structured data and Jeff Dean just gave a talk about it at the University of Washington and Andrew Hitchcock was nice enough to make a summary for those interested in an overview.

With a little help from the internet

Like many geeks, we love people tweaking, twisting, pushing a technology to be more useful in the ways that suit them best. Here's some recent favorites:

If you develop anything Reader-related drop us a line. We'd be happy to post about it here. We're excited to be making Reader - most of us slept overnight at the office during launch week. It's been an amazing experience.

We're curious about one thing, though, and maybe the developers of other feed reader projects can tell us about their experience when testing their products...

How do you stop from being distracted by, well, the whole internet? It's an endless divertimento - I mean, seriously, it just keeps coming...