I want you to think about XML for a moment.
Now, I want you to reflect on how similar
<p>A paragraph</p>
and
(p "A paragraph")
really are.
Now, what about:
(div
(p "Some text")
(a (@ (href "http://www.rockalypse.org")))
(p "More text"))
We're basically looking at a list of symbols and strings. The first list (really a tree) begins with the symbol div. The rest of that list contains three more lists.
My goal for you, in this section, is to write a function called parse that takes in an XML expression, and returns a list of all of the URLs that are contained in the document. You'll know these, because those will be the second element of a list starting with the symbol href.
;; CONTRACT ;; parse :: SXML -> (list-of string) ;; PURPOSE ;; Takes an SXML document representing the RSS feed from a weblog, ;; and returns all the URLs referenced in all of the posts in the blog.
My solution was 13 lines of code, and one cond statement. You have all the tools to do this—I didn't do anything particularly fancy. (UPDATE: When I say my solution was 13 lines of code, I do not mean it would pass a code walk. Upon reflection, my solution is the shortest, ugliest solution you can write. A more readable solution will be a bit longer, and as a result, it would be clearer and more maintainable.)
- Write a function called flatten. It should consume a list of lists and symbols, and return a "flat" list of symbols. For example, if I give you (a (b c) d (e f)), you should give me (a b c d e f). Take a look at the Scheme function append. Otherwise, this (roughly) follows the template for lists. (Don't forget that you can find out if something is a list by asking list?).
- We will parse weblog RSS feeds. This awesome video explains RSS feeds:
- Write a function called parse. It should take an SXML expression (sexp) and wander all the way down and through the tree, looking for description nodes. In short, you are uninterested in the actual content of all nodes save description nodes.
description nodes will get handled specially, because they contain a string containing yet more XML!
- Once you find a description node, you need to apply the function cleanup to the string you find there, and keep parsing it. cleanup converts the string full of XML into another SXML tree that your function will enjoy chewing up.
- You're on the lookout for href nodes. When you find one, return the URL found there.
- Pass the results of parse to flatten, and you should end up with a nice, neat list.
I've provided a template file that will get you started. By get you started, I mean "I've written some tests for you and everything."