XML (Tree) Lab

cassette-tape-breakdown

You've seen lots of lists in the last week. This exercise takes a step back from the heavy-hitting practice you've been doing these last two weeks, and hopefully will let you see the forest for the trees

"Forest for the Trees" was one of the tracks on the chart-topping 1986 hit album Fore! by Huey Lewis & The News. It was the first cassette tape that I owned.

A compact cassette tape is pictured above; it is an antiquated analog format capable of storing 90 minutes of audio, 45 minutes per side. Mix tapes are generally considered to be the historical source of the playlist. See also High Fidelity.

Warmup

  1. Define a structure called an empty-tree. This should be a structure with zero fields. It is, for all intents and purposes, the same as an onion-core.

  2. Define a data structure called a binary-tree. The data definition for this structure should look like

    ;; DATA DEFINITION
    ;; A binary-tree is EITHER
    ;;   - empty-tree OR
    ;;   - (make-binary-tree number binary-tree binary-tree)
    

  3. Declare several binary-trees for testing. One should be the empty tree, and each of your other trees should have some property that you think would be worth testing. I would encourage you to have between three and five trees defined for testing. Give them awesome names.

  4. Write a method called sum-tree-nodes. It should consume a binary-tree and return a number. That number should be the sum of all the values in the tree.

  5. Write a method called sum-wonky. If the node has an even? number, then you should only sum the nodes in the right-hand side of that node. If the node is an odd? number, then you should only sum the left-hand side of that node. For example, the tree

    (make-binary-tree 1
      (make-binary-tree 1 (make-empty-tree) (make-empty-tree))
      (make-binary-tree 1000000 (make-empty-tree) (make-empty-tree)))

      will return 2. If it is not clear, draw the tree.

  6. Write a function called build-binary-tree. It should take one parameter, a positive number. Your function should build a tree to the depth given as a parameter. A tree of depth zero should return the empty-tree. If you are being a good computer scientist, you'll think about how many nodes are in a binary tree of depth 25 before you try testing it. Each node should carry a value equal to its distance from the leaves.

  7. When testing the previous problem, use sum-wonky (among any other tests you write). What is this equivalent to? What method from the previous problem set have you just created a memory-intensive version of? 

    UPDATE: If you prefer... if you create a tree of depth 20, and run sum-wonky over that tree, how many nodes does the function touch? What structure is that like?

These problems should, I hope, get you started, but be somewhat less grueling than the last problem set.


ALL YOUR BLOG

aybabtu

I want you to think about XML for a moment. 

Now, I want you to reflect on how similar

<p>A paragraph</p>

and

(p "A paragraph")

really are. 


Now, what about:

(div
   (p "Some text")
   (a (@ (href "http://www.rockalypse.org")))
   (p "More text"))

We're basically looking at a list of symbols and strings. The first list (really a tree) begins with the symbol div. The rest of that list contains three more lists. 

My goal for you, in this section, is to write a function called parse that takes in an XML expression, and returns a list of all of the URLs that are contained in the document. You'll know these, because those will be the second element of a list starting with the symbol href

;; CONTRACT
;; parse :: SXML -> (list-of string)
;; PURPOSE
;; Takes an SXML document representing the RSS feed from a weblog,
;; and returns all the URLs referenced in all of the posts in the blog.


My solution was 13 lines of code, and one cond statement. You have all the tools to do this—I didn't do anything particularly fancy. (UPDATE: When I say my solution was 13 lines of code, I do not mean it would pass a code walk. Upon reflection, my solution is the shortest, ugliest solution you can write. A more readable solution will be a bit longer, and as a result, it would be clearer and more maintainable.)

  1. Write a function called flatten. It should consume a list of lists and symbols, and return a "flat" list of symbols. For example, if I give you (a (b c) d (e f)), you should give me (a b c d e f). Take a look at the Scheme function append. Otherwise, this (roughly) follows the template for lists. (Don't forget that you can find out if something is a list by asking list?). 

  2. We will parse weblog RSS feeds. This awesome video explains RSS feeds:

  3. Write a function called parse. It should take an SXML expression (sexp) and wander all the way down and through the tree, looking for description nodes. In short, you are uninterested in the actual content of all nodes save description nodes. 

    description nodes will get handled specially, because they contain a string containing yet more XML!

  4. Once you find a description node, you need to apply the function cleanup to the string you find there, and keep parsing it. cleanup converts the string full of XML into another SXML tree that your function will enjoy chewing up.

  5. You're on the lookout for href nodes. When you find one, return the URL found there. 

  6. Pass the results of parse to flatten, and you should end up with a nice, neat list. 


I've provided a template file that will get you started. By get you started, I mean "I've written some tests for you and everything." 

Template Code

You'll want to download the template code that I've provided to get you started.

In Case of Emergency

You should probably watch this video, too:


In truth, if you would like more practice with higher-order functions, and you would like to use this laboratory as part of your portfolio, then I would encourage you to include the Higher-Order Functions Contest work as part of your portfolio submission. It is a duplication of problems from the previous lab, but you are asked to rewrite them using map and fold. Most of them are one-liners, but may take a touch of thought.

Feedback

I had 3 goals in assigning this laboratory.

  1. Provide you with the opportunity to practice traversing structure-based trees.

  2. Provide you with a real-world tree-processing task. In particular, the conversion of XML into SXML (Scheme lists of lists), and then the processing of that data to extract useful information.

  3. Provide a break from the drill-and-practice nature of previous labs, and give you a chance to stretch your brains just a touch.

If you would please rate this laboratory based on these goals, and provide whatever feedback you feel is necessary to improve this lab in the future, that would be excellent.