G

What I learned building Candle in Rust

Candle is a tool that takes in a body of HTML plus some CSS selectors, then runs the CSS against the HTML to find what you’re looking for. For example:

$ echo "<h1 id='my-id'><span>text</span></h1>" | candle 'h1 {text}, h1 attr{id}'
text
my-id

It’s the largest project I’ve built in Rust so far, and I learned some excellent parts of Rust that will continue to be useful for my new projects.

I like filter_map

It’s a small thing, but I like the filter_map method. Here’s an example of where it’s useful.

In Candle, each combination of a selector and an operation (e.g. a.highlighted attr{href}) is called a Finder. Each Finder is tested against each HTML element and its operation is run if it matches:

// Lightly edited to remove some code that's not important here
impl<'a> Finder<'a> {
    fn match_and_apply(&self, element: &ElementRef) -> Option<String> {
        if self.selector.matches(element) {
            match self.operation {
                FinderOperation::Text => Some(...),
                FinderOperation::Attr(attr) => {
                  // Some(value) or None, if the attr doesn't exist
                },
                FinderOperation::Html => Some(...)
            }
        } else {
            None
        }
    }
}

If the selector matches and the operation returned something, it returns Some(...). Otherwise, it return None. Returning an Option here makes it easy to filter out empty results using filter_map:

finders.iter().filter_map(|finder| finder.match_and_apply(&element)) {

filter_map runs the closure, and if the closure returns Some(x), it uses x (i.e. it auto-unwraps the value) and if it returns None, filter_map discards the value entirely. So it turns Vec<Option<String>> into Vec<String>. This pattern pops up pretty often and filter_map is a neat solution.

The Read trait and testing

The Read trait allows for reading bytes from a source. It requires one method, read. Other methods are defined in terms of Read. (This is similar to how Rust’s Iterable module only requires a next method but gives many other methods defined in terms of next.)

I had a method that read from io::stdin, called read_from_stdin that took zero arguments. In order to test it, I changed it to instead take in anything that implements Read:

-fn read_from_stdin() -> Option<String> {
+fn read_from<R: Read>(mut reader: R) -> Option<String> {

Now instead of calling read_from_stdin, I call read_from(io::stdin()). The real benefit comes from testing the method. I can now pass in a Cursor, which lets us use slices or vectors as if they’re files. Here’s the full test:

#[test]
fn test_less_than_1024_bytes_of_html(){
    let html = r#"
        <!DOCTYPE html>
        <meta charset="utf-8">
        <title>Hello, world!</title>
        <h1 class="foo">Hello, <i>world!</i></h1>
    "#;
    let result = read_from(Cursor::new(html));
    assert_eq!(result, Some(html.to_string()));
}

(Here’s the commit where I added Read and Cursor.)

Cursor is a very helpful struct and I’ll definitely keep it in mind when writing future tests.

Pipes and panic

In an early version of Candle, piping the output to anything that truncated the output (like head) would cause it to panic:

$ curl --silent daringfireball.net | candle 'html {html}' | head -1
<html class="daringfireball-net" lang="en">
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', src/libstd/io/stdio.rs:792:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

This is a common problem with any Rust program that writes to the terminal in Rust. There’s an open Rust issue about it. As I understand it, when enough output has been read, the kernel closes the pipe and sends a SIGPIPE signal to the application that’s generating output (i.e. candle) so it gracefully dies. However, Rust ignores SIGPIPE, and keeps trying to generate output, and then panics when it detects the now-broken pipe.

Specifically, the issue is this:

println!("{}", &string);

println! panics when it runs into any errors. The fix is to use writeln!, which does not panic but instead returns std::io::Result<()>. We can then catch broken pipes:

let mut stdout = io::stdout();

if let Err(e) = writeln!(stdout, "{}", &string) {
    if e.kind() != ErrorKind::BrokenPipe {
        eprintln!("{}", e);
        process::exit(1);
    }
}

This new code does 2 things:

  1. It catches the Err and ignores it if it’s a broken pipe, and
  2. If it’s a real error, it prints the error to STDERR and exits with an error code (but without printing a gross error message)

This is much better than println!‘s panicky behavior.

(I did this in two PRs: #13 and #14.)