What I learned building Candle in Rust
Candle is a tool that takes in a body of HTML plus some CSS selectors, then runs the CSS against the HTML to find what you’re looking for. For example:
$ echo "<h1 id='my-id'><span>text</span></h1>" | candle 'h1 {text}, h1 attr{id}'
text
my-id
It’s the largest project I’ve built in Rust so far, and I learned some excellent parts of Rust that will continue to be useful for my new projects.
I like filter_map
It’s a small thing, but I like the filter_map
method. Here’s an example of
where it’s useful.
In Candle, each combination of a selector and an operation (e.g. a.highlighted
attr{href}
) is called a Finder
. Each Finder
is tested against each
HTML element and its operation is run if it matches:
// Lightly edited to remove some code that's not important here
impl<'a> Finder<'a> {
fn match_and_apply(&self, element: &ElementRef) -> Option<String> {
if self.selector.matches(element) {
match self.operation {
FinderOperation::Text => Some(...),
FinderOperation::Attr(attr) => {
// Some(value) or None, if the attr doesn't exist
},
FinderOperation::Html => Some(...)
}
} else {
None
}
}
}
If the selector matches and the operation returned something, it returns
Some(...)
. Otherwise, it return None
. Returning an Option
here makes it
easy to filter out empty results using filter_map
:
finders.iter().filter_map(|finder| finder.match_and_apply(&element)) {
filter_map
runs the closure, and if the closure returns Some(x)
, it uses x
(i.e. it auto-unwraps the value) and if it returns None
, filter_map
discards
the value entirely. So it turns Vec<Option<String>>
into Vec<String>
. This
pattern pops up pretty often and filter_map
is a neat solution.
The Read
trait and testing
The Read
trait allows for
reading bytes from a source. It requires one method, read
. Other methods are
defined in terms of Read
. (This is similar to how Rust’s Iterable
module
only requires a next
method but gives many other methods defined in terms of
next
.)
I had a method that read from io::stdin
, called read_from_stdin
that took
zero arguments. In order to test it, I changed it to instead take in anything
that implements Read
:
-fn read_from_stdin() -> Option<String> {
+fn read_from<R: Read>(mut reader: R) -> Option<String> {
Now instead of calling read_from_stdin
, I call read_from(io::stdin())
. The
real benefit comes from testing the method. I can now pass in a
Cursor
, which lets us
use slices or vectors as if they’re files. Here’s the full test:
#[test]
fn test_less_than_1024_bytes_of_html(){
let html = r#"
<!DOCTYPE html>
<meta charset="utf-8">
<title>Hello, world!</title>
<h1 class="foo">Hello, <i>world!</i></h1>
"#;
let result = read_from(Cursor::new(html));
assert_eq!(result, Some(html.to_string()));
}
(Here’s
the commit where I added Read
and Cursor
.)
Cursor
is a very helpful struct and I’ll definitely keep it in mind when
writing future tests.
Pipes and panic
In an early version of Candle, piping the output to anything that truncated the
output (like head
) would cause it to panic:
$ curl --silent daringfireball.net | candle 'html {html}' | head -1
<html class="daringfireball-net" lang="en">
thread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)', src/libstd/io/stdio.rs:792:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
This is a common problem with any Rust program that writes to the terminal in
Rust. There’s an open Rust issue about
it. As I understand it, when
enough output has been read, the kernel closes the pipe and sends a SIGPIPE
signal to the application that’s generating output (i.e. candle
) so it
gracefully dies. However, Rust ignores SIGPIPE, and keeps trying to generate
output, and then panics when it detects the now-broken pipe.
Specifically, the issue is this:
println!("{}", &string);
println!
panics when it runs into any errors. The fix is to use writeln!
,
which does not panic but instead returns std::io::Result<()>
. We can then
catch broken pipes:
let mut stdout = io::stdout();
if let Err(e) = writeln!(stdout, "{}", &string) {
if e.kind() != ErrorKind::BrokenPipe {
eprintln!("{}", e);
process::exit(1);
}
}
This new code does 2 things:
- It catches the
Err
and ignores it if it’s a broken pipe, and - If it’s a real error, it prints the error to STDERR and exits with an error code (but without printing a gross error message)
This is much better than println!
‘s panicky behavior.