Archive

Archive for October, 2010

How I Tamed VimClojure

October 24th, 2010 15 comments

Update 1/2013: The future of Clojuring with Vim is tpope’s foreplay.vim. You can find a tutorial here.

NOTE: I really love the setup described below. But I admit it’s a lot to digest. If you just want vimclojure working now, you may be interested in vimclojure-easy


UPDATED October 7th, 2011 for VimClojure 2.3.0

TL;DR VimClojure Download List


Introduction

Like Windows users at a Ruby conference, vim-using Clojurians can feel a little left out. Everyone uses Emacs. Well, I played a ton of Rogue when I was 11, so I’ve always had a soft spot for vi/vim’s way of doing things, at least the movement keys anyway.

Up until now I’ve been doing Clojure development in a cobbled together mix of screen.vim and VimClojure. I could start a simple repl with screen and rely on VimClojure for basic syntax highlighting, indenting, etc. I’ve been aware that VimClojure had much more to offer, but the setup docs I found were varied, inconsistent and generally confusing. Other blogposts on the subject seemed to either surrender to a half solution like I already had, or were so inconvenient that I wouldn’t use it. So, I finally sucked it up, slogged through the setup, and got everything working to my liking.

Here, I’ll try to clearly document my full setup in hopes that someone else might find it useful. My main goal was a setup where I could make the most of VimClojure directly from vim without a bunch of preparation. If I want to just play around, I don’t want to fiddle around with Maven or Leiningen or anything. I want to start vim, get a repl, and start exploring!

Everything here is in my vimfiles repo on github.com

screenshot

Prerequisites

Note that throughout this document, if you’re on Windows, replace references to the ~/.vim directory with ~/vimfiles

Pathogen

If you’re not interested in using pathogen, just adjust paths accordingly throughout the rest of this post

To get going, you’ll need to download a few things. First, for vim plugin sanity, I use pathogen.vim. Download it and add pathogen.vim to ~/.vim/autoload and add the following to the top of your .vimrc file:

" Load plugins from .vim/bundles using .vim/autoload/pathogen.vim
call pathogen#runtime_append_all_bundles()

filetype off " On some Linux systems, this is necessary to make sure pathogen
             " picks up ftdetect directories in plugins! :(
syntax on
filetype plugin indent on

Now add a ~/.vim/bundle directory. This is where you’ll expand all your plugins, each in its own directory.

screen.vim

For this setup, I’m still using screen.vim as a simple way of starting up VimClojure’s Nailgun server directly from vim. It makes sure the server is killed when vim exits and is generally useful anyway. Download and install it. Since I’m not a fan of vimball (vba) installers, I just grabbed the source directly from github (hit the “Downloads” button) and dropped it in my bundle directory.

That’s it. Now you can use :ScreenShell args ... to start a simple screen session.

Note to Windows users: Obviously, Windows doesn’t have GNU screen. The least painful approach seems to be to install Cygwin. I don’t care that much for Cygwin, mostly because it has no uninstaller (wtf !?!?), but a super-minimal Cygwin install seems to be working ok for me. Just be sure to add c:\cygwin\bin (or whatever) to your system path.

VimClojure Setup

Install

And of course you need VimClojure itself. Everything here assumes VimClojure 2.3.0. Hopefully, it doesn’t change too much. Download VimClojure and extract it to your ~/.vim/bundle folder. For full interactive support, you’ll need two other things:

  • The Nailgun client (note this is 2.2.0, but still works fine with 2.3.0) – A tiny executable that allows vim to communicate with a running JVM. (as a side-note, Nailgun is an occasionally useful way to avoid the horrendous startup times of the JVM)
  • The VimClojure Nailgun server jar (you can browse here for other versions) – The other half of the Nailgun protocol where the running instance of Clojure will live in a JVM. Note that as documented in the README.markdown file in the VimClojure archive, you can also specify this as a dependency in Leiningen, Maven and friends. This isn’t that useful for ad hoc Clojure exploration though.

Assuming you’ve installed VimClojure to ~/.vim/bundle/vimclojure-2.3.0, here’s what I did with these. I created a folder, ~/.vim/bundle/vimclojure-2.3.0/lib and added the following to it:

bundle/vimclojure-2.3.0/lib
|-- nailgun
|   |-- Makefile
|   |-- ng
|   |-- ng.exe
|   |-- ngclient
|   |   `-- ng.c
|   `-- readme.txt
`-- server-2.3.0.jar

Note that the ng executable on Unix must be built with make.

Now the only files we still need are the clojure core and contrib jars. I drop these in ~/.vim/lib. Which version you put there is up to you. We’ll set things up in .vimrc below so that this is the last resort for finding Clojure jars. If a different version is in your project, that should get used instead. If you just start up vim to experiment outside a full project, this is the version you’ll get.

.vimrc Settings

Ok, we have everything we need in our .vim folder. Now we just need to wire it all up in .vimrc. First, let’s do some stuff to make things cross-platform:

" Let's remember some things, like where the .vim folder is.
if has("win32") || has("win64")
    let windows=1
    let vimfiles=$HOME . "/vimfiles"
    let sep=";"
else
    let windows=0
    let vimfiles=$HOME . "/.vim"
    let sep=":"
endif

Now we know where the .vim folder is and the correct Java classpath separator (stupid!) to use. So we can build a generic classpath that will cover most project configurations. I tend to always open vim at the root of a project and never change working directory so your mileage may vary with this:

let classpath = join(
   \[".", 
   \ "src", "src/main/clojure", "src/main/resources", 
   \ "test", "src/test/clojure", "src/test/resources",
   \ "classes", "target/classes",
   \ "lib/*", "lib/dev/*", 
   \ "bin", 
   \ vimfiles."/lib/*"
   \], 
   \ sep)

Basically we’ve just built a big, catchall classpath. Note that the jars in .vim are at the very end so they’ll be a fallback.

Finally, we’ll pull it all together by configuring VimClojure:

" Settings for VimClojure
let vimclojureRoot = vimfiles."/bundle/vimclojure-2.3.0"
let vimclojure#HighlightBuiltins=1
let vimclojure#HighlightContrib=1
let vimclojure#DynamicHighlighting=1
let vimclojure#ParenRainbow=1
let vimclojure#WantNailgun = 1
let vimclojure#NailgunClient = vimclojureRoot."/lib/nailgun/ng"
if windows
    " In stupid windows, no forward slashes, and tack on .exe
    let vimclojure#NailgunClient = substitute(vimclojure#NailgunClient, "/", "\\", "g") . ".exe"
endif

" Start vimclojure nailgun server (uses screen.vim to manage lifetime)
nmap <silent> <Leader>sc :execute "ScreenShell java -cp \"" . classpath . sep . vimclojureRoot . "/lib/*" . "\" vimclojure.nailgun.NGServer 127.0.0.1" <cr>
" Start a generic Clojure repl (uses screen.vim)
nmap <silent> <Leader>sC :execute "ScreenShell java -cp \"" . classpath . "\" clojure.main" <cr>

Basically we tell VimClojure where our Nailgun client is (and handle Windows silliness) and set up some helpers for starting a Clojure repl (<Leader>sC) or VimClojure Nailgun server (<Leader>sc) with screen.vim. If you don’t feel like using the screen shortcut, you can of course start the server manually as described in the VimClojure docs:

$ java -cp "classpath including clojure jars and server-2.3.0.jar" vimclojure.nailgun.NGServer 127.0.0.1

Kick the Tires

With all that in place, we’re ready to give it a try:

  • Start vim
  • Hit <Leader>sc (or whatever you mapped the server to) to start up the Nailgun server. Note that it must be running before you open a Clojure buffer!
  • Open a new Clojure file: :e test.clj
  • At this point, there might be a slight, or not so slight, delay (or error message!) as the Nailgun server loads Clojure for the first time. Remember that JVM startup time I mentioned?
  • Hit <LocalLeader>sr to start a repl inside vim (see :help maplocalleader if you don’t know what <LocalLeader> is)

All the other commands and interactive features of VimClojure are described in the docs (~/.vim/bundle/vimclojure-2.3.0/doc/clojure.txt). Here’s my current shortlist:

  • <LocalLeader>sw – Show source for the word under the cursor, including clojure.core! <– this is key if you care about understanding a platform!
  • <LocalLeader>sr – Start a repl
  • <LocalLeader>sR – Start a repl in the current namespace
  • <LocalLeader>eb – Evaluate current visual block. There are several “eX” variations.
  • <LocalLeader>lw – Lookup docs for word under cursor. There are several “lX” lookup variations

Conclusion

Now we have a Clojure development environment that’s not so embarrassingly inadequate compared to Emacs. This is working well for me so far, but I’m sure there’s room for improvment. Let me know if you see anything really dumb I’m doing. Thanks!

Categories: clojure, vim Tags: ,

Remedial Clojure: duck-streams, let style, and pronunciation

October 22nd, 2010 3 comments

This is the “If it weren’t for my family and my stupid conscience I’d drive to Clojure Conj” edition of Remedial Clojure


UPDATE, 10/16/2011: Ok, it’s been a year. Now clojure.contrib is really, truly deprecated. Don’t use duck-streams or anything else from monolithic contrib. See Where Did Clojure.Contrib Go?, maintained tirelessly by Sean Cornfield, for more info.


EDIT, 10/26/2010: It’s come to my attention that duck-streams is technically deprecated. It’s still in clojure.contrib and usable, but it’s been superseded by functions in clojure.java.io. Everything I say about the usefulness of duck-streams still apply though. Wherever you see (read-lines "foo"), replace it with something like this:

(with-open [rdr (clojure.java.io/reader "foo")] ...)

Ok, here’s another entry in my Clojure journey. Previously, I covered getting started with Leiningen for builds and Lazytest for TDD. I’ll be doing the same here. There are plenty of places where I’m unsure about the most “Clojure-y” way to do things, so feel free to poke holes in the code.

In the next couple posts, I’ll build a super simple rhyming dictionary using the CMU Pronouncing Dictionary to determine whether words rhyme. Here’s the overall structure of the program:

  • Read the lines of the dictionary using the clojure.contrib.duck-streams library.
  • Parse each line
  • Build a map from words to pronunciations
  • Create a function for determining whether two words rhyme (part 2)
  • Building a fast datastructure for finding all rhymes for a word (part 2)

The final source for this post is all on github.com.

Initial Project Setup

The setup for this project is pretty vanilla. I’m using Leiningen again, following the same procedure as my last post to create a stub project. This time I’m going to create a new namespace, rhymetime.pronounce where we’ll keep all the code for parsing the dictionary. We’ll end up with these new files:

  • src/rhymetime/pronounce.clj
  • test/rhymetime/test/pronounce.clj

Let’s Code

Ok, let’s get down to it. First we want to read the dictionary from file… but that’s a little boring. What if we want to read it from the web, or a string, or a test resource? This is where the clojure.contrib.duck-streams library comes in. Given an input (a file name, url, reader, input stream, etc), it magically determines how to open it. Well, maybe not magically, but it does its best to do what you’d expect it to do. Here are some examples:

(use 'clojure.contrib.duck-streams)

; Lazy seq of lines from file named README
(read-lines "README")
; ... or ...
(read-lines (java.io.File. "README"))

; Lazy seq of lines from a URL (2009 Tornado data)
(read-lines "http://www.spc.noaa.gov/wcm/data/2009_torn.csv")
; ... or ...
(read-lines (java.net.URL. "http://www.spc.noaa.gov/wcm/data/2009_torn.csv"))

In these examples, read-lines can be replaced with reader which returns a buffered Java Reader as well as slurp* which returns the contents of the file or resource as a single string.

Anyway, the nice thing about duck-streams is that our parser can be agnostic about where input comes from without adding a bunch of configuration hassle. It also helps with testing!

“… It’s not that hard: Na-ghee-na-na-jar. Nagheenanajar. “

The CMU Pronouncing Dictionary has been around for a long, long time. It’s a flat-file database of about 130,000 words. Each word includes a pronunciation as a list of Arpabet (which is really fun to say) phonemes, each with an optional stress. Each entry has the following form:

word phoneme0 phoneme1 ...

and each phoneme consists of alphabetic characters followed by, if it’s a vowel sound, an optional stress identifier, “0”, “1”, or “2”. Here are some examples:

BETTY  B EH1 T IY0
READY  R EH1 D IY0
SPAGHETTI  S P AH0 G EH1 T IY0
MACARONI  M AE2 K ER0 OW1 N IY0

Creating small…

Let’s write some tests first. In test/rhymetime/test/pronounce.clj, starting bottom-up, we’ll try to parse a single phoneme into a phoneme/stress pair:

(describe parse-phoneme
  (it "can parse a phoneme with no stress"
    (= ["K" nil] (parse-phoneme "K")))
  (it "can parse a phoneme with null stress"
    (= ["AH" :n] (parse-phoneme "AH0")))
  (it "can parse a phoneme with primary stress"
    (= ["AH" :p] (parse-phoneme "AH1")))
  (it "can parse a phoneme with null stress"
    (= ["AH" :s] (parse-phoneme "AH2"))))

Here I’ve decided to represent these entries as a vector pair rather than, say, a map. I honestly don’t know if this is good style or not. The map might be clearer, and memory usage isn’t much of a concern since the set of all phonemes is small and could be easily cached. Thoughts?

This seems like a good place for a regex. In Clojure, regular expression matches are sequences and can participate in destructuring so you can very easily extract values from a match without a lot of fuss. Here’s what I came up with:

(defn parse-phoneme
  "Parse a phoneme into a phone/accent pair"
  [s]
  (let [[_ phone stress] (re-matches #"([A-Z]+)([012])?" s)]
    [phone (case stress "0" :n "1" :p "2" :s nil)]))

Here, the result of the regex match is unpacked into [_ phone stress]. When you don’t care about a particular field when destructuring, I guess it’s Clojure convention to use an underscore. Done. I use a case to map from the dictionary’s stress code to a symbolic code, :n for none, :p for primary, and :s for secondary.

... testable ...

Parsing a full entry is also pretty easy. Here's the test:

(describe parse-entry
  (it "can parse an entry with a name and phoneme list"
    (= { :word "CLOJURE" 
         :phonemes [["K" nil] ["L" nil]["OW" :p] ["ZH" nil] ["ER" :n]] } 
       (parse-entry "CLOJURE K L OW1 ZH ER0"))))

and the code:

(defn parse-entry
  "Parse a dictionary entry"
  [s]
  (let [parts (re-split #"\s+" s)]
    { :word     (first parts) 
      :phonemes (map parse-phoneme (rest parts)) }))

Just split the line on whitespace. Use the first result as the word, and apply parse-phoneme to the rest. Is returning a map here necessary or good? I think it makes the code more readable, but if that's the case why didn't I do the same for parse-phoneme. Lau B. Jensen has some interesting thoughts on this topic in his "Taking Uncle Bob to School" article. Using a map gives us somewhere to stick additional info in the future. YAGNI? If I'm wrong, at least I'm in good company :)

... functions ...

Finally, we want to parse an entire dictionary. We'll complicate things a bit because the we want support for comments (lines starting with a ;) and empty lines. But first, we need to set up some tests:

(describe parse-dictionary
  (given [line1 "CLOSURE K L OW1 ZH ER0"
          line2 "MACARONI  M AE2 K ER0 OW1 N IY0"
          expected {"CLOSURE"  (:phonemes (parse-entry line1))
                    "MACARONI" (:phonemes (parse-entry line2))}
          input (fn [& s] (java.io.StringReader. (apply str s)))]

  (it "can parse multiple entries"
    (= expected (parse-dictionary (input line1 "\n" line2))))

  (it "can parse a dictionary with comments"
    (= expected (parse-dictionary (input "; comment\n" line1 "\n;;; another\n" line2))))

  (it "can parse a dictionary with empty lines"
    (= expected (parse-dictionary (input line1 "\n\n   \n" line2))))))

Here I've used Lazytest's given macro to define some constants used across all test examples. given acts just like a let except its body consists of examples and nested test groups. I pre-define the expected result and then test it against various input combinations. It at least saves some typing.

The other interesting bit is the input helper function which converts its arguments into a StringReader suitable for input to duck-streams. Remember what I said about testability above? I think this helper function should really go somewhere else, but I'll let future development guide that ...

... is the Clojure way!

And here's the resulting code for parse-dictionary:

(defn- is-dictionary-entry?
  "Returns true if the given line is a dictionary entry"
  [line]
  (and (not (empty? line))
       (not (.startsWith line ";"))))

(defn parse-dictionary
  [reader]
  (let [lines    (read-lines reader)
        trimmed  (map #(.trim %1) lines)
        filtered (filter is-dictionary-entry? trimmed)
        parsed   (map parse-entry filtered)]
    (reduce #(assoc %1 (:word %2) (:phonemes %2)) {}  parsed)))

I've read that it's idiomatic in Clojure to use the Java libraries when available, but I don't yet know the Clojure core API well enough to choose between them. For example, I originally had (.isEmpty line) in the is-dictionary-entry? helper because I know the Java String API really well off the top of my head. Later, I found the empty? function which seemed more appropriate and, I think, reads better. So, are there better ways to express my usage of .startsWith and .trim? Time will tell ... and now I look at the API doc for empty? and see that I should be using (seq line) instead!

Note that I've broken down the parsing into a number of transformation steps in a let form. For me, this made the data flow much clearer since I can name the results of each steps. Of course, what do I know? So I asked and got some really great feedback both validating my approach and offering equally readable alternatives. In particular, the ->> macro calls multiple forms, passing the return value from each form as the last argument of the next form.

Here's the code above translated to use ->>:

(defn parse-dictionary
  [reader]
  (->> (read-lines reader)
       (map #(.trim %1))
       (filter is-dictionary-entry?)
       (map parse-entry)
       (reduce #(assoc %1 (:word %2) (:phonemes %2)) {})))

It's arguably more readable. It's also less typing because you don't have to pass the results along manually. But having the named results is a nice form of documentation, and if you need to use a result in multiple places downstream, ->> doesn't help much. I guess it's a tossup and depends on the situation.

In the same thread, there was also some interesting discussion of performance characteristics of these approaches and a nice demonstration of using macros to create a test harness for analyzing them.

Conclusion

If you've read this far, I hope you've found this post informative. I know that I've learned a lot in the course of writing and refining just this little bit of code. As I mentioned in the introduction, I'll be adding to it in future posts. Bear with me; I think this is all going somewhere.

What have we learned?

  • We can use given in Lazytest to set up constants used across multiple test examples
  • clojure.contrib.duck-streams is super handy. It makes it easy to read data from multiple sources and, by decoupling our code from specific input sources, makes it easier to test.
  • It's ok to bind a stream of operations in a let, but there are alternatives like the ->> macro.
  • Building up code by creating small, testable functions is preferable to creating a giant, kitchen sink function.
  • The Clojure community is really helpful and friendly, going above and beyond to answer a noob question.

Until next time ...

Remedial Clojure: Leiningen, Lazytest, and some code

October 17th, 2010 6 comments

Update/Note 10/27/2011: Some of the code in the this post makes use of monolithic clojure.contrib. This library is now deprecated so please don’t use it in new Clojure code. For more info see “Where Did Clojure.Contrib Go?


I’m starting to get friendly with Clojure. There’s no better way to learn a new language than to build something and write (that is, reflect) about it. Let’s try building a little program that covers some of the basics. I’m going to assume you have access to some Clojure reference material such as the Clojure wiki or “Programming Clojure”. Here we’ll be using Clojure, Leiningen for builds and Lazytest for continuous testing. This program, histowords, will read a text file, count word occurrences and produce a histogram of the results. Nothing fancy, but the journey of a thousand miles and all that…

The final source for this post is all on github.com.

Initial Project Setup

First, let’s create a new project with Leiningen. Leiningen is a Clojure dependency management and build tool similar to Maven, hopefully without the betrayal and heartache (it’s actually built on top of Maven). Download “lein” and put it on your path somewhere. Of course, you’ll need Java installed and on your path as well.

$ lein new histowords
$ cd histowords

(Note that the first invocation of lein might take awhile while it downloads the internet)

Now we’ve got a skeleton project that looks something like this:

histowords/
|-- README
|-- lib
|-- project.clj
|-- src
|   `-- histowords
|       `-- core.clj
|-- test
    `-- histowords
        `-- test
            `-- core.clj

project.clj is the Leiningen project file where we define our dependencies and everything. Let’s add our Clojure and Lazytest dependencies now:

(defproject histowords "1.0.0-SNAPSHOT"
  :description "FIXME: write"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]]
  :dev-dependencies [[com.stuartsierra/lazytest "1.1.2"]]
  :repositories {"stuartsierra-releases" "http://stuartsierra.com/maven2"}
  :main histowords.core)

We can now ask Leiningen to download the dependent jars into the lib and lib/dev folders:

$ lein deps

Leiningen has some other useful targets as well:

  • clean – clean the project
  • jar – Build a jar
  • uberjar – Build a standalone jar with Clojure and everything bundled.
  • repl – Start a repl with the classpath all set up

Lazytest Setup

Lazytest is an RSpec-like BDD/testing framework created by Stuart Sierra. Before we write any code, let’s get Lazytest running in “watch” mode so every time we save a file, it’ll re-run our tests automatically. First, we’ll add some setup code to our test file, test/histowords/test/core.clj. Replace the default code generated by Leiningen with this:

(ns histowords.test.core
  (:use [lazytest.describe :only (describe it)])
  (:use histowords.core))

Now, in a new console, fire up Lazytest:

$ cd histowords
$ java -cp "src:test:lib/*:lib/dev/*" lazytest.watch src test

This tells Lazytest to watch for changes in the src/ and test/ directories. You’ll see output like this:

Namespaces (no cases run)

Ran 0 test cases.
0 failures.

Done.

Let’s Write Some Code

As mentioned above, we want to make a histogram by counting words in a file. So the input “Why Betty, why Betty, why?” will generate this:

why   ###
betty ##

I’m going to take a bottom up approach to this problem. First, let’s break up the input into words, discarding whitespace and punctuation, and converting to lowercase. Here’s the tests I came up with:

(describe gather-words
  (it "splits words on whitespace"
    (= ["mary" "had" "a" "little" "lamb"] (gather-words "   mary had a\tlittle\n   lamb    ")))
  (it "removes punctuation"
    (= ["mary" "had" "a" "little" "lamb"] (gather-words "., mary, had... a little; lamb!")))
  (it "converts words to lower case"
    (= ["mary" "had" "a" "little" "lamb"] (gather-words "., MaRy, hAd... A liTTle; lAmb!"))))

Add these to the test file and save it. Lazytest will immediately complain. Try implementing gather-words in src/historwords/core.clj. Here’s what I came up with:

(defn gather-words 
  "Given a string, return a list of lower-case words with whitespace and
   punctuation removed"
  [s]
  (map #(.toLowerCase %) (filter #(not (.isEmpty %)) (seq (.split #"[\s\W]+" s)))))

There are several things going on here. First we split on whitespace and non-word characters (yes, this isn’t perfect). We take that sequence and filter out any empty strings. Finally, we convert the results to lower case. I think the best way to go about this is to add each test one at a time and refine the implementation as you go. That worked well for me anyway.

Next, we want to count distinct words in the sequence returned by gather-words. The result will be a map from word to count. Here’s a test:

(describe count-words
  (it "counts words into a map"
    (= {"mary" 2 "why" 3 } (count-words ["why" "mary" "why" "mary" "why"]))))

Again, after you add the test, Lazytest will complain and you can start implementing the count-words function. In an imperative language like Java, you’d create an empty map and then iterate over the word list. If the word’s in the map, increment the count and put it back in the map, otherwise, add an entry to the map with an initial count of 1. Something like this:

public static Map<String, Integer> countWords(Collection<String> words) {
    final Map<String, Integer> result = new HashMap<String, Integer>();
    for(String word : words) {
        final Integer count = result.get(word);
        result.put(word, count != null ? count + 1 : 1));
    }
    return result;
}

In other words, a bureaucratic nightmare.

In Clojure, the idea’s the same, but we don’t need a for-loop. Instead, we can use the reduce function over the list of words. In other languages, this function is known as fold, foldl, inject, accumulate, and other names. Essentially, a callback function is called for each element in the sequence. It’s passed the element and the result of the previous call to the function. So we’re going to take in a word and a map, update the word count and return a new map. Here’s my implementation:

(defn count-words 
  "Take a seq of words and return a map from word to word count"
  [words]
  (reduce 
    (fn [m w] (assoc m w (+ 1 (m w 0)))) 
    {} 
    words))

Note the {} which is the initial value for our accumulator map. Our anonymous function looks up the current count for the word (default to 0 if missing) and increments by 1. Then we return a new map (the assoc function) with the updated count.

So now we have a map from words to counts. The next steps towards our histogram are to “flatten” the map into a list of word/count pairs and sort by count so our histogram looks nice. Conveniently, Clojure maps are already a “seq” of key/value pairs. That is, if we use a map in a context where a sequence is needed, the map will act like a sequence of pairs. This is easy to see in a repl:

$ lein repl
"REPL started; server listening on localhost:46775."
histowords.core=> (seq { "a" 1 "b" 2 "c" 3 })
(["a" 1] ["b" 2] ["c" 3])

So all we need to do is sort by the second field of each pair. Here’s a test:

(describe sort-counted-words
  (it "sorts and returns a list of word/count pairs"
    (= [["a" 1] ["b" 2] ["c" 3]] (sort-counted-words {"b" 2 "c" 3 "a" 1}))))

and here’s my implementation of sort-counted-words":

(defn sort-counted-words 
  "Given a sequence of word/count pairs, sort by count"
  [words]
  (sort-by #(% 1) words))

The sort-by function takes a comparator function and the sequence to sort. Here our comparator function just grabs the second field (index 1) of each pair. This function is so simple it's almost unnecessary, but it's nice when code is readable.

Now we have a list of word/count pairs, sorted by count. Now we just need to turn it into a histogram. We're going to need a function to generate the histogram bars. Here's the test:

(describe repeat-str
  (it "returns the empty string if count is zero"
    (= "" (repeat-str "*" 0)))
  (it "repeats the input string n times"
    (= "xxxxx" (repeat-str "x" 5))))

Can you implement it?

Next let's try generating a single entry in the histogram. histogram-entry will take a word/count pair and the width of the name column as parameters and return a string. Here's the test:

(describe histogram-entry
  (it "can generate a single histogram entry"
    (= "betty   ######" (histogram-entry ["betty" 6] 7))))

Here's my implementation:

(defn histogram-entry 
  "Make a histogram entry for a word/count pair and maximum word width"
  [[w n] width]
  (let [r (- width (.length w))]
    (str w (repeat-str " " r) " " (repeat-str "#" n))))

Note the use of destructuring ([w n]) in the parameter list to bind the word and count from the pair to variables rather than extracting with indices. Otherwise this is pretty straightforward. Calculate the required padding and concatenate some strings.

Finally, we're ready to pull it all together. The histogram function takes a sequence of word/count pairs and generates a full histogram for them with nice alignment and everything. Here's the test:

(describe histogram
  (it "can generate a histogram from word counts"
    (= "mary ##\nwhy  ###\n" (histogram [["mary" 2] ["why" 3]]))))

And here's the implementation:

(defn histogram 
  "Make a histogram for a seq of word/count pairs"
  [words]
  (let [width (apply max (map #(.length (%1 0)) words))]
    (reduce 
      (fn [acc pair] (str acc (histogram-entry pair width) "\n")) 
      "" 
      words)))

We use the max function to calculate the width of the widest word in the input sequence. The we use reduce again to generate the output string. Previously we were accumulating a map. This time we're accumulating a string, thus the initial value is the empty string.

Pulling It All Together

Now we've got a bunch of passing tests. How do we turn this into a program we can run? Leiningen to the rescue. We'll add a main function to src/histowords/core.clj. Let's string all our functions together there. We'll assume that a file name is given as a command-line parameter, and use slurp to read it into a string:

(defn -main [& args]
  (println 
    (histogram 
      (sort-counted-words 
        (count-words 
          (gather-words 
            (slurp (first args))))))))

Additionally, we need to tell the Clojure compiler to generate a class for this file so it can be used as a main entry point. Att the top of the file, add the :gen-class keyword:

(ns histowords.core
  (:use [clojure.contrib.str-utils :only (str-join)])
  (:gen-class))

With that in place, we can use Leiningen to generate a standalone uber-jar for us:

$ lein uberjar
$ java -jar histowords-1.0.0-SNAPSHOT-standalone.jar mary.txt
against    #
but        #
lingered   #
snow       #
near       #
go         #
still      #
fleece     #
does       ##
did        ##
a          ###
teacher    ###
was        ###
... snip ...
to         ####
turned     ####
day        ####
you        ####
school     #####
so         ######
it         #########
and        ##########
lamb       ############
mary       #############
the        ##############

Conclusions

I think this is pretty cool. I find the code nice and compact while remaining readable. Lazytest makes testing easy and Leiningen gives us the basic features of Maven without the XML nightmare. But, this is just a toy app, right? In real life, it'll get a lot messier. The thing I'm most excited about is that, since I started playing with Clojure last week, I've read a bunch of "realworld" Clojure code in github, and it's still just as compact and readable. I know I'm just scratching the surface too. Fun!

There are still some newb questions/issues I have though:

  • How do I get Leiningen to run my Lazytests as part of the build. A plugin maybe? What about Maven integration and JUnit-style reports so I can run all this stuff in Hudson?
  • Sometimes I have to stare at the Lazytest failure messages a bit to figure out what failed

Anyway, I'm just getting started, so if you see any glaring mistakes or ways to improve the code, I'd love to hear about it.

Categories: clojure Tags: , , ,

A Brief Note On Pathogen For Vim

October 12th, 2010 4 comments

I spent a while tracking down a vim issue this evening. For plugin-management purposes, I’ve recently switched to the excellent Pathogen plugin. Right at the top of that page it says:

Add this to your vimrc:

     call pathogen#runtime_append_all_bundles()

That seems easy enough, and everything worked great on Windows. Moving over to Linux, things weren’t going as well. Whenever I opened a Clojure file, its filetype wasn’t detected. I had to manually execute “set filetype=clojure” to get syntax highlighting and even then, indenting was weird.

So, I debugged. One great thing I learned along the way is how to enable vim’s logging for debugging purposes. Just do something like this:

   $ gvim -V9log.txt ...

that will log everything vim does to log.txt in the current directory. I was able to compare my Windows log to the on Linux and see that when it searched for ftdetect plugin directories, it wasn’t including any of the plugins managed by Pathogen. hmmm… I googled it for myself and was led back to … that’s right, the same Pathogen page, down toward the bottom where it says:

Note that you need to invoke the pathogen functions before invoking “filetype plugin indent on” if you want it to load ftdetect files. On Debian (and probably other distros), the system vimrc does this early on, so you actually need to “filetype off” before “filetype plugin indent on” to force reloading.

My evening would have been funner if the top of the page just said this in the first place:

Add this to your vimrc:

     call pathogen#runtime_append_all_bundles()
     filetype off
     syntax on
     filetype plugin indent on

Yes, I should have read all the instructions. Everything’s up on github for the curious.

Categories: vim Tags: