Laziness and chunking in Clojure
Posted on 2016-11-05 Edit on GitHub
Surprises with side effects
One of Clojure's more surprising behaviors is the interaction between functions like map
1 and functions with side effects2.
Let's map a side-effecting function like prn
over a list3:
(take 1 (map prn '(1 2 3 4 5)))
You may expect that, since there is a (take 1 ...)
, only one number will be printed. This is indeed the case:
1 (nil)
But try doing the same to a vector:
(take 1 (map prn [1 2 3 4 5]))
We get:
1 2 3 4 5 (nil)
What's going on?
Chunking
In the definition of take
, seq
is called on the collection. seq
causes a lazy sequence to realize its value, i.e. the (map prn ...)
is evaluated.
Now we need to take a look at the definition of map
; it is sufficient to look at one arity:
([f coll] (lazy-seq (when-let [s (seq coll)] (if (chunked-seq? s) (let [c (chunk-first s) size (int (count c)) b (chunk-buffer size)] (dotimes [i size] (chunk-append b (f (.nth c i)))) (chunk-cons (chunk b) (map f (chunk-rest s)))) (cons (f (first s)) (map f (rest s)))))))
The difference is that [1 2 3 4 5]
goes down the chunked-seq?
path, while (1 2 3 4 5)
does not.
(seq [1 2 3 4 5])
returns aclojure.lang.PersistentVector$ChunkedSeq
, which is a chunked seq (an instance ofIChunkedSeq
).(seq '(1 2 3 4 5))
simply returns aclojure.lang.PersistentList
, which is not.
From above, we see that if the collection is a chunked seq, map
uses chunk-first
to take elements from it. For performance reasons, chunk-first
takes 32 elements. Therefore, prn
is called 32 times.
We can see this in the following:
(take 1 (map prn (range 100)))
which outputs:
0 1 2 ... 31 (nil)
If the collection is not a chunked seq, map
realizes only one element of it.
Avoiding chunking
To avoid chunking, we can explicity "unchunk" the lazy sequence:
(defn unchunk [s] (when (seq s) (lazy-seq (cons (first s) (unchunk (next s))))))
In short, unchunk
turns the collection into something that's not chunkable (a cons cell).
We see that:
(take 1 (map prn (unchunk (range 100))))
produces
0 (nil)
Laziness and side effects
In general, mixing laziness and side effects is a bad idea. It makes reasoning about when thing will be evaluated difficult.
If it has to be done, however, understanding chunking and when it occurs is important. Most of the time, the side effects are much costlier than printing a value.
Footnotes:
And concat
, reduce
, filter
, and many others.
Since Clojure 1.1.0
For some interactive examples, see Ian Truslove's presentation.