Coding some sequence processing in Clojure, I was wondering how efficient is the test for sequence emptiness. the first thing which comes in mind is:

 (when-not (empty? coll) …) 

Sometimes, this leads to unreadable code and for instance, Joy of Clojure recommends to simply use the following pun:

 (when (seq coll) …) 

So basically converting the collection into a sequence every time, leveraging the fact that empty sequence is nil, i.e. false.

The ancient C developer in me started screaming about the complexity of seq.

Well, let us see, what it really does:

(source seq)
(def ^{
   :arglists '(^clojure.lang.ISeq [coll])
   :doc "Returns a seq on the collection.
   If the collection is empty, returns nil.
   (seq nil) returns nil. seq also works
   on Strings, native Java arrays
   (of reference types) and any objects that
   implement Iterable."
   :tag clojure.lang.ISeq
   :added "1.0"<br>
   :static true}
   seq (fn ^:static seq ^clojure.lang.ISeq [coll]
     (. clojure.lang.RT (seq coll))))

As for many clojure.lang functions, it is simply a wrapper over a Java defined method. In this case, we are looking at Java class clojure.lang.RT.

Aside being almost a classbook example of lost type information and downcasting, this basically says that the performance depends heavily on the type of the collection we are trying to convert. For many cases, this is just a downcast – not a significant performance hit (we live in the Java world, right). For some, the conversion seems linear (have a look at the RT.seqFrom() method). So I have written two test functions to see how big hit the seq function is, when it comes to Java arrays for instance.


(defn hungry-sum1
([coll] (hungry-sum1 0 coll))
([s coll] (
if (seq coll)
(recur (+ s (first coll)) (rest coll))
s)))

(defn hungry-sum2
([coll] (hungry-sum2 0 coll))
([s coll] (
if (empty? coll) s
(recur (+ s (first coll)) (rest coll)))))

(def test-data
(into-array (range 1000000)))

(defn test1 []
(seq (repeatedly 1 #(hungry-sum1 test-data))))
(defn test2 []
(seq (repeatedly 1 #(hungry-sum2 test-data))))
(println "Testing with seq for emptiness.")
(time test1)
(println "Testing with empty? for emptiness.")
(time test2)

When you load this, to clojure REPL, you might get something like this:

user=> (load-file "seqloop.clj")
Testing with seq for emptiness.
"Elapsed time: 0.018768 msecs"
Testing with empty? for emptiness.
"Elapsed time: 0.01805 msecs"

Basically meaning the speed is the same. Well, definitely not something I would expect from this code.

Dig in:

(source empty?)
(defn empty?
  "Returns true if coll has no items -
   same as (not (seq coll)).
  Please use the idiom (seq x) rather
  than (not (empty? x))"
  {:added "1.0"
   :static true}
  [coll] (not (seq coll)))
nil

Surprise!!! Well, let’s just say that this is where i should have started in the first place :-//. I am going to play with this a bit and will get back, hopefully with some faster way how to test for collection emptiness. I am still not sure I like how Clojure treats sequences.

And yes, I know I should have read the documentation first ;-).

Comments