Coding some sequence processing in Clojure, I was wondering how efficient is the test for sequence emptiness. the first thing which comes in mind is:

 (when-not (empty? coll) …) 

Sometimes, this leads to unreadable code and for instance, Joy of Clojure recommends to simply use the following pun:

 (when (seq coll) …) 

So basically converting the collection into a sequence every time, leveraging the fact that empty sequence is nil, i.e. false.

The ancient C developer in me started screaming about the complexity of seq.

Well, let us see, what it really does:

(source seq)
(def ^{
   :arglists '(^clojure.lang.ISeq [coll])
   :doc "Returns a seq on the collection.
   If the collection is empty, returns nil.
   (seq nil) returns nil. seq also works
   on Strings, native Java arrays
   (of reference types) and any objects that
   implement Iterable."
   :tag clojure.lang.ISeq
   :added "1.0"<br>
   :static true}
   seq (fn ^:static seq ^clojure.lang.ISeq [coll]
     (. clojure.lang.RT (seq coll))))

As for many clojure.lang functions, it is simply a wrapper over a Java defined method. In this case, we are looking at Java class clojure.lang.RT.

Aside being almost a classbook example of lost type information and downcasting, this basically says that the performance depends heavily on the type of the collection we are trying to convert. For many cases, this is just a downcast – not a significant performance hit (we live in the Java world, right). For some, the conversion seems linear (have a look at the RT.seqFrom() method). So I have written two test functions to see how big hit the seq function is, when it comes to Java arrays for instance.


(defn hungry-sum1
([coll] (hungry-sum1 0 coll))
([s coll] (
if (seq coll)
(recur (+ s (first coll)) (rest coll))
s)))

(defn hungry-sum2
([coll] (hungry-sum2 0 coll))
([s coll] (
if (empty? coll) s
(recur (+ s (first coll)) (rest coll)))))

(def test-data
(into-array (range 1000000)))

(defn test1 []
(seq (repeatedly 1 #(hungry-sum1 test-data))))
(defn test2 []
(seq (repeatedly 1 #(hungry-sum2 test-data))))
(println "Testing with seq for emptiness.")
(time test1)
(println "Testing with empty? for emptiness.")
(time test2)

When you load this, to clojure REPL, you might get something like this:

user=> (load-file "seqloop.clj")
Testing with seq for emptiness.
"Elapsed time: 0.018768 msecs"
Testing with empty? for emptiness.
"Elapsed time: 0.01805 msecs"

Basically meaning the speed is the same. Well, definitely not something I would expect from this code.

Dig in:

(source empty?)
(defn empty?
  "Returns true if coll has no items -
   same as (not (seq coll)).
  Please use the idiom (seq x) rather
  than (not (empty? x))"
  {:added "1.0"
   :static true}
  [coll] (not (seq coll)))
nil

Surprise!!! Well, let’s just say that this is where i should have started in the first place :-//. I am going to play with this a bit and will get back, hopefully with some faster way how to test for collection emptiness. I am still not sure I like how Clojure treats sequences.

And yes, I know I should have read the documentation first ;-).

As our pet project me and my product management mates decided to improve the way we write user stories at Y Soft R&D department.

The biggest challenge is not to find what to do, as we are quite sure about that. The biggest challenge for us is to split to epic to user story in a useful way to be planned but still understandable way, so that developers see the value.

As the input user story, we took a story already posted here by our fellow Y Softer Ondra: I, as an End User, want the system to control access to my documents in a way visible to me, so that I can trust that my documents remain confidential.

This story is great as it represents the stakeholder value as well as the product vision. Our job is now to check whether the story is ready for development or if more backlog grooming is required.

We found the “How to split a user story” from Richard Lawrence (Agileforall.com) quite useful.

At our first session, we started with step 1 – INVEST model helping to prepare the input story. More information about INVEST can be found on wikipedia. The funny thing on INVEST model is that even if you split to small the story, the story must still satisfy the INVEST model. Therefore you have clear guidance on how far you can go with splitting.

So let’s go evaluate our story 🙂

Independent?

Yes.

Negotiable?

Yes, because the story is too high level. What’s negotiable: system (can be user), documents (can be control of something else), and value.

Valuable?

Yes, definitely

Estimable?

Partially, but not estimable to one sprint, since the output of the story can be everything from Human computer interaction to Biometric stuff.

Small?

No. Level of certainty is too low.

Testable?

No. It is very hard to find a metric for this story.

What does this mean?

The story is not done yet, so let’s continue to the next round and split the story!

In the next session we will try to split some more stories as well. 😉