Sorting by tuple slots
Friday, January 9, 2009
Factor’s sorting has been extended to support sorting tuples by multiple slots, one after the other. In a music player application, you may wish to sort your songs first by artist, then by album, then track number. If you had a tuple representing every song, the code for such a sort is now very easy:
{ { artist>> <=> } { album>> <=> } { track>> <=> } } sort-by-slots
The main word at work is sort-by-slots ( seq sort-specs -- seq' )
,
where a sort-spec is an accessor paired with a comparator, one of
<=>
, >=<
, human-<=>
, human->=<
. Human sort first converts
consecutive digits to integers and then makes the comparison, e.g.
{ "a1" "a10" "a03" "a2" }
sorts as { "a1" "a2" "a03" "a10" }
instead
of { "a03" "a1" "a10" "a2" }
, as it would with natural-sort
.
Sorting in reverse order is possible with the new operator >=<
, which
inverts the result of the <=>
comparator. To sort a playlist by the
most played songs in reverse order (most played first):
{ { play-count>> >=< } } sort-by-slots
Source code for sort-by-slots
: slot-comparator ( accessor comparator -- quot )
'[ [ _ execute ] bi@ _ execute dup +eq+ eq? [ drop f ] when ] ;
MACRO: compare-slots ( sort-specs -- <=> )
#! sort-spec: { accessor comparator }
[ first2 slot-comparator ] map '[ _ 2|| +eq+ or ] ;
: sort-by-slots ( seq sort-specs -- seq' )
'[ _ compare-slots ] sort ;
First, a bit about how Factor’s sort ( seq quot -- sortedseq )
word
works. It expects a quotation that compares two objects
lexicographically, which is element by element in dictionary order.
Thus, the macro compare-slots
expands the sort-specs into a quotation
that compares tuples slot-by-slot until there is a difference. The
macro-expansion for the first example looks like this:
( scratchpad ) [ { { artist>> <=> } { album>> <=> } { track>> <=> } } compare-slots ] expand-macros .
f 2 [
\ artist>> \ <=> [ [ execute ] curry bi@ ] dip execute
dup +eq+ eq? [ drop f ] when
] [ [ drop ] dip ndup ] dip call dup
[ 2 [ ndrop ] curry dip ] [
2 [
\ album>> \ <=> [ [ execute ] curry bi@ ] dip execute
dup +eq+ eq? [ drop f ] when
] [ [ drop ] dip ndup ] dip call dup
[ 2 [ ndrop ] curry dip ] [
2 [
\ track>> \ <=> [ [ execute ] curry bi@ ] dip
execute dup +eq+ eq? [ drop f ] when
] [ [ drop ] dip ndup ] dip call dup
[ 2 [ ndrop ] curry dip ]
[ 2 [ drop ] dip ndrop t [ f ] [ no-cond ] if ] if
] if
] if +eq+ or
The macro first extracts the slots and compares with the first
comparator. If the comparator returns +lt+
or +gt+
then the
comparison is over, but if the comparator returns +eq+
then the next
comparison is invoked and the sort continues in the same way. Notice
that the slot-comparator
word replaces +eq+
with f
to avoid
short-circuiting the iteration done by 2||
. The final sort-by-slots
word is a trivial call to sort
with the new comparator word we just
defined.
Algorithmic complexity and ideas for improvement
The sorting bound is still O(n log n) for time, but of course the more
comparators that you need to break ties, the longer the algorithm will
take to run.
If your data has to be sorted anyway, it’s possible to sort and then
split the data into related slices using a sequence of accessors like
the above. You could then compute the playing time of every album, or
find your most-played song from every artist. Splitting with the same
accessors takes O(n) time, but the odds are that you probably won’t have
already sorted the data. So for unsorted data, there is a more efficient
way to extract features – you can instead partition it in O(n) time and
find features on each partition, avoiding sorting altogether. This will
be the subject of a future blog post.