The complete correctness of sorting

Published: 2013-05-23T12:43Z

Tags: agda

License: CC-BY

A while ago I set out to prove the correctness of merge sort in Agda. Of course this has been done before. But most proofs you find are far from complete. All they prove is a lemma such as

is-sorted : ∀ (xs : List A) → IsSortedList (sort xs)

Maybe even restricted to lists of natural numbers. While it is nice that a sort function indeed produces a sorted output, that is only half of the story. Consider this function:

cheat-sort : List A → List A
cheat-sort _ = []

Clearly the empty list is sorted. So we are done. What is missing is the second half of correctness of sorting: that the output is a permutation of the input. You want something like:

sort : (xs : List A) → Sorted' A
record Sorted' (xs : List A) : Set where
  field
    ys       : List A
    isSorted : IsSorted ys
    isPerm   : IsPermutation ys xs

While I was at it, I decided to add the third half of correctness: a bound on the runtime or computational complexity. In the end I was able to define:

insertion-sort : ∀ xs → (Sorted xs) in-time (length xs * length xs)
selection-sort : ∀ xs → (Sorted xs) in-time (length xs * length xs)
merge-sort : ∀ xs → (Sorted xs) in-time (length xs * ⌈log₂ length xs ⌉)

This was not as easy as I would have hoped. In this post I will not bore you with all the details, I'll just go over some of the highlights. The full code is on github.

What it means to be sorted

There are roughly two ways to define sorted lists that I know of:

Parametrize the sorted list by a lower bound on the values it contains. For a cons cell the head should be smaller than the lower bound, and the tail should be larger than the head. This requires the type to have a smallest element, but you can adjoin -∞ with a new datatype.
Parametrize the sorted list by a list of all values in it. For a cons cell require that the head is smaller than all the values in the tail.

Since I already need to parametrize by all values in the list to show that the sorted list contains a permutation of them, I went with the second approach:

-- A proof that x is less than all values in xs
data _≤*_ (x : A) : List A → Set where
  []  : x ≤* []
  _∷_ : ∀ {y ys} → (x ≤ y) → x ≤* ys → x ≤* (y ∷ ys)

-- Proof that a list is sorted
data IsSorted : List A → Set where
  []  : IsSorted []
  _∷_ : ∀ {x xs} → x ≤* xs → IsSorted xs → IsSorted (x ∷ xs)

What it means to be a permutation

To show that one list is a permutation of another I again used two data types. Suppose that we know that xs is a permutation of ys. Then when is x ∷ xs a permutation of some list xys? Well, we can permute xs to ys, and insert x anywhere. I used ◂ to denote this insertion,

-- x ◂ xs ≡ xys means that xys is equal to xs with x inserted somewhere
data _◂_≡_ (x : A) : List A → List A → Set a where
  here  : ∀ {xs}           → x ◂ xs ≡ (x ∷ xs)
  there : ∀ {y} {xs} {xys} → (p : x ◂ xs ≡ xys) → x ◂ (y ∷ xs) ≡ (y ∷ xys)

-- Proof that a list is a permutation of another one
data IsPermutation : List A → List A → Set a where
  []  : IsPermutation [] []
  _∷_ : ∀ {x xs ys xys}
      → (p : x ◂ ys ≡ xys)
      → (ps : IsPermutation xs ys)
      → IsPermutation (x ∷ xs) xys

Now the Sorted data type has three components: the sorted list, a proof that it is sorted, and a proof that it is a permutation of the input. These parts are either all [], or they are all _∷_. It turns out to be much nicer to combine the parts together,

-- Sorted permutations of a list
data Sorted : List A → Set  where
  []   : Sorted []
  cons : ∀ x {xs xxs}
       → (p : x ◂ xs ≡ xxs) -- inserting x somewhere into xs gives xxs
       → (least : x ≤* xs)  -- x is the smallest element of the list
       → (rest : Sorted xs) -- and we have also sorted xs
       → Sorted xxs

Of course Sorted and Sorted' are equivalent.

As an aside, these are all the ingredients necessary for proving

sorted-unique : ∀ {xs} → (ys zs : Sorted xs)
              → sorted-to-List ys ≡ sorted-to-List zs

A monad for keeping track of the runtime

To be able to reason about the runtime, as measured in the number of comparisons performed, I decided to use a monad. The type is simply

data _in-time_ (A : Set) (n : ℕ) : Set a where
  box : A → C A n

the constructor box is private, and it can only be accessed through the standard monad operations,

return : ∀ {A n} → A → A in-time n

_>>=_ : ∀ {A B} {m n} → A in-time n → (A → B in-time m) → B in-time (n + m)

Then the sorting functions will be parametrized by a function that for some partial order decides between x ≤ y and y ≤ x in one step, using the monad we defined above:

module Sorting
    {A : Set} {l} {_≤_ : Rel A l}
    (isPartialOrder : IsPartialOrder _≡_ _≤_) 
    (_≤?_ : (x y : A) → (x ≤ y ⊎ y ≤ x) in-time 1)
  where ...

Note that I specify that _≤_ is a partial order, because the Agda standard library definition of a total order actually comes with a function

total : ∀ x y → (x ≤ y) ⊎ (y ≤ x)

which would defeat the whole prupose of _≤?_. In fact, the standard TotalOrders are decidable up to base equality, and if the base equality is propositional equality, then they are decidable. I.e.

total-decidable : ∀ {a r} {A : Set a} → (_≤_ : Rel A r)
                → IsTotalOrder _≡_ _≤_
                → IsDecTotalOrder _≡_ _≤_

See the source for the proof of this side theorem. It relies on a trick to show that total x y can only be different from total y x if x ≢ y. Which holds for propositional equality, but not in general.

Logarithms

To be able to complete the specification of merge sort, we still need to add some missing functions on natural numbers. In particular, we need a logarithm. This logarithm turns out to be surprisingly tricky to define in Agda. Why? Because the usual definition uses non-structural recursion. In haskell you would write

-- @log n@ calculates ⌊log₂ (n+1)⌋
log 0 = 0
log n = 1 + log (n `div` 2)

But Agda is not able to see that n `div` 2 (or in agda notation, ⌊ n /2⌋) is smaller than n. There are two approaches to circumvent this problem:

Use a different algorithm: Convert n to a binary representation, and count the number of digits.
Use well-founded recursion, manually supplying a proof that ⌊ n /2⌋ < n.

I went with the second option, because I will also be using the same shape of recursion inside merge sort itself. The standard way to use well-founded recursion is through the function <-rec, which works a bit like fix in haskell, except that you need to pass in a proof that the argument is smaller. The code would look like this:

log = <-rec log'
  where
  log′ self 0 = 0
  log′ self (suc n) = 1 + self ⌊ suc n /2⌋ ({-proof ommitted-})

But this leads to a problem as soon as you want to prove a property of logarithms. For example, you would think that log (suc n) ≡ 1 + (log ⌊ suc n /2⌋). But that is not definitionally true, since one <-rec is not like another. I found that the well-founded recursion library was in general a pain to work with, especially because it uses so many type synonyms. My solution was to use the slightly lower level accessibility relation. A value of type Acc _<′_ n allows you to do recursion with any m <′ n. Now I can use actual recursion:

log-acc : ∀ n → Acc _<′_ n → ℕ
log-acc 0 _ = 0
log-acc (suc n) (acc more) = 1 + log-acc ⌊ suc n /2⌋ (more _ {-proof ommitted-})

And use the well-foundedness of ℕ to get an Acc for any number:

log : ℕ → ℕ
log n = log-acc n (<-well-founded n)

⌈log₂_⌉ : ℕ → ℕ
⌈log₂ n ⌉ = log (pred n)

There is still a snag when proving properties of log or log-acc, namely that you need to prove that (more n ...) ≡ <-well-founded n. But the accessibility relation doesn't actually matter for the computation, so I decided to just postulate

postulate acc-irrelevance : ∀ {n : ℕ} → {a b : Acc _<′_ n} → a ≡ b
 -- this also follows from function extensionality

If anyone knows a better way to prove properties of functions defined with well-founded recursion, I am open to suggestions.

Vectors versus lists

While working on the proofs I had to choose: Do I use fixed length Vecs or variable length Lists? Both have their pros and cons.

On the one hand, the sorting functions with vectors look a bit nicer, because we can use n instead of length xs:

merge-sort : ∀ {n} (xs : Vec A n) → Sorted xs in-time (n * ⌈log₂ n ⌉)

Additionally, with lists we can only do recursion on the input list, with vectors we can do recursion on the length of the list. The former works fine for insertion sort, where in each step you do something with the head element of the list; but it fails for selection and merge sort.

On the other hand, with vectors you sometimes can't even state the property that one vector is equal to another. For the term xs ≡ ys ++ zs to be well-typed, xs must have the type Vec A (m + n).

I went back and forth a couple of times between vectors and lists. In the end I settled for using vectors only when needed, and specifying properties in terms of lists. For example the split function for merge sort has the type

splitHalf : ∀ {n} → (xs : Vec A n)
          → ∃₂ \(ys : Vec A ⌈ n /2⌉) (zs : Vec A ⌊ n /2⌋)
               → toList ys ++ toList zs ≡ toList xs

So instead of using Vec._++_, I use List._++_. In this style 'select' from selection sort looks like

select : ∀ {n} (xs : Vec A (suc n))
       → (∃₂ \y ys → (y ◂ toList ys ≡ toList xs) × (y ≤* toList ys)) in-time n

I.e. given a vector xs with n+1 elements, return a vector ys with n elements, such that inserting y into it gives us back xs. And this item y should be the smallest one.

Extension: expected runtime

An extension of this post would be to look at randomized sorting algorithms. In particular, quick sort with a randomly chosen pivot has expected runtime O(n * log n). At first I thought that all that would be needed is a function

expected : ∀ {P}
         → (ns : List ℕ)             -- A list of numbers
         → All (\n → P in-time n) ns -- for each n we have P in-time n
         → P in-time ⌈mean ns ⌉      -- then expect time is mean of ns

But that is not quite right, since if we actually knew the runtimes ns we could just pick the fastest one. With the randomized quicksort you will end up in a situation where you have two or more computations to choose from, and you know that some are faster than the others, but you don't yet know which one. That sounds a bit classical. A second idea is to return the runtimes at a later time, something like

expected : ∀ {P} {long-time}
         → (xs : List (\ex n P in-time n) in-time long-time)
         → P in-time ⌈mean map proj1 xs ⌉

But this is not quite right either, since after long-time computing P (i.e. a sorting) can be done in 0 time. Rather, we need to decouple the proof about the runtime from the computation. This is not possible with the _in-time_ monad. We would need to get rid of the runtime from the type, and store it as a value instead.

I have tried redoing the proofs in this post with the monad

data Timed (A : Set) : Set a where
  _in-time_ : A → ℕ → Timed A
runtime : Timed A → ℕ

But I didn't succeed; I ended up with the baffling error message

runtime (big-lambda-term (unbox (x ≤? u)))
!=
runtime (big-lambda-term (unbox (x ≤? u)))

Another extension: lower bound on runtime

So far I have proved that you can sort a list in time n * log n. It would also be interesting to look at the well known lower bound on the runtime of sorting, and prove a theorem such as

can't-sort-in-linear-time : ¬ ∃ \k → ∀ xs → Sorted xs in-time k * length xs

unfortunately this statement is not actually true for all types. For finite sets you actually can sort in linear time with counting sort. It also fails if we happen to have some decidable total order for that type lying around. But it might be possible to prove

can't-sort-in-linear-time
  : (no-fast-compare : ∀ x y → (x ≤ y ⊎ y ≤ x) in-time 0 → x ≡ y)
  → ¬ ∃ \k → ∀ xs → Sorted xs in-time k * length xs

But you have to be really careful with a term like no-fast-compare, because inside the runtime monad we do have values of type (x ≤ y ⊎ y ≤ x). And so you can derive ∀ x y → x ≡ y in-time 1, and therefore also ⊥ in-time 1 for non trivial types. Which certainly looks wrong to me.

I don't know a way around this problem, but it might be related to the same issue as expected runtime. I.e. the problem is that all information about the runtime is bundled together with the return value. The lower bound proof essentially asks to sort a 'random' list, and by a counting argument shows that at least a certain number of comparisons are needed to be able to produce all outputs.

Comments

EelisDate: 2013-05-24T13:19Zx

In 2008 I wrote a paper that used monadic instrumentation to prove the average-case complexity of Quicksort in Coq: http://eel.is/research/quicksort/

gallaisDate: 2013-05-24T14:06Zx

``If anyone knows a better way to prove properties of functions defined with well-founded recursion, I am open to suggestions.''

I used a Bove-Capretta approach: mutually (here it's not necessary) define the domain of your function as an inductive datatype basically mimicking its call graph and the function as a partial algorithm working only on elements which are proven to be in the domain. Then prove the domain to be total and accessibility proofs to be unique.

This lets you get rid of the postulate as well as giving you a convenient way to prove properties of the function (by induction... on its call graph!).

https://gist.github.com/gallais/5643718/revisions

Nils Anders DanielssonDate: 2013-06-17T11:44Zx

In my paper "Bag Equivalence via a Proof-Relevant Membership Relation" (http://www.cse.chalmers.se/~nad/publications/danielsson-bag-equivalence.html) I use a different definition of permutation, that I have found to be rather nice to work with. The paper includes a proof sketch showing that tree sort produces a permutation of its input. The accompanying code extends this to full correctness, using Conor McBride's "Pivotal pragmatism" to enforce sortedness (http://www.cse.chalmers.se/~nad/publications/danielsson-bag-equivalence/Tree-sort.Full.html).

You may also be interested in some code by Bob Atkey (https://github.com/bobatkey/sorting-types/blob/master/agda/Linear.agda). He uses a deep embedding of linear logic in Agda to prove that insertion sort returns a permutation.

Finally I would like to note that you can use something like _in-time_ to show that head ∘ insertion-sort takes linear time, see "Lightweight Semiformal Time Complexity Analysis for Purely Functional Data Structures" (http://www.cse.chalmers.se/~nad/publications/danielsson-popl2008.html).

Twan van LaarhovenDate: 2013-06-17T14:56Zx

gallais: that seems to work, thanks. But it is a bit annoying to have to do this for each different structure of recursion.

Nils: thank you for the links, it looks like I have some reading to do :)

Dan PeeblesDate: 2013-09-16T04:49Zx

The random question is difficult, and I do think it's sort of classical. A lower bound, at first glance, would appear to rely on some sort of parametricity. Yes, for specific types you can sort in better time, but the general statement is about a comparison-based sort (i.e., a sort that knows nothing but a comparison function). Doing something useful with this would rely on some statement of parametricity, but that seems possible to do if you're willing to postulate it. Oddly enough, parametricity is anti-classical.

AnonymousDate: 2014-06-11T07:33Zx

You cite the inability to state equality propositions on Vec as a disadvantage. Did you try using HeterogeneousEquality? Is that inconvenient as well?

david karapetyanDate: 2014-11-14T07:15Zx

Why are you showing that the sorted list is a permutation of the unsorted list? Why not just show that they are equal as multisets?

Twan van LaarhovenDate: 2014-11-14T11:33Zx

Anonymous: With regard to HeterogeneousEquality: yes, I have tried it, and it doesn't really help. For instance, when using associativity of _++_ you still need to invoke associativity of _+_. In Agda, if you know that two values x and y are heterogeneously equal, then this doesn't tell you that their types are equal. In other words (x : A) ≅ (y : B) means A ≡ B → x ≡ y, not (A ≡ B) × (x ≡ y).

David: That would be another option, it is certainly easier to formalize the notion of equal multisets compared to permutations. In Agda it is probably convenient to generalize a bit more, and write:

count : (A → ℕ) → List A → ℕ
count p [] = 0
count p (x ∷ xs) = p x + count p xs

EqualMultiset : List A → List A → Set a
EqualMultiset xs ys = ∀ p → count p xs ≡ count p ys

But I'm not sure if it is actually simpler overall. For insertion and selection sort you already need the notion of insertion (_◂_≡_), and once you have that IsPermutation follows naturally. I haven't tried it, but I expect that to prove things about the count function you will probably need lots of lemmas involving commutativity and associativity of +. These are a pain to use in Agda, because not all arguments can be inferred. In Coq this might be nicer with the omega tactic.

Note that while IsPermutation implies EqualMultiset, the converse is only true if equality on A is decidable (which it should ultimately be, since we are sorting the things).

Matthew BrecknellDate: 2015-03-02T00:56Zx

I have to admit I'm not a fan of your setup for accounting for costs. Making box private might prevent you from recanting an admission that you've done some work, but it doesn't force you to admit to all the work you've done. I suspect there's no easy way around this: you probably need to formalise an abstract machine with a cost model, and implement your sorting algorithms on that machine.

Name
Homepage	(optional)
Email	(optional, will not be revealed)
	Subscribe to comments on this article
Human?	Name of the lazy functional programming language I write about: