Discussion:
"Reverse" heapify
(too old to reply)
Ross Clement
2004-10-16 08:25:00 UTC
Permalink
Hi. I'm writing a program where I need to perform a beam search. I
maintain my fixed-size list of "best current solutions" in a heap,
with the "worst" solution at the top of the heap, and better solutions
lower down. I can therefore know instantly whether a new solution
should be included in my beam, as it will be better than the solution
on top of the heap (in position 0 of my array). If it is better, then
I drop the worst solution, replace it with the new one, and then
heapify() the new one down to a suitable place to rebuild my heap.

So far so good. But, the above discussion assumes that my beam is
full. When the beam is not yet full, then all new solutions will be
added to it. Effectively, I'm building a heap bit by bit. So, the
method I use is to place the new solution at the end of the array, and
then "reverse heapify" it back up to a proper position in the heap. I
expected this to be a bit tricky, but it turned out very suspiciously
easy. Basically, I look for the parent (at position (n-1)/2 where n
starts at 0 and division is integer), and see if the new node and the
parent should be swapped. If so, I swap them, and then continue
"reverse heapifying" from the parent. The code is:

void reverseHeapify( node *sink[], int start )
{
int parent;
node *temp;

if ( start == 0 )
{
return;
}

parent = ( start - 1 ) / 2;

if ( better( sink[parent], sink[start] ))
{
temp = sink[parent];
sink[parent] = sink[start];
sink[start] = temp;

reverseHeapify( sink, parent );
}
}

As I said, this does seem *very* easy. So much so that as far as I can
see, it should be the standard way of building a heap rather than the
much more complex heapify(left), heapify(right), pushdown(current). Is
there something wrong in what I've done? Certainly, it's O(log n) to
add a new element to a heap, and therefore O(n log n) to build a heap
of n elements, which could be done in place by starting with a heap of
1, 'adding' the next element by incrementing an index, reverse
heapifying it, 'adding' the next one.

Would this be slower than the standard method? If so, why?

Note: I'm not confidence my code is 100% debugged. I wrote the code
(including a standard heapify) in one go, and it immediately worked.
Past experience suggests that hidden bugs not exposed in my test case
are a far more likely possibility than the possibility that I wrote
that much code without a single bug.

Cheers,

Ross-c
AngleWyrm
2004-10-16 10:41:30 UTC
Permalink
Post by Ross Clement
added to it. Effectively, I'm building a heap bit by bit. So, the
method I use is to place the new solution at the end of the array, and
then "reverse heapify" it back up to a proper position in the heap. I
expected this to be a bit tricky, but it turned out very suspiciously
easy. Basically, I look for the parent (at position (n-1)/2 where n
starts at 0 and division is integer), and see if the new node and the
parent should be swapped. If so, I swap them, and then continue
"reverse heapifying" from the parent.
It looks like the code is designed to swap several times--up to O(log n) record
movements--on a given insert. How does this compare to the standard heap
algorithm?
Ross Clement
2004-10-16 19:12:20 UTC
Permalink
Post by AngleWyrm
Post by Ross Clement
added to it. Effectively, I'm building a heap bit by bit. So, the
method I use is to place the new solution at the end of the array, and
then "reverse heapify" it back up to a proper position in the heap. I
expected this to be a bit tricky, but it turned out very suspiciously
easy. Basically, I look for the parent (at position (n-1)/2 where n
starts at 0 and division is integer), and see if the new node and the
parent should be swapped. If so, I swap them, and then continue
"reverse heapifying" from the parent.
It looks like the code is designed to swap several times--up to O(log n) record
movements--on a given insert. How does this compare to the standard heap
algorithm?
As far as I know, it's the same. I saw O(n log n) quoted for building
a heap, and of course heapsort is O(n log n). I wasn't thinking that
this method would differ from the standard method in magnitude, but
that as far as I can see, it should be a constant factor faster due to
the code being simpler.

I could easily be wrong of course.

Cheers,

Ross-c
Ben Pfaff
2004-10-16 20:34:22 UTC
Permalink
Post by Ross Clement
As far as I know, it's the same. I saw O(n log n) quoted for building
a heap,
No. Building a heap takes O(n) time.
Post by Ross Clement
and of course heapsort is O(n log n). I wasn't thinking that
this method would differ from the standard method in magnitude,
but that as far as I can see, it should be a constant factor
faster due to the code being simpler.
--
Ben Pfaff
email: ***@cs.stanford.edu
web: http://benpfaff.org
Ross Clement
2004-10-17 11:32:32 UTC
Permalink
Post by Ben Pfaff
Post by Ross Clement
As far as I know, it's the same. I saw O(n log n) quoted for building
a heap,
No. Building a heap takes O(n) time.
Can you please give an O(n) algorithm for building a heap? I had a
quick check around, and it seems to be O(n log n) from what I can see.

E.g. http://www2.toki.or.id/book/AlgDesignManual/LEC/LECTUR17/NODE4.HTM

Note: The above link also describes a method identical to mine for
building the heap, i.e. append new items at the end and push them up
the tree. So, that seems to answer my first question.

Cheers,

Ross-c
Ben Pfaff
2004-10-17 17:29:58 UTC
Permalink
Post by Ross Clement
Post by Ben Pfaff
Post by Ross Clement
As far as I know, it's the same. I saw O(n log n) quoted for building
a heap,
No. Building a heap takes O(n) time.
Can you please give an O(n) algorithm for building a heap? I had a
quick check around, and it seems to be O(n log n) from what I can see.
http://www.nist.gov/dads/HTML/buildHeap.html
http://www.sgi.com/tech/stl/make_heap.html
--
"...In the UNIX world, people tend to interpret `non-technical user'
as meaning someone who's only ever written one device driver."
--Daniel Pead
Dan Kegel
2004-10-17 17:50:22 UTC
Permalink
Post by Ben Pfaff
Post by Ross Clement
Post by Ben Pfaff
No. Building a heap takes O(n) time.
Can you please give an O(n) algorithm for building a heap? I had a
quick check around, and it seems to be O(n log n) from what I can see.
http://www.nist.gov/dads/HTML/buildHeap.html
http://www.sgi.com/tech/stl/make_heap.html
Here's a nice one that explains why it's O(n) even though it seems
at first glance it should be O(n log n):

http://www.dgp.toronto.edu/people/JamesStewart/378notes/08buildheap/

Thanks for bringing the topic up, this was an interesting read.
- Dan
Ross Clement
2004-10-18 10:18:28 UTC
Permalink
Post by Ben Pfaff
Post by Ross Clement
Post by Ben Pfaff
Post by Ross Clement
As far as I know, it's the same. I saw O(n log n) quoted for building
a heap,
No. Building a heap takes O(n) time.
Can you please give an O(n) algorithm for building a heap? I had a
quick check around, and it seems to be O(n log n) from what I can see.
http://www.nist.gov/dads/HTML/buildHeap.html
http://www.sgi.com/tech/stl/make_heap.html
Thank you.

The first link that I read was one sent to me by email (thanks D).
Having seen the proof, I'm convinced. Unfortunately the proof seems to
rely on things being pushed down from above, and hence shouldn't apply
in my case where I push things up from below. I'm going to have a
quiet think to see if I can use the push-down from above approach in
my program. My problem is that I don't have a list of numbers of
heapify, but that I maintaining a heap with a maximum size while
adding things to it (which potentially knock things currently in the
heap out of it). However, given what my program is doing, The time
required to maintain the heap is likely to be the bottleneck, so it
will be worth my while optimising that part of it.

Cheers,

Ross-c
Ross Clement
2004-10-20 08:28:16 UTC
Permalink
Post by Ross Clement
The first link that I read was one sent to me by email (thanks D).
Having seen the proof, I'm convinced. Unfortunately the proof seems to
rely on things being pushed down from above, and hence shouldn't apply
in my case where I push things up from below. I'm going to have a
quiet think to see if I can use the push-down from above approach in
my program. My problem is that I don't have a list of numbers of
heapify, but that I maintaining a heap with a maximum size while
adding things to it (which potentially knock things currently in the
heap out of it).
Just as a late PS to this, I can use the more efficient measure. Up to
the point where my beam is full, I have no need to maintain it as a
heap. So, I can allow it to grow as an unordered list. When it finally
fills up, I can convert the whole thing to a heap, and then maintain
it as a heap later on. Hence I can use the more efficient push down
from above approach.

Cheers,

Ross-c

rossum
2004-10-16 22:12:19 UTC
Permalink
Post by Ross Clement
Hi. I'm writing a program where I need to perform a beam search. I
maintain my fixed-size list of "best current solutions" in a heap,
with the "worst" solution at the top of the heap, and better solutions
lower down. I can therefore know instantly whether a new solution
should be included in my beam, as it will be better than the solution
on top of the heap (in position 0 of my array). If it is better, then
I drop the worst solution, replace it with the new one, and then
heapify() the new one down to a suitable place to rebuild my heap.
So far so good. But, the above discussion assumes that my beam is
full. When the beam is not yet full, then all new solutions will be
added to it. Effectively, I'm building a heap bit by bit. So, the
method I use is to place the new solution at the end of the array, and
then "reverse heapify" it back up to a proper position in the heap. I
expected this to be a bit tricky, but it turned out very suspiciously
easy. Basically, I look for the parent (at position (n-1)/2 where n
starts at 0 and division is integer), and see if the new node and the
parent should be swapped. If so, I swap them, and then continue
void reverseHeapify( node *sink[], int start )
{
int parent;
node *temp;
if ( start == 0 )
{
return;
}
parent = ( start - 1 ) / 2;
if ( better( sink[parent], sink[start] ))
{
temp = sink[parent];
sink[parent] = sink[start];
sink[start] = temp;
reverseHeapify( sink, parent );
}
}
As I said, this does seem *very* easy. So much so that as far as I can
see, it should be the standard way of building a heap rather than the
much more complex heapify(left), heapify(right), pushdown(current). Is
there something wrong in what I've done? Certainly, it's O(log n) to
add a new element to a heap, and therefore O(n log n) to build a heap
of n elements, which could be done in place by starting with a heap of
1, 'adding' the next element by incrementing an index, reverse
heapifying it, 'adding' the next one.
Would this be slower than the standard method? If so, why?
Note: I'm not confidence my code is 100% debugged. I wrote the code
(including a standard heapify) in one go, and it immediately worked.
Past experience suggests that hidden bugs not exposed in my test case
that much code without a single bug.
Cheers,
Ross-c
A thought. When you find the first solution fill your beam with n
copies of the first solution, dead easy since there is no need to sort
with all elements identical. After that the heap is always full size.

Possible problem, what if more than one of the copies of the initial
solution survives to the end? I don't know how your code will cope
with identical solutions. Is there a dummy solution which must be
worse than all possible actual solution? That would avoid having any
duplicates surviving.

rossum




--

The ultimate truth is that there is no Ultimate Truth
Ross Clement
2004-10-18 10:36:46 UTC
Permalink
Post by rossum
A thought. When you find the first solution fill your beam with n
copies of the first solution, dead easy since there is no need to sort
with all elements identical. After that the heap is always full size.
Possible problem, what if more than one of the copies of the initial
solution survives to the end? I don't know how your code will cope
with identical solutions. Is there a dummy solution which must be
worse than all possible actual solution? That would avoid having any
duplicates surviving.
This is very likely to happen. Having thought about what my program is
doing a bit more carefully, I'm actually performing a breadth-first
search, with the beam being applied to solutions of length N, to
'''handle''' the combinatoric explosion. Hence I couldn't add in the
first solution repeated many times as it could be a good one, and
hence duplicates might remain to the end.

However, I can very easily create dummy solutions which are guaranteed
to be worse than any solutions actually created. I'll look into that.
In fact, I could fill the heap with the first solution except that
I've fidded the evaluation measure on all but one of them so that it's
worse than any other solution found. Current evaluation can't get
worse than 0. I could enter dummies with a worth of -1. The single
solution with the proper solution would be the last one in the array,
and that would give a proper heap.

However, in that case, because the worst solution is at the top of my
heap, I'd end up pushing the first #beam entries down to the bottom of
the heap, rather than be adding them to a small heap. I'd have to
experiment to find which approach is faster.
Post by rossum
rossum
Cheers,

Ross-c
CBFalconer
2004-10-17 04:54:43 UTC
Permalink
Post by Ross Clement
Hi. I'm writing a program where I need to perform a beam search.
What is a beam?
--
Chuck F (***@yahoo.com) (***@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Ross Clement
2004-10-17 11:12:44 UTC
Permalink
Post by CBFalconer
Post by Ross Clement
Hi. I'm writing a program where I need to perform a beam search.
What is a beam?
"Beam search" is a compound noun. It's wsed in state-space search.
Instead of (in theory) maintaining a list of all possible open states
which can be expanded, keep only the best N open states, for a fixed
value of N.

Cheers,

Ross-c
Loading...