Server cannot startup after shutdown

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Server cannot startup after shutdown

Brian Frank Cooper
Hi folks,

I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.

Is this a known issue?

ERROR - Fatal exception in thread Thread[main,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)

ERROR - Exception encountered during startup.
java.io.UTFDataFormatException: malformed input around byte 5497
        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
Exception encountered during startup.
java.io.UTFDataFormatException: malformed input around byte 5497
        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)

Thanks for the help!

Brian
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
The malformed input bug was fixed after beta1 and should be in a
nightly build by now.  (I introduced a regression where it couldn't
handle the last entry in the commitlog being incomplete.  So upgrading
should be able to restart on the existing commitlogs.)

The OOM puzzles me a little; I'm not sure how it could be unable to
replay a mutation that it was able to write to the commitlog in the
first place.  You could try setting the memtable object and memory
thresholds lower temporarily and see if that leaves enough extra free
to do the replay.

-Jonathan

On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Hi folks,
>
> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>
> Is this a known issue?
>
> ERROR - Fatal exception in thread Thread[main,5,main]
> java.lang.OutOfMemoryError: Java heap space
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>
> ERROR - Exception encountered during startup.
> java.io.UTFDataFormatException: malformed input around byte 5497
>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
> Exception encountered during startup.
> java.io.UTFDataFormatException: malformed input around byte 5497
>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>
> Thanks for the help!
>
> Brian
Reply | Threaded
Open this post in threaded view
|

RE: Server cannot startup after shutdown

Brian Frank Cooper
Thanks for the reply. I'll try playing with the memory settings.

brian
________________________________________
From: Jonathan Ellis [[hidden email]]
Sent: Wednesday, August 19, 2009 7:46 PM
To: [hidden email]
Subject: Re: Server cannot startup after shutdown

The malformed input bug was fixed after beta1 and should be in a
nightly build by now.  (I introduced a regression where it couldn't
handle the last entry in the commitlog being incomplete.  So upgrading
should be able to restart on the existing commitlogs.)

The OOM puzzles me a little; I'm not sure how it could be unable to
replay a mutation that it was able to write to the commitlog in the
first place.  You could try setting the memtable object and memory
thresholds lower temporarily and see if that leaves enough extra free
to do the replay.

-Jonathan

On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Hi folks,
>
> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>
> Is this a known issue?
>
> ERROR - Fatal exception in thread Thread[main,5,main]
> java.lang.OutOfMemoryError: Java heap space
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>
> ERROR - Exception encountered during startup.
> java.io.UTFDataFormatException: malformed input around byte 5497
>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
> Exception encountered during startup.
> java.io.UTFDataFormatException: malformed input around byte 5497
>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>
> Thanks for the help!
>
> Brian
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
I saw that you're testing with different tokens now -- how did the
replay OOM work out?

On Wed, Aug 19, 2009 at 10:54 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Thanks for the reply. I'll try playing with the memory settings.
>
> brian
> ________________________________________
> From: Jonathan Ellis [[hidden email]]
> Sent: Wednesday, August 19, 2009 7:46 PM
> To: [hidden email]
> Subject: Re: Server cannot startup after shutdown
>
> The malformed input bug was fixed after beta1 and should be in a
> nightly build by now.  (I introduced a regression where it couldn't
> handle the last entry in the commitlog being incomplete.  So upgrading
> should be able to restart on the existing commitlogs.)
>
> The OOM puzzles me a little; I'm not sure how it could be unable to
> replay a mutation that it was able to write to the commitlog in the
> first place.  You could try setting the memtable object and memory
> thresholds lower temporarily and see if that leaves enough extra free
> to do the replay.
>
> -Jonathan
>
> On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>>
>> Is this a known issue?
>>
>> ERROR - Fatal exception in thread Thread[main,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> ERROR - Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>> Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> Thanks for the help!
>>
>> Brian
>
Reply | Threaded
Open this post in threaded view
|

RE: Server cannot startup after shutdown

Brian Frank Cooper
I haven't had a chance to play with that yet. I'm just trying to get a bunch of data loaded so I can run some tests. Once the tests are done I will look at starting and stopping servers again.

The tokens thing helped out quite a lot.

brian

-----Original Message-----
From: Jonathan Ellis [mailto:[hidden email]]
Sent: Thursday, August 20, 2009 10:50 AM
To: [hidden email]
Subject: Re: Server cannot startup after shutdown

I saw that you're testing with different tokens now -- how did the
replay OOM work out?

On Wed, Aug 19, 2009 at 10:54 PM, Brian Frank
Cooper<[hidden email]> wrote:

> Thanks for the reply. I'll try playing with the memory settings.
>
> brian
> ________________________________________
> From: Jonathan Ellis [[hidden email]]
> Sent: Wednesday, August 19, 2009 7:46 PM
> To: [hidden email]
> Subject: Re: Server cannot startup after shutdown
>
> The malformed input bug was fixed after beta1 and should be in a
> nightly build by now.  (I introduced a regression where it couldn't
> handle the last entry in the commitlog being incomplete.  So upgrading
> should be able to restart on the existing commitlogs.)
>
> The OOM puzzles me a little; I'm not sure how it could be unable to
> replay a mutation that it was able to write to the commitlog in the
> first place.  You could try setting the memtable object and memory
> thresholds lower temporarily and see if that leaves enough extra free
> to do the replay.
>
> -Jonathan
>
> On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>>
>> Is this a known issue?
>>
>> ERROR - Fatal exception in thread Thread[main,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> ERROR - Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>> Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> Thanks for the help!
>>
>> Brian
>
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
In reply to this post by Jonathan Ellis-3
Oops, my bad -- that patch has been sitting unreviewed in
CASSANDRA-370.  I thought it was in trunk by now.  I'll try to get
someone to review that today.

-Jonathan

On Wed, Aug 19, 2009 at 9:46 PM, Jonathan Ellis<[hidden email]> wrote:

> The malformed input bug was fixed after beta1 and should be in a
> nightly build by now.  (I introduced a regression where it couldn't
> handle the last entry in the commitlog being incomplete.  So upgrading
> should be able to restart on the existing commitlogs.)
>
> The OOM puzzles me a little; I'm not sure how it could be unable to
> replay a mutation that it was able to write to the commitlog in the
> first place.  You could try setting the memtable object and memory
> thresholds lower temporarily and see if that leaves enough extra free
> to do the replay.
>
> -Jonathan
>
> On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>>
>> Is this a known issue?
>>
>> ERROR - Fatal exception in thread Thread[main,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> ERROR - Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>> Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> Thanks for the help!
>>
>> Brian
>
Reply | Threaded
Open this post in threaded view
|

RE: Server cannot startup after shutdown

Brian Frank Cooper
Hi, Jonathan,

I have been trying to shutdown and restart Cassandra again this morning. I still get the malformed entry bug (which you say below your patch fixes.) I also get:

ERROR - Exception encountered during startup.
java.lang.NegativeArraySizeException
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
Exception encountered during startup.
java.lang.NegativeArraySizeException
        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)

No out of memory error this time, though.

I'm also curious about your comment "I introduced a regression where it couldn't handle the last entry in the commitlog being incomplete." Does the last entry in the commit log being incomplete mean that the last update or set of updates are not fully committed to the log? And therefore they are lost? I thought since I had set "<CommitLogSync>true</CommitLogSync>" that all updates would be fully flushed before returning to the caller.

(BTW thanks for all the help with setting up Cassandra, it really made it easier to run experiments...)

brian
________________________________________
From: Jonathan Ellis [[hidden email]]
Sent: Monday, August 24, 2009 12:51 PM
To: [hidden email]
Subject: Re: Server cannot startup after shutdown

Oops, my bad -- that patch has been sitting unreviewed in
CASSANDRA-370.  I thought it was in trunk by now.  I'll try to get
someone to review that today.

-Jonathan

On Wed, Aug 19, 2009 at 9:46 PM, Jonathan Ellis<[hidden email]> wrote:

> The malformed input bug was fixed after beta1 and should be in a
> nightly build by now.  (I introduced a regression where it couldn't
> handle the last entry in the commitlog being incomplete.  So upgrading
> should be able to restart on the existing commitlogs.)
>
> The OOM puzzles me a little; I'm not sure how it could be unable to
> replay a mutation that it was able to write to the commitlog in the
> first place.  You could try setting the memtable object and memory
> thresholds lower temporarily and see if that leaves enough extra free
> to do the replay.
>
> -Jonathan
>
> On Wed, Aug 19, 2009 at 7:12 PM, Brian Frank
> Cooper<[hidden email]> wrote:
>> Hi folks,
>>
>> I'm using 0.4 beta1 and had six servers loaded with 20 GB of data per server. (In this test, 10 KB per record, and 2 GB heap space allocated to the JVM.) I stopped the servers (using what I think is the recommended method, the kill command). Upon trying to restart, some servers threw a UTFDataFormatException, while others threw an OutOfMemoryError exception. None of them started.
>>
>> Is this a known issue?
>>
>> ERROR - Fatal exception in thread Thread[main,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:274)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> ERROR - Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>> Exception encountered during startup.
>> java.io.UTFDataFormatException: malformed input around byte 5497
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>>        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>>        at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:218)
>>        at org.apache.cassandra.db.CommitLog.recover(CommitLog.java:285)
>>        at org.apache.cassandra.db.RecoveryManager.doRecovery(RecoveryManager.java:63)
>>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:96)
>>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:171)
>>
>> Thanks for the help!
>>
>> Brian
>
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
On Wed, Aug 26, 2009 at 1:03 AM, Brian Frank
Cooper<[hidden email]> wrote:
> Hi, Jonathan,
>
> I have been trying to shutdown and restart Cassandra again this morning. I still get the malformed entry bug (which you say below your patch fixes.) I also get:
>
> ERROR - Exception encountered during startup.
> java.lang.NegativeArraySizeException

That could be a related problem.  Or it might be a different bug. :)
Is the commitlog small enough that you can gzip it and attach to JIRA
(10 MB limit)?

> No out of memory error this time, though.
>
> I'm also curious about your comment "I introduced a regression where it couldn't handle the last entry in the commitlog being incomplete." Does the last entry in the commit log being incomplete mean that the last update or set of updates are not fully committed to the log? And therefore they are lost? I thought since I had set "<CommitLogSync>true</CommitLogSync>" that all updates would be fully flushed before returning to the caller.

Right, but you can still have an incomplete write in progress if you
shut down while writes are still happening.

-Jonathan
Reply | Threaded
Open this post in threaded view
|

RE: Server cannot startup after shutdown

Brian Frank Cooper
> Is the commitlog small enough that you can gzip it and attach to JIRA
> (10 MB limit)?

/var/cassandra/commitlog has 215 files totaling about 28 GB. Most are 134 MB, the last one is 6MB. Which one would be useful to you?

> Right, but you can still have an incomplete write in progress if you
> shut down while writes are still happening.

That's the curious thing; there were no writes in progress. In fact, my experiment had finished about 24 hours before, and there was no load in between, and then I shut down and still couldn't restart. I figured all the writes would have committed by then.

thanks...

brian
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
In reply to this post by Jonathan Ellis-3
On Wed, Aug 19, 2009 at 9:46 PM, Jonathan Ellis<[hidden email]> wrote:
> The OOM puzzles me a little; I'm not sure how it could be unable to
> replay a mutation that it was able to write to the commitlog in the
> first place.

Ah, I think I know: if a compaction starts during recovery, that could
suck up a bunch of memory.
Reply | Threaded
Open this post in threaded view
|

Re: Server cannot startup after shutdown

Jonathan Ellis-3
In reply to this post by Brian Frank Cooper
On Wed, Aug 26, 2009 at 12:26 PM, Brian Frank
Cooper<[hidden email]> wrote:
>> Is the commitlog small enough that you can gzip it and attach to JIRA
>> (10 MB limit)?
>
> /var/cassandra/commitlog has 215 files totaling about 28 GB. Most are 134 MB, the last one is 6MB. Which one would be useful to you?

Can you update to trunk and re-run recovery with log level set to
DEBUG?  It will log the file it is in like this:

DEBUG - Replaying
/var/lib/cassandra/commitlog/CommitLog-1251137387800.log starting at
117

then before it errors out the last entry like

DEBUG - Reading mutation at 666

> That's the curious thing; there were no writes in progress. In fact, my experiment had finished about 24 hours before, and there was no load in between, and then I shut down and still couldn't restart. I figured all the writes would have committed by then.

Must be a different bug, then.  Thanks for finding it for us! :)

-Jonathan