Deleted snapshot files filling up /var/lib/cassandra

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Deleted snapshot files filling up /var/lib/cassandra

David Wahler
We have a 16-node, globally-distributed cluster. running Cassandra
2.0.12. We're using the Datastax packages on CentOS 6.5.

Even though the total amount of data on each server is only a few
hundred MB (as measured by both du and the "load" metric), we're
seeing a problem where the disk usage is steadily increasing and
eventually filling up the 10GB /var/lib/cassandra partition. Running
"lsof" on the Cassandra process shows that it has open file handles
for thousands of deleted snapshot files:

$ sudo lsof -p 4753 | grep DEL -c
13314
$ sudo lsof -p 4753 | grep DEL | head
java 4753 cassandra DEL REG 253,6 538873
/var/lib/cassandra/data/keyspace/cf/snapshots/65bc8170-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3979-Index.db
java 4753 cassandra DEL REG 253,6 538899
/var/lib/cassandra/data/keyspace/cf/snapshots/8cb41770-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3983-Index.db
...etc...

We're not manually creating these snapshots; they're being generated
by periodic runs of "nodetool repair -pr". There are some errors in
system.log that seem to be related:

ERROR [RepairJobTask:10] 2015-03-16 02:02:12,485 RepairJob.java (line
143) Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.1.1.188
        at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:81)
        at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:344)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,486
RepairSession.java (line 288) [repair
#55a8eb50-cbaa-11e4-9af9-27d7677e5965] session completed with the
following error
java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,488
CassandraDaemon.java (line 199) Exception in thread
Thread[AntiEntropySessions:4,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        ... 3 more

Has anyone encountered this problem before? The same stack trace shows
up in CASSANDRA-8020, but that bug was supposedly introduced in 2.1.0
and fixed in 2.1.1. In any case, we don't want to upgrade to 2.1.x,
since the consensus on this list seems to be that it's not yet
production-ready.

I'm fairly new to Cassandra, so general troubleshooting tips would
also be much appreciated.

Thanks,
-- David
Jan
Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

Jan
David; 

all the packaged installations use the /var/lib/cassandra directory. 
Could you check your yaml config files and see if you are using this default directory  for backups 

May want to change it to a location with more disk space. 



hope this helps
Jan/
 



On Monday, March 16, 2015 2:52 PM, David Wahler <[hidden email]> wrote:


We have a 16-node, globally-distributed cluster. running Cassandra
2.0.12. We're using the Datastax packages on CentOS 6.5.

Even though the total amount of data on each server is only a few
hundred MB (as measured by both du and the "load" metric), we're
seeing a problem where the disk usage is steadily increasing and
eventually filling up the 10GB /var/lib/cassandra partition. Running
"lsof" on the Cassandra process shows that it has open file handles
for thousands of deleted snapshot files:

$ sudo lsof -p 4753 | grep DEL -c
13314
$ sudo lsof -p 4753 | grep DEL | head
java 4753 cassandra DEL REG 253,6 538873
/var/lib/cassandra/data/keyspace/cf/snapshots/65bc8170-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3979-Index.db
java 4753 cassandra DEL REG 253,6 538899
/var/lib/cassandra/data/keyspace/cf/snapshots/8cb41770-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3983-Index.db
...etc...

We're not manually creating these snapshots; they're being generated
by periodic runs of "nodetool repair -pr". There are some errors in
system.log that seem to be related:

ERROR [RepairJobTask:10] 2015-03-16 02:02:12,485 RepairJob.java (line
143) Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.1.1.188
        at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:81)
        at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:344)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,486
RepairSession.java (line 288) [repair
#55a8eb50-cbaa-11e4-9af9-27d7677e5965] session completed with the
following error
java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,488
CassandraDaemon.java (line 199) Exception in thread
Thread[AntiEntropySessions:4,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed during snapshot creation.
        at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        ... 3 more

Has anyone encountered this problem before? The same stack trace shows
up in CASSANDRA-8020, but that bug was supposedly introduced in 2.1.0
and fixed in 2.1.1. In any case, we don't want to upgrade to 2.1.x,
since the consensus on this list seems to be that it's not yet
production-ready.

I'm fairly new to Cassandra, so general troubleshooting tips would
also be much appreciated.

Thanks,
-- David


Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

David Wahler
On Mon, Mar 16, 2015 at 5:28 PM, Jan <[hidden email]> wrote:
> David;
>
> all the packaged installations use the /var/lib/cassandra directory.
> Could you check your yaml config files and see if you are using this default
> directory  for backups
>
> May want to change it to a location with more disk space.

We're using the default /var/lib/cassandra as our data directory,
mounted as its own LVM volume. I don't see anything in cassandra.yaml
about a backup directory. There is an "incremental_backups" option
which is set to false.

Increasing the available disk space doesn't really seem like a
solution. We have only about 450MB of live data on the most
heavily-loaded server, and the space taken up by these deleted files
is growing by several GB per day. For now we can work around the
problem by periodically restarting servers to close the file handles,
but that hurts our availability and seems like a hack.
Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

Ben Bromhead
Cassandra will by default snapshot your data directory on the following events:
  • TRUNCATE and DROP schema events
  • when you run nodetool repair
  • when you run nodetool snapshot
Snapshots are just hardlinks to existing SSTables so the only disk space they take up is for files that have since been compacted away. Disk space for snapshots will be freed when the last link to the files are removed. You can remove all snapshots in a cluster using nodetool clearsnapshot 

Snapshots will fail if you are out of disk space (this is counterintuitive to the above, but it is true), if you have not increased the number of available file descriptors or if there are permissions issues.

Out of curiosity, how often are you running repair? 

On 16 March 2015 at 15:52, David Wahler <[hidden email]> wrote:
On Mon, Mar 16, 2015 at 5:28 PM, Jan <[hidden email]> wrote:
> David;
>
> all the packaged installations use the /var/lib/cassandra directory.
> Could you check your yaml config files and see if you are using this default
> directory  for backups
>
> May want to change it to a location with more disk space.

We're using the default /var/lib/cassandra as our data directory,
mounted as its own LVM volume. I don't see anything in cassandra.yaml
about a backup directory. There is an "incremental_backups" option
which is set to false.

Increasing the available disk space doesn't really seem like a
solution. We have only about 450MB of live data on the most
heavily-loaded server, and the space taken up by these deleted files
is growing by several GB per day. For now we can work around the
problem by periodically restarting servers to close the file handles,
but that hurts our availability and seems like a hack.



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

David Wahler
On Mon, Mar 16, 2015 at 6:12 PM, Ben Bromhead <[hidden email]> wrote:

> Cassandra will by default snapshot your data directory on the following
> events:
>
> TRUNCATE and DROP schema events
> when you run nodetool repair
> when you run nodetool snapshot
>
> Snapshots are just hardlinks to existing SSTables so the only disk space
> they take up is for files that have since been compacted away. Disk space
> for snapshots will be freed when the last link to the files are removed. You
> can remove all snapshots in a cluster using nodetool clearsnapshot
>
> Snapshots will fail if you are out of disk space (this is counterintuitive
> to the above, but it is true), if you have not increased the number of
> available file descriptors or if there are permissions issues.
>
> Out of curiosity, how often are you running repair?

Thanks for the information. We're running repair once per week, as
recommended by the Datastax documentation. The repair is staggered to
run on one machine at a time with the --partitioner-range option in
order to spread out the load.

Running "nodetool clearsnapshot" doesn't free up any space. I'm
guessing that because the snapshot files have been deleted from the
filesystem, Cassandra thinks the snapshots are already gone. But
because it still has the file descriptors open, the disk space hasn't
actually been reclaimed.
Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

Ben Bromhead
If you are running a sequential repair (or have previously run a sequential repair that is still running) Cassandra will still have the file descriptors open for files in the snapshot it is using for the repair operation.

From the http://www.datastax.com/dev/blog/repair-in-cassandra

Cassandra 1.2 introduced a new option to repair to help manage the problems caused by the nodes all repairing with each other at the same time, it is call a snapshot repair, or sequential repair. As of Cassandra 2.1, sequential repair is the default, and the old parallel repair an option. Sequential repair has all of the nodes involved take a snapshot, the snapshot lives until the repair finishes, and then is removed. By taking a snapshot, repair can procede in a serial fashion, such that only two nodes are ever comparing with each other at a time. This makes the overall repair process slower, but decreases the burden placed on the nodes, and means you have less impact on reads/writes to the system.

On 16 March 2015 at 16:33, David Wahler <[hidden email]> wrote:
On Mon, Mar 16, 2015 at 6:12 PM, Ben Bromhead <[hidden email]> wrote:
> Cassandra will by default snapshot your data directory on the following
> events:
>
> TRUNCATE and DROP schema events
> when you run nodetool repair
> when you run nodetool snapshot
>
> Snapshots are just hardlinks to existing SSTables so the only disk space
> they take up is for files that have since been compacted away. Disk space
> for snapshots will be freed when the last link to the files are removed. You
> can remove all snapshots in a cluster using nodetool clearsnapshot
>
> Snapshots will fail if you are out of disk space (this is counterintuitive
> to the above, but it is true), if you have not increased the number of
> available file descriptors or if there are permissions issues.
>
> Out of curiosity, how often are you running repair?

Thanks for the information. We're running repair once per week, as
recommended by the Datastax documentation. The repair is staggered to
run on one machine at a time with the --partitioner-range option in
order to spread out the load.

Running "nodetool clearsnapshot" doesn't free up any space. I'm
guessing that because the snapshot files have been deleted from the
filesystem, Cassandra thinks the snapshots are already gone. But
because it still has the file descriptors open, the disk space hasn't
actually been reclaimed.



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

David Wahler
On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead <[hidden email]> wrote:
> If you are running a sequential repair (or have previously run a sequential
> repair that is still running) Cassandra will still have the file descriptors
> open for files in the snapshot it is using for the repair operation.

Yeah, that aligns with my understanding of how the repair process
works. But the cluster has no repair sessions active (I think; when I
run "nodetool tpstats", the AntiEntropyStage and AntiEntropySessions
values are zero on all nodes) and the space still hasn't been freed.
Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

Ben Bromhead
Sorry for the late reply. 

To immediately solve the problem you can restart Cassandra and all the open file descriptors to the deleted snapshots should disappear. 

As for why it happened I would first address the disk space issue and see if the snapshot errors + open file descriptors issue still occurs (I am unclear as to whether you got the snapshot exception after the disk filled up or before), if you still have issues with repair not letting go of snapshotted files even with free disk space I would look to raise a ticket in Jira.

On 17 March 2015 at 12:46, David Wahler <[hidden email]> wrote:
On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead <[hidden email]> wrote:
> If you are running a sequential repair (or have previously run a sequential
> repair that is still running) Cassandra will still have the file descriptors
> open for files in the snapshot it is using for the repair operation.

Yeah, that aligns with my understanding of how the repair process
works. But the cluster has no repair sessions active (I think; when I
run "nodetool tpstats", the AntiEntropyStage and AntiEntropySessions
values are zero on all nodes) and the space still hasn't been freed.



--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|

Re: Deleted snapshot files filling up /var/lib/cassandra

David Wahler
Sorry if I was unclear. The /var/lib/cassandra partition didn't
literally become 100% full. Our low-disk-space alarms started going
off, and that's when we noticed that the disk usage on several nodes
was steadily increasing much faster than expected. We restarted the
worst-affected node before it could run entirely out of free space,
and that brought the disk usage back down to the expected level, but
it immediately started creeping upwards again. The snapshot failures
have been happening periodically, starting before we noticed the high
disk usage.

I'll do some more digging through the logs to see if the exceptions
are recurring, and open a Jira ticket once I have more information.

On Fri, Mar 20, 2015 at 11:52 AM, Ben Bromhead <[hidden email]> wrote:

> Sorry for the late reply.
>
> To immediately solve the problem you can restart Cassandra and all the open
> file descriptors to the deleted snapshots should disappear.
>
> As for why it happened I would first address the disk space issue and see if
> the snapshot errors + open file descriptors issue still occurs (I am unclear
> as to whether you got the snapshot exception after the disk filled up or
> before), if you still have issues with repair not letting go of snapshotted
> files even with free disk space I would look to raise a ticket in Jira.
>
> On 17 March 2015 at 12:46, David Wahler <[hidden email]> wrote:
>>
>> On Mon, Mar 16, 2015 at 6:51 PM, Ben Bromhead <[hidden email]> wrote:
>> > If you are running a sequential repair (or have previously run a
>> > sequential
>> > repair that is still running) Cassandra will still have the file
>> > descriptors
>> > open for files in the snapshot it is using for the repair operation.
>>
>> Yeah, that aligns with my understanding of how the repair process
>> works. But the cluster has no repair sessions active (I think; when I
>> run "nodetool tpstats", the AntiEntropyStage and AntiEntropySessions
>> values are zero on all nodes) and the space still hasn't been freed.
>
>
>
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692