Disastrous profusion of SSTables

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Disastrous profusion of SSTables

Dave Galbraith
Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:

    Read Count: 42071
    Read Latency: 67.47804242827601 ms.
    Write Count: 131964300
    Write Latency: 0.011721604274792501 ms.
    Pending Flushes: 0
        Table: metrics16513
        SSTable count: 641
        Space used (live): 6366740812
        Space used (total): 6366740812
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.25272488401992765
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 1016
        Local read count: 42071
        Local read latency: 67.479 ms
        Local write count: 131964300
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 994
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 37840376
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 24601
        Compacted partition mean bytes: 255
        Average live cells per slice (last five minutes): 111.67243951154147
        Maximum live cells per slice (last five minutes): 1588.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

and nodetool cfhistograms:

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                 
50%            46.00              6.99         154844.95               149                 1
75%           430.00              8.53        3518837.53               179                 1
95%           430.00             11.32        7252897.25               215                 2
98%           430.00             15.54       22103886.34               215                 3
99%           430.00             29.86       22290608.19              1597                50
Min             0.00              1.66             26.91               104                 0
Max           430.00         269795.38       27311364.89             24601               924

Gross!! There are 641 SSTables in there, and all my reads are hitting hundreds of them and timing out. How could this possibly have happened, and what can I do about it? Nodetool compactionstats says pending tasks: 0, by the way. Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Disastrous profusion of SSTables

Anishek Agarwal
Are you frequently updating same rows ? What is the memtable flush size ? can you post the table create query here in please.

On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <[hidden email]> wrote:
Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:

    Read Count: 42071
    Read Latency: 67.47804242827601 ms.
    Write Count: 131964300
    Write Latency: 0.011721604274792501 ms.
    Pending Flushes: 0
        Table: metrics16513
        SSTable count: 641
        Space used (live): 6366740812
        Space used (total): 6366740812
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.25272488401992765
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 1016
        Local read count: 42071
        Local read latency: 67.479 ms
        Local write count: 131964300
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 994
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 37840376
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 24601
        Compacted partition mean bytes: 255
        Average live cells per slice (last five minutes): 111.67243951154147
        Maximum live cells per slice (last five minutes): 1588.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

and nodetool cfhistograms:

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                 
50%            46.00              6.99         154844.95               149                 1
75%           430.00              8.53        3518837.53               179                 1
95%           430.00             11.32        7252897.25               215                 2
98%           430.00             15.54       22103886.34               215                 3
99%           430.00             29.86       22290608.19              1597                50
Min             0.00              1.66             26.91               104                 0
Max           430.00         269795.38       27311364.89             24601               924

Gross!! There are 641 SSTables in there, and all my reads are hitting hundreds of them and timing out. How could this possibly have happened, and what can I do about it? Nodetool compactionstats says pending tasks: 0, by the way. Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: Disastrous profusion of SSTables

graham sanderson
you may be seeing


related issues (which ends up with excessive numbers of sstables)

we applied

diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactio
index fbd715c..cbb8c8b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
@@ -118,7 +118,11 @@ public class SizeTieredCompactionStrategy extends AbstractCompactionStrategy
     static List<SSTableReader> filterColdSSTables(List<SSTableReader> sstables, double coldReadsToOmit, int minThreshold)
     {
         if (coldReadsToOmit == 0.0)
+        {
+            if (!sstables.isEmpty())
+                logger.debug("Skipping cold sstable filter for list sized {} containing {}", sstables.size(), sstables.get(0).getFilename());
             return sstables;
+        }

 

         // Sort the sstables by hotness (coldest-first). We first build a map because the hotness may change during the sort.
         final Map<SSTableReader, Double> hotnessSnapshot = getHotnessMap(sstables);
diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCo
index 84e7d61..c6c5f1b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class SizeTieredCompactionStrategyOptions
     protected static final long DEFAULT_MIN_SSTABLE_SIZE = 50L * 1024L * 1024L;
     protected static final double DEFAULT_BUCKET_LOW = 0.5;
     protected static final double DEFAULT_BUCKET_HIGH = 1.5;
-    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.05;
+    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.0;
     protected static final String MIN_SSTABLE_SIZE_KEY = "min_sstable_size";
     protected static final String BUCKET_LOW_KEY = "bucket_low";
     protected static final String BUCKET_HIGH_KEY = "bucket_high";

to our 2.1.3, though the entire coldReadsToOmit is removed in 2.1.4

Note you don’t have to patch your code, you can set the value on each table (we just have a lot and dynamically generated ones) - basically try setting coldReadsToOmit back to 0 which was the default in 2.0.x

On Mar 26, 2015, at 3:56 AM, Anishek Agarwal <[hidden email]> wrote:

Are you frequently updating same rows ? What is the memtable flush size ? can you post the table create query here in please.

On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <[hidden email]> wrote:
Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:

    Read Count: 42071
    Read Latency: 67.47804242827601 ms.
    Write Count: 131964300
    Write Latency: 0.011721604274792501 ms.
    Pending Flushes: 0
        Table: metrics16513
        SSTable count: 641
        Space used (live): 6366740812
        Space used (total): 6366740812
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.25272488401992765
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 1016
        Local read count: 42071
        Local read latency: 67.479 ms
        Local write count: 131964300
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 994
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 37840376
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 24601
        Compacted partition mean bytes: 255
        Average live cells per slice (last five minutes): 111.67243951154147
        Maximum live cells per slice (last five minutes): 1588.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

and nodetool cfhistograms:

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                 
50%            46.00              6.99         154844.95               149                 1
75%           430.00              8.53        3518837.53               179                 1
95%           430.00             11.32        7252897.25               215                 2
98%           430.00             15.54       22103886.34               215                 3
99%           430.00             29.86       22290608.19              1597                50
Min             0.00              1.66             26.91               104                 0
Max           430.00         269795.38       27311364.89             24601               924

Gross!! There are 641 SSTables in there, and all my reads are hitting hundreds of them and timing out. How could this possibly have happened, and what can I do about it? Nodetool compactionstats says pending tasks: 0, by the way. Thanks!



smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Disastrous profusion of SSTables

Dave Galbraith
It looks like it was CASSANDRA-8860, setting that cold reads to omit thing down to zero took my SSTable count from 641 to 1 and made all my queries work. Thank you!!

On Thu, Mar 26, 2015 at 4:55 AM, graham sanderson <[hidden email]> wrote:
you may be seeing


related issues (which ends up with excessive numbers of sstables)

we applied

diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactio
index fbd715c..cbb8c8b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
@@ -118,7 +118,11 @@ public class SizeTieredCompactionStrategy extends AbstractCompactionStrategy
     static List<SSTableReader> filterColdSSTables(List<SSTableReader> sstables, double coldReadsToOmit, int minThreshold)
     {
         if (coldReadsToOmit == 0.0)
+        {
+            if (!sstables.isEmpty())
+                logger.debug("Skipping cold sstable filter for list sized {} containing {}", sstables.size(), sstables.get(0).getFilename());
             return sstables;
+        }

 

         // Sort the sstables by hotness (coldest-first). We first build a map because the hotness may change during the sort.
         final Map<SSTableReader, Double> hotnessSnapshot = getHotnessMap(sstables);
diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCo
index 84e7d61..c6c5f1b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class SizeTieredCompactionStrategyOptions
     protected static final long DEFAULT_MIN_SSTABLE_SIZE = 50L * 1024L * 1024L;
     protected static final double DEFAULT_BUCKET_LOW = 0.5;
     protected static final double DEFAULT_BUCKET_HIGH = 1.5;
-    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.05;
+    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.0;
     protected static final String MIN_SSTABLE_SIZE_KEY = "min_sstable_size";
     protected static final String BUCKET_LOW_KEY = "bucket_low";
     protected static final String BUCKET_HIGH_KEY = "bucket_high";

to our 2.1.3, though the entire coldReadsToOmit is removed in 2.1.4

Note you don’t have to patch your code, you can set the value on each table (we just have a lot and dynamically generated ones) - basically try setting coldReadsToOmit back to 0 which was the default in 2.0.x

On Mar 26, 2015, at 3:56 AM, Anishek Agarwal <[hidden email]> wrote:

Are you frequently updating same rows ? What is the memtable flush size ? can you post the table create query here in please.

On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <[hidden email]> wrote:
Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:

    Read Count: 42071
    Read Latency: 67.47804242827601 ms.
    Write Count: 131964300
    Write Latency: 0.011721604274792501 ms.
    Pending Flushes: 0
        Table: metrics16513
        SSTable count: 641
        Space used (live): <a href="tel:6366740812" value="+16366740812" target="_blank">6366740812
        Space used (total): <a href="tel:6366740812" value="+16366740812" target="_blank">6366740812
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.25272488401992765
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 1016
        Local read count: 42071
        Local read latency: 67.479 ms
        Local write count: 131964300
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 994
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 37840376
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 24601
        Compacted partition mean bytes: 255
        Average live cells per slice (last five minutes): 111.67243951154147
        Maximum live cells per slice (last five minutes): 1588.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

and nodetool cfhistograms:

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                 
50%            46.00              6.99         154844.95               149                 1
75%           430.00              8.53        3518837.53               179                 1
95%           430.00             11.32        7252897.25               215                 2
98%           430.00             15.54       22103886.34               215                 3
99%           430.00             29.86       22290608.19              1597                50
Min             0.00              1.66             26.91               104                 0
Max           430.00         269795.38       27311364.89             24601               924

Gross!! There are 641 SSTables in there, and all my reads are hitting hundreds of them and timing out. How could this possibly have happened, and what can I do about it? Nodetool compactionstats says pending tasks: 0, by the way. Thanks!



Reply | Threaded
Open this post in threaded view
|

Re: Disastrous profusion of SSTables

Robert Coli-3
In reply to this post by Dave Galbraith
On Thu, Mar 26, 2015 at 12:51 AM, Dave Galbraith <[hidden email]> wrote:
Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:

For the record, the cassandra download page currently advises using 2.0.x as the stable production version.

"
The most stable release of Apache Cassandra is 2.0.13 (released on 2015-03-16). If you are in production or planning to be soon, download this one.
"

=Rob