Quantcast

Slow bulk loading

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Slow bulk loading

Pierre
Hi,

I m streaming a big sstable using bulk loader of sstableloader but it's very slow (3 Mbytes/sec) :

Summary statistics: 
   Connections per host:         : 1         
   Total files transferred:      : 1         
   Total bytes transferred:      : 10357947484
   Total duration (ms):          : 3280229   
   Average transfer rate (MB/s): : 3         
   Peak transfer rate (MB/s):    : 3   

I'm on a single node configuration, empty keyspace and table, with good hardware 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the process. I'm uploading from another server.

The sstable is 9GB in size and have 4 partitions, but a lot of rows per partition (like 100 millions), the clustering key is a INT and have 4 other regulars columns, so approximatly 500 millions cells per ColumnFamily.

When I upload I notice one core of the cassandra node is full CPU (all other cores are idleing), so I assume I'm CPU bound on node side. But why ? What the node is doing ? Why does it take so long time ?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow bulk loading

Nate McCall-4


When I upload I notice one core of the cassandra node is full CPU (all other cores are idleing),

Take a look at the interrupt distribution (cat /proc/interrupts). You'll probably see disk and network interrupts mostly/all bound to CPU0. If that is the case, this article has an excellent description of the underlying issue as well as some work-arounds: http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux



--
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slow bulk loading

Mike Neir
In reply to this post by Pierre
It sounds as though you could be having troubles with Garbage Collection. Check
your cassandra system logs and search for "GC". If you see frequent garbage
collections taking more than a second or two to complete, you're going to need
to do some configuration tweaking.

On 05/07/2015 04:44 AM, Pierre Devops wrote:

> Hi,
>
> I m streaming a big sstable using bulk loader of sstableloader but it's very
> slow (3 Mbytes/sec) :
>
> Summary statistics:
>     Connections per host:         : 1
>     Total files transferred:      : 1
>     Total bytes transferred:      : 10357947484
>     Total duration (ms):          : 3280229
>     Average transfer rate (MB/s): : 3
>     Peak transfer rate (MB/s):    : 3
>
> I'm on a single node configuration, empty keyspace and table, with good hardware
> 8x2.8ghz 32G RAM, dedicated to cassandra, so it's plenty of ressource for the
> process. I'm uploading from another server.
>
> The sstable is 9GB in size and have 4 partitions, but a lot of rows per
> partition (like 100 millions), the clustering key is a INT and have 4 other
> regulars columns, so approximatly 500 millions cells per ColumnFamily.
>
> When I upload I notice one core of the cassandra node is full CPU (all other
> cores are idleing), so I assume I'm CPU bound on node side. But why ? What the
> node is doing ? Why does it take so long time ?
>

--



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Loading...