Quantcast

Multinode Cassandra and sstableloader

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Multinode Cassandra and sstableloader

Serega Sheypak

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multinode Cassandra and sstableloader

arodrime
Hi,

Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <[hidden email]>:

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multinode Cassandra and sstableloader

Serega Sheypak
>I have to ask you if you considered doing an Alter keyspace, change RF 
The idea is dead simple: 
get data from cluster1, 
put it to cluster2
vipe cluster1

I understand drawbacks of streaming sstableloader approach, I need right now something easy. Later we consider switch to Priam since it does backup/restore in a right way.

2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
Hi,

Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <[hidden email]>:

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multinode Cassandra and sstableloader

arodrime
IMHO, the most straight forward solution is to add cluster2 as a new DC for mykeyspace and then drop the old DC.

That's how we migrated to VPC (AWS) and we love this approach since you don't have to mess with your existing cluster, plus sync is made automatically and you can then drop your old DC safely, when you are sure.


"get data from cluster1, 
put it to cluster2
wipe cluster1"

I would definitely use this method to do this (I actually did already, multiple times).

Up to you, I heard once that there is almost as much way of doing operational on Cassandra as the number of operators :). You should go with method you can be confident with. I can assure the one I propose is quite secure.

C*heers,

Alain

2015-03-31 15:32 GMT+02:00 Serega Sheypak <[hidden email]>:
>I have to ask you if you considered doing an Alter keyspace, change RF 
The idea is dead simple: 
get data from cluster1, 
put it to cluster2
vipe cluster1

I understand drawbacks of streaming sstableloader approach, I need right now something easy. Later we consider switch to Priam since it does backup/restore in a right way.

2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
Hi,

Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <[hidden email]>:

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multinode Cassandra and sstableloader

arodrime
From Michael Laing - posted on the wrong thread :

"We use Alain's solution as well to make major operational revisions.

We have a "red team" and a "blue team in each AWS region, so we just add and drop datacenters to get where we want to be.

Pretty simple."

2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
IMHO, the most straight forward solution is to add cluster2 as a new DC for mykeyspace and then drop the old DC.

That's how we migrated to VPC (AWS) and we love this approach since you don't have to mess with your existing cluster, plus sync is made automatically and you can then drop your old DC safely, when you are sure.


"get data from cluster1, 
put it to cluster2
wipe cluster1"

I would definitely use this method to do this (I actually did already, multiple times).

Up to you, I heard once that there is almost as much way of doing operational on Cassandra as the number of operators :). You should go with method you can be confident with. I can assure the one I propose is quite secure.

C*heers,

Alain

2015-03-31 15:32 GMT+02:00 Serega Sheypak <[hidden email]>:
>I have to ask you if you considered doing an Alter keyspace, change RF 
The idea is dead simple: 
get data from cluster1, 
put it to cluster2
vipe cluster1

I understand drawbacks of streaming sstableloader approach, I need right now something easy. Later we consider switch to Priam since it does backup/restore in a right way.

2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
Hi,

Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <[hidden email]>:

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.





Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Multinode Cassandra and sstableloader

Serega Sheypak
So, sstableloader streams a portion of data stored in /var/lib/cassandra/data/keyspace/table catalog
If we have 3 nodes and RF=3, then only 1/3 of data would be streamed to other cluster.
Problem is solved.


2015-04-01 12:05 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
From Michael Laing - posted on the wrong thread :

"We use Alain's solution as well to make major operational revisions.

We have a "red team" and a "blue team in each AWS region, so we just add and drop datacenters to get where we want to be.

Pretty simple."

2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
IMHO, the most straight forward solution is to add cluster2 as a new DC for mykeyspace and then drop the old DC.

That's how we migrated to VPC (AWS) and we love this approach since you don't have to mess with your existing cluster, plus sync is made automatically and you can then drop your old DC safely, when you are sure.


"get data from cluster1, 
put it to cluster2
wipe cluster1"

I would definitely use this method to do this (I actually did already, multiple times).

Up to you, I heard once that there is almost as much way of doing operational on Cassandra as the number of operators :). You should go with method you can be confident with. I can assure the one I propose is quite secure.

C*heers,

Alain

2015-03-31 15:32 GMT+02:00 Serega Sheypak <[hidden email]>:
>I have to ask you if you considered doing an Alter keyspace, change RF 
The idea is dead simple: 
get data from cluster1, 
put it to cluster2
vipe cluster1

I understand drawbacks of streaming sstableloader approach, I need right now something easy. Later we consider switch to Priam since it does backup/restore in a right way.

2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
Hi,

Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <[hidden email]>:

 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using sstableloader. I understand that it's not the best solution, I need it for testing purposes.

What I'm going to do:

  1. Recreate keyspace schema on cluster2 using schema from cluster1
  2. nodetool flush for mykeyspace.source_table being exported from cluster1 to cluster2
  3. Run sstableloader for each table on cluster1.node01

    sstableloader -d cluster2.nodeXXX.com /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

ALL data from source_table?

or

Just data stored in partition of source_table

I'm confused. Doc says I just run this command to export table from cluster1 to cluster2, but I specify path to a part of source_table data, since other parts of table should be on other nodes.






Loading...