Despite of "I understand that it's not the best solution, I need it for testing purposes", I have to ask you if you considered doing an Alter keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild" to add a new DC (your cluster2) ?
In the case you go your way (sstableloader) also advice you to make a snapshot (instead of just flushing) to avoid fails due to compactions on your active cluster1.
To answer your question, sstableloader is supposed to distribute correctly data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is that you will have all the data present on c1.node1 stored on the new c2 (each data to corresponding node). So if you have an RF=3 on c1, you should have all the data on c2 just by running sstableloader from c1.node1, if you are using RF=1 on c1, then you need to load data from c1.each_node. I suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.
I never used the tool, but that's what would be "logical" imho. Wait for a confirmation as I wouldn't to lead you to a failure of any kind. Also, I don't know if data is also replicated directly with sstableloader or if you need to repair c2 after loading data.
>I have to ask you if you considered doing an Alter keyspace, change RF
The idea is dead simple:
get data from cluster1,
put it to cluster2
I understand drawbacks of streaming sstableloader approach, I need right now something easy. Later we consider switch to Priam since it does backup/restore in a right way.
2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
IMHO, the most straight forward solution is to add cluster2 as a new DC for mykeyspace and then drop the old DC.
That's how we migrated to VPC (AWS) and we love this approach since you don't have to mess with your existing cluster, plus sync is made automatically and you can then drop your old DC safely, when you are sure.
I put steps on this ML long time ago: https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3CCA+VSrLopop7Th8nX20aOZ3As75g2jrJm3ryX119dekLYNHqFwA@...%3E
Also Datastax docs: https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
"get data from cluster1,
put it to cluster2
I would definitely use this method to do this (I actually did already, multiple times).
Up to you, I heard once that there is almost as much way of doing operational on Cassandra as the number of operators :). You should go with method you can be confident with. I can assure the one I propose is quite secure.
2015-03-31 15:32 GMT+02:00 Serega Sheypak <[hidden email]>:
From Michael Laing - posted on the wrong thread :
"We use Alain's solution as well to make major operational revisions.
We have a "red team" and a "blue team in each AWS region, so we just add and drop datacenters to get where we want to be.
2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
So, sstableloader streams a portion of data stored in /var/lib/cassandra/data/keyspace/table catalog
If we have 3 nodes and RF=3, then only 1/3 of data would be streamed to other cluster.
Problem is solved.
2015-04-01 12:05 GMT+02:00 Alain RODRIGUEZ <[hidden email]>:
|Free forum by Nabble||Edit this page|