Problems after trying a migration

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problems after trying a migration

David CHARBONNIER

Hi,

 

We’re using Cassandra through the Datastax Enterprise package in version 4.5.1 (Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.

 

We need to move our Cassandra cluster from France to another country. To do this, we want to add a second 7-nodes datacenter to our cluster and stream all data between the two countries before dropping the first datacenter.

 

On January 31st, we tried doing so but we had some problems:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)

-          The following procedure has been followed: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html but an error occurred during step 3. New nodes have been started before the cassandra-topology.properties file has been updated on the original datacenter. New nodes appeared in the original datacenter instead of the new one.

-          To recover our original cluster, we decommissionned every node of the new datacenter with the nodetool decommission command.

 

On February 9th, nodes in the second datacenter have been restarted and joined the cluster. We had to decommission them just like before.

 

On February 11th, we added disk space on our 7 running French nodes. To achieve this, we restarted the cluster but the nodes updated their perring informations and nodes from Luxembourg (decommissionned on February 9th) were present. This behaviour is described here: https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned system.peers table content.

 

On March 11th, we needed to add an 8th node to our existing French cluster. We installed the same Datastax Enterprise version (4.5.1 with Cassandra 2.0.8.39) and tried to add this node to the cluster with this procedure: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html. In OPSCenter, the node was joining the cluster and data streaming got stuck at 100%. After several hours, nodetool status showed us that the node was still joining but nothing in the logs let us know there was a problem. We restarted the node but it has no effect. Then we cleaned data and commitlog contents and try to add the node to the cluster again but without result.

Last try was to add the node with auto_bootstrap : false in order to add the node to the cluster manually but it messed up with the data. So we shut down the node and decommissioned it (with nodetool removenode). The whole cluster has been repaired and we stopped doing anything.

 

Now, our cluster has only 7 French nodes in which we can’t add any node. The OPSCenter data has disapeared and we work without any information about how our cluster is running.

 

You’ll find attached to this email our current configuration and a screenshot of our OPSCenter metric page.

 

Do you have some idea on how to clean up the mess and get our cluster running cleanly before we start our migration (France to another country like described in the beginning of this email)?

 

Thank you.

 

Best regards,

 

David CHARBONNIER

Sysadmin

T : +33 411 934 200

[hidden email]

ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com

 

 

 


dse.yaml (7K) Download Attachment
cassandra.yaml (43K) Download Attachment
cassandra-topology.properties (1K) Download Attachment
OPSCenter.png (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problems after trying a migration

Fabien Rousseau
Hi David,

There is an excellent article which describes exactly what you want to do (ie migrate from one DC to another DC) :

2015-03-18 17:05 GMT+01:00 David CHARBONNIER <[hidden email]>:

Hi,

 

We’re using Cassandra through the Datastax Enterprise package in version 4.5.1 (Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.

 

We need to move our Cassandra cluster from France to another country. To do this, we want to add a second 7-nodes datacenter to our cluster and stream all data between the two countries before dropping the first datacenter.

 

On January 31st, we tried doing so but we had some problems:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)

-          The following procedure has been followed: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html but an error occurred during step 3. New nodes have been started before the cassandra-topology.properties file has been updated on the original datacenter. New nodes appeared in the original datacenter instead of the new one.

-          To recover our original cluster, we decommissionned every node of the new datacenter with the nodetool decommission command.

 

On February 9th, nodes in the second datacenter have been restarted and joined the cluster. We had to decommission them just like before.

 

On February 11th, we added disk space on our 7 running French nodes. To achieve this, we restarted the cluster but the nodes updated their perring informations and nodes from Luxembourg (decommissionned on February 9th) were present. This behaviour is described here: https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned system.peers table content.

 

On March 11th, we needed to add an 8th node to our existing French cluster. We installed the same Datastax Enterprise version (4.5.1 with Cassandra 2.0.8.39) and tried to add this node to the cluster with this procedure: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html. In OPSCenter, the node was joining the cluster and data streaming got stuck at 100%. After several hours, nodetool status showed us that the node was still joining but nothing in the logs let us know there was a problem. We restarted the node but it has no effect. Then we cleaned data and commitlog contents and try to add the node to the cluster again but without result.

Last try was to add the node with auto_bootstrap : false in order to add the node to the cluster manually but it messed up with the data. So we shut down the node and decommissioned it (with nodetool removenode). The whole cluster has been repaired and we stopped doing anything.

 

Now, our cluster has only 7 French nodes in which we can’t add any node. The OPSCenter data has disapeared and we work without any information about how our cluster is running.

 

You’ll find attached to this email our current configuration and a screenshot of our OPSCenter metric page.

 

Do you have some idea on how to clean up the mess and get our cluster running cleanly before we start our migration (France to another country like described in the beginning of this email)?

 

Thank you.

 

Best regards,

 

David CHARBONNIER

Sysadmin

T : +33 411 934 200

[hidden email]

ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com

 

 

 




--
Reply | Threaded
Open this post in threaded view
|

RE: Problems after trying a migration

David CHARBONNIER

Hi Fabien,

 

Thank you for the link ! That’s exactly what we want to do.

But before starting this, we need to clean up the mess in order to get a clean cluster.

 

Thanks for your help.

 

Best regards,

 

David CHARBONNIER

Sysadmin

T : +33 411 934 200

[hidden email]

ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com

 

 

 

De : Fabien Rousseau [mailto:[hidden email]]
Envoyé : mercredi 18 mars 2015 17:32
À : user
Objet : Re: Problems after trying a migration

 

Hi David,

 

There is an excellent article which describes exactly what you want to do (ie migrate from one DC to another DC) :

 

2015-03-18 17:05 GMT+01:00 David CHARBONNIER <[hidden email]>:

Hi,

 

We’re using Cassandra through the Datastax Enterprise package in version 4.5.1 (Cassandra version 2.0.8.39) with 7 nodes in a single datacenter.

 

We need to move our Cassandra cluster from France to another country. To do this, we want to add a second 7-nodes datacenter to our cluster and stream all data between the two countries before dropping the first datacenter.

 

On January 31st, we tried doing so but we had some problems:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)

-          The following procedure has been followed: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html but an error occurred during step 3. New nodes have been started before the cassandra-topology.properties file has been updated on the original datacenter. New nodes appeared in the original datacenter instead of the new one.

-          To recover our original cluster, we decommissionned every node of the new datacenter with the nodetool decommission command.

 

On February 9th, nodes in the second datacenter have been restarted and joined the cluster. We had to decommission them just like before.

 

On February 11th, we added disk space on our 7 running French nodes. To achieve this, we restarted the cluster but the nodes updated their perring informations and nodes from Luxembourg (decommissionned on February 9th) were present. This behaviour is described here: https://issues.apache.org/jira/browse/CASSANDRA-7825. So we cleaned system.peers table content.

 

On March 11th, we needed to add an 8th node to our existing French cluster. We installed the same Datastax Enterprise version (4.5.1 with Cassandra 2.0.8.39) and tried to add this node to the cluster with this procedure: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html. In OPSCenter, the node was joining the cluster and data streaming got stuck at 100%. After several hours, nodetool status showed us that the node was still joining but nothing in the logs let us know there was a problem. We restarted the node but it has no effect. Then we cleaned data and commitlog contents and try to add the node to the cluster again but without result.

Last try was to add the node with auto_bootstrap : false in order to add the node to the cluster manually but it messed up with the data. So we shut down the node and decommissioned it (with nodetool removenode). The whole cluster has been repaired and we stopped doing anything.

 

Now, our cluster has only 7 French nodes in which we can’t add any node. The OPSCenter data has disapeared and we work without any information about how our cluster is running.

 

You’ll find attached to this email our current configuration and a screenshot of our OPSCenter metric page.

 

Do you have some idea on how to clean up the mess and get our cluster running cleanly before we start our migration (France to another country like described in the beginning of this email)?

 

Thank you.

 

Best regards,

 

David CHARBONNIER

Sysadmin

T : +33 411 934 200

[hidden email]

ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com

 

 

 



 

--

Fabien Rousseau


Reply | Threaded
Open this post in threaded view
|

Re: Problems after trying a migration

Robert Coli-3
In reply to this post by David CHARBONNIER
On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER <[hidden email]> wrote:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)


This is officially unsupported, and might cause of problems during this process.

=Rob
 
Reply | Threaded
Open this post in threaded view
|

Re: Problems after trying a migration

Robert Coli-3
On Wed, Mar 18, 2015 at 12:58 PM, Robert Coli <[hidden email]> wrote:
On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER <[hidden email]> wrote:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)


This is officially unsupported, and might cause of problems during this process.

As regards your other situation, I suggest joining #cassandra and pointing people there towards your summary and interactively discussing it with them. Mailing list lag is not best for operational issues. :)

=Rob 
Jan
Reply | Threaded
Open this post in threaded view
|

Re: Problems after trying a migration

Jan
In reply to this post by Robert Coli-3

Hi David; 

some input to get back to where you were : 
a) Start with the French cluster only and get it working with DSE 4.5.1 
b) Opscenter keyspace is by default RF1;   alter the keyspace to RF3 
c) Take a full snapshot of all your nodes & copy the files to a safe location on all the nodes 

To migrate the data into new cluster: 
a) Use the same version DSE 4.5.1 in Luxembourg & bring up 1 node at a time.    Check that the node has comeup in the new Datacenter.
b) Bring up new nodes into the new Datacenter one at a time
c) After all your new nodes are UP in Luxembourg, conduct a 'nodetool repair -parallel'    
d)  Check in OpsCenter that you have all your nodes showing up (new and old)
e) Start taking down your nodes in France, one at  a time
f) After all the nodes in France are down,  conduct a 'nodetool repair -parallel'  again 
g) Upgrade the nodes in Luxembourg to DSE 4.6.1 
h)  conduct a 'nodetool repair -parallel'  again 
i) Upgrade to  OpsCenter 5.1  

Best of luck,  hope this helps. 

Jan/





On Wednesday, March 18, 2015 1:01 PM, Robert Coli <[hidden email]> wrote:


On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER <[hidden email]> wrote:
-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)

This is officially unsupported, and might cause of problems during this process.

=Rob
 


Reply | Threaded
Open this post in threaded view
|

RE: Problems after trying a migration

David CHARBONNIER

Hi Jan,

 

Thank you for your help, we’ll see during next week.

 

Have a nice day.

 

Best regards,

 

David CHARBONNIER

Sysadmin

T : +33 411 934 200

[hidden email]

ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com

 

 

 

De : Jan [mailto:[hidden email]]
Envoyé : jeudi 19 mars 2015 05:09
À : [hidden email]
Objet : Re: Problems after trying a migration

 

 

Hi David; 

 

some input to get back to where you were : 

a)            Start with the French cluster only and get it working with DSE 4.5.1 

b)            Opscenter keyspace is by default RF1;   alter the keyspace to RF3 

c)            Take a full snapshot of all your nodes & copy the files to a safe location on all the nodes 

 

To migrate the data into new cluster: 

a)            Use the same version DSE 4.5.1 in Luxembourg & bring up 1 node at a time.    Check that the node has comeup in the new Datacenter.

b)            Bring up new nodes into the new Datacenter one at a time

c)            After all your new nodes are UP in Luxembourg, conduct a 'nodetool repair -parallel'    

d)  Check in OpsCenter that you have all your nodes showing up (new and old)

e)            Start taking down your nodes in France, one at  a time

f)             After all the nodes in France are down,  conduct a 'nodetool repair -parallel'  again 

g)            Upgrade the nodes in Luxembourg to DSE 4.6.1 

h)  conduct a 'nodetool repair -parallel'  again 

i)             Upgrade to  OpsCenter 5.1  

 

Best of luck,  hope this helps. 

 

Jan/

 

 

 

 

On Wednesday, March 18, 2015 1:01 PM, Robert Coli <[hidden email]> wrote:

 

On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER <[hidden email]> wrote:

-          New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country)

 

This is officially unsupported, and might cause of problems during this process.

 

=Rob