VMs versus Physical machines

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

VMs versus Physical machines

Shahab Yunus
Hello,

We are deciding whether to get VMs or physical machines for a Cassandra cluster. I know this is a very high-level question depending on lots of factors and in fact I want to know that how to tackle this is and what factors should we take into consideration while trying to find the answer.

Data size? Writing speed (whether write heavy usecases or not)? Random ead use-cases? column family design/how we store data? 

Any pointers, documents, guidance, advise would be appreciated.

Thanks a lot.

Regards,
Shahab
Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Aaron Turner
Physical machines unless you're running your cluster in the cloud (AWS/etc).

Reason is simple: Look how Cassandra scales and provides redundancy.  

Aaron Turner
http://synfin.net/         Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin



On Wed, Sep 11, 2013 at 4:21 PM, Shahab Yunus <[hidden email]> wrote:
Hello,

We are deciding whether to get VMs or physical machines for a Cassandra cluster. I know this is a very high-level question depending on lots of factors and in fact I want to know that how to tackle this is and what factors should we take into consideration while trying to find the answer.

Data size? Writing speed (whether write heavy usecases or not)? Random ead use-cases? column family design/how we store data? 

Any pointers, documents, guidance, advise would be appreciated.

Thanks a lot.

Regards,
Shahab

Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Shahab Yunus
Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we don't go the physical route.

" Look how Cassandra scales and provides redundancy.  "
But how does it differ for physical machines or VMs (in cloud.) Or after your first comment, are you saying that there is no difference whether we use physical or VMs (in cloud)?

Regards,
Shahab


On Wed, Sep 11, 2013 at 7:34 PM, Aaron Turner <[hidden email]> wrote:
Physical machines unless you're running your cluster in the cloud (AWS/etc).

Reason is simple: Look how Cassandra scales and provides redundancy.  

Aaron Turner
http://synfin.net/         Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. 
    -- Benjamin Franklin



On Wed, Sep 11, 2013 at 4:21 PM, Shahab Yunus <[hidden email]> wrote:
Hello,

We are deciding whether to get VMs or physical machines for a Cassandra cluster. I know this is a very high-level question depending on lots of factors and in fact I want to know that how to tackle this is and what factors should we take into consideration while trying to find the answer.

Data size? Writing speed (whether write heavy usecases or not)? Random ead use-cases? column family design/how we store data? 

Any pointers, documents, guidance, advise would be appreciated.

Thanks a lot.

Regards,
Shahab


Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Robert Coli-3
On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus <[hidden email]> wrote:
But how does it differ for physical machines or VMs (in cloud.) Or after your first comment, are you saying that there is no difference whether we use physical or VMs (in cloud)?

Physical will always outperform virtual. He's just saying don't buy one big physical box, virtualize on it, and then run cassandra on those VMs.

=Rob 
Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Aaron Turner
In reply to this post by Shahab Yunus




On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus <[hidden email]> wrote:
Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we don't go the physical route.

" Look how Cassandra scales and provides redundancy.  "
But how does it differ for physical machines or VMs (in cloud.) Or after your first comment, are you saying that there is no difference whether we use physical or VMs (in cloud)?

They're different, but both can and do work... VM's just require more virtual servers then going the physical route.

Sorry, but without you providing any actual information about your needs all you're going to get is generalizations and hand-waving.



Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Shahab Yunus
I admit about missing details. Sorry for that. The thing is that I was looking for guidance at the high-level so we can then sort out myself what fits our requirements and use-cases (mainly because we are at the stage that they could be molded according to hardware and software limitations/features.) So, for example if it is recommended that ' for heavy reads physical is better etc.')

Anyway, just to give you a quick recap:
1- Cassandra 1.2.8
2- Row is a unique userid and can have one or more columns. Every cell is basically a blob of data (using Avro.) All information is in this one table. No joins or other access patters.
3- Writes can be both in bulk (which will of course has less strict performance requirements) or real-time. All writes would be at the per userid, hence, row level and constitute of adding new rows (of course with some column values) or updating specific cells (column) of the existing row.
4- Reads are per userid i.e. row and 90% of the time random reads for a user. Rather than in bulk. 
5- Both reads and write interfaces are exposed through REST service as well as direct Java client API.
6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a time.

Regards,
Shahab




On Thu, Sep 12, 2013 at 1:51 AM, Aaron Turner <[hidden email]> wrote:




On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus <[hidden email]> wrote:
Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we don't go the physical route.

" Look how Cassandra scales and provides redundancy.  "
But how does it differ for physical machines or VMs (in cloud.) Or after your first comment, are you saying that there is no difference whether we use physical or VMs (in cloud)?

They're different, but both can and do work... VM's just require more virtual servers then going the physical route.

Sorry, but without you providing any actual information about your needs all you're going to get is generalizations and hand-waving.




Reply | Threaded
Open this post in threaded view
|

Re: VMs versus Physical machines

Aaron Turner


On Thu, Sep 12, 2013 at 5:42 AM, Shahab Yunus <[hidden email]> wrote:
I admit about missing details. Sorry for that. The thing is that I was looking for guidance at the high-level so we can then sort out myself what fits our requirements and use-cases (mainly because we are at the stage that they could be molded according to hardware and software limitations/features.) So, for example if it is recommended that ' for heavy reads physical is better etc.')

Anyway, just to give you a quick recap:
1- Cassandra 1.2.8
2- Row is a unique userid and can have one or more columns. Every cell is basically a blob of data (using Avro.) All information is in this one table. No joins or other access patters.
3- Writes can be both in bulk (which will of course has less strict performance requirements) or real-time. All writes would be at the per userid, hence, row level and constitute of adding new rows (of course with some column values) or updating specific cells (column) of the existing row.
4- Reads are per userid i.e. row and 90% of the time random reads for a user. Rather than in bulk. 
5- Both reads and write interfaces are exposed through REST service as well as direct Java client API.
6- Reads and writes, as mentioned in 3&4 can be for 1 or more columns at a time.

Regards,
Shahab


Your total data set size and number of reads/writes per-second are the important things here.  Also how sensitive are you to latency spikes (which tends to happen with VM's)?

Long story short, the safest option is always physical IMHO.  Use VM/cloud if you need to use VM/cloud for some reason (like all the other servers talking to Cassandra are also in AWS for example).  Cloud can work (Netflix uses Cassandra on AWS), but your performance will be a lot more consistent on physical hardware and Cassandra like all databases likes lots of RAM (although this can be offset some with SSD's) which tends to be expensive in the cloud.




-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety.  
    -- Benjamin Franklin