Arbitrary nested tree hierarchy data model

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Arbitrary nested tree hierarchy data model

List
Not sure if this is the right place to ask, but we are trying to model a
user-generated tree hierarchy in which they create child objects of a
root node, and can create an arbitrary number of children (and children
of children, and on and on).  So far we have looked at storing each tree
structure as a single document in JSON format and reading/writing it out
in it's entirety, doing materialized paths where we store the root id
with every child and the tree structure above the child as a map, and
some form of an adjacency list (which does not appear to be very viable
as looking up the entire tree would be ridiculous).

The hope is to end up with a data model that allows us to display the
entire tree quickly, as well as see the entire path to a leaf when
selecting that leaf.  If anyone has some suggestions/experience on how
to model such a tree heirarchy we would greatly appreciate your input.

Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Robert Wille-2
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>

Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Fabian Siddiqi
Hi Robert,

We're trying to do something similar to the OP and finding it a bit difficult. Would it be possible to provide more details about how you're doing it?

Thanks.

On Fri, Mar 27, 2015 at 3:15 AM, Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--
Fabian Siddiqi
Software Engineer
T: <a href="tel:%28%2B44%29%20776%20335%201398" value="+447763351398" target="_blank">(+44) 776 335 1398
Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

List
In reply to this post by Robert Wille-2
On 3/26/15 10:15 PM, Robert Wille wrote:

> I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.
>
> Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.
>
> On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:
>
>> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>>
>> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>>
>

Robert,

This certainly sounds like a step in the right direction so yes please
do share!  Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Jonathan Haddad
In reply to this post by Robert Wille-2
I'd be interested to see that data model. I think the entire list would benefit!
On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>

Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Ben Bromhead
+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad <[hidden email]> wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692

Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Jack Krupansky-2
Hmmm... If you serialize the tree properly in a partition, you could always read an entire sub-tree as a single slice (consecutive CQL rows.) Is there much more to it?

-- Jack Krupansky

On Fri, Mar 27, 2015 at 7:35 PM, Ben Bromhead <[hidden email]> wrote:
+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad <[hidden email]> wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692


Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Robert Wille-2
In reply to this post by Ben Bromhead
Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through.

First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkLqDmI. What you see looks like one big tree, but actually is a combination of trees spliced together. There is one tree in a relational database that forms what I call the top-level browse. The top-level browse is used to navigate through categories until you arrive at a publication. When you drill down into a publication, you are then viewing data stored in Cassandra. The link provided above points to the root of a publication (in this case, maps from the Civil War), so to the left is top-level browse coming from MySQL, and to the right is the Cassandra browse. Each publication has an independent tree in Cassandra, with all trees stored in the same set of tables (I do not dynamically create tables for each publication — I personally think that’s a bad practice). We currently have 458 publications, and collectively they have about half a billion nodes and consume about 400 GB (RF=3).

My trees are immutable. When there are changes to a publication (e.g. adding new documents), it is very difficult to know what changes need to be made to the tree to edit it in-place. Also, it would be impossible to maintain navigational consistency while a tree is in process of being restructured. So, when a publication changes, I create a completely new tree. Once the new tree is built, I change a reference to point to the new tree. I have a process that nightly pages through the tables and deletes records that belong to obsolete trees. This process takes about five hours. If it were much longer than that, I would probably run it weekly. My database has quite a bit of churn, which is fairly unusual for a Cassandra-based application. Most nights build two or three trees, generally resulting in a few tens of millions of new records and a slightly fewer number of deletions. Size-tiered compaction is a bad choice for churn, so I use leveled compaction. Most publications are at most a few million nodes, and generally build in less than 20 minutes.

Since any modeling exercise requires knowing the queries, I should describe that before getting into the model. Here are the features I need to support. For browsing the tree, I need to be able to get the children of a node (paginated), the siblings of a node (also paginated), and the ancestors of a node. The leaves of each tree are images and form a filmstrip. You can use the filmstrip to navigate through all the images in a publication in the tree’s natural order. If you go to my browse page and keep drilling down, you’ll eventually get to an image. The filmstrip appears at the bottom of the image viewer.

Before I discuss the schema, I should discuss a couple of other non-obvious things that are relevant to the data model. One very common operation is to retrieve a node and all of its ancestors in order to display a path. Denormalization would suggest that I store the data for each node, along with that of all of its ancestors. That would mean that in my biggest tree, I would store the root node 180 million times. I didn’t consider that kind of bloat to be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a node and its ancestors in constant time, rather than O(n) as would be typical for tree traversal. In order to accomplish this, I use a pretty unique idea for a node's primary key. I create a hash from information in the node, and then append it to the hash of its parent. So, the primary key is really a path. When I need to retrieve a node and its ancestors, I tokenize the path and issue queries in parallel to get all the nodes in the ancestry at the same time. In keeping with this pattern of not denormalizing, my auxiliary tables do not have node data in them, but instead provide a means of getting hash paths, which I then tokenize and make parallel requests with. Most requests that use an auxiliary table can generally just make a query to the auxiliary table to get the hash path, and then retrieve the node and its ancestors from the node table. Three or fewer trips to Cassandra are sufficient for all my API’s.

Without further ado, here’s my schema (with commentary):

CREATE TABLE tree (
tree INT, 
pub INT,
rhpath VARCHAR, 
atime TIMESTAMP,
ccount INT,
ncount INT,
PRIMARY KEY (tree)
) WITH gc_grace_seconds = 864000;

This table maintains the references to the root nodes for each tree. pub is the primary key for the publication table in my relational database. There is usually just one record for each publication. When a tree is being built (and until the old one is retired), a publication may have more than one tree. This table is small (458 records), and I cache it in memory and maintain inverted indexes. Not caching this table would result in one additional trip to Cassandra for all API’s. Each tree is assigned a random tree ID (which is an INT instead of UUID for reasons beyond this discussion). All my tables have a tree ID in them so I can know which tree each node belongs to, since the hash path is not unique.

CREATE TABLE node (
hpath VARCHAR, 
tree INT,
node VARCHAR,
ccount INT,
PRIMARY KEY (hpath, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This is the main table that holds all the text and ordinal information for all my nodes. This table contains the lion’s share of the data in my cluster. The node column is a JSON document. ccount is the number of child nodes.

CREATE TABLE path_by_page (
page BIGINT, 
tree INT,
hpath VARCHAR,
pord INT,
ord INT,
PRIMARY KEY (page, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the hash path for any image. page is the primary key of the page from my relational database. pord is the image’s ordinal in the publication filmstrip. ord is the page’s ordinal amongst its siblings.

CREATE TABLE path_by_pub (
tree INT,
bucket INT,
pord INT,
ord INT,
hpath VARCHAR,
page BIGINT,
PRIMARY KEY ((tree, bucket), pord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to do the filmstrip. pord is what I key off of to paginate.

CREATE TABLE path_by_parent (
phpath VARCHAR,
bucket INT,
tree INT,
ord INT,
hpath VARCHAR,
PRIMARY KEY ((phpath, bucket, tree), ord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the children for a node. ord is the node's ordinal within its siblings. It is what I key off of to paginate. The link I provided above is to a publication that has a fanout of 1 for the leaf node’s parent nodes, so isn’t very interesting (although the content is very interesting). Here’s a more interesting node that has bigger fanout: http://www.fold3.com/browse.php#246|hrRUXqv6Rj6GFxKAo. And finally, here’s a node with a fanout of 378622: http://www.fold3.com/browse.php#1|hhqJwp03TQBwCAyoD.

As long as this post is, it probably wasn’t enough to fully understand everything I do with my schema. I have dozens of queries. If anyone would like me to dig a little deeper, I’d be happy to. Just email me.

Robert 

On Mar 27, 2015, at 5:35 PM, Ben Bromhead <[hidden email]> wrote:

+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad <[hidden email]> wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | (650) 284 9692


Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

daemeon reiydelle

Fascinating. Both the mysql front and and this delightful inverted search solution. Your creativity makes me wonder what other delights your query solutions might expose!!

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872

On Mar 27, 2015 7:08 PM, "Robert Wille" <[hidden email]> wrote:
Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through.

First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkLqDmI. What you see looks like one big tree, but actually is a combination of trees spliced together. There is one tree in a relational database that forms what I call the top-level browse. The top-level browse is used to navigate through categories until you arrive at a publication. When you drill down into a publication, you are then viewing data stored in Cassandra. The link provided above points to the root of a publication (in this case, maps from the Civil War), so to the left is top-level browse coming from MySQL, and to the right is the Cassandra browse. Each publication has an independent tree in Cassandra, with all trees stored in the same set of tables (I do not dynamically create tables for each publication — I personally think that’s a bad practice). We currently have 458 publications, and collectively they have about half a billion nodes and consume about 400 GB (RF=3).

My trees are immutable. When there are changes to a publication (e.g. adding new documents), it is very difficult to know what changes need to be made to the tree to edit it in-place. Also, it would be impossible to maintain navigational consistency while a tree is in process of being restructured. So, when a publication changes, I create a completely new tree. Once the new tree is built, I change a reference to point to the new tree. I have a process that nightly pages through the tables and deletes records that belong to obsolete trees. This process takes about five hours. If it were much longer than that, I would probably run it weekly. My database has quite a bit of churn, which is fairly unusual for a Cassandra-based application. Most nights build two or three trees, generally resulting in a few tens of millions of new records and a slightly fewer number of deletions. Size-tiered compaction is a bad choice for churn, so I use leveled compaction. Most publications are at most a few million nodes, and generally build in less than 20 minutes.

Since any modeling exercise requires knowing the queries, I should describe that before getting into the model. Here are the features I need to support. For browsing the tree, I need to be able to get the children of a node (paginated), the siblings of a node (also paginated), and the ancestors of a node. The leaves of each tree are images and form a filmstrip. You can use the filmstrip to navigate through all the images in a publication in the tree’s natural order. If you go to my browse page and keep drilling down, you’ll eventually get to an image. The filmstrip appears at the bottom of the image viewer.

Before I discuss the schema, I should discuss a couple of other non-obvious things that are relevant to the data model. One very common operation is to retrieve a node and all of its ancestors in order to display a path. Denormalization would suggest that I store the data for each node, along with that of all of its ancestors. That would mean that in my biggest tree, I would store the root node 180 million times. I didn’t consider that kind of bloat to be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a node and its ancestors in constant time, rather than O(n) as would be typical for tree traversal. In order to accomplish this, I use a pretty unique idea for a node's primary key. I create a hash from information in the node, and then append it to the hash of its parent. So, the primary key is really a path. When I need to retrieve a node and its ancestors, I tokenize the path and issue queries in parallel to get all the nodes in the ancestry at the same time. In keeping with this pattern of not denormalizing, my auxiliary tables do not have node data in them, but instead provide a means of getting hash paths, which I then tokenize and make parallel requests with. Most requests that use an auxiliary table can generally just make a query to the auxiliary table to get the hash path, and then retrieve the node and its ancestors from the node table. Three or fewer trips to Cassandra are sufficient for all my API’s.

Without further ado, here’s my schema (with commentary):

CREATE TABLE tree (
tree INT, 
pub INT,
rhpath VARCHAR, 
atime TIMESTAMP,
ccount INT,
ncount INT,
PRIMARY KEY (tree)
) WITH gc_grace_seconds = 864000;

This table maintains the references to the root nodes for each tree. pub is the primary key for the publication table in my relational database. There is usually just one record for each publication. When a tree is being built (and until the old one is retired), a publication may have more than one tree. This table is small (458 records), and I cache it in memory and maintain inverted indexes. Not caching this table would result in one additional trip to Cassandra for all API’s. Each tree is assigned a random tree ID (which is an INT instead of UUID for reasons beyond this discussion). All my tables have a tree ID in them so I can know which tree each node belongs to, since the hash path is not unique.

CREATE TABLE node (
hpath VARCHAR, 
tree INT,
node VARCHAR,
ccount INT,
PRIMARY KEY (hpath, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This is the main table that holds all the text and ordinal information for all my nodes. This table contains the lion’s share of the data in my cluster. The node column is a JSON document. ccount is the number of child nodes.

CREATE TABLE path_by_page (
page BIGINT, 
tree INT,
hpath VARCHAR,
pord INT,
ord INT,
PRIMARY KEY (page, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the hash path for any image. page is the primary key of the page from my relational database. pord is the image’s ordinal in the publication filmstrip. ord is the page’s ordinal amongst its siblings.

CREATE TABLE path_by_pub (
tree INT,
bucket INT,
pord INT,
ord INT,
hpath VARCHAR,
page BIGINT,
PRIMARY KEY ((tree, bucket), pord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to do the filmstrip. pord is what I key off of to paginate.

CREATE TABLE path_by_parent (
phpath VARCHAR,
bucket INT,
tree INT,
ord INT,
hpath VARCHAR,
PRIMARY KEY ((phpath, bucket, tree), ord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the children for a node. ord is the node's ordinal within its siblings. It is what I key off of to paginate. The link I provided above is to a publication that has a fanout of 1 for the leaf node’s parent nodes, so isn’t very interesting (although the content is very interesting). Here’s a more interesting node that has bigger fanout: http://www.fold3.com/browse.php#246|hrRUXqv6Rj6GFxKAo. And finally, here’s a node with a fanout of 378622: http://www.fold3.com/browse.php#1|hhqJwp03TQBwCAyoD.

As long as this post is, it probably wasn’t enough to fully understand everything I do with my schema. I have dozens of queries. If anyone would like me to dig a little deeper, I’d be happy to. Just email me.

Robert 

On Mar 27, 2015, at 5:35 PM, Ben Bromhead <[hidden email]> wrote:

+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad <[hidden email]> wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692


Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Peter Lin
In reply to this post by Robert Wille-2
that's neat, thanks for sharing.

sounds like the solution is partly inspired by merkle tree to make lookup fast and easy.

peter

On Fri, Mar 27, 2015 at 10:07 PM, Robert Wille <[hidden email]> wrote:
Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through.

First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkLqDmI. What you see looks like one big tree, but actually is a combination of trees spliced together. There is one tree in a relational database that forms what I call the top-level browse. The top-level browse is used to navigate through categories until you arrive at a publication. When you drill down into a publication, you are then viewing data stored in Cassandra. The link provided above points to the root of a publication (in this case, maps from the Civil War), so to the left is top-level browse coming from MySQL, and to the right is the Cassandra browse. Each publication has an independent tree in Cassandra, with all trees stored in the same set of tables (I do not dynamically create tables for each publication — I personally think that’s a bad practice). We currently have 458 publications, and collectively they have about half a billion nodes and consume about 400 GB (RF=3).

My trees are immutable. When there are changes to a publication (e.g. adding new documents), it is very difficult to know what changes need to be made to the tree to edit it in-place. Also, it would be impossible to maintain navigational consistency while a tree is in process of being restructured. So, when a publication changes, I create a completely new tree. Once the new tree is built, I change a reference to point to the new tree. I have a process that nightly pages through the tables and deletes records that belong to obsolete trees. This process takes about five hours. If it were much longer than that, I would probably run it weekly. My database has quite a bit of churn, which is fairly unusual for a Cassandra-based application. Most nights build two or three trees, generally resulting in a few tens of millions of new records and a slightly fewer number of deletions. Size-tiered compaction is a bad choice for churn, so I use leveled compaction. Most publications are at most a few million nodes, and generally build in less than 20 minutes.

Since any modeling exercise requires knowing the queries, I should describe that before getting into the model. Here are the features I need to support. For browsing the tree, I need to be able to get the children of a node (paginated), the siblings of a node (also paginated), and the ancestors of a node. The leaves of each tree are images and form a filmstrip. You can use the filmstrip to navigate through all the images in a publication in the tree’s natural order. If you go to my browse page and keep drilling down, you’ll eventually get to an image. The filmstrip appears at the bottom of the image viewer.

Before I discuss the schema, I should discuss a couple of other non-obvious things that are relevant to the data model. One very common operation is to retrieve a node and all of its ancestors in order to display a path. Denormalization would suggest that I store the data for each node, along with that of all of its ancestors. That would mean that in my biggest tree, I would store the root node 180 million times. I didn’t consider that kind of bloat to be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a node and its ancestors in constant time, rather than O(n) as would be typical for tree traversal. In order to accomplish this, I use a pretty unique idea for a node's primary key. I create a hash from information in the node, and then append it to the hash of its parent. So, the primary key is really a path. When I need to retrieve a node and its ancestors, I tokenize the path and issue queries in parallel to get all the nodes in the ancestry at the same time. In keeping with this pattern of not denormalizing, my auxiliary tables do not have node data in them, but instead provide a means of getting hash paths, which I then tokenize and make parallel requests with. Most requests that use an auxiliary table can generally just make a query to the auxiliary table to get the hash path, and then retrieve the node and its ancestors from the node table. Three or fewer trips to Cassandra are sufficient for all my API’s.

Without further ado, here’s my schema (with commentary):

CREATE TABLE tree (
tree INT, 
pub INT,
rhpath VARCHAR, 
atime TIMESTAMP,
ccount INT,
ncount INT,
PRIMARY KEY (tree)
) WITH gc_grace_seconds = 864000;

This table maintains the references to the root nodes for each tree. pub is the primary key for the publication table in my relational database. There is usually just one record for each publication. When a tree is being built (and until the old one is retired), a publication may have more than one tree. This table is small (458 records), and I cache it in memory and maintain inverted indexes. Not caching this table would result in one additional trip to Cassandra for all API’s. Each tree is assigned a random tree ID (which is an INT instead of UUID for reasons beyond this discussion). All my tables have a tree ID in them so I can know which tree each node belongs to, since the hash path is not unique.

CREATE TABLE node (
hpath VARCHAR, 
tree INT,
node VARCHAR,
ccount INT,
PRIMARY KEY (hpath, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This is the main table that holds all the text and ordinal information for all my nodes. This table contains the lion’s share of the data in my cluster. The node column is a JSON document. ccount is the number of child nodes.

CREATE TABLE path_by_page (
page BIGINT, 
tree INT,
hpath VARCHAR,
pord INT,
ord INT,
PRIMARY KEY (page, tree)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the hash path for any image. page is the primary key of the page from my relational database. pord is the image’s ordinal in the publication filmstrip. ord is the page’s ordinal amongst its siblings.

CREATE TABLE path_by_pub (
tree INT,
bucket INT,
pord INT,
ord INT,
hpath VARCHAR,
page BIGINT,
PRIMARY KEY ((tree, bucket), pord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to do the filmstrip. pord is what I key off of to paginate.

CREATE TABLE path_by_parent (
phpath VARCHAR,
bucket INT,
tree INT,
ord INT,
hpath VARCHAR,
PRIMARY KEY ((phpath, bucket, tree), ord)
) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };

This table allows me to get the children for a node. ord is the node's ordinal within its siblings. It is what I key off of to paginate. The link I provided above is to a publication that has a fanout of 1 for the leaf node’s parent nodes, so isn’t very interesting (although the content is very interesting). Here’s a more interesting node that has bigger fanout: http://www.fold3.com/browse.php#246|hrRUXqv6Rj6GFxKAo. And finally, here’s a node with a fanout of 378622: http://www.fold3.com/browse.php#1|hhqJwp03TQBwCAyoD.

As long as this post is, it probably wasn’t enough to fully understand everything I do with my schema. I have dozens of queries. If anyone would like me to dig a little deeper, I’d be happy to. Just email me.

Robert 

On Mar 27, 2015, at 5:35 PM, Ben Bromhead <[hidden email]> wrote:

+1 would love to see how you do it

On 27 March 2015 at 07:18, Jonathan Haddad <[hidden email]> wrote:
I'd be interested to see that data model. I think the entire list would benefit!

On Thu, Mar 26, 2015 at 8:16 PM Robert Wille <[hidden email]> wrote:
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and siblings. It works really well.

Doesn’t sound like its exactly like what you’re looking for, but if you want any pointers on how I went about implementing mine, I’d be happy to share.

On Mar 26, 2015, at 3:05 PM, List <[hidden email]> wrote:

> Not sure if this is the right place to ask, but we are trying to model a user-generated tree hierarchy in which they create child objects of a root node, and can create an arbitrary number of children (and children of children, and on and on).  So far we have looked at storing each tree structure as a single document in JSON format and reading/writing it out in it's entirety, doing materialized paths where we store the root id with every child and the tree structure above the child as a map, and some form of an adjacency list (which does not appear to be very viable as looking up the entire tree would be ridiculous).
>
> The hope is to end up with a data model that allows us to display the entire tree quickly, as well as see the entire path to a leaf when selecting that leaf.  If anyone has some suggestions/experience on how to model such a tree heirarchy we would greatly appreciate your input.
>




--

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr | <a href="tel:%28650%29%20284%209692" value="+16502849692" target="_blank">(650) 284 9692



Reply | Threaded
Open this post in threaded view
|

Re: Arbitrary nested tree hierarchy data model

Robert Wille-2
In reply to this post by Robert Wille-2
Ben Bromhead sent an email to me directly and expressed an interest in seeing some of my queries. I may as well post them for everyone. Here are my queries for the part of my code that reads and cleans up browse trees.

@NamedCqlQueries({
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CHECK_TREE_EXISTS,
query = "SELECT tree FROM tree WHERE tree = :tree",
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_TREES,
query = "SELECT tree, atime, pub, rhpath FROM tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_TREE,
query = "SELECT tree FROM tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_NODE,
query = "SELECT hpath, tree FROM node",
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PAGE,
query = "SELECT page, tree FROM path_by_page"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PUB,
query = "SELECT distinct tree, bucket FROM path_by_pub"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_CHILD,
query = "SELECT distinct phpath, bucket, tree FROM path_by_parent"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_TREE,
query = "DELETE FROM tree WHERE tree IN :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_NODE,
query = "DELETE FROM node WHERE hpath IN :hpath AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PAGE
query = "DELETE FROM path_by_page WHERE page IN :page AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PUB,
query = "DELETE FROM path_by_pub WHERE tree = :tree AND bucket IN :bucket"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_CHILD,
query = "DELETE FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree IN :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_MAX_ORDINAL,
query = "SELECT pord FROM path_by_pub WHERE tree = :tree AND bucket = :bucket ORDER BY pord DESC"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_PAGE,
query = "SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_PAGE_ALL_TREES,
query = "SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NODE,
query = "SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NODE_ALL_TREES,
query = "SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_TREE_FOR_HASHPATH,
query = "SELECT tree, node FROM node WHERE hpath = :hpath"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_CHILDREN,
query = "SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree = :tree AND ord >= :ord"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_ALL_CHILDREN,
query = "SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NEIGHBORS_NEXT,
query = "SELECT hpath FROM path_by_pub WHERE tree = :tree AND bucket = :bucket AND pord > :pord ORDER BY pord"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_NEIGHBORS_PREVIOUS,
query = "SELECT hpath FROM path_by_pub WHERE tree = :tree AND bucket = :bucket AND pord < :pord ORDER BY pord DESC"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_HASHPATHS_FOR_TREE,
query = "SELECT hpath FROM path_by_pub WHERE tree = :tree AND bucket = :bucket ORDER BY pord"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_GET_PAGE_INFO_FOR_TREE,
query = "SELECT page, hpath, ord, pord FROM path_by_pub WHERE tree = :tree AND bucket = :bucket ORDER BY pord"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_ONE
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_DELETE_NODE,
query = "DELETE FROM node WHERE hpath = :hpath AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_DELETE_PAGE,
query = "DELETE FROM path_by_page WHERE page = :page AND tree = :tree"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
),
@NamedCqlQuery(
name = DocumentBrowseDaoImpl.Q_DELETE_PUB,
query = "DELETE FROM path_by_pub WHERE tree = :tree AND bucket = :bucket AND pord = :pord"
keyspace = KeyspaceFamilyImpl.BROWSE,
consistencyLevel = ConsistencyLevel.LOCAL_QUORUM
)
})

The annotations are inspired by JPA, but the design and implementation are my own. It allows me execute statements with code that looks like this:

ResultSet rs = cassandraUtil.executeNamedQuery(Q_GET_CHILDREN, new StatementOption(StatementOption.Type.FETCH_SIZE, count), 
"phpath", parent.getHashPath(), 
"bucket", getBucket(start), 
"tree", parent.getTreeId(), 
"ord", start
);

cassandraUtil executes statements synchronously or asynchronously, logs errors and queries that take a long time, and creates histograms of query times for each statement. It also provides a convenient way of limiting the number of concurrent asynchronous queries and aggregating the results of parallel queries. It took some time to write, but is really nice. And in case you’re wondering, sorry, but I won’t donate it to the open-source community. My employer has onerous policies governing that, and it’s not really worth the hassle or my time.

A number of you seemed to have appreciated seeing my data model. I have a number of upcoming projects to migrate data from MySQL to Cassandra that I’ll be working on over the next several months. If I have any modeling exercises that seem unique or particularly clever, I’ll post them for all to enjoy.

Robert

On Mar 28, 2015, at 4:41 PM, Ben Bromhead <[hidden email]> wrote:

Thanks for that Robert. I would love to hear more (e.g. various queries etc) and even clean it up and publish it on instaclustr.com

We are starting to work with some of our customers as well as those in the community to collect solutions to common problems that people have implemented in Cassandra and publish it in a single easily accessible format. 

The other people that would probably love to see more on this would be the folk over at planet cassandra (http://planetcassandra.org/).

Cheers

Ben