Tuesday, December 20, 2016

Married With Children - Why Nested Documents are the absolute Worst Feature that Solr released

Nested Documents is a relative new feature in Solr, and you should NEVER EVER USE IT.


So what are nested documents? It's a relative new feature in Solr that lets a user to create documents inside documents and get a hierarchical pattern. It is a convenient way to store data that is relevant only to the parent. A 1:N relationship.
How is it done? Solr doesn’t really have this hierarchical pattern in place. It has a flat structure where all the children or parents are equal. But what it is doing behind the scenes is placing the parents and the children inside the same block. A block can have only one parent and it is referred to by a variable called _root_. This variable will have always the id of the parent. By querying it, a user will see the parent and all of its children.

All of that is nice and everything. BUT it is the absolute worst feature that Solr released. And I'll explain why:
just some exposition: In our site we have this exact hierarchical pattern that we wanted to index: a gig -  service that a seller can offer and inside of it there are packages. so gig will be the parent and the packages will be the children. very intuitive.

But then the horror began:

DUPLICATES
When creating the block and _root_ variable, Solr broke the uniqueness by id, that it has in its spec. There could be 2 documents with the same id. One has children and the other doesn’t. One will be saved inside a block and one will not. But the end result will be 2 documents.
For example, docs: {id:123 } and {id:123, '_childDocuments_’: [{id: 123_1}, {id: 123_2} ]} will create 2 documents(!) The new way of looking for unique documents is by its _root_ variable and its id. So in the example, a document without children will have _root_:nil, id:123, and a document with children will have _root_:123, id:123.
I mean - what the hell? why? So after we first launched the feature, we got  a million duplicates. and no one knows why. and of course there is no documentation, what so ever. We only see it after in Stack verflow and open issues. So we changed our environment to always index with children.

UPDATES
When updating a document with children, a user must update the document itself with children at all times. Updating it without its children, will cause the same duplicate ids behavior, because it will be created with no block. And that really sucks. Think about an architecture of indexing where now you need to change everything to support it to update with children.

ATOMIC UPDATES
You can’t use atomic updates when having children documents. Even when having the children inside the atomic updates, it won’t work. The atomic update will cause an UNSTABLE behavior where one time it’ll be saved with children and one time it won’t. It will create an even more unsettling behavior where children will “stick” to the next parent in the next block. So another parent might have more than the children it had, before the operation happened (!!)
So think about this situation: we have a lot of atomic updates happening. we check them. everything looks good. Then, we go to production. Suddenly millions of children are switching parents. Why? What the f*ck Solr? There seems to be some race condition there that causes this weirdness.But anyway, all of our environment is now using updates instead of atomic updates - where the latter is so much better. and that (again) sucks.

DELETING DOCUMENTS
A user must delete first the children and then the parents. Deleting by id a parent, should be replaced by deleting by _root_. Deleting by query should first delete all children, and then deleting all parents.
Deleting only parents will create an unstable behavior of orphans children. Orphan children will too “stick” to the next parent in the next block. So again, think about it - someone accidentally deletes a parent, and its children won't disappear. But not only that - they go to a near by parent(!) WTF? how could that be?

SO WHY DID WE END UP USING IT F?
 So why do we use them? We invested a lot of time in having our environment with children. We had a perfect honey trap, where each time we saw the problem a head and said "hmm... that will take about a week to fix. better to just do that, instead of reverting everything - that will take 2-3 weeks". But each time we found another new and exciting bug.

So beware - it's too late for us. But you can still save yourself.
Our environment is pretty much stable now. But still - from time to time, we get parent dups/ children dups/ orphans/ parents with mismatched children. We have so many scrips and jobs to fix this fragile situation, but still - it's just too unstable.

REFERENCES

https://issues.apache.org/jira/browse/SOLR-6596

http://stackoverflow.com/questions/5584857/solr-documents-with-child-elements

http://yonik.com/solr-nested-objects/

https://medium.com/@alisazhila/solr-s-nesting-on-solr-s-capabilities-to-handle-deeply-nested-document-structures-50eeaaa4347a#.ras3cfw1v

https://people.apache.org/~hossman/bbuzz2014/