Why Large Hadron Collider Scientists are Using CouchDB

nivertech · on Aug 26, 2010

It's most likely just handles metadata, aggregated data or serves as a proxy to their main DB.

I just can't believe they converted 10 PB of binary data into JSON ;)

sroecker · on Aug 26, 2010

Yep. MongoDB for example is used for their Dataset Bookkeeping Service. JSON is indeed used but only for 'good run' lists.

simonw · on Aug 27, 2010

CouchDB has a feature for attaching binary attachments to records.

Tichy · on Aug 26, 2010

Well, storage is basically free these days, right? :-)

rb2k_ · on Aug 26, 2010

I have no idea why they don't use something dynamo based like Riak or Cassandra which actually can automatically shard data and scale really well... CouchDB only supports replication, no sharding unless you use third party libraries. With Riak they could use the internal map/reduce support and with Cassandra they might be able to use Hadoop to analyze data and split the workload over several nodes.

nborwankar · on Aug 26, 2010

If you read the case study you will see that building apps rapidly was a huge win for them. CouchDB gets compared with other DB's but it is a full stack app dev environment. The other NoSQL DB's need a Rails or PHP or Python or Java or C# or ... stack in front and app dev is nowhere as fast as with CouchApp.

rb2k_ · on Aug 27, 2010

Riak e.g. does also have a REST interface (https://wiki.basho.com/display/RIAK/REST+API). Could you elaborate what else ist part of the "full stack app dev"? Futon? p.s. please don't use "Rails" when you mean "Ruby". Kittens are dying everytime!

ergo98 · on Aug 26, 2010

They use it to store aggregated reports for distribution and analysis. It isn't doing any heavy lifting, and it's not a terribly compelling case study of anything.

theBobMcCormick · on Aug 27, 2010

Wouldn't it be a good case study in rapid development of a web based aggregate reporting portal? :-)

jcsalterego · on Aug 26, 2010

I hope they didn't upgrade to 1.0.0.

jchrisa · on Aug 26, 2010

For the record I should note that CouchDB's durable storage format ensured that all the data affected by the 1.0.0 bug is recoverable. We think that in the end, because we announced the bug loudly and clearly, and provided a repair tool within days, no one lost any critical data.

couchdb · on Aug 26, 2010

If they cared for data-integrity above performance, they would be running with delayed_commits=false anyway, and thus would not be subject to the bug.

ibejoeb · on Aug 26, 2010

Obviously CERN is doing heisenbug research.

chunkbot · on Aug 26, 2010

I am happy to see that CERN uses Oracle extensively, presumably for the data that matters. Our tax dollars should only go to a reliable, proven database solution.

chunkbot · on Aug 30, 2010

I was being sarcastic.