Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why Large Hadron Collider Scientists are Using CouchDB (readwriteweb.com)
52 points by jchrisa on Aug 26, 2010 | hide | past | favorite | 15 comments


It's most likely just handles metadata, aggregated data or serves as a proxy to their main DB.

I just can't believe they converted 10 PB of binary data into JSON ;)


Yep. MongoDB for example is used for their Dataset Bookkeeping Service. JSON is indeed used but only for 'good run' lists.


CouchDB has a feature for attaching binary attachments to records.


Well, storage is basically free these days, right? :-)


I have no idea why they don't use something dynamo based like Riak or Cassandra which actually can automatically shard data and scale really well... CouchDB only supports replication, no sharding unless you use third party libraries. With Riak they could use the internal map/reduce support and with Cassandra they might be able to use Hadoop to analyze data and split the workload over several nodes.


If you read the case study you will see that building apps rapidly was a huge win for them. CouchDB gets compared with other DB's but it is a full stack app dev environment. The other NoSQL DB's need a Rails or PHP or Python or Java or C# or ... stack in front and app dev is nowhere as fast as with CouchApp.


Riak e.g. does also have a REST interface (https://wiki.basho.com/display/RIAK/REST+API). Could you elaborate what else ist part of the "full stack app dev"? Futon? p.s. please don't use "Rails" when you mean "Ruby". Kittens are dying everytime!


They use it to store aggregated reports for distribution and analysis. It isn't doing any heavy lifting, and it's not a terribly compelling case study of anything.


Wouldn't it be a good case study in rapid development of a web based aggregate reporting portal? :-)


I hope they didn't upgrade to 1.0.0.


For the record I should note that CouchDB's durable storage format ensured that all the data affected by the 1.0.0 bug is recoverable. We think that in the end, because we announced the bug loudly and clearly, and provided a repair tool within days, no one lost any critical data.


If they cared for data-integrity above performance, they would be running with delayed_commits=false anyway, and thus would not be subject to the bug.


Obviously CERN is doing heisenbug research.


I am happy to see that CERN uses Oracle extensively, presumably for the data that matters. Our tax dollars should only go to a reliable, proven database solution.


I was being sarcastic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: