Michelangelo: Uber’s Machine Learning Platform

atonse · on Sept 6, 2017

This reminds me of LivingSocial. For years, LivingSocial was like a mini black hole of really good startup-like talent in the DC area.

But when you have that many smart people under one roof, you have to keep them busy. So you get these kinds of reinventing-a-beautiful-wheel projects. Not knocking the quality, just knocking the growing distraction and temptation to build in-house rather than use existing tech.

LivingSocial had such good quality output, but they were all over the place. They had to keep all those productive people busy after all. And the best way was to keep churning out new products.

hitekker · on Sept 6, 2017

I agree. "Keeping them busy" also raises an uncomfortable question: if Uber and other valuable Unicorns can be executed 99.999% with 25% of the engineers they hire, do they need the next 75% of engineers?

Even considering crunch-time and potential diversification/expansion, full-time employment of non-critical engineers is obviously inefficient. And, as Uber proved, politically toxic and consequently harmful to the productivity of the business.

More pressing is the existential tension underlying this quandary. That our economy will generally and increasingly forgo the skilled and unskilled labor provided by our society.

Framed concretely: one day, we may define a "10x engineer" as an engineer who can send ten other engineers to the soup kitchen.

I'm still looking for a well-thought-out solution.

xixi77 · on Sept 8, 2017

This is not really new; just think of all the stuff that came out of AT&T labs...

JPKab · on Sept 6, 2017

Yeah, as a techie that used to live in DC, LivingSocial people were always at the Meetups. I was always blow away by the talent being used to work on a shitty coupon app.

andrewfong · on Sept 6, 2017

Are there any ML-as-a-service platforms that offers something comparable to what Uber has here? And importantly, do you actually like using them?

That is, suppose I'm willing to do the following:

* Send said service a firehose of data

* Write code for feature selection, data cleanup, transformations, etc. -- i.e. the sort of code you'd expect to find in a data scientist's Jupyter notebook.

* Specify the type of ML model and parameters I'd like to experiment with.

Is there a service that handles data warehousing, serving up predictions via a REST API, deployment of new models, reporting model performance over time, and general scaling issues? I know it's possible to set up a large portion of this using AWS services, but ideally, it'd be nice to have something with a Heroku-like ease-of-use, but for ML.

rasmi · on Sept 6, 2017

Disclosure: I work for Google. Google Cloud does this with Dataflow and ML Engine in a fairly simple way, and the internals are open sourced as Apache Beam and TensorFlow so you can migrate to other platforms if you want.

mayank · on Sept 6, 2017

Not quite there, but parts are:

* You can send your firehose to BigQuery, and write Javascript UDFs to transform the data.

* You can use a service like DataRobot to automatically train a bunch of models.

* You can deploy the trained models using a serverless platform like Lambda.

Nothing that's one-size-fits-all, but watch out for trends in that direction as more ML workflows get standardized.

wmeddie · on Sept 6, 2017

Skymind's SKIL does this. (I work on the team.) You don't have to send it your data though, you can just install it inside your system and have it connect to the data source or stream to train models (optionally doing hyper-parameter search), deploy them to REST servers and monitor their performance.

mobileexpert · on Sept 6, 2017

Azure ML

ANaimi · on Sept 6, 2017

Algorithmia Enterprise (algorithmia.com/enterprise). Inference- and deep learning-centric. (I'm part of the team).

romankolpak · on Sept 6, 2017

You should take a look at DataRobot, it does most of the things you've listed.

pplonski86 · on Sept 6, 2017

I wrote MLJAR (https://mljar.com) that is doing model and parameters search - and it helps me a lot with ML projects.

philliphaydon · on Sept 6, 2017

I wonder if this thing is used for Uber Eats in Singapore... Because I use User Eats all the time, the estimated time is MASSIVELY inaccurate. It's not even within 30% to be close to actual time.

The estimated time is always less than the actual time, but every minutes the estimated time re-adjusts. By the time it's actually delivered it's about 15 to 20 minutes more than the original estimate.

Note: I'm just complaining because the article uses User Eats as an example and as a service I use often I wonder if this ML Platform is used at all...

anindha · on Sept 6, 2017

Maybe they did use ML in your scenario just not the way you thought:

It takes 50 mins to get your delivery but to make you order they need to tell you its going to take 30 mins. ML also tells them that the 20 minute discrepancy won't stop you ordering next time.

Cycle continues.

jdwyah · on Sept 6, 2017

Chuckle. Same experience. As far as I can tell the estimated time is about as good as the Windows 3.1 file copy time estimation.

strin · on Sept 6, 2017

Maybe they trained on global data and it biases against sample portion of the dataset that represents Singapore?

philliphaydon · on Sept 6, 2017

Wouldn't the ML learn to that times differ globally tho?

d4rti · on Sept 6, 2017

Only with the right models and features.

Quarrelsome · on Sept 6, 2017

ML is only as good as its inputs and if it doesn't have any Singaporean inputs then its gonna potentially suck for Singapore if its an outlier.

nl · on Sept 6, 2017

This solves a real problem. ML pipelines are complicated, and often involve multiple languages and platforms (just as described in this post).

But.. there are existing solutions in this area. Notable Luigi[1] (from Spotify) and Airflow[2] (from AirBNB) both seem to have a lot of overlap with this.

I'm most familiar with Luigi, and it does many of the things that are listed here.

Some of the model visualisations look pretty nice, and don't come out-of-the-box in the other platforms.

So I'm not really sure what makes this unique.

[1] https://github.com/spotify/luigi

[2] https://airflow.incubator.apache.org/

gipp · on Sept 6, 2017

What? Luigi and Airflow are ETL frameworks. They express dependencies between tasks, schedule their execution, and that's about it. Model storage and versioning, feature definition and extraction (batch and real-time), model deployment... Luigi and Airflow do almost nothing described in this post.

nl · on Sept 6, 2017

They do model/feature storage and versioning just fine. Or at least Luigi does - I use it daily.

And feature extraction and modelling is my main usecase. It works really well, across Spark, Scikit and R, saving data in HDFS.

Some examples:

https://blog.dominodatalab.com/luigi-pipelines-domino/

http://blog.richardweiss.org/2016/10/13/kaggle-with-luigi.ht...

https://github.com/Atreya22/luigi_rosmann_sales

gipp · on Sept 6, 2017

Yeah, it does all those things, as long as you write all the code, set up all the systems for it to interact with, and manage all the logic yourself. That's like saying Python does all those things.

nl · on Sept 6, 2017

It's not exactly clear that Michelangelo does anything different?

A quote:

We provide containers and scheduling to run regular jobs to compute features which can be made private to a project or published to the Feature Store (see below) and shared across teams, while batch jobs run on a schedule or a trigger and are integrated with data quality monitoring tools to quickly detect regressions in the pipeline–either due to local or upstream code or data issues.

That certainly sounds like you write code to run inside the containers. The deep integration with the standardized feature store etc sounds nice, but not radically different.

we created a DSL (domain specific language) that modelers use to select, transform, and combine the features that are sent to the model at training and prediction times. The DSL is implemented as sub-set of Scala.

So this is pretty much the equivalent of Spark Dataset API, or maybe the RFormula stuff in Spark ML[1] except in Scala right?

[1] https://spark.apache.org/docs/latest/ml-features.html#rformu...

jaz46 · on Sept 6, 2017

Pachyderm (github.com/pachyderm/pachyderm) combines model versioning with data pipelining. It's basically Domino meets Luigi/Airflow.

Pachyderm.io (I'm one of the founders)

sandGorgon · on Sept 6, 2017

>For every model that is trained in Michelangelo, we store a versioned object in our model repository in Cassandra..

>For important model types, we provide sophisticated visualization tools to help modelers understand why a model behaves as it does, as well as to help debug it if necessary

How is this done - what serialisation format (pickle, pmml,etc) is being used? And more importantly, how are you tracing into the model ? It looks like this is a Spark based framework

Eridrus · on Sept 6, 2017

It's probably being "serialized" as a JAR file which just contains the code you call into. That was Spark's online deployment story last I heard, and jives with them saying the library mode has a Java API.

strictnein · on Sept 6, 2017

SAP has Leonardo. Uber has Michelangelo. Who has Donatello and Raphael?

oculusthrift · on Sept 6, 2017

Personally i don't get the obsession with ninja turtles.... /s

omot · on Sept 6, 2017

Funny you thought of Ninja Turtles, I was thinking Renaissance artists.

strictnein · on Sept 6, 2017

Then this Epic Rap Battle of History is for you (salty language warning):

https://www.youtube.com/watch?v=6HZ5V9rT96M

tomdre · on Sept 6, 2017

Splinter is the best

jackyinger · on Sept 6, 2017

I expect this will be an unpopular opinion, but a proliferation of frameworks does not create value for anyone besides the developers employed in making it.

Sure different frameworks handle different uses better than others, but it doesn't change the fact that in a given genre (machine learning, web frontend, ORM, etc.) that there is a lot of duplication going on.

If you want to look cool and impress people, make a tool that doesn't exist, develop a new algorithm, or contribute a great feature to an existing framework.

Maro · on Sept 6, 2017

This is like FB's FBLearner, which I've used a little bit.

This is a platform, so this is not at at the SKL/Tensorflow/Caffe level, this is something which says: put your data in our data warehouse in this format, tell me what model you want to run on it, go have a coffee, and by the time you're back I'll magically run your model on a big cluster quickly; if you want I can also try out different hyperparams/models, and I'll give you standard plots/metrics to evaluate the thing. You can also use 1000s of existing feature vectors/metrics that other people have put in the DWH. You can also clone an existing model and tweak that. If you're happy with the model, you can deploy, either on my API by querying and I'll tell you the prediction, or I can publish the model in binary form to a datastore and you can just import it in your iOS code in 3 lines. You can also schedule nightly re-trains, I'll notify you if something big changes in the accuracy metrics, etc.

The point is, this sort of ML platform allows engineers at big companies to leverage existing data/metrics/f.vectors, models, infra, etc. and allows them to move 10-100x faster than startups/hobbyists. When I say 10-100x, I actually mean it, I'm also running ML models at home, and the amount of time I waste on this glue stuff is huge (starting from data prep to f---ing around with tensor ranks).

I'm sure there are already startups working on platforms like this for everybody, we'll just have to wait for one of those to become good.

MasterScrat · on Sept 6, 2017

> I'm sure there are already startups working on platforms like this for everybody

Does anyone knows of any such open-source product actually usable today?

This would solve a real need.

(Also I'd love to work on such a thing)

Kpourdeilami · on Sept 6, 2017

I've been working on something very similar to that. It allows you to connect a SQL database, S3 bucket, etc. then write a query for the data that you want to train the model on. It takes a couple minutes and then it'll expose the trained model over a REST endpoint.

It also has feedback loops to fine tune the model on real-time data and has an option to store all the predictions made by model into Redshift or S3 to be visualized on a tool like Tableau.

The tool itself is not open-source but it can be installed on the user's AWS, Azure, or GCP account to keep their models and data private.

It's still in alpha testing (bunch of pilots) and I'll probably do a "Show HN" post when it is ready for a public beta

Maro · on Sept 6, 2017

Things to look at:

H2O, Amazon ML Platform, Azure ML, Google Cloud

Actually, in this case I'd bet on the big cloud providers to eventually deliver sth nice (or acquire and integrate).

MasterScrat · on Sept 6, 2017

I would prefer self-hosted, or at least without vendor lock-in.

Looks like H2O would fit the bill.

Anyone has experience with pipeline.ai?

xbeta · on Sept 6, 2017

Sure, you sound like anyone can just have a eureka moment and come up with an algorithm. Uber has a legitimate use-case that fits their needs on ML, so they build it for that need. And why not just also talk about it too as a lesson-learned for others.

Do you simply have very high expectation of everything in this world?

jackyinger · on Sept 6, 2017

There are plenty of meaningful ways to contribute to the body of open source software that don't require strokes of genius. Bug fixes and refactoring come to mind.

I just picked this thread as a starting point Uber may have a legit need for a particular framework.

My main point is there is no need to make frameworks for frameworking's sake. Purposeful OSS collaboration is just as good, if not a better way to gain influence than a PR splash.

Of course that assumes that the goal is to make cool things, which seems loosely connect with wooing investors at best. (Again in general, not picking on Uber.)

xbeta · on Sept 6, 2017

I don't work in Uber's team on ML, so I can't answer this, but I also cannot just simply assume everything is built-for-PR-splash. You can argue why can't they simply talk about the reason of not reusing any existing OSS framework than building yourself, but making an assumption that they did it for the sake of PR is lack of factual support. Are we now living in a world assuming everything Uber did was for PR? Last I heard they had 1500+ engineers and with that size, I am sure they can just talk about things they have done in the last few years.

bayonetz · on Sept 6, 2017

In this particular genre there are no off-the-shelf platforms or even accepted set of best practices for something as comprehensive as this for generalized ML pipelining infrastructure. It's still very Wild West. I've done a lot of research into what pipelines look like at different companies and cloud providers and I can tell you this is a nice one. Some have a good training story, some have a good deployment story, some have a good experimentation story, and so on but none have all the things. Projects like Tensorflow Serving and PipelineAI are heading there but still quite a lot missing compared with something like what was presented here.

buremba · on Sept 6, 2017

Your comment is indeed not helpful and prevents people to open-source their work unless they think that it is awesome.

This tool may not be great but can inspire similar tools or may be helpful for developers who want to learn how to develop a software similar to this one, or attack developers to contribute the project and eventually may become a standard tool as UI for ML pipelines etc.

rockinghigh · on Sept 6, 2017

Have you read the article? Michelangelo is a platform, not a framework. It abstracts the underlying frameworks.

thewhitetulip · on Sept 6, 2017

Yes, but in some cases a platform and framework do help in formalizing things.

gaius · on Sept 6, 2017

It's a bet similar to a startup - your "framework" might not be popular but if it is - for no particular reason other than a cool name and lots of retweets - then you are sorted. How many JS "frameworks" are there now?

lwansbrough · on Sept 6, 2017

That's how I feel about most projects I see on here. Toy languages are my personal pet peeve. So much talent being wasted on useless projects. I'm not really one to speak - none of my projects make a substantially positive impact on society, but they have their niches.

gfodor · on Sept 6, 2017

Do you really feel this way? It seems like a pretty strange world view to think that someone posting a thing they built on a site called hacker news implores you to judge their talent as being wasted, or their project as being useless.

Not only does it seem wrong to judge things people post here that way, anyone out there doing good work knows that the path to doing meaningful work begins with "useless projects."

lwansbrough · on Sept 6, 2017

The reality for the people who post on here is that most of us aren't doing anything truly substantial. I include myself in that group. You can believe your IOT AI messenger bot is going to revolutionize an industry, but the more likely situation is that it doesn't. That being said, such a project is not something I'd consider useless, but it's in the same range as the things I do consider useless. Take toy languages for example: could it revolutionize software development? Maybe. Will it? Almost certainly not. How many languages are commonly used in software engineering, and how much ability have you gained by doing it?

It's not for me to say where you apply your time and from what you get your enjoyment from, but I do believe there are objectively more important problems that could be solved. There are plenty of problems out there that need to be solved, and engineers here have the capability to solve them -- I think it's fair to say we're more well equipped than most to solve some of the largest problems humans face.

Here's a quick list of problems that are worth solving (in my opinion), vs. things that are figuratively useless:

- Rising global temperatures will disrupt agriculture and food supplies, how will we farm when desertification consumes the croplands?

- Thousands of people walk by homeless people on the street every day, if a small portion of those people stopped and engaged with the homeless, how many lives could be improved?

- Children in third world countries struggle to receive proper education, how can we reach them and improve their education?

- Governments around the world are failing to represent the needs and desires of their people, what can be done to help governments succeed?

- Growing levels of automation are replacing a staggering number of jobs, and if the growth continues, the great depression will be ahead of us, not behind us.

- Global financial markets are controlled by companies which you and I have no say in, yet they hold most of the worlds wealth.

In contrast:

- I want X programming language that does Y because Z doesn't do Y, in all likelihood, I will be the only one who ever uses this language, but it'll be a personal accomplishment.

- I don't want to leave the house or stop working for half an hour to make a meal, how can I improve/speed up the time it takes for me to get a meal from my favourite restaurant?

- Making X sucks, I want a robot to do it

- Bad AI for X

- IOT for X

- Framework for X

- Uber for X

How often do I see the latter vs. the former? Imagine if you saw FOSS projects attempting to solve problems on the first list as often as you saw projects attempting to solve problems on the second list. I truly believe the world would be a better place if that was the case.

I said earlier I'm guilty of this myself. I build consumer facing supporting software for entertainment media. I like what I do, but I don't think I'm contributing much to society, and that makes me uneasy.

bko · on Sept 6, 2017

> Take toy languages for example: could it revolutionize software development? Maybe. Will it? Almost certainly not. How many languages are commonly used in software engineering, and how much ability have you gained by doing it?

Maybe revolutionizing the world is not the point. Maybe the people who create toy languages gain a deeper understanding of some issue from pursuing the project. Maybe sharing it helps others discover that deeper issue or hidden complexities and elegant solutions. Or maybe they are just an inspiration and serve as a reminder that everything that exists is built by humans. Everything can be understood. There is no magic.

nindalf · on Sept 6, 2017

I don't think toy languages are the a waste of time, because the people who make them improve their own skills. With these skills they might contribute to say, LLVM and all our code gets 0.1% faster. But it's a deeper reason - I don't think it's an either-or between working toy languages and global warming. The choice that I personally face is between enhancing my skills and Netflix/twitch and I suspect that many other people do as well, ie, the choice between creation and consumption.

In my opinion, it's absolutely fine to create anything as a hobby if it makes you happy, whether that's woodworking, drone photography, or even reading or watching something that makes you think deeply. It's impossible for us to judge a priori which of these contributions will be useful to society in the long run. So if the people doing it are happy, why stop them?

gfodor · on Sept 7, 2017

I used to think like you, until I started a company to work on something because I felt like it was a "useful problem" and it crashed and burned quickly because I was not truly motivated: I was thinking with the head not the heart. Then I started working on a problem space I enjoy out of sheer technical fun (VR) and have been amazed to discover that it has massive potential for helping solve some of the more "useful" problems you mention.

The truth is, when you are working on a project you care about, nobody really knows where it will lead you. Big things start out small and usually with humble ambitions. Usually the things that end up impacting the world start out as being considered "useless" and "toys."

I say, work on interesting problems that you feel good about working on but most importantly enjoy working on. Feeling like the thing you are working on is an important problem to solve is a necessary but insufficient condition to ensuring you will get through the tough parts.

Richard Hamming once wrote that he asked his peers the question: "Are you working on the most important problems in your field? If not, why not?" And I think this is a valid question and not a leading one at all. But it also is fair to answer "No" to this question if you are able to clearly answer "why not?" to yourself in a way you can live with.

barrkel · on Sept 6, 2017

Be wary of projecting your personal values as if they were absolutes.

Sean1708 · on Sept 6, 2017

Your first list is full of things which would be difficult for a team of experts dedicated to tackling that problem to solve. Your second list is full of things which could feasibly be tackled by a single hobbyist.

Are you really that surprised that you see more of the latter than the former?

peternicky · on Sept 6, 2017

I started reading your comments and was initially supportive and thought you had a good thought but your examples are not equivalent and are quite biased towards your own values.

mercer · on Sept 6, 2017

I think this is a case of different 'cultures' on HN clashing.

On the one hand I completely understand your point, and from the perspective of one 'culture' it feels like a waste of time and instead we should focus on 'getting things done' and 'making an impact' (in the shape of a successful business or whatnot).

But on the other hand, one of the main reasons I love HN is that there are still plenty of posts that are just about hacking/tinkering, without concern for 'usefulness' and 'purpose'. In fact, more effort put into something as silly as possible often makes the whole thing even more delightful.

Personally I try to focus on the latter as long as I can afford it. While I really appreciate the startup advice and 'useful' stuff, a few years ago I developed a burnout that, in hindsight, was caused in part by the fact that I stopped being able to just enjoy things for their own sake.

I noticed this same thing happen to friends of mine (without the burnout, usually): in our twenties we often wanted to do cool stuff together because it was cool, even though we really needed money! But then, in our late twenties, it all became about whether we could make money from an idea or whether the idea was 'useful' by some other metric, despite the fact that we actually had enough money finally to not be in survival mode anymore.

For me one of the best improvements to my life (and stress levels) has been to try and bring more of that 'silliness' in my life. And judging by the people I admire, that seems like a good approach to life in general. And already now I notice how my 'silly' explorations actually end up providing tons of benefit for my 'career'.

lwansbrough · on Sept 6, 2017

> But on the other hand, one of the main reasons I love HN is that there are still plenty of posts that are just about hacking/tinkering, without concern for 'usefulness' and 'purpose'. In fact, more effort put into something as silly as possible often makes the whole thing even more delightful.

Indeed, this is what makes most of us engineers. But if I separate myself from it and look at it from the outside, we "waste" a ton of time doing things that... well... don't matter. The exceptional few make advances most of us dream of, and they contribute a lot to the advancements of engineering. Many of us just tinker for the sake of it. I respect people want to do what makes them happy, but I find the lack of interest in solving problems bigger than what can fit inside a hard drive a bit... depressing. It feels like resignation.

> On the one hand I completely understand your point, and from the perspective of one 'culture' it feels like a waste of time and instead we should focus on 'getting things done' and 'making an impact' (in the shape of a successful business or whatnot).

I don't think it needs to be a business, just something that serves as a testament to a legacy that says "I left the planet in better shape than I found it."

The world could benefit a lot from more stoicism, and I think we'll need it in the future that's to come.

mercer · on Sept 6, 2017

I can understand your point of view, but I think there are two assumptions there that many would not agree with.

The first assumption is that it is better to advance engineering, or somehow 'leave a mark' (fame?), or 'make the world a better place', than to simply live a happy life, all else being equal. While personally I do enjoy it to see others enjoying or being grateful for the fruits of my labor, I don't actually consider this a good incentive. I'm a huge fan of the 'wu-wei' concept in taoism though.

And while I'm still actively figuring out how to apply all this to my life, I generally find that some (and possibly most) of the most intensely happy moments in my life were rather... insular and independent of (rational) 'context'.

The second assumption is that the kinds of things we engineers do are a net benefit to the world. I've begun to doubt that, despite my love for all things engineering. If I were to do anything that has a large enough impact to 'matter', there's always a chance that this might end up having unintended consequences that also matter. I can think of many pursuits that seem more unambiguously 'good'.

All that said, it's possible that you are either 'wired' to want to make the world a better place, or that your life took a path where this is who you've becoming. By my own logic, I can't really disagree with you wanting to pursue that! Just offering a different point of view.

icebraining · on Sept 6, 2017

Learning is useful, and some people just enjoy publishing the results.

visarga · on Sept 6, 2017

It would be more useful to publish anonymized datasets, if they got any useful data.

strin · on Sept 6, 2017

A random idea: in this article, they talked about a Feature Store shared across teams. Feature extraction also seems a major bottleneck in their modeling. It seems a very natural and valuable thing to have a platform to share "generic features".

Yann LeCun said once a deep neural network predicts many targets, say 1000 classes in ImageNet, it is possible for the model to learn quite generic features. So it makes sense to pre-train on a large amount of data and a reasonable number of targets, and then share the learned feature extractors with others.

Could this be a business? Or a community? Thoughts?

Eridrus · on Sept 6, 2017

Pretrained models for generic things exist publicly, e.g. models trained on ImageNet, or pretrained word vectors.

Not a whole lot of general use in having a feature extractor tuned to Uber's data.

aorloff · on Sept 6, 2017

I am surprised by this - h2o.ai seems to have many if not all of the features of this, and is open source.

"Specifically, there were no systems in place to build reliable, uniform, and reproducible pipelines for creating and managing training and prediction data at scale. Prior to Michelangelo, it was not possible to train models larger than what would fit on data scientists’ desktop machines, and there was neither a standard place to store the results of training experiments nor an easy way to compare one experiment to another."

Seems like h2o.ai fits a lot of that bill.

opensandwich · on Sept 6, 2017

h2o.ai doesn't really do data pipelines, though it does appear they are eager to go into this space through their new driverless.ai tool. However this does not appear to be open source.

JPKab · on Sept 6, 2017

Something else that throws me off on this:

My team's experience with MLlib has been bad. Especially compared to H2O Sparkling Water.

Anyone else on here find MLlib to not be as good as advertised? I was surprised to see Uber using it.

thecity2 · on Sept 6, 2017

Spark is good if it fits your use case (or if you can make it fit your use case). We have definitely found that scaling can be a real issue though.

linkregister · on Sept 6, 2017

The "subscribe for updates" modal is extremely distracting. It has a long lag before it repositions at the top of the viewport, at least in my browser.

ahead · on Sept 6, 2017

Is Michelangelo the name of their previously published paper "Scaling Machine Learning as a Service" https://news.ycombinator.com/item?id=14708761 , or is this something separate?

rockinghigh · on Sept 6, 2017

Yes, it's the same platform.

rodionos · on Sept 6, 2017

> batch jobs run on a schedule or a trigger and are integrated with data quality monitoring tools to quickly detect regressions in the pipeline–either due to local or upstream code or data issues

I'm curious what tooling they're using to ensure data quality, in particular time series data.

manugarri · on Sept 6, 2017

So I have seen this Michelangelo pop up in other articles.

The key question is? Is Uber going to open source it? If not, why bother writing articles about the specifics of their platform?

sunsetMurk · on Sept 6, 2017

PR & attracting talent? Though open sourcing would be great and do the same thing, but more so.

thecity2 · on Sept 6, 2017

This doesn't seem to be an open source project, unless I'm missing something. It's kind of a show-and-tell basically.

tomerweller · on Sept 6, 2017

Describing your ML platform is nice. Open sourcing it would have been nicer.

Dystopian · on Sept 6, 2017

From my experience (and what this looks like) is that there isn't a ton of novel IP to open source here. A lot of the times ML "platforms" are composed of a number of open source components glued together in an elegant way (which I'm not implying is easy, because it's very difficult).

From looking at this from the outside the only things in this platform that may be open-sourceable would be the job scheduling and visualizations - and there are already variants of open source tooling which could be repurposed for those tasks (or may even be powering those components).

The main purpose of this post seems to be Uber's way of standardizing their workflows + a little extra glue (which they're calling their platform). It still provides a lot of value. Also, Uber does have a few cool open source projects: https://uber.github.io/ (but could admittedly have more).

tomerweller · on Sept 6, 2017

You have a point. Describing their workflow has value to it.

With that said, the title of the post is not "Meet our workflow", It's "Meet X: bla bla". In the world of software, one expects to actually see a product named X that bla blas.

lwansbrough · on Sept 6, 2017

> You have a point. Describing their workflow has value to it.

This is a really valuable thing to acknowledge. There is some sharing of company philosophies, but seldom do I see companies fully "open sourcing" their workflows and strategies. Perhaps because the people at the top see that as the real value their company brings - that knowledge. Nevertheless, it's extremely valuable and I wish I could see more things like that. Basecamp's book "Getting Real" is close to what that might look like, I think.

mogili · on Sept 9, 2017

Absolutely! I assumed they had open sourced it and was searching for the Github link :(

jamesmishra · on Sept 6, 2017

Former Uber engineer here.

I don't think you would find an incredible amount of use from an open-sourced Michelangelo. The biggest advantage that Michelangelo has for Uber is that it is easy to integrate into all of Uber's other tools.

Depending on what your machine learning needs are, you could get pretty far with just Spark + MLLib, and wouldn't need any of the customization that Michelangelo has on top.

thecity2 · on Sept 6, 2017

This is the sense that I got. I am a one-person data science team for my startup, and I basically cobbled together most of the automation described in Michaelangelo over the course of a few months. Spinning off Spark ML jobs on EMR and saving metadata to a database.

gregoriol · on Sept 6, 2017

Why make all that noise with a detailed blog post then? If it's a custom-fit internal tool, then good for you, the rest of the world doesn't care. Each company has internal tools and stuff.

ClashTheBunny · on Sept 6, 2017

There is the sharing of ideas. Maybe they couldn't open source it, but were given permission to publish about it. Google never opensourced some of their greatest contributions, just the ideas behind them.

jamesmishra · on Sept 7, 2017

I think blog posts like these are an interesting way to show off what goes on in a large company like Uber.

If you're a tiny startup, then Spark + MLLib is more than enough. Even that would be overkill if your data fits on a single machine.

But if you're at a young, but quickly-growing company with:

- terabytes of data

- tens of thousands of features extracted from the data

- dozens or hundreds of unique machine learning models being tweaked over time

then hopefully a blog post like this is helpful. It shows off various effective patterns for solving machine learning patterns at scale. Presumably, you'll want to build your own internal system with its own set of hooks, but the best practices and lessons learned should be roughly the same.

dharma1 · on Sept 6, 2017

Recruitment, internal PR

andrew-lucker · on Sept 6, 2017

I didn't make it past "democratizing" and "internal" in the same sentence. I think their words are different from our words.

cyrux004 · on Sept 6, 2017

TLC anyone ?

curiousDog · on Sept 6, 2017

What is TLC?

Larrikin · on Sept 6, 2017

An autocorrect of tldr

_pmf_ · on Sept 6, 2017

I prefer to think he likes a hug.

myhrvold · on Sept 6, 2017

Also: Tender Loving Care or Taxi & Limousine Commission :0p

pyrophane · on Sept 6, 2017

So, what? Uber is now the new IBM Watson consultancy?

jamesmishra · on Sept 6, 2017

Former Uber engineer here.

I don't think Uber is anything like IBM Watson, mostly because Uber uses machine learning to solve business problems for its own products--not for other companies' products.

Most large companies that use machine learning in production will have something similar to Uber's Michelangelo. For example, Facebook has FBLearner Flow[1]. Catherine Dong recently wrote a TechCrunch article describing this broader industry trend[2].

[1]: https://code.facebook.com/posts/1072626246134461/introducing...

[2]: https://techcrunch.com/2017/08/08/the-evolution-of-machine-l...

gipp · on Sept 6, 2017

I'm not really sure what similarities you see between the two.