Running FFmpeg on AWS Lambda for 1.9% the Cost of AWS Elastic Transcoder

mncharity · on May 3, 2018

Similarly, you can compile ffmpeg on Lambda, in 0.5 minutes, for 9 cents.[1] Versus 10 min on one core, for ~free. And while -j200 of ffmpeg is nice, -j1000 of the linux kernel is... wow, like seeing the future.

[1] demo in a talk: https://www.youtube.com/watch?v=O9qqSZAny3I&t=55m15s (the actual run (sans uploading) is at https://www.youtube.com/watch?v=O9qqSZAny3I&t=1h2m58s ); code: https://github.com/StanfordSNR/gg ; some slides (page 24): http://www.serverlesscomputing.org/wosc2/presentations/s2-wo...

adtac · on May 3, 2018

This is impossibly amazing to me! Thank you so much, what an excellent lecture.

benmanns · on May 2, 2018

Very cool. I hope more of the "value add" stuff from cloud providers ($$$) can be replaced with open source running on their cloud functions. My suggestions:

* FFmpeg supports http/https as input protocols if compiled with the options enabled. See `ffmpeg -protocols`

* You can parallelize or chunk FFmpeg to enable longer inputs, e.g. I found https://github.com/nergdron/dve/blob/9f1ca516b18f50d1d99d15e...

* Try with larger memory sizes. Larger memory = more CPU for Lambda which may result in shorter transcodes. You might even pay the same amount if the transcodes are CPU bound and finish in roughly linear time wrt CPUs

akvadrako · on May 2, 2018

This is not "cool" - this is either doing less than Amazon's encoding service or just exposing the pricing model. If it's cheaper they could charge less, unless they are terrible programmers who can't even use their own lambda functions.

kwillets · on May 3, 2018

>If it's cheaper they could charge less,

You definitely misunderstand cloud pricing.

akvadrako · on May 3, 2018

How so? It seems to me this is a perfect example of it.

thefounder · on May 3, 2018

They have no interest to charge you less. The strategy is to lock you in and get the most they can from you.

akvadrako · on May 3, 2018

Of course - that's why this "exposes their pricing model", which is charging for services.

naiv · on May 2, 2018

Licensing costs are pretty high for the different encoders.

kwillets · on May 2, 2018

Next installment: "Running FFmpeg on a tower under my desk for 1% the cost of AWS Lambda".

Is there any effort to on-prem lambda stuff yet? I know it's a moving target, but I wouldn't recommend getting into cloud stuff you can't migrate out of.

_5qp2 · on May 2, 2018

Yes! If one uses something like Kubeless[1], you have something like AWS Lambda, but where the backend is Kubernetes rather than AWS. Yes you're still on a framework, but it's a vendor-agnostic, open-source one. There are some other attempts at similar things, too. I am partial to this one for now.

[1]: https://github.com/kubeless/kubeless

hardwaresofton · on May 3, 2018

This is what I see as the true power of kubernetes -- once people start developing high quality (hopefully open source) applications for platforms like kubernetes, providers like AWS should lose their "value-added" benefits, and be reduced to more like colo providers, maybe offering 24/7 support as well.

That will be when the ubiquitous cloud truly arrives -- run on whatever provider in the sky, and as long as they run kubernetes you can run your workloads there.

checker · on May 2, 2018

At a pure level, Lambda "integration" is essentially an interface for passing in runtime arguments+environment to your application. If you're concerned about lock-in, you can write or integrate another entry point that executes your function on on-prem Tomcat or whatever other environment (Cloud Functions?) you want to run in.

The much harder challenge would be provisioning thousands of on-prem servers to handle the load, but I wouldn't necessarily qualify a dependency on cloud-like autoscaling as lock-in.

I guess the lock-in might come as you unwittingly couple yourself to the intricacies of the rest of AWS's offerings as your app architecture grows more complex.

ty_a · on May 2, 2018

Well there's iron.io (which just got bought by Oracle) and there are several FaaS projects for Kubernetes such as: OpenFaaS and Kubeless.

Edit: OpenFaaS can run on Nomad as well apparently: https://www.hashicorp.com/blog/functions-as-a-service-with-n...

polskibus · on May 2, 2018

https://openwhisk.apache.org/ backed by IBM AFAIK, but not only.

2RTZZSro · on May 2, 2018

Next installment: "Running a private cloud under my desk for 1% the cost of AWS".

kwillets · on May 3, 2018

I'm thinking of writing a book about tower-under-desk computing. It's surprisingly common.

pixl97 · on May 3, 2018

https://xkcd.com/908/

yaronhaviv · on May 5, 2018

check: https://github.com/nuclio/nuclio high-perf serverless runs over (standalone) docker or Kubernetes or Cloud services

buildbuildbuild · on May 2, 2018

Their tool which facilitates the packaging and relocation of dynamically linked binaries is interesting: https://github.com/intoli/exodus

"Painless relocation of Linux binaries–and all of their dependencies–without containers."

therein · on May 3, 2018

Yeah, seriously. This sounds great.

Also, if you found exodus interesting, you may find the following interesting too.

https://github.com/endrazine/wcc

yesco · on May 3, 2018

Wow this is amazing, how have I never heard of this before? Thank you for sharing this.

athrun · on May 2, 2018

Wow! Thanks for flagging this. This tool has so many useful applications; mind blown.

pjc50 · on May 2, 2018

Yes, I think this is the buried lede - that could be incredibly useful.

maxk42 · on May 2, 2018

Very misleading title. Elastic Transcoder pricing applies primarily to video. This tutorial only covers the audio transcription which is much, much less resource-intensive.

Johnny555 · on May 3, 2018

Nothing misleading here, they compared their project's cost to the Elastic Transcoder audio pricing.

Elastic Transcoder Audio is $0.00450 per minute [1], this article says that with Lambda it cost "$0.00008273 per minute of audio, a full factor of 54 times less than Elastic Transcoder.".

0.00450 / 0.00008273 = 54

[1] https://aws.amazon.com/elastictranscoder/pricing/

jwildeboer · on May 2, 2018

Misleading title. Article is about audio encoding, not video. Better: “Using FFmpeg on AWS Lambda for audio encoding at 1.9% of cost for AWS Elastic Transcoder”

taion · on May 3, 2018

What's the benefit of using Exodus over just using the official-ish static builds of FFmpeg? https://johnvansickle.com/ffmpeg/

This works just fine on AWS Lambda. The `ffmpeg` binary there weighs in at 46 MB. Unless you need something not bundled with that build, it seems like this is sufficient and is easier to set up.

kwillets · on May 2, 2018

It looks like the youtube download has to complete before ffmpeg can start; is there a way to start processing the head while the tail is still being written?

This problem comes up a lot with storage blobs. The bigger they are the worse it is to serialize write/reads.

RJIb8RBYxzAMX9u · on May 2, 2018

I don't see why not with I/O redirection, or named pipes. It probably wasn't done for simplicity.

throwaway2016a · on May 3, 2018

This is great. I wouldn't have thought to run FFmpeg on Lambda.

I'm going to stick with Elastic Transcoder for now though. I like that I have no upgrades to maintain and very little code. I feel like if I did this, it would take me years to recoup the cost even with a 99% savings.

But that is only because I only have a few videos a month. Roughly $1.00 on Elastic Transcoder. If I had thousands or even hundreds of videos this seems like a great and worthwhile project. Especially since this article appears to take a lot of the trial and error and proof of concept out of the mix.

I worked for a large Internet company that had a Netflix like product back in 2007. The transcoders were literally just plugged in underneath people's cubicles. Kept things nice and warm in the winter and I'm sure the costs were pretty low.

jordan314 · on May 2, 2018

Cool but only for up to 8 minute videos. Unless you found a way to parallelize the lambda tasks.

pfg · on May 2, 2018

I'm far from an FFmpeg expert, but I believe it's possible to segment the input video, transcode the segments one by one, and then concatenate them. Not sure how the segmentation and concatenation steps perform, but if that's fast, this might even improve your overall transcoding speed due to the parallelization.

armen52 · on May 2, 2018

Media companies are already taking this approach using ffmpeg, AWS Lambda, and AWS Step Functions. I heard from two companies using such approaches at AWS re:Invent in October 2017, so it's definitely possible.

Rolling your own approach like this is certainly more complex to build/maintain than using Elastic Transcoder though.

brian_cloutier · on May 2, 2018

If you know that you'll need more than 8 minutes, why wouldn't you just run ffmpeg on EC2? EC2 is now pay-per-second. I haven't looked at the prices recently, is AWS lambda so much cheaper that it's worth jumping through all these extra hoops?

bigcostooge · on May 2, 2018

You can encode about 1 video per EC2 medium instance without losing >1:1 encoding speed. It’s horrendously expensive.

lamlam · on May 2, 2018

Also not an expert, but since videos are transcoded as key-frames and changes applied to those key frames, I don't think it's as simple as segmenting something like a CSV. Transcoding is probably required just for the segmentation process. Putting it back together might be easier, but the final output file might also be larger because of overhead.

pfg · on May 2, 2018

The segmentation code is keyframe-aware, so it only splits along keyframe edges. In other words: requesting segments of 30 seconds each probably won't get you segments that are exactly 30 seconds long. Still, there could be plenty of other obstacles I'm not aware of.

Vindicis · on May 3, 2018

Neither am I. It's pretty simple to do though, and the performance of the steps that aren't encoding are a lot quicker as it's mainly just copying the encoded files to an intermediate format, and then concatenating those together.

ffmpeg -i "file.mp4" -ss 01:16 -to 02:16 -c:v libx264 -crf 32 "newFile.mp4"

ffmpeg -i "file.mp4" -ss 02:20 -to 02:45 -c:v libx264 -crf 32 "newFile1.mp4"

ffmpeg -i "newFile.mp4" -c copy -bsf:v h264_mp4toannexb -f mpegts temp.ts

ffmpeg -i "newFile1.mp4" -c copy -bsf:v h264_mp4toannexb -f mpegts temp1.ts

timeout /t 5 /nobreak

ffmpeg -i "concat:temp.ts|temp1.ts" -c copy -bsf:a aac_adtstoasc "Finished.mp4"

wmf · on May 2, 2018

http://ex.camera/nsdi17/

https://www.usenix.org/conference/nsdi17/technical-sessions/...

lozenge · on May 3, 2018

Doubling the memory allocation for a lambda task also doubles the CPU allocation. It's possible it could go much faster than this.

lostmsu · on May 2, 2018

Don't you also have to pay for traffic going to/from lambda? In that case raw audio and video would be very expensive!

celerity · on May 2, 2018

Traffic to Lambda from the internet and between Lambda and S3 is free. The only thing you pay for are the transfer costs from S3 (at cents per GB).

archgoon · on May 2, 2018

Assuming resulting audio of 3 minutes, then 1000 uses would result in 9 GB, or about 81 cents. As long as you can get ads for $1 per mille, you should be good. That said, you'd probably need to implement something to prevent abuse (single user bypassing the frontend and just spamming your backend).

Looking forward to the next post in the series.

chatmasta · on May 2, 2018

What if you use an S3 upload hook? Are you charged for bandwidth S3 -> lambda? Also, wouldn't you use the same bandwidth with elastic transcoder anyway?

tyrankh · on May 2, 2018

When I worked at Panasonic we did this exact thing. It's remarkably easy for the cost savings.

moonbread · on May 3, 2018

I used FFmpeg static build to transcode WAV to mp3, but the latest 64-bit build gave me corrupt files, so I had to hunt down an archived version. Works well though!

sp332 · on May 2, 2018

Is it downloading the whole Youtube video just to pull out the audio? Why not just download the audio to begin with?

archgoon · on May 2, 2018

What is the API to download only the audio from a youtube video? How would you do your proposed solution?

sp332 · on May 2, 2018

Let's take a video I recently linked in another HN comment. https://www.youtube.com/watch?v=r_fxB6yrDVo If I run youtube-dl -F https://www.youtube.com/watch?v=r_fxB6yrDVo then I get a bunch of options marked "audio only DASH audio", e.g.

251 webm audio only DASH audio 143k , opus @160k, 78.96MiB

Then if I run youtube-dl -f 251 -g https://www.youtube.com/watch?v=r_fxB6yrDVo then I get this horrible URL: https://r1---sn-hxugvj5nu-cvnl.googlevideo.com/videoplayback...

However if I wget "[horrible_url]" -O audio, it still takes forever to download, so I guess rate-limiting might be the issue. But if download time is the problem, you could have one server that just downloads the data slowly to S3 and then kicks off the lambda job on the completed file.

rhizome · on May 2, 2018

Because it's in the video container.

mmozeiko · on May 2, 2018

Most YT videos use DASH protocol to serve you video and audio separately. That's because you can use same audio stream with different resolution videos. "youtube-dl" script can download just the audio file, without downloading video data.

marta_moreno · on May 3, 2018

Yes well, now you just have to pay the fee for licensing the codecs FFmpeg gives you ;). What was it? One million dollars for MP4? Good luck with that.