Shout-out for "AutoSpotting", which transparently re-launches a regular On-Deman...

londons_explore · on Feb 25, 2020

> runs out of every single instance type you have requested, terminates your Spot instances, and you can't launch any more On-Demand ones.

This is more common than you think.

Internally cloud providers schedule instance types on real hardware, and running out of an instance type likely means they have run out of capacity, and only a tiny amount exists in fragmentation. To access that tiny remainder, they'll terminate spot instances and migrate live users (which they have to do very slowly) to make space for a few more of whichever instance types make most business sense (which varies depending on the mix of real hardware and existing instance types).

It takes someone like AWS a good few weeks, sometimes months, to provision new actual hardware.

It isn't uncommon for big users to be told they'll be given a service credit if they'll move away from a capacity constrained zone.

cm2187 · on Feb 25, 2020

Is there a similar concept to airline upgrading? Better than to deny a paying customer to board the plane. Surely there must be spare capacity, somewhere in the datacentre, with slightly better specs.

londons_explore · on Feb 25, 2020

Yes - they totally do that. If there is only space for a large instance, but you want a small one, they fit your small one in the free capacity, and there is now space for someone else to fit another small one next to it.

For business reasons they might decide not to do that though - your small instance might mean they have to say no to a big allocation later.

Instead they just delay your instance starting and hope other instances moving around opens up a more suitable location for it.

Theres an entire paper on the topic: https://dl.acm.org/doi/10.1145/2797211

alien_ · on Feb 25, 2020

The AutoSpotting author here, always feels great to see my little pet project mentioned by happy users. Thank you for making my day!

To set matters straight, AutoSpotting pre-dates the new AutoScaling mixed instance types functionality by a couple of years and it (intentionally) doesn't make use of it under the hood for reliability reasons related to failover to on-demand. To avoid any race conditions, AutoSpotting currently ignores any groups configured with mixed instances policy.

In the default configuration AutoSpotting implements a lazy/best-effort on-demand->spot replacement logic with built-in failover to on demand and to different spot instance types. To keep costs down, it is only triggered when failing to launch new spot instances (for whatever reason, including insufficient spot capacity).

What we do is iterating in increasing order of the spot price until successfully launching a compatible spot instance (roughly at least as large as the original from CPU/Memory/disk perspective but cheaper per hour). If all compatible spot instances fail to launch, the group keeps running the existing on-demand capacity. We retry this every few minutes until we eventually succeed.

There's currently no failover to multiple on-demand instance types (this is a known limitation), but this could be implemented with reasonable effort.

We're also working in significantly improving the current replacement logic to address a bunch of edge cases with a significant architectural change(making use of instance launch events). I'm very excited about this improvement and looking forward to having this land, hopefully within a few weeks.

At the end of the day, unlike most tools in this space(including AWS offerings) AutoSpotting is an open source project so if anyone is interested in helping out implement any of these improvements(or maybe others), while at the same time getting experience with Go and using the AWS APIs, which are nowadays very valuable skills, you're more than welcome to join the fun.

alien_ · on March 3, 2020

Thanks for the shout-out, really appreciate it.

If you don't mind I'd like to get some feedback/feature ideas from users like you.

Please get in touch with me on https://gitter.im/cristim

ignoramous · on Feb 25, 2020

ASG, per the blog-post you linked to, now supports starting both on-demand and spot instances, so what's the use of AutoSpotting?

alien_ · on Feb 25, 2020

The author of AutoSpotting here, this is often being asked and I'm happy to clarify it.

The mixed capacity ASGs currently run at decreased capacity when failing to launch spot instances. AutoSpotting will automatically failover to on-demand capacity when spot capacity is lost and back to spot once it can launch it again.

Another useful feature is that it most often requires no configuration of older on-demand ASGs, because it can just take them over and replace their nodes with compatible spot instances.

This makes it very popular for people who run legacy infrastructure that can't be tampered with for whatever reasons, as well as for large-scale rollouts on hundreds of accounts. Someone recently deployed it on infrastructure still running on EC2 Classic started in 2008 or so that wasn't touched for years.

Another large company deployed it with the default opt-in configuration against hundreds of AWS accounts owned by as many teams, many with legacy instances running for years. It would normally take them years to coordinate as a mass migration but it just took them a couple of months to migrate to spot. The teams could opt-in and try it out on their application or opt-out known sensitive workloads. A few weeks later then they centrally switched the configuration to opt-out mode, converting most of their infrastructure to spot literally overnight and saving lots of money with very little configuration effort and very few disruption to the teams.

If you want to learn more about it have a look at our FAQ at https://autospotting.org/faq/index.html

It's also the most prominent open source tool in this space. Most competition consists of closed-source, commercial (and often quite expensive) tools so if you're currently having any issues or missing functionality, anyone skilled enough can submit a fix or improvement pull request.

616c · on Feb 26, 2020

Where can I read about some of these more impressive use cases you describe?

alien_ · on Feb 26, 2020

Have a look at https://github.com/AutoSpotting/AutoSpotting or the FAQ section on https://autospotting.org

If those don't answer your questions feel free to reach out to me and I'll do my best to explain further.

kondro · on Feb 25, 2020

It replaces on demand instances in-place. If there’s no spot instances, it will leave them running. If the spot instance gets killed, it will start again as on demand.

It sounds a bit hinky, but it tends to leave you with the number of instances you want running without having to determine what percentage of the ASG should be on demand or spot — especially with the possibility of not being able to start new spot instances if they’ve been terminated.