I interviewed with a company that was like this. Their Engineering Manager took me through a systems design quiz. The problem was simple:
* 100k users updating their gas / electricity meter readings every second
My first question was what unit was the meter reading in, he said it was the current total (called a totalizer in the biz). I'd worked for a Flow Metering company before. So I said:
"There's no need to upload the total every second reduce the frequency and the problem becomes easier"
Didn't accept that and I said ok a few optimized servers behind a HAProxy load balancer would the trick as the processing is very simple here. The database is the harder part as you can't have a simple approach here is 100k requests / second is going to cause contention for most DB's. You'd need to partition the database in a clever way to reduce locking / contention.
This answer was unacceptable he ignored it and asked me to start building some insane processing system where you use Azure Functions behind an api gateway to process every request into a queue. Then you have a scheduled job in Azure of which the name has changed by now. This job looks at the queue in batches of say up to 1000 records at a time and writes these in bulk into the DB Once it's finished it then immediately reinvokes itself to create a "job loop".
You would create multiple of the "job loops" that run in parallel. Then you would need to scale up these jobs to drain the queue at a rate that meant we could process 100k requests per second.
You would also need something to handle errors, for example requests that are broken somehow. These would go into a broken request queue and could be retried by the system automatically a certain number of times if they went over the number of retries then they would go into a dead letter queue to be looked at manually.
I'm pretty sure this was the actual process he would take in designing this type of system. Use every option the cloud gives him to make some overly complex thing which is probably quite unreliable. I also suspect that's the reason I wasn't hired as I was just looking at the thing sensibly rather than in the "cloud native" way.
He doesn't work for that company anymore and is now back to a normal (non-senior) software developer.
He was very upset when I questioned the requirement.
To give him the benefit of the doubt maybe it was an artificial question and the real question is how do you make a large scale "cloud native" system say something like Netflix / Facebook.
Of course the problem with those systems is that it's very difficult to explain it in detail of every part in an interview so his example was just a small part of that system. Also most companies do not require that level of scalability but I concede they are fun to build just not very fun to maintain/operate.
I have no doubt that the architecture I described was something he had personally built as part of what he was doing at the company. Me questioning that was probably taken as an attack.
I never got the job or even a single word of feedback.
In his hypothetical system with queues and tasks, I wonder how much resources are being wasted on the management overhead (receiving tasks from the network, scheduling functions, the cold start of the Azure function, etc) compared to a boring solution in a cron on a high-performance bare-metal machine costing a fraction of his solution?
* 100k users updating their gas / electricity meter readings every second
My first question was what unit was the meter reading in, he said it was the current total (called a totalizer in the biz). I'd worked for a Flow Metering company before. So I said:
"There's no need to upload the total every second reduce the frequency and the problem becomes easier"
Didn't accept that and I said ok a few optimized servers behind a HAProxy load balancer would the trick as the processing is very simple here. The database is the harder part as you can't have a simple approach here is 100k requests / second is going to cause contention for most DB's. You'd need to partition the database in a clever way to reduce locking / contention.
This answer was unacceptable he ignored it and asked me to start building some insane processing system where you use Azure Functions behind an api gateway to process every request into a queue. Then you have a scheduled job in Azure of which the name has changed by now. This job looks at the queue in batches of say up to 1000 records at a time and writes these in bulk into the DB Once it's finished it then immediately reinvokes itself to create a "job loop".
You would create multiple of the "job loops" that run in parallel. Then you would need to scale up these jobs to drain the queue at a rate that meant we could process 100k requests per second.
You would also need something to handle errors, for example requests that are broken somehow. These would go into a broken request queue and could be retried by the system automatically a certain number of times if they went over the number of retries then they would go into a dead letter queue to be looked at manually.
I'm pretty sure this was the actual process he would take in designing this type of system. Use every option the cloud gives him to make some overly complex thing which is probably quite unreliable. I also suspect that's the reason I wasn't hired as I was just looking at the thing sensibly rather than in the "cloud native" way.
He doesn't work for that company anymore and is now back to a normal (non-senior) software developer.