You can get it running with one Python script on Modal.com :) https://github.com...

quickthrower2 · on May 3, 2023

Ok you lot! Will try out modal.

quickthrower2 · on May 4, 2023

Yeah it is pretty nice. Not sure how long it took, but less that the time to make a sandwich (2 minutes). It cost 2-3c a pop so sadly more expensive than GPT3.5. However maybe it can be optimised. Or maybe there is some init cost that could be store in state.

    (modal) fme:/mnt/c/temp/modal$ modal run openllama.py
    ? Initialized. View app at https://modal.com/apps/ap-9...
    ? Created objects.
    +-- ?? Created download_models.
    +-- ?? Created mount /mnt/c/temp/modal/openllama.py
    +-- ?? Created OpenLlamaModel.generate.
    +-- ?? Created mount /mnt/c/temp/modal/openllama.py
    Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]Downloading shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:00<00:00, 1733.54it/s]
    Loading checkpoint shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:12<00:00,  5.70s/it]Loading checkpoint shards: 100%|¦¦¦¦¦¦¦¦¦¦| 2/2 [00:12<00:00,  6.23s/it]
    Building a website can be done in 10 simple steps:
    1. Choose a domain name. 2. Choose a web hosting service. 3. Choose a web hosting package. 4. Choose a web hosting plan. 5. Choose a web hosting package. 6. Choose a web hosting plan. 7. Choose a web hosting package. 8. Choose a web hosting plan. 9. Choose a web hosting package. 10. Choose a web hosting plan. 11. Choose a web hosting package. 12. Choose a web hosting package. 13. Choose a web hosting package. 14. Choose a web hosting
    ? App completed.

thundergolfer · on May 4, 2023

Thanks for trying it out!

2-3c per run seems very high. That's probably just the cost if you have to spin up a new container. You can shorten the idle timeout on a container if its going to just serve one request typically. If it's going to serve more requests, then the startup and idle shutdown cost is amortized over more requests :)

quickthrower2 · on May 4, 2023

I found this was the cost per call to a web function. I used deploy to deploy it. The function just does what the main did in the example repo (earlier in this theead)