Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Chromiumoxid – An Async Headless Chrome API in Rust (github.com/mattsse)
127 points by matsche on Dec 14, 2020 | hide | past | favorite | 19 comments


What are pdl files? Aren't protocol definition files in JSON?

(shameless plug) I guess I'll have to update my cdp java library. https://github.com/kklisura/chrome-devtools-java-client


> The pdl files are the canonical protocol definitions and are maintained manually by the devtools team [0].

The JSON protocol definition files are then in fact being generated from those pdl files. You find them mirrored in the devtools-protocol-repo [1]

[0] https://chromedevtools.github.io/devtools-protocol/ [1] https://github.com/ChromeDevTools/devtools-protocol/


This project looks great! In a similar vein, I've been using Fantoccini[0] for web browser automation task to great success! It uses the webdriver protocol under the hoods, which means it works with Chrome, Firefox, Edge (both the old and new one) and Safari.

Chromiumoxide looks like a great alternative when there is a need for deep introspection. Its API appears to be extremely complete, on the protocol level you can get access to things like the state of the webaudio engine or setup DOM breakpoints, things that are not possible with webdriver!

EDIT: forgot the link ^^'. Thx izietto.

[0] https://github.com/jonhoo/fantoccini


You put a [0] but you forgot the link: https://github.com/jonhoo/fantoccini


All of the examples await each and everything. Where does being async become most useful? I can’t think of anything I might want to do really that won’t have me wanting to wait for the previous thing I was saying to Chrome to first complete.


> I can’t think of anything I might want to do really that won’t have me wanting to wait for the previous thing I was saying to Chrome to first complete.

That's precisely the point. By using async/await you can describe each of the things you want to do as a sequence of steps depending/waiting on previous results, but at the places you'd have to pause anyway (waiting on chromium, a database, etc) you can clearly signal that the machine is free to do other things till it has your results. If you have other tasks you'd _also_ like to be doing then those can be similarly coded with async/await, and they can run during times when the first task would otherwise have been idle (or however they're being scheduled).

As an aside, async/await is just a tool to try to make your life easier. It's moderately easy to reason about (each task is sequential after all, and shared structures can only be modified at explicit pause points), the syntax is only mildly more cumbersome than writing the sequential task you were going to write anyway, and it's remarkably easy to write a scheduler to execute an async program with some degree of efficiency. You can absolutely use any other parallel execution model you'd like or none at all as the situation demands.


You can use this for integration testing. If you have enough memory, you can then run several tests in parallel.

Or, of course, if you need to do something with headless chrome while serving a request.


Lol someone is still having an async syntax debate? How passé.

Essentially programmers want the ability to write concurrent code, without specifying the actual constructs (like threading), and the ouija board of programming community have spoken that explicit syntax is ideal. Promises were cool for a hot second, but then everyone coalesced on async/await. I thought other approaches like Python's gevent were really clever way of handling semantics, but I guess the majority didn't agree, people like explicit. But I will say, this syntax in this Rust library does look odd to me, but I agree 100% that the library made the right choice making it explicitly async. I really wish Selenium in Python used asyncio.

You're literally bike shedding on someone's amazing effort.


What is the difference between this and the headless_chrome [0] crate?

[0]: https://github.com/atroche/rust-headless-chrome


Unfortunately this crate has been abandoned by its author for quite some time now. The main differences are that are that chromiumoxide uses async rust by design and that all the devtools protocol types are generated.

rust-headless-chrome was very helpful for some parts and bootstrapped some parts of chromiumoxide, such as the key and mouse clicking. However I looked at chromedp and puppeteer itself for the most part.


Looks like that isn't async.


So can this API handle js injection, http requests and behave like a web driver (selenium) ?


It only supports the chrome devtools protocol[0]. chromiumoxide is therefor more like a (limited) puppeteer rust port. Support for javascript page injection is currently being worked on

[0] https://chromedevtools.github.io/devtools-protocol/


Out of curiosity how much interest is there in a JS injection utility (written in JS) that walks the DOM, executes events, reports pass/fail/error, has no dependencies, and does nothing else?


Thankyou. Amazing work. I was writing CDP requests by hand the other day.


I started that way, but that was more work than writing a generator. Luckily I found https://github.com/chromedp/pdlgen


Can I run it with Tokio?


It uses async-tungstenite [0] for the websocket part. At the moment configured only with async-std-runtime but since async-tungstenite also supports tokio, the option to use tokio instead of async-std will be added as well.

[0] https://github.com/sdroege/async-tungstenite





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: