The rxv64 Operating System: MIT's xv6, in Rust, for SMP x86_64 machines

anta40 · on Sept 9, 2023

One thing I really like about rxv64 is pretty easy to be built on MacOS, without having to built a custom GCC, like the original xv6.

Of course, there are other various "Rust xv6" forks and unfortunately they won't compile with recent Rust...

kelas · on Sept 9, 2023

> easy on macOS

afaik macOS currently targets two architectures. just how smooth is the apple silicon ride, i wonder?

inferiorhuman · on Sept 9, 2023

My experience has been that in general LLVM + Rust do a very good job of making cross compilation painless. There is unpleasantness when cross compiling from an Apple host, but that's almost entirely limited to actually compiling rustc and some of the host tools. In terms of smoothing those edges over I think most of the patches I've submitted will land in 1.73 or 1.74.

With that in mind if rxv64 is easy enough to build on an Intel Mac, then it should do fine on an ARM Mac as well.

Obviously if you're talking about porting rxv64 to an ARM host, that's a whole different ball of wax.

anta40 · on Sept 9, 2023

No idea how to do it on ARM MacOS, I'm still on Intel. Here's how to built it (had some discussions with the author on Twitter):

https://gist.github.com/anta40/4625d6a2752551aada9c91a72e787...

Done. Will try on M2 Mac probably in the next 2 weeks

fulafel · on Sept 9, 2023

It's targeting just amd64, so depends on how smooth you consider emulating it on ARM to be. (eg https://github.com/dancrossnyc/rxv64/blob/main/kernel/src/sw... )

fulafel · on Sept 9, 2023

How is the code size vs the C version?

kelas · on Sept 10, 2023

as of yet, it doesn’t shine quite as bright as current C/rv64 original.

but as they say, a long road begins with a single step.

basementcat · on Sept 9, 2023

Is there a reason why mkfs is still in C?

rollcat · on Sept 9, 2023

I would imagine "nobody got around to rewriting it yet" is the most likely reason? mkfs is a fairly simple procedure on the surface: you take a couple basic parameters, and bang out some bits onto a block device. The on-disk data structures likely come from a header file/crate shared with the kernel.

The actual dragons live in the FS driver.

matt3210 · on Sept 9, 2023

C is a better language for this kind of stuff.

yjftsjthsd-h · on Sept 9, 2023

Surely the same exact argument is equally true for mkfs and everything else in the repo? Or is there something special about it?

worik · on Sept 9, 2023

afr0ck · on Sept 9, 2023

"better" is subjective.

kelas · on Sept 9, 2023

[flagged]

cross · on Sept 9, 2023

xv6 was originally written for 32-bit x86; the RISC-V port is a relatively recent development. See e.g. https://github.com/mit-pdos/xv6-public for some of the earlier history.

rxv64 was written for a specific purpose: we had to ramp up professional engineers on both 64-bit x86_64 and kernel development in Rust; we were pointing them to the MIT materials, which at the time still focused on x86, but they were getting tripped up 32-bit-isms and the original PC peripherals (e.g., accessing the IDE disk via programmed IO). Interestingly, the non sequitur about C++ aside, porting to Rust exposed several bugs or omissions in the C original; fixes were contributed back to MIT and applied (and survived into the RISC-V port).

Oh, by the way, the use of the term "SMP" predates Intel's usage by decades.

kelas · on Sept 9, 2023

(all points taken, all valid in their own right)

yes, riscv arch is a relatively new thing, which now quietly powers every modern nvidia gpu, led to a final demise of MIPS, and for whatever reason made MIT PDOS abandon x86 as their OS teaching platform back in 2018 (ask them why).

but perhaps i didn’t stress my central point enough, which is “textbook”. Xv6 used to have a make target which produced 99 pages of pdf straight out of C code. i don’t think the latest riscv release (rev3) still has it, probably because it is no longer deemed necessary - the code now documents itself entirely, and the tree is down to just kernel and user(land), both implemented in consistent and uniform style.

rxv6, at least its userland, still seems to be written in C, which (correct me if i’m wrong) must be creating a lot of pressure on the rust kernel along the lines of ‘unsafe’ and ‘extern “C”’.

i only hope the said group of pro engineers who needed to be ramped up on all of that at the same time plus the essentials of how an OS even works got ramped up alright.

again, no offense. and not to start a holiwar “rust will never replace C”. why not, maybe it will, where appropriate. which is why the notion of C++ is a sequitur all the way.

cross · on Sept 9, 2023

> rxv6, at least its userland, still seems to be written in C, which (correct me if i’m wrong) must be creating a lot of pressure on the rust kernel along the lines of ‘unsafe’ and ‘extern “C”’.

Yes. I didn't feel the need to rewrite most of that. The C library is written in Rust, but as a demonstration, most of the userspace programs are C to show how one can invoke OS services. Are there unmangled `extern "C"` interfaces and some unsafe code? Yes.

Userspace interacts with the kernel via a well-defined interface: the kernel provides system calls, and userspace programs invoke those to request services from the kernel. The kernel doesn't particularly care what language userspace programs are written in; they could be C, Rust, C++, FORTRAN, etc. If they are able to make system calls using the kernel-defined interface, they should work (barring programmer error). Part of the reason rxv64 leaves userspace code in C is to demonstrate this.

The rxv64 kernel, however, is written in almost entirely in Rust, with some assembly required.

> i only hope the said group of pro engineers who needed to be ramped up on all of that at the same time plus the essentials of how an OS even works got ramped up alright.

They did just fine.

kelas · on Sept 9, 2023

okay, fair. i only got misled by the title of the post, which claims all-rust xv6 port.

now that we cleared the userland part, here’s what I’m contemplating on the kernel side. i can’t think of anything simpler and more staple than this, so:

https://github.com/dancrossnyc/rxv64/blob/main/kernel/src/ua...

https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/uart...

honestly - i don’t feel at ease to tell which driver code is more instructional, which is easier to read, which is better documented, which is better covered with tests, which has more unsafety built into it (explicit or otherwise), what size are the object files, and what is easier to cross-compile and run on the designated target from, say, one of now-ubiquitous apple silicon devices.

lest we forget that the whole point of it is “pedagogical”, i.e. to learn something about how a modern OS can be organized, and how computer generally works.

and i’m just not sure.

cross · on Sept 9, 2023

Well, you're free to study both in detail and draw your own conclusions. But the UART driver in both is pretty uninteresting, and I suspect whatever conclusions one may draw from comparing the two will be generally specious.

Perhaps compare the process code, instead, and look at how the use of RAII around locks compares to explicit lock/unlock pairs in C, or compare how system calls are implemented: in rxv64, most syscalls are actually methods on the Proc type; by taking a reference to a proc, we know, statically, that the system call is always operating on the correct process, versus in C, where the "current" process is taken from the environment via per-CPU storage. Similarly with some of the code in the filesystem and block storage layer, where operations on a block are done by passing a thunk to `with_block`, which wraps a block in a `read`/`relse` pair.

Of course I'm biased here, but one of the nice things about Rust IMO is that it makes entire classes of problems unrepresentable. E.g., forgetting to release a lock in an error path, since the lock guard frees the lock automatically when it goes out of scope, or forgetting to `brelse` a block when you're done with it if the block is manipulated inside of `bio::with_block`. Indeed, the ease of error handling let me make some semantic changes where some things that caused the kernel to `panic` in response to a system call in xv6 are bubbled up back up to userspace as errors in rxv64. (Generally speaking, a user program should not be able to make the kernel panic.)

kelas · on Sept 9, 2023

thanks, this is useful, let me ponder. obviously i am also a subject of severe cognitive bias, granted.

but ok, if not uart driver - what other direct comparison in r/xv6 kernel spaces you would use to show where rust shows a real hard edge over C?

not a loaded question, I’m seriously asking for a valid pointer (no pun intended)

cross · on Sept 10, 2023

Sorry, I thought my last message did give suggestions for things to compare?

phendrenad2 · on Sept 9, 2023

Okay let's take your argument to the logical conclusion. If elegance is all that matters, let's write a kernel in Standard ML and run it on a virtual Turing machine. But even the v6 learning Unix targets 32-bit x86, which is NOT elegant in any way (love that bootstrap assembly that switches from 16-bit 8086 mode to 32-bit 386 mode, but it's not elegant).

What's missing from your theory is the ability to tinker. Nobody is going to try to improve upon an esoteric kernel that sacrifices all usability for theoretical purity. But a Rust kernel for x86-64 (which is not dying, BTW) is something students can run on real hardware, and that's cool and engaging in a way that is hard to put a value on.

kelas · on Sept 9, 2023

> elegance matters

absolutely, but no need for strong violence. ANSI C and RISC-V are doing fine.

> x86-64 which is not dying

true. technically speaking, it is long gone - any Intel CPU since about Pentium Pro is actually a well-hidden RISC, because it is more efficient. Sure that requires a pretty beefy CISC decoder in front, but the sheer power of having a virtual 8086 mode next to amd64 must totally justify all that extra TDP.

> [x86 is] something students can run on real hardware, and that's cool and engaging

also true, but nothing compares to booting the real xv6 on a real RISC-V system for the first time, which is easy and fun (and cheap, if we’re talking value). i recommend.

nine_k · on Sept 9, 2023

I'd say that C is not elegant as a language at all. It's a good (as in "working") solution of many problems that other system languages had in late 1960s, and I'd hazard to say that the highly economical ways that were used to solve them are elegant. E.g. using #include is terrible from many angles, but is elegant as a highly minimalist way to solve the modularity problem (reduce the problem to one already solved).

kelas · on Sept 9, 2023

> C is not elegant

okay, tastes differ, but it remains what it was in one of its original definitions: a sufficiently portable assembly.

> #include hell / modularity

okay, so rust has cargo, so we’re all better now, right? but what if someone will dare to argue that C has a whole bunch of cargos, some less terrible than others, if we reframe the problem like so:

$ [rpm|dpkg|dnf|yum|apt|brew|pkg-add] install libcurl libcurl-dev(el)

(this makes me wonder how any software even works anymore in absence of universal cargo cult, but hey)

lproven · on Sept 9, 2023

> a sufficiently portable assembly.

No, it isn't, and it hasn't been for about 40 years.

« C Is Not a Low-level Language Your computer is not a fast PDP-11. »

https://queue.acm.org/detail.cfm?id=3212479

David Chisnall, ACM Queue, April 30, 2018

kelas · on Sept 9, 2023

Sounds expensive, thanks. So:

Preface to the Digital Edition of Kernighan and Ritchie's The C Programming Language Oct 31, 2012

“… Remarkably, in spite of all of this change, C retains a central position. It is still the core language for operating system implementation and tool building. It remains unequaled for portability, efficiency, and ability to get close to the hardware when necessary. C has sometimes been called a high-level assembler, and this is not a bad characterization of how well it spans the range from intricate data structure and control flow to the lowest level of external devices.”

(sorry, I messed up the lyrics a bit, but also I didn’t, really)

Also, my computer is a fast Turing machine. And so is yours.

lproven · on Sept 9, 2023

I can't make any sense of a single sentence of that.

> Sounds expensive, thanks

Huh? What does? The paper is entirely free. That's why I posted a link. Go read it. It's not very long, it's interesting, it's accessible, it's provocative and insightful and profound.

> Kernighan and Ritchie

Creator of C is biased about the language he created. Shock. Pictures at 11.

K&R was published in 1988. Then, your PC was to some degree a fast PDP-11. That was over 1/3 of a century ago and it is no longer even remotely true.

> Also, my computer is a fast Turing machine. And so is yours.

Reductio ad absurdam.

Here's a talk from the inventor of the Arm architecture. It's pretty good.

https://www.youtube.com/watch?v=6lOnpQgn-9s

She talks about her more recent baby, Firepath.

It can, for instance, in a single opcode and a single CPU cycle load 4 separate different 32 bit integers, multiply them by 4 other 32-bit ints and store the result.

(Around 25 min in but it's very worth watching the whole talk.)

Her point is that C can't express stuff like this well at all.

A modern CPU has dozens of execution units doing stuff in parallel, applying matrix operations and transformations to multiple different streams of data simultaneously: SIMD and MIMD combined. C has no way of even beginning to express any of this.

C is no more a high-level assembly language for any 21st century CPU than a Falcon-9 reusable orbital launch vehicle is a space-going version of the Wright Flyer.

kelas · on Sept 9, 2023

> The paper is entirely free. That's why I posted a link. Go read it. It's not very long, it's interesting, it's accessible, it's provocative and insightful and profound…

…and I already read it and you already have my reflections on this piece of work :)

> …SIMD and MIMD combined. C has no way of even beginning to express any of this.

Man, no hard feelings, but I suggest you start with K&R refresher and then work your way up to modern C. It is remarkably easy to learn, you’ll come around. What you deem impossible is very possible. Once you’re comfortable with basic C, I recommend cloning the repo of Lemire’s blog.

Just to see what’s available.

kelas · on Sept 9, 2023

> No, [C is not a high-level assembler], and it hasn't been for about 40 years. «C Is Not a Low-level Language Your computer is not a fast PDP-11.» some dude in ACM Queue

Please correct me if I’m wrong, but this ACM rant seems to amount to blaming C language for the most disastrous embarrassment in the history of Intel Corp, i.e. spectre/meltdown. Well, thank you very much, but that’s just deranged thinking, or someone was on INTL payroll in 2018.

but wow, thank you, what a gem. he then proceeds to elucidate how C is broken and again not low-level enough, because his 1337 skills require knowing the cache line size. well, this is absolutely true, captain Chisnall. but what is complete and utter nonsense is the conclusion that C is retarded by not letting the programmer manipulate the cache directly:

“The cache is, as its name implies, hidden from the programmer and so is not visible to C. Efficient use of the cache is one of the most important ways of making code run quickly on a modern processor, yet this is completely hidden by the abstract machine, and programmers must rely on knowing implementation details of the cache (for example, two values that are 64-byte-aligned may end up in the same cache line) to write efficient code.“

This deserves a WAT.

lproven · on Sept 11, 2023

It seems to me that you take a fundamentalist view on this: anything which attempts to point out the problems, inadequacies, or failings of C and possibly of xNix in general is heresy, to be mocked rather than addressed.

I have often encountered this attitude in real life from fundie/evangelical christians, but in the tech world, it's also a thing among a certain hardcore xNix enthusiast.

I feel that I have encountered one here, and you are not willing or not able to engage in discussion. You merely want to proselytise at me.

I am not interested in that, so I won't comment further.

For me, as an xNix user for some 35Y and a full-time writer about xNix for a decade or more, it's just one of many interesting, significant or valuable OS ideas, and C is a rather poor and deeply flawed language which is a vastly expensive handicap on the computer industry.

viraptor · on Sept 9, 2023

Different things entirely. System-specific packages have no concept of C building or linking. Things work because of some agreement about paths + pkgconfig + others. This is very far from the idea of a dependency manager like cargo which is very much aware of project-specific and language features.

kelas · on Sept 9, 2023

> Different things entirely

yes, they are.

> System-specific packages have no concept of C building or linking.

every day i learn something new, but this time around I’m gonna go ahead a tell you that you are very mistaken.

> Things work because of some agreement about paths + pkgconfig + others.

True. By convention, we call it “convention”, or “standard” for short, e.g. POSIX or ISO C. Call us back once you have something like that in rustland.

> This is very far from the idea of a dependency manager like cargo which is very much aware of project-specific and language features.

All good and sound, and makes total sense for userland software. But what is being discussed here is an operating system and core userland. If you can imagine that to be your “project”, such as linux or OpenBSD, maybe my tongue-in-cheek comparison of a system-wide dependency manager suddenly looks less dissimilar to cargo.

needless to say, things like Linux and OpenBSD will stay written in C for a little while longer. In case of BSD specifically, you can go as crazy as to bump and rebuild your entire system from source in place, kernel and userland. BSD is one pretty old cargo cult in that sense, and pretty cool, i might add.

> language features

right… rust 2018 edition is not really rust as of 2023, or is it? This where those pesky standards come very handy.

viraptor · on Sept 9, 2023

> tell you that you are very mistaken.

Go on then. Beyond RPMs ability to depend on soname, what else is there on the structure/installation/usage side (not on the building side which can include whatever scripts you want). Where are they now aware of the libraries more than tar.gz is?

kelas · on Sept 9, 2023

I think I see your point. Thing is, a C program can be built and executed without that cargo dependency pyramid sandwiched into a gigantic runtime polyp. The whole operating system is your environment, and yes - if ‘man ldconfig’ is not your friend, then maybe pkg-config is, or, worse to worst, link against project-local structure, where you can indeed build the bejesus out of your imagination and ship it. That’s ok. It’s linux, after all.

That’s what we do for embedded systems, anyway. I haven’t seen much success for rust in this department, although people are clearly trying to make it work. It just keeps coming out fat and ugly. A few more decades of hard push, call me a maybe.

Do you know what riscv version of xv6 rev3 compiles to in terms of object size?

I will not spoil it for you, but it compiles to total shreds.

phendrenad2 · on Sept 9, 2023

> nothing compares to booting the real xv6 on a real RISC-V system for the first time

Why is that?

> any Intel CPU since about Pentium Pro is actually a well-hidden RISC

Well if it's well-hidden, it doesn't matter to the person learning to write an OS. ;)

kelas · on Sept 9, 2023

> Why is that?

you said it best - the “tinkering” part is a small thing that makes all the difference when one is first led up to a computer.

> if it's hidden, it doesn't matter

right… and if you ignore it, maybe it’ll go away. like Intel Management Engine, perhaps. Which OS does it run, if you know? :)

phendrenad2 · on Sept 9, 2023

> the “tinkering” part is a small thing that makes all the difference when one is first led up to a computer

True, but when making an OS, assembly code is an incredibly small part of it. You basically need to run a few instructions to put the processor in the right mode, and you need ONE line of assembly code (often inline C) to allow usermode to call supervisor mode. When choosing an architecture for a learning OS, I think other factors would dominate, like availability of hardware (many people have a PC already, saving them $100-$200 on RISC-V hardware).

> right… and if you ignore it, maybe it’ll go away. like Intel Management Engine, perhaps. Which OS does it run, if you know? :)

I'm not interested in gatekeeping tech to only those who care about personal privacy, so I consider this line of thought completely irrelevant to the discussion. But that's just me...

kelas · on Sept 9, 2023

> True, but when making an OS, assembly code is an incredibly small part of it.

A few comments up this thread i briefly mentioned r/xv6 entry.S, so here’s what i meant:

rv64:

https://github.com/mit-pdos/xv6-riscv/blob/riscv/kernel/entr...

amd64:

https://github.com/dancrossnyc/rxv64/blob/main/kernel/src/en...

> You basically need to run a few instructions to put the processor in the right mode, and you need ONE line of assembly code (often inline C) to allow usermode to call supervisor mode.

Right… Please see above. Also, in riscv land we have machine, supervisor and user modes, with hypervisor mode being in the works.

> When choosing an architecture for a learning OS, I think other factors would dominate, like availability of hardware (many people have a PC already, saving them $100-$200 on RISC-V hardware).

yet, MIT PDOS thinks Fabrice Bellard’s ubiquitous qemu is a better, cheaper and more forgiving starting point, although i personally prefer to spend a few bucks on something as absurdly cheap and cool as kendryte k210/k510 SoC and enjoy the real deal.

> Intel hate keeping tech is irrelevant [I don’t care]

Yes, it’s just you. Also, it runs Minix.

phendrenad2 · on Sept 10, 2023

Cool links and all, but can you express in words how the invalidate my point? And do you actually know how RISC-V machine/supervisor/user/hypervisor modes work? I assure you, it's just a few instructions to switch between them, so my point still stands, unless I'm wrong about that, which I am not. So let's move on.

> qemu

A poor substitute for running on real hardware, from a psychological perspective. Just look at how people nerd out on HN about porting Doom to a smartwatch. People like to run things on hardware: fact. PCs are more ubiquitous than RISC-V boards which you have to order online: fact.

> Intel hate keeping tech is irrelevant [I don’t care]

I didn't get it, can you explain?

> Also, it runs Minix

Everyone knows this already, it makes the front page of HN at least once per year.

astrange · on Sept 9, 2023

> true. technically speaking, it is long gone - any Intel CPU since about Pentium Pro is actually a well-hidden RISC, because it is more efficient. Sure that requires a pretty beefy CISC decoder in front, but the sheer power of having a virtual 8086 mode next to amd64 must totally justify all that extra TDP.

It is not, the µcode on an x86 is still often "CISCy" and rightfully so - a complicated instruction like "lea" is still perfectly suited to hardware.

adgjlsfhk1 · on Sept 9, 2023

this is true, but modern RISC cpus are also pretty CISCy now. The risc-V vector extension has an vfwmacc.vf which is a broadcasted vectorized widening fused multiply accumulate instruction.

wtallis · on Sept 9, 2023

> (well, i guess someone has to)

No, it really was unnecessary for you to do that. Most of your comment is snarky swipes that are not at all substantive. And pretending like this that xv6 didn't also target x86 is dishonest.

djur · on Sept 9, 2023

"Someone has to", "don't kill the messenger", "no offense" are all signs of someone saying something they know is unnecessarily hostile and dismissive.

yjftsjthsd-h · on Sept 9, 2023

Or simply that they expect to be poorly received while still believing to be true.

kelas · on Sept 9, 2023

Thanks, that’s what I expected, but I don’t think I’m poorly received. It came out pretty positive and insightful. Hopefully not just for me.

ylyn · on Sept 9, 2023

As someone who partly works on the Linux kernel for a living I still don't really know what your point is.

nickysielicki · on Sept 9, 2023

xv6 is an oversimplification that falsely convinces professors that they’ve taught something useful and students that they’ve learned something relevant. It’s not “just enough” operating system, it’s “just not enough” operating system.

anta40 · on Sept 9, 2023

Hmm I wonder if each person has a different criteria of "enough".

I see xv6 as playground for students to implement basic concepts like filesystem, scheduler, etc. And it provides userspace so they can easily test their implementation.

Perhaps you suggest to use production-grade kernel like Linux, instead?

kelas · on Sept 9, 2023

+1

only xv6 gives traps, stack, uart, kalloc, preemptive scheduler, rudimentary fs, init(0) and minimal userland out of the box. students are not expected to implement any of that, it is hard - instead, is a book to read, a beautiful one. assignments are built on top of the core public codebase.

> Perhaps you suggest to use production-grade kernel like Linux, instead?

i imagine the expression on the student’s face once they run into their first:

/* not sure what this code does and why it is even here */

if it wasn’t for xv6, the best available choice would probably be OpenBSD codebase (not yet reimplemented in rust), but that’d be one hell of a graduate course.

xv6 is material for undergraduate 6.1810, a gentle introduction to operating systems.

kelas · on Sept 9, 2023

> just not enough an OS

but of course, which is why a typical assignment would be like “and now implement a NIC driver”.

cloning kernel.org is definitely a better way to learn something relevant, depending on one’s definition of “something” and “just enough”.

bitwize · on Sept 9, 2023

Because Rust is what C should've been. It is the correction to the multidecadal error that is C, and basing everything on C, thus inviting the gremlins of UB, UAF, buffer overflows, and data races to tapdance on our critical infrastructure.

SICP is gaining a new audience thanks to a JavaScript reimplementation.

kelas · on Sept 9, 2023

more very valid points.

> should’ve been

if it could’ve, it would’ve, but reality is such that k&r gave us this lingua franca, and not that other one.

the very fabric which holds the entire humpty dumpty together is still mostly written in C, so there is that. writing code in C still requires a tremendous amount of discipline and professional culture, true that. these things notoriously scale very poorly.

As for UB, IDB, BJJ and other scary words, i guess we get to discuss that once we get to read the first ISO standard for rust and its stdlib, which is definitely going to be written in a better, clearer, less Aesop language compared to ISO C, which resorts to CYA strategy on every 10th page, because people who write it aren’t exactly stupid and understand the amount of responsibility they bear.

> sicp is now JS

(what do you want me to say)