Naming it Cause and Effect is misleading; it makes it seem like bayes rule only applies when there is a causal relation somewhere. It also make the symmetry of the first line seem off. I would write the first line as
P(A and B are both true)
= P (A is true) x P (B is also true, given that we know A is already true)
= P (B is true) x P (A is also true, given that we know B is already true)
I was actually going to write A and B at first, but then realized that wouldn't show the relationship of the formula to the disease example very obviously.
I thought it'd be better to make it easier to understand (and more practical) by showing how it helps you infer causes from effects.
Unfortunately, "Cause" and "Effect" are actively misleading choices of names here. ("Evidence" and "Hypothesis" on the other hand are frequently used.)
ADDED. Explanation: hearing squeaking noises in the night is evidence for the hypothesis that the cheese will have bite marks on it when we look in the morning (in the sense that the squeaking noises increase the probability of the bite-mark hypothesis) even though the squeaking noises do not cause the bite marks nor do the bite marks cause the squeaking noises. You and I know that the squeaking noises and the bite marks have the same underlying cause, but Bayes's rule is useful in situations where cause-and-effect remain unknown. E.g., it can be used by a space alien without knowledge of mice who cannot afford to ponder on the possible causes of the bite marks and the squeaking noises.
posterior now includes random draws from the posterior p(unknowns|observed).
This is my favorite explanation of Bayes statistics since it implements Bayes Theorem without math. It also is the underlying intuition behind the probabilistic programming approach to Bayesian statistics. Rubin (1984) has a great explanation: https://twitter.com/tristanzajonc/status/325120025428119552
As someone who have tried on occasion to get used to Haskell (and failed - Haskell makes me want to claw my eyes out), and who have minimal maths background, I found reading Bayes original paper substantially easier than figuring out the Haskell in that post.
Instead, I saw Bayes' rule in Haskell, a form that's more idiosyncratic than probability expressions.
Simple is in the eye of the beholder. Perhaps making monads isn't actually making Bayes' rule simpler.