OpenMP was easy to use in Visual Studio, to parallelize obvious data-parallel sections of the code without changing the code too much, and without spending lots of time learning OpenMP itself. For things like texture encoding, filtering, processing it was the right fit and much better than rolling yet another worker version. Also DEBUG versions had it disabled. Another plus was easier control from the environment.
This is a great step forward! This will make a lot of scientific codes easier to use with LLVM. Once there is first class Fortran support maybe I can convince some crufty scientists to switch (never gonna happen, likely, oh well).
The auto-vectorization that I know of is the ability of a compiler to batch several scalar operations of a sequential loop into one single vector operation (SSE). There is still only one processor/core at work, but it processes several items at once.
OpenMP is a standard to ease multithreading, to use several parallel threads (cores) without the trouble of creating threads by hand.
Edit: SIMD auto-vectorization directives are part of OpenMP 4, released 2 years ago.
So, auto-vectorization and auto-parallelization are closely related problems in compilers. The legality necessary to do either is mostly the same (the profitability is very different)
ICC used to do both. Now they mainly focus on vectorization.
The main reason that auto-vectorization is much more "advanced" is many-fold:
1. Most cores have more resources for auto-vectorization than parallelization.
2. Auto-parallelization has more communication overhead, and as you scale, communication overhead dominates, so doing both at once doesn't help as much as you'd think.
Most compilers, including clang, will try to vectorize loops. The OpenMP pragma is a step beyond and is telling the compiler, "hey, you really, really should try vectorizing this."
They haven't kept up with the times, however. MSVC is stuck at OpenMP 2.0, even with MSVC 13. Further, MS has stated they have no plans to change that.
For what? Most of openmp is useful because you can slap a pre-processor directive on a for loop and make it parallel. Rust has the ability to provide ergonomic fork-join parallelism at the library level. Right now collections that can iterate provide a .iter() method that can provide an iterator. I see no reason that equivalent functionality couldn't be provided by something like a .par_iter() or similar.
>Rust has the ability to provide ergonomic fork-join parallelism at the library level
I'm just guessing because I have only vague familiarity with this topic - but implementing that and the rest of OpenMP constructs efficiently does not sound trivial.
pragmas are just a way to consume OpenMP in C++, that part could be done trough library - but leveraging OpenMP implementation, stuff like schedulers, etc. could be useful.