As a cpu buyer, I'd like amd and intel to get stuck at 50-50 market share and fight tooth and nail by introducing cheaper and faster models every year. And maybe discover low power computing, although that's wishful thinking.
The upcoming Phoenix APUs from AMD are a game-changer for portable and handheld devices. Between 15 and 45W TDP, Zen4/RDNA3... They're slowly trickling out to thin and light laptops, but I can't wait to get them in the next Steam Deck killer.
Not as nice as my old Atom router because of the active cooling on the CPU and the ATX power connector. The latter I can fix with a PicoPSU but the former...
I'd rather have a passive radiator on the CPU and add some airflow via a huge case fan - which is what i'm doing now for my router box. That way it's a lot more silent.
I totally could believe that in another year or 2 there will be sufficient performance uplift. steam Deck with 50% more performance at the same battery life would be great.
I hope a dynamic 120Hz screen (like in the iPhone but obviously not as screamingly expensive) is also in the pipeline.
With a 120Hz screen, you can run the UI and overlays at 120FPS, run cinematics at 24FPS or 30FPS and the game at 40FPS or 60FPS. All fits neatly into 120Hz.
They are going to need more performance by then too though. They are tied to the PC gaming universe and the games people want to play will keep getting more demanding.
They have it a lot harder than something like Switch.
The basic system requirements for most PC games has barely budged in years mostly due to high GPU costs and lack of killer features that would require higher requirements. Neither factor seems to be immediately changing.
We already had a game changer, Apple Silicon. AMD chips are a bit better than Intel's on mobile but not by much. Mobile Zen4 and RDNA3 SoC would not be a game changer. We already know how they perform on desktop and how much power they use per performance.
Let me know when Apple decides to sell their chips to 3rd party manufacturers, when they end up in a handheld gaming device, and when most games run natively on ARM with competitive performance.
Apple silicon was only a game changer for productivity tasks, and only for people willing to jump into the Apple ecosystem. In all other cases, especially for gaming, an APU with the performance of Zen4/RDNA3 at the announced TDP doesn't exist yet. So, yes, it's a game changer.
> when most games run natively on ARM with competitive performance.
They just need to run natively on ARM to get performance that's more than competitive on the CPU side. The GPU is no slouch but is not top of the line (yet). Seems trivial to pair the ARM CPU with an external GPU.
So why would your phrase your argument as "it's not a game changer because it boosts if there's available power/cooling", rather than "it's not a game changer because Apple's CPUs are better"?
Other benchmarks like C-ray show ZERO slowdown going from 230 -> 125 and even 105 watts. Intel drops by 21% for 125 watts and 30% for 105 watts.
Personally I'd rather get the 7900X (or non-x) and you'd keep an even larger fraction of the performance within a given power limit. It's pretty clear who has the worse core and has to push harder into high clock/terrible power usage for not much gain curve.
I'm a little dubious of benchmarks that compare only one pair of chips when it comes to details like power efficiency and tweaking. The silicon lottery can make a big difference.
Sure. Various forums corroborate the reducing the Ryzen 7000 series at 65 TDP is often a less than 10% decrease in performance.
Also keep in mind that "performance mode" on Intel can significantly increase the default 253 watt TDP. Anandtech reviews of the i9-13900k hit 380 watts I believe.
So sure there's chip to chip variation, but generally the AMD chips are more power efficient and have much lower penalties for reducing the TDP.
I'm not sure whether I've won the lottery or Intel's software is unreliable.
I've been playing around with the 13900K's settings and so far I've managed to get it down to a stable 210W TDP(according to XTU under stress testing. That's with a 0.110V undervolt(likely could do more) and the P-core turbo multipliers tapering down to 50 with all cores active from 57 with 1 core active. I've yet to mess with the E-cores though; could probably shave off some more perf/W still.
I think managing a 100mV undervolt is pretty common for the 13900K. That's what I have as well, with a power limit of 200W. The penalty in cinebench is only ~5% despite the 20% reduction in power.
The easiest way is to go to the bios or cpu settings and change the precision optimizer so that you have a negative pbo setting. That'll reduce the stock voltage at a given frequency. It'll also potentially speed your chip up since this lets the chip boost to higher frequencies at a given power level. I think the lowest you can go is -30, but try -20 or so and see if your system is stable before trying lower values.
I just tested my 7900x yesterday and it gets about 95% of the max performance at 105W TDP.
The system idles at half the wattage, too—so it's saving a ton of energy not running at the stock 170W TDP. Not sure why AMD didn't go for efficiency crown, especially for non-halo SKUs like the 7950.
Have you tested draw from the outlet? I assumed that modern chips could scale down their power usage significantly when idle. Which is to say I thought consumers would rarely ever hit maximum power draw, so I would expect really modest real-world savings.
Happy to be proven wrong, because it is a brilliant idea. My machine is wildly oversized for my typical usage, and I could easily take a 10%+ performance haircut and likely would not even notice.
That's part of the reason they're given such a high TDP by default. The cost of squeezing out the last 5-15% may be worth it if it for short periods of time.
But if you use these chips for tasks that max out all cores for several hours per day, it may be better the simply run a chip with more cores and to run those in econ mode. A 7950X@64W may perform as well as a 7900X@200W for such compute tasks, and are probably cheaper over time.
And if you use it in your house, they generate less noise, can use a cheaper cooler and doesn't dump so much heat into your room.
Thanks for the thoughts. My untested suspicion is that my CPU hits max load for <5% of total use. Meaning lowering max TDP is unlikely to noticeably impact my electrical bill, but it’s such a cheap optimization to enable, why not?
I blame Intel for that. They pushed the clock speed high for silly small perf gains. AMD could take the high road, then media would brag about how fast Intel is and AMD would lose market share.
As you should. I worked at Intel back in the 00s when they were in the "megahertz wars". All they talked about internally was "we're winning the MHz wars!!! Yay!". When anyone mentioned power consumption, or how fucking loud the cooling fans were, or MIPS/Watt, they didn't care; all that mattered was MHz. Hitching their wagon to RAMBUS memory was part of this. Then they got pissed and indignant when consumers didn't want to spend a fortune on RAMBUS memory and bought AMD CPUs with cheap SDRAM instead.
That fits. The p4 had a long pipeline, high clock speed, and poor perf per MHz. AMD pushed the general performance score, forget the name, and seems like the succeeded. Doubly so since they shipped x86-64 first, which people did care about.
I wrote a few microbenchmarks to explore the performance promises of rambus, and didn't find anything.
Indeed. The game Intel used to play is now the game everyone plays -- Intel, AMD, and NVIDIA, for both consumer CPUs and GPUs. At least on the desktop. They tend to run them to redline by default, because numbers. It's all about getting that big splash at launch, to ride it for higher average sale price as long as possible.
Which is unfortunate, because when it comes to benchmarks, understanding the context of a test setup is extraordinarily important. Are you buying that very high-end memory the test setup used and then manually adjusting timings? No? Then you're not getting those numbers. Heck, you may be losing 20% or more performance compared to the test rig on just the difference in memory. Never mind adjusting other things, ensuring the system's thermals are entirely kept in check to prevent throttling, etc.
You can set various power settings that suggest that they are limits, but at least in the Skylake NUC I have they really don't do much and certainly don't limit the maximum power the system uses at all. The article doesn't talk enough about the actual power use vs the settings although it sounds like while both are going over there might actually be some limiting in the recent chips.
That Anandtech article did include some power measurements on page three, but given those measurements show the limits are applied inconsistently between AMD and Intel it would have been nice to have power measurements on other graphs too.
Indeed, ideally 3 companies to help ensure there's not some agreement to raise prices. Apple's largely not in the same market, but is pushing the edge when it comes to performance per watt, even shipping products gasp without fans.
Seems like Apple's increasingly happy to hit lower price points and disrupt the price structure for AMD and Intel based product lines.
Similarly Intel seems to finally be getting it's act together and setting up to disrupt the AMD and Nvidia GPU market. Here's hoping. I saw a pretty decent GPU (RTX3080) for $420 recently!
Until I can slap an Apple chip into my random Linux build, those numbers are meaningless to me. Would require absolutely enormous performance improvements to justify making such a huge leap.
The M1 sits near the bottom of NVIDIA's GPU lineup in terms of ML performance. Putting it just slightly ahead of the $200 1660ti.
When doing a quick Google search on the topic, it turned out that several of the top results were blatantly misleading in that they limited the competing hardware to match the M1's limitations. For example, the top result, a wandb article [1] claims that the M1 is competitive with the V100, yet their own data shows that they aren't even fully utilizing the V100 and that when properly utilized it obviously totally outperforms the M1.
Similarly, Apple in its marketing for the M1 Ultra was extremely manipulative, bordering on outright lying, when it compared the chip to the 3090. It presented them on a "relative performance vs power" graph making it look like the chip matched the 3090 while consuming less than half the power, when what it was saying was that it's more efficient than the 3090 when you underutilize the 3090 to the point of matching the M1 Ultra's performance.
However for AI/ML duty the apple neural engine (ANE) looks pretty promising. On Densenet 121 the ANE on the M2 max is almost 7 times faster than the M2 Max GPU.
Seems like the M2 max does pretty well compared to a plugged in RTX 3070 to 3080 Ti. The big bonuses is that you can use all ram (not limited to VRAM of 10-16gb) and you get the same performance even on battery.
Sorry bro, that iGPU is not able to run anything on it, either because of the limitations of the OS or the limitations of the architecture. It’s just marketing.
I found a bunch of ML benchmarks, only one had a comparison to a RTX3070. Seems quite a bit better than "not able to run anything". In particular you can use up to 96GB ram, 4x the 4090. Granted it's m2 max is approximately 3070 in speed for such workloads, at least it doesn't decrease when unplugged.
I agree with you, although I'd prefer to see (at least) a three way duke out between Intel, AMD, and ARM. I really don't think living in a world where there's (basically) only one CPU architecture and instruction set is going to be as beneficial as one in which there is competition on more than simply price and power consumption.
No, it is better for AMD to pull far ahead so that the next challenger is forced to come up with even greater innovation to steal market share. This is better than just having two large vendors locked in a back and forth game of one up manship and incremental innovation.
Screw that; I want them to dump x86 altogether and move to ARM, or maybe come up with something even better, to compete with Apple's M2. Why are we stuck with this shitty old ISA from the 1970s?
ARM came out the same year as 32 bit x86. Both architectures are very old.
I very much doubt that architecture is all that relevant for their advancements in power usage. Apple's chips contain a significant x86 feature without ruining battery life. Meanwhile, Qualcom is struggling to compete with Apple in both performance and efficiency despite being in the ARM space for much longer.
I'm sure if Apple could've gotten an x64 license ten years ago, they would've made their own x64 chips instead of switching to ARM. When Apple's plans started coming together, there simply were no competing architectures they could base their chips on. MIPS was practically dead already, x64 was extremely closed off, RISC-V wasn't even announced and struggles to keep up today and it wasn't even announced when Apple started selling their own chips.
Maybe they could've licensed POWER6 or an early version of POWER7? The POWER architecture isn't exactly widely used or designed to be power efficient; power management wasn't introduced until 2017 and even then it was optional.
There simply weren't any serious alternatives to licensing ARM and Apple would be stupid to develop an entirely separate CPU architecture for their desktop/laptop/tablet form factors.
The pc platform is standardised and open. Everything else is a fragmentary shitshow. It will clearly be superseded at some point, but I pray it takes its time.
Yes and no. The (very simplified) answer is that yes, some ISA (front-end) instructions are decoded into simpler back-end operations, but the design of the ISA still imposes constraints on the implementation of that decoder [1] and on the design of the back-end.
Then there are concerns like register pressure. x86 has so few general purpose registers, that values need to be stored on the stack and reloaded when needed. Some of the performance impact can be reduced by complex decoder logic, but making complex logic fast nearly always leads to high power consumption.
[1] E.g., the highly variable length of x86/x86-64 instructions puts a limit on the number of instructions that can be decoded per cycle.
It's got plenty in 64-bit mode (15 + RSP) + you can spill to SSE registers instead of the stack.
It also needs fewer registers than (most) RISCs because it has a more flexible way of specifying memory addresses (base, index, scale, offset + PC-relative) and it also has proper immediates.
> the highly variable length of x86/x86-64 instructions puts a limit on the number of instructions that can be decoded per cycle.
Not that much of a limit, actually. Yes, a parallel decoder that takes on arbitrary byte sequence and decodes them is hard to scale up. The instruction lengths can be cached, though. In fact, they used to be cached as extra bits in L1 back when the size of a wide decoder was a significant size of the CPU transistor budget. It should be possible to use that idea again to go wider.
The newer x86 CPU's also have a µop cache so no decoding is even needed for tight loops.