this post was submitted on 27 Jul 2023
49 points (98.0% liked)

Programming

17450 readers
99 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
top 12 comments
sorted by: hot top controversial new old
[–] [email protected] 22 points 1 year ago* (last edited 1 year ago)

The changes:

Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 32. This allows the compiler to keep more values in registers; as a result, APX-compiled code contains 10% fewer loads and more than 20% fewer stores than the same code compiled for an Intel® 64 baseline.2 Register accesses are not only faster, but they also consume significantly less dynamic power than complex load and store operations.

Intel® APX adds conditional forms of load, store, and compare/test instructions, and it also adds an option for the compiler to suppress the status flags writes of common instructions. These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties. All these conditional ISA improvements are implemented via EVEX prefix extensions of existing legacy instructions.

[–] [email protected] 5 points 1 year ago (1 children)

Aren't most x86 executables being built now still favoring compatibility to performance? I think I've read that just targeting the current gen CPUs while compiling can bring up to 20% improvements.

[–] [email protected] 7 points 1 year ago* (last edited 1 year ago) (1 children)

For consumer software, yes, most is still being built with a baseline target instruction set from the early/mid-2000s. In 2019 there were reports of Apex Legends requiring SSE4.1, an instruction set from circa 2007. It will be be probably close to a couple decades before consumer software would start commonly requiring these instructions.

However, for more specialized environments, such as scientific and high-performance computing applications, it's much more common that you will be using custom software designed for a specific task, and that it's normal to recompile the software when you get a new set of hardware. In those applications, these instructions can make a huge impact, as you know exactly which capabilities are supported by the hardware and can use everything available.

I believe there are also some (possibly limited) situations where a program can check what instructions a processor supports and use either the newer (higher-performance) version or the slower, more widely-supported version depending on that check. There may be limits on how often that can be done however.

[–] [email protected] 5 points 1 year ago

In 2019 there were reports of Apex Legends requiring SSE4.1, an instruction set from circa 2007.

It's not just about when it was released, sometimes budget processors or, in this case, AMD doesn't support them straight away or ever.

[–] [email protected] 2 points 1 year ago (3 children)

@keenkoon I wonder if it would make sense to store a regular compiled code and the extensions into one binary. And only load the extensions if the binary is executed on such an architecture, otherwise be compatible to older architecture.

[–] [email protected] 7 points 1 year ago

This is why .NET code compiles to platform-independent binaries that get JIT translated to machine code and optimized for the target CPU. Developers don't need to do anything (the applications don't even need to be re-compiled), they will just get conditionally optimized when appropriate.

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago)

This is the only way really to move forward with ISA extensions.

Though, I think for this update we don't need to be too concerned. Since it changes the code in such an extensive way, compiler writers will be strongly incentivised to produce this duplicate path themselves. Instead of letting the burden of dispatching fall on the programmer like with AVX and friends

[–] [email protected] 4 points 1 year ago

For an extension like this - unlike most prior extensions - you're best off with essentially an entirely separately compiled copy of the program/library. So IFUNC is a poor fit, even with peer optimization.

load more comments
view more: next ›