Port amd64 SIMD libc optimizations to aarch64

Project description

Project proposal (gopher)

The goal of the project is to port the SIMD optimized routines written for amd64 to aarch64 using Arm NEON instructions. Several string functions already had SIMD routines located in /contrib/arm-optimized-routines but some were less than optimal, in those cases the amd64 variants were ported with great success.

Code to be ported is located at /src/lib/libc/amd64/string

https://github.com/freebsd/freebsd-src/tree/main/lib/libc/amd64/string

New code is located in /src/lib/libc/aarch64/string

https://github.com/freebsd/freebsd-src/tree/main/lib/libc/aarch64/string

Project outcome

Almost all string functions are now SIMD enhanced for aarch64, pending an exp-run before merge into -CURRENT.

A bug in the existing memccpy implementation was also discovered which could result in an overread condition causing a segfault.

What's left to do

str(c)spn would benefit from a SIMDized check which bytes are in a set.

NEON has no nice instruction to do this like pcmpistri for amd64 or MATCH for SVE so the above algorithm could work well.

Reviews

memcmp

strlen

strcmp

strncmp

memccpy

strlcpy

memcpy

strlcat

strspn

strcspn

strcat

strncat

strpbrk

Progress reports

Blog Posts

Update 1

Update 2

Update 3

Update 4

Update 5

Deliverables

HEADER          FUNCTION        NOTES
string.h        stpcpy          String copy functions
                stpncpy
                strcat
                strncat
                strcpy
                strncpy
                strlcpy
                strlcat
                strchrnul
                strrchr
                strcspn
                strspn
                strpbrk
                strsep          String tokenisation functions
                strtok_r
                strcmp          String comparison functions
                strncmp
                memcpy          Memory copy functions
                memccpy
                memset          Memory initialisation functions
                memchr          Memory search functions
                memrchr
                memmem
                memcmp          Memory comparison function
                strlen          String length

Milestones

Test Plan

Code will be tested using the available FreeBSD tests and the ones borrowed from NetBSD on a Raspberry Pi5. Additional tests will be written if needed. Performance will be measured using fuz' tool strperf (https://github.com/clausecker/strperf) and results will be analyzed using benchstat from devel/go-perf.

The Code

https://git.sr.ht/~getz/aarch64_string.h https://github.com/soppelmann/freebsd-src

Notes

I will publish progress reports here and in-depth writeups for interesting solutions on my blog, https://df.lth.se/~getz or https://getz.sdf.org)

https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon

https://danlark.org/2023/06/06/csinc-the-arm-instruction-you-didnt-know-you-wanted/

https://www.corsix.org/content/whirlwind-tour-aarch64-vector-instructions

https://branchfree.org/2019/03/26/an-intel-programmer-jumps-over-the-wall-first-impressions-of-arm-simd-programming/

https://branchfree.org/2019/04/01/fitting-my-head-through-the-arm-holes-or-two-sequences-to-substitute-for-the-missing-pmovmskb-instruction-on-arm-neon/

SummerOfCode2024Projects/PortingAmd64LibcSIMDEnhancementsToArm64 (last edited 2024-08-23T12:29:38+0000 by GetzMikalsen)