BPF Archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction
@ 2024-04-19 18:28 Puranjay Mohan
  2024-04-22 11:14 ` Russell King (Oracle)
  2024-04-22 15:30 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Puranjay Mohan @ 2024-04-19 18:28 UTC (permalink / raw
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Russell King, Russell King (Oracle), bpf, linux-arm-kernel,
	linux-kernel
  Cc: puranjay12

The current implementation of the mov instruction with sign extension
has the following problems:

  1. It clobbers the source register if it is not stacked because it
     sign extends the source and then moves it to the destination.
  2. If the dst_reg is stacked, the current code doesn't write the value
     back in case of 64-bit mov.
  3. There is room for improvement by emitting fewer instructions.

The steps for fixing this and the instructions emitted by the JIT are
explained below with examples in all combinations:

Case A: offset == 32:
=====================
  Case A.1: src and dst are stacked registers:
  --------------------------------------------
    1. Load src_lo into tmp_lo
    2. Store tmp_lo into dst_lo
    3. Sign extend tmp_lo into tmp_hi
    4. Store tmp_hi to dst_hi

    Example: r3 = (s32)r3
	r3 is a stacked register

	ldr     r6, [r11, #-16]	// Load r3_lo into tmp_lo
	// str to dst_lo is not emitted because src_lo == dst_lo
	asr     r7, r6, #31	// Sign extend tmp_lo into tmp_hi
	str     r7, [r11, #-12] // Store tmp_hi into r3_hi

  Case A.2: src is stacked but dst is not:
  ----------------------------------------
    1. Load src_lo into dst_lo
    2. Sign extend dst_lo into dst_hi

    Example: r6 = (s32)r3
	r6 maps to {ARM_R5, ARM_R4} and r3 is stacked

	ldr     r4, [r11, #-16] // Load r3_lo into r6_lo
	asr     r5, r4, #31	// Sign extend r6_lo into r6_hi

  Case A.3: src is not stacked but dst is stacked:
  ------------------------------------------------
    1. Store src_lo into dst_lo
    2. Sign extend src_lo into tmp_hi
    3. Store tmp_hi to dst_hi

    Example: r3 = (s32)r6
	r3 is stacked and r6 maps to {ARM_R5, ARM_R4}

	str     r4, [r11, #-16] // Store r6_lo to r3_lo
	asr     r7, r4, #31	// Sign extend r6_lo into tmp_hi
	str     r7, [r11, #-12]	// Store tmp_hi to dest_hi

  Case A.4: Both src and dst are not stacked:
  -------------------------------------------
    1. Mov src_lo into dst_lo
    2. Sign extend src_lo into dst_hi

    Example: (bf) r6 = (s32)r6
	r6 maps to {ARM_R5, ARM_R4}

	// Mov not emitted because dst == src
	asr     r5, r4, #31 // Sign extend r6_lo into r6_hi

Case B: offset != 32:
=====================
  Case B.1: src and dst are stacked registers:
  --------------------------------------------
    1. Load src_lo into tmp_lo
    2. Sign extend tmp_lo according to offset.
    3. Store tmp_lo into dst_lo
    4. Sign extend tmp_lo into tmp_hi
    5. Store tmp_hi to dst_hi

    Example: r9 = (s8)r3
	r9 and r3 are both stacked registers

	ldr     r6, [r11, #-16] // Load r3_lo into tmp_lo
	lsl     r6, r6, #24	// Sign extend tmp_lo
	asr     r6, r6, #24	// ..
	str     r6, [r11, #-56] // Store tmp_lo to r9_lo
	asr     r7, r6, #31	// Sign extend tmp_lo to tmp_hi
	str     r7, [r11, #-52] // Store tmp_hi to r9_hi

  Case B.2: src is stacked but dst is not:
  ----------------------------------------
    1. Load src_lo into dst_lo
    2. Sign extend dst_lo according to offset.
    3. Sign extend tmp_lo into dst_hi

    Example: r6 = (s8)r3
	r6 maps to {ARM_R5, ARM_R4} and r3 is stacked

	ldr     r4, [r11, #-16] // Load r3_lo to r6_lo
	lsl     r4, r4, #24	// Sign extend r6_lo
	asr     r4, r4, #24	// ..
	asr     r5, r4, #31	// Sign extend r6_lo into r6_hi

  Case B.3: src is not stacked but dst is stacked:
  ------------------------------------------------
    1. Sign extend src_lo into tmp_lo according to offset.
    2. Store tmp_lo into dst_lo.
    3. Sign extend src_lo into tmp_hi.
    4. Store tmp_hi to dst_hi.

    Example: r3 = (s8)r1
	r3 is stacked and r1 maps to {ARM_R3, ARM_R2}

	lsl     r6, r2, #24 	// Sign extend r1_lo to tmp_lo
	asr     r6, r6, #24	// ..
	str     r6, [r11, #-16] // Store tmp_lo to r3_lo
	asr     r7, r6, #31	// Sign extend tmp_lo to tmp_hi
	str     r7, [r11, #-12] // Store tmp_hi to r3_hi

  Case B.4: Both src and dst are not stacked:
  -------------------------------------------
    1. Sign extend src_lo into dst_lo according to offset.
    2. Sign extend dst_lo into dst_hi.

    Example: r6 = (s8)r1
	r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2}

	lsl     r4, r2, #24	// Sign extend r1_lo to r6_lo
	asr     r4, r4, #24	// ..
	asr     r5, r4, #31	// Sign extend r6_lo to r6_hi

Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 arch/arm/net/bpf_jit_32.c | 56 ++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 1d672457d02f..72b5cd697f5d 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -871,16 +871,11 @@ static inline void emit_a32_alu_r64(const bool is64, const s8 dst[],
 }
 
 /* dst = src (4 bytes)*/
-static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
-				  struct jit_ctx *ctx) {
+static inline void emit_a32_mov_r(const s8 dst, const s8 src, struct jit_ctx *ctx) {
 	const s8 *tmp = bpf2a32[TMP_REG_1];
 	s8 rt;
 
 	rt = arm_bpf_get_reg32(src, tmp[0], ctx);
-	if (off && off != 32) {
-		emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
-		emit(ARM_ASR_I(rt, rt, 32 - off), ctx);
-	}
 	arm_bpf_put_reg32(dst, rt, ctx);
 }
 
@@ -889,15 +884,15 @@ static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
 				  const s8 src[],
 				  struct jit_ctx *ctx) {
 	if (!is64) {
-		emit_a32_mov_r(dst_lo, src_lo, 0, ctx);
+		emit_a32_mov_r(dst_lo, src_lo, ctx);
 		if (!ctx->prog->aux->verifier_zext)
 			/* Zero out high 4 bytes */
 			emit_a32_mov_i(dst_hi, 0, ctx);
 	} else if (__LINUX_ARM_ARCH__ < 6 &&
 		   ctx->cpu_architecture < CPU_ARCH_ARMv5TE) {
 		/* complete 8 byte move */
-		emit_a32_mov_r(dst_lo, src_lo, 0, ctx);
-		emit_a32_mov_r(dst_hi, src_hi, 0, ctx);
+		emit_a32_mov_r(dst_lo, src_lo, ctx);
+		emit_a32_mov_r(dst_hi, src_hi, ctx);
 	} else if (is_stacked(src_lo) && is_stacked(dst_lo)) {
 		const u8 *tmp = bpf2a32[TMP_REG_1];
 
@@ -917,17 +912,52 @@ static inline void emit_a32_mov_r64(const bool is64, const s8 dst[],
 static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 dst[], const s8 src[],
 				      struct jit_ctx *ctx) {
 	const s8 *tmp = bpf2a32[TMP_REG_1];
-	const s8 *rt;
+	s8 rs;
+	s8 rd;
 
-	rt = arm_bpf_get_reg64(dst, tmp, ctx);
+	if (is_stacked(dst_lo))
+		rd = tmp[1];
+	else
+		rd = dst_lo;
+	rs = arm_bpf_get_reg32(src_lo, rd, ctx);
+	/* rs may be one of src[1], dst[1], or tmp[1] */
+
+	/* Sign extend rs if needed. If off == 32, lower 32-bits of src are moved to dst and sign
+	 * extension only happens in the upper 64 bits.
+	 */
+	if (off != 32) {
+		/* Sign extend rs into rd */
+		emit(ARM_LSL_I(rd, rs, 32 - off), ctx);
+		emit(ARM_ASR_I(rd, rd, 32 - off), ctx);
+	} else {
+		rd = rs;
+	}
+
+	/* Write rd to dst_lo
+	 *
+	 * Optimization:
+	 * Assume:
+	 * 1. dst == src and stacked.
+	 * 2. off == 32
+	 *
+	 * In this case src_lo was loaded into rd(tmp[1]) but rd was not sign extended as off==32.
+	 * So, we don't need to write rd back to dst_lo as they have the same value.
+	 * This saves us one str instruction.
+	 */
+	if (dst_lo != src_lo || off != 32)
+		arm_bpf_put_reg32(dst_lo, rd, ctx);
 
-	emit_a32_mov_r(dst_lo, src_lo, off, ctx);
 	if (!is64) {
 		if (!ctx->prog->aux->verifier_zext)
 			/* Zero out high 4 bytes */
 			emit_a32_mov_i(dst_hi, 0, ctx);
 	} else {
-		emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
+		if (is_stacked(dst_hi)) {
+			emit(ARM_ASR_I(tmp[0], rd, 31), ctx);
+			arm_bpf_put_reg32(dst_hi, tmp[0], ctx);
+		} else {
+			emit(ARM_ASR_I(dst_hi, rd, 31), ctx);
+		}
 	}
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction
  2024-04-19 18:28 [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction Puranjay Mohan
@ 2024-04-22 11:14 ` Russell King (Oracle)
  2024-04-22 15:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Russell King (Oracle) @ 2024-04-22 11:14 UTC (permalink / raw
  To: Puranjay Mohan
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	bpf, linux-arm-kernel, linux-kernel, puranjay12

On Fri, Apr 19, 2024 at 06:28:32PM +0000, Puranjay Mohan wrote:
> The current implementation of the mov instruction with sign extension
> has the following problems:
> 
>   1. It clobbers the source register if it is not stacked because it
>      sign extends the source and then moves it to the destination.
>   2. If the dst_reg is stacked, the current code doesn't write the value
>      back in case of 64-bit mov.
>   3. There is room for improvement by emitting fewer instructions.
> 
> The steps for fixing this and the instructions emitted by the JIT are
> explained below with examples in all combinations:
> 
> Case A: offset == 32:
> =====================
>   Case A.1: src and dst are stacked registers:
>   --------------------------------------------
>     1. Load src_lo into tmp_lo
>     2. Store tmp_lo into dst_lo
>     3. Sign extend tmp_lo into tmp_hi
>     4. Store tmp_hi to dst_hi
> 
>     Example: r3 = (s32)r3
> 	r3 is a stacked register
> 
> 	ldr     r6, [r11, #-16]	// Load r3_lo into tmp_lo
> 	// str to dst_lo is not emitted because src_lo == dst_lo
> 	asr     r7, r6, #31	// Sign extend tmp_lo into tmp_hi
> 	str     r7, [r11, #-12] // Store tmp_hi into r3_hi
> 
>   Case A.2: src is stacked but dst is not:
>   ----------------------------------------
>     1. Load src_lo into dst_lo
>     2. Sign extend dst_lo into dst_hi
> 
>     Example: r6 = (s32)r3
> 	r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
> 
> 	ldr     r4, [r11, #-16] // Load r3_lo into r6_lo
> 	asr     r5, r4, #31	// Sign extend r6_lo into r6_hi
> 
>   Case A.3: src is not stacked but dst is stacked:
>   ------------------------------------------------
>     1. Store src_lo into dst_lo
>     2. Sign extend src_lo into tmp_hi
>     3. Store tmp_hi to dst_hi
> 
>     Example: r3 = (s32)r6
> 	r3 is stacked and r6 maps to {ARM_R5, ARM_R4}
> 
> 	str     r4, [r11, #-16] // Store r6_lo to r3_lo
> 	asr     r7, r4, #31	// Sign extend r6_lo into tmp_hi
> 	str     r7, [r11, #-12]	// Store tmp_hi to dest_hi
> 
>   Case A.4: Both src and dst are not stacked:
>   -------------------------------------------
>     1. Mov src_lo into dst_lo
>     2. Sign extend src_lo into dst_hi
> 
>     Example: (bf) r6 = (s32)r6
> 	r6 maps to {ARM_R5, ARM_R4}
> 
> 	// Mov not emitted because dst == src
> 	asr     r5, r4, #31 // Sign extend r6_lo into r6_hi
> 
> Case B: offset != 32:
> =====================
>   Case B.1: src and dst are stacked registers:
>   --------------------------------------------
>     1. Load src_lo into tmp_lo
>     2. Sign extend tmp_lo according to offset.
>     3. Store tmp_lo into dst_lo
>     4. Sign extend tmp_lo into tmp_hi
>     5. Store tmp_hi to dst_hi
> 
>     Example: r9 = (s8)r3
> 	r9 and r3 are both stacked registers
> 
> 	ldr     r6, [r11, #-16] // Load r3_lo into tmp_lo
> 	lsl     r6, r6, #24	// Sign extend tmp_lo
> 	asr     r6, r6, #24	// ..
> 	str     r6, [r11, #-56] // Store tmp_lo to r9_lo
> 	asr     r7, r6, #31	// Sign extend tmp_lo to tmp_hi
> 	str     r7, [r11, #-52] // Store tmp_hi to r9_hi
> 
>   Case B.2: src is stacked but dst is not:
>   ----------------------------------------
>     1. Load src_lo into dst_lo
>     2. Sign extend dst_lo according to offset.
>     3. Sign extend tmp_lo into dst_hi
> 
>     Example: r6 = (s8)r3
> 	r6 maps to {ARM_R5, ARM_R4} and r3 is stacked
> 
> 	ldr     r4, [r11, #-16] // Load r3_lo to r6_lo
> 	lsl     r4, r4, #24	// Sign extend r6_lo
> 	asr     r4, r4, #24	// ..
> 	asr     r5, r4, #31	// Sign extend r6_lo into r6_hi
> 
>   Case B.3: src is not stacked but dst is stacked:
>   ------------------------------------------------
>     1. Sign extend src_lo into tmp_lo according to offset.
>     2. Store tmp_lo into dst_lo.
>     3. Sign extend src_lo into tmp_hi.
>     4. Store tmp_hi to dst_hi.
> 
>     Example: r3 = (s8)r1
> 	r3 is stacked and r1 maps to {ARM_R3, ARM_R2}
> 
> 	lsl     r6, r2, #24 	// Sign extend r1_lo to tmp_lo
> 	asr     r6, r6, #24	// ..
> 	str     r6, [r11, #-16] // Store tmp_lo to r3_lo
> 	asr     r7, r6, #31	// Sign extend tmp_lo to tmp_hi
> 	str     r7, [r11, #-12] // Store tmp_hi to r3_hi
> 
>   Case B.4: Both src and dst are not stacked:
>   -------------------------------------------
>     1. Sign extend src_lo into dst_lo according to offset.
>     2. Sign extend dst_lo into dst_hi.
> 
>     Example: r6 = (s8)r1
> 	r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2}
> 
> 	lsl     r4, r2, #24	// Sign extend r1_lo to r6_lo
> 	asr     r4, r4, #24	// ..
> 	asr     r5, r4, #31	// Sign extend r6_lo to r6_hi
> 
> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
> Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>

This looks really great, thanks for putting in the extra effort!

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction
  2024-04-19 18:28 [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction Puranjay Mohan
  2024-04-22 11:14 ` Russell King (Oracle)
@ 2024-04-22 15:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-22 15:30 UTC (permalink / raw
  To: Puranjay Mohan
  Cc: ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, linux, rmk+kernel,
	bpf, linux-arm-kernel, linux-kernel, puranjay12

Hello:

This patch was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Fri, 19 Apr 2024 18:28:32 +0000 you wrote:
> The current implementation of the mov instruction with sign extension
> has the following problems:
> 
>   1. It clobbers the source register if it is not stacked because it
>      sign extends the source and then moves it to the destination.
>   2. If the dst_reg is stacked, the current code doesn't write the value
>      back in case of 64-bit mov.
>   3. There is room for improvement by emitting fewer instructions.
> 
> [...]

Here is the summary with links:
  - [bpf] arm32, bpf: reimplement sign-extension mov instruction
    https://git.kernel.org/bpf/bpf/c/c6f48506ba30

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-22 15:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-19 18:28 [PATCH bpf] arm32, bpf: reimplement sign-extension mov instruction Puranjay Mohan
2024-04-22 11:14 ` Russell King (Oracle)
2024-04-22 15:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).