All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: always_inline wrapper for x86's test_bit
@ 2008-04-13 11:23 Alexander van Heukelum
  2008-04-13 16:58 ` Andi Kleen
  2008-04-14  7:52 ` Ingo Molnar
  0 siblings, 2 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-13 11:23 UTC (permalink / raw
  To: Ingo Molnar; +Cc: linux-kernel, heukelum

On x86, test_bit is currently implemented as a preprocessor macro. It
uses gcc's __builtin_constant_p to determine if the bit position is 
known at compile time and defers to one of two functions depending
on that. This changes the same logic to an __always_inline wrapper
instead.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>

---

Hi Ingo,

This patch is a cleanup, changing a macro into an inline function.

However, the size comparison here is interesting. Allowing gcc to
uninline functions marked inline causes a nice decrease in size
for a defconfig, but on my tiny config the size increases.

   text    data     bss     dec     hex vmlinux/i386
4850554  474688  622592 5947834  5ac1ba defconfig+forced inlining
4850888  474688  622592 5948168  5ac308 defconfig+forced inlining (patched)
4721175  474688  622592 5818455  58c857 defconfig (uninlining)
4705595  474688  622592 5802875  588b7b defconfig (uninlining, patched)
   
   text    data     bss     dec     hex vmlinux/i386
 759524   77400   81920  918844   e053c tiny (forced inlining)
 759562   77400   81920  918882   e0562 tiny (forced inlining, patched)
 787508   77400   81920  946828   e728c tiny (uninlining)
 784492   77400   81920  943812   e66c4 tiny (uninlining, patched)

If gcc decides to emit a separate function instead of inlining it,
the image can have multiple instances of this uninlined function.
Moreover, gcc decides if a function is too big to be inlined using
a heuristic and sometimes gets it wrong. In particular, it uninlined
constant_bit_test which indeed looks a bit big, but should reduce to
a single assembly instruction after constant folding.

So I guess we should sprinkle some __always_inlines in the core
parts of the kernel? The most problematic ones are easily spotted
using: nm -S vmlinux|uniq -f2 -D

Greetings,
	Alexander

 arch/x86/boot/bitops.h   |   20 ++++++++++----------
 include/asm-x86/bitops.h |   23 +++++++++--------------
 2 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
index 878e4b9..b1a4841 100644
--- a/arch/x86/boot/bitops.h
+++ b/arch/x86/boot/bitops.h
@@ -16,12 +16,7 @@
 #define BOOT_BITOPS_H
 #define _LINUX_BITOPS_H		/* Inhibit inclusion of <linux/bitops.h> */
 
-static inline int constant_test_bit(int nr, const void *addr)
-{
-	const u32 *p = (const u32 *)addr;
-	return ((1UL << (nr & 31)) & (p[nr >> 5])) != 0;
-}
-static inline int variable_test_bit(int nr, const void *addr)
+static inline int variable_test_bit(int nr, const volatile unsigned long *addr)
 {
 	u8 v;
 	const u32 *p = (const u32 *)addr;
@@ -30,10 +25,15 @@ static inline int variable_test_bit(int nr, const void *addr)
 	return v;
 }
 
-#define test_bit(nr,addr) \
-(__builtin_constant_p(nr) ? \
- constant_test_bit((nr),(addr)) : \
- variable_test_bit((nr),(addr)))
+static __always_inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+	if (__builtin_constant_p(nr)) {
+		return ((1UL << (nr % BITS_PER_LONG)) &
+			(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
+	}
+
+	return variable_test_bit(nr, addr);	
+}
 
 static inline void set_bit(int nr, void *addr)
 {
diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index b81a4d4..85dc2e7 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -266,13 +266,7 @@ static inline int test_and_change_bit(int nr, volatile void *addr)
 	return oldbit;
 }
 
-static inline int constant_test_bit(int nr, const volatile void *addr)
-{
-	return ((1UL << (nr % BITS_PER_LONG)) &
-		(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
-}
-
-static inline int variable_test_bit(int nr, volatile const void *addr)
+static inline int variable_test_bit(int nr, const volatile unsigned long *addr)
 {
 	int oldbit;
 
@@ -285,19 +279,20 @@ static inline int variable_test_bit(int nr, volatile const void *addr)
 	return oldbit;
 }
 
-#if 0 /* Fool kernel-doc since it doesn't do macros yet */
 /**
  * test_bit - Determine whether a bit is set
  * @nr: bit number to test
  * @addr: Address to start counting from
  */
-static int test_bit(int nr, const volatile unsigned long *addr);
-#endif
+static __always_inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+	if (__builtin_constant_p(nr)) {
+		return ((1UL << (nr % BITS_PER_LONG)) &
+			(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
+	}
 
-#define test_bit(nr, addr)			\
-	(__builtin_constant_p((nr))		\
-	 ? constant_test_bit((nr), (addr))	\
-	 : variable_test_bit((nr), (addr)))
+	return variable_test_bit(nr, addr);	
+}
 
 /**
  * __ffs - find first set bit in word


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
@ 2008-04-13 16:58 ` Andi Kleen
  2008-04-13 18:03   ` Alexander van Heukelum
  2008-04-14  7:52 ` Ingo Molnar
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2008-04-13 16:58 UTC (permalink / raw
  To: Alexander van Heukelum; +Cc: Ingo Molnar, linux-kernel, heukelum

Alexander van Heukelum <heukelum@mailshack.com> writes:

> On x86, test_bit is currently implemented as a preprocessor macro. It
> uses gcc's __builtin_constant_p to determine if the bit position is 
> known at compile time and defers to one of two functions depending
> on that. This changes the same logic to an __always_inline wrapper
> instead.

Some old gccs didn't support __builtin_constant_p in inline properly,
that is why it was always written in macros.

Please double check with the oldest still supported gcc (3.2) if it 
really generates the expected code for the constant/non constant case.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-13 16:58 ` Andi Kleen
@ 2008-04-13 18:03   ` Alexander van Heukelum
  2008-04-13 18:16     ` Andi Kleen
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-13 18:03 UTC (permalink / raw
  To: Andi Kleen, Alexander van Heukelum; +Cc: Ingo Molnar, linux-kernel

On Sun, 13 Apr 2008 18:58:30 +0200, "Andi Kleen" <andi@firstfloor.org>
said:
> Alexander van Heukelum <heukelum@mailshack.com> writes:
> 
> > On x86, test_bit is currently implemented as a preprocessor macro. It
> > uses gcc's __builtin_constant_p to determine if the bit position is 
> > known at compile time and defers to one of two functions depending
> > on that. This changes the same logic to an __always_inline wrapper
> > instead.
> 
> Some old gccs didn't support __builtin_constant_p in inline properly,
> that is why it was always written in macros.
> 
> Please double check with the oldest still supported gcc (3.2) if it 
> really generates the expected code for the constant/non constant case.

Hi Andi,

I have googled, but the only problem I found was concerning dead code 
elimination, and in particular references to unavailable object from
code that was expected to be discarded. The worst that can happen in 
this case is that gcc might produce a strange construction where a
runtime check will choose between the two alternative implementations of
test_bit. Another is that it will select the 'wrong' implementation. 
Both will result is some code-bloat, but at least the code should work
properly.

I have not checked with 3.2. The oldest compiler I have available here
is 3.3. That version compiles the functions as expected: I have found
instances of either type in the objdump and I have not found strange
constructions with both types there.

If you were thinking of another/bigger problem with gcc-3.2, could you 
please give me a pointer?

Greetings,
        Alexander

> -Andi
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
                          love email again


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-13 18:03   ` Alexander van Heukelum
@ 2008-04-13 18:16     ` Andi Kleen
  2008-04-14 12:24       ` Alexander van Heukelum
  0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2008-04-13 18:16 UTC (permalink / raw
  To: Alexander van Heukelum
  Cc: Andi Kleen, Alexander van Heukelum, Ingo Molnar, linux-kernel

> I have googled, but the only problem I found was concerning dead code 

It might come as a surprise, but google is still not omniscient.

> elimination, and in particular references to unavailable object from
> code that was expected to be discarded. The worst that can happen in 
> this case is that gcc might produce a strange construction where a
> runtime check will choose between the two alternative implementations of
> test_bit. Another is that it will select the 'wrong' implementation. 
> Both will result is some code-bloat, but at least the code should work
> properly.

Yes, but extreme code bloat can be fatal.

> I have not checked with 3.2. The oldest compiler I have available here
> is 3.3. That version compiles the functions as expected: I have found
> instances of either type in the objdump and I have not found strange
> constructions with both types there.

It would be good to check on 3.2 too to avoid potential nasty surprises.

> If you were thinking of another/bigger problem with gcc-3.2, could you 
> please give me a pointer?

Not sure what you mean here.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
  2008-04-13 16:58 ` Andi Kleen
@ 2008-04-14  7:52 ` Ingo Molnar
  2008-04-14 13:49   ` Alexander van Heukelum
  1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2008-04-14  7:52 UTC (permalink / raw
  To: Alexander van Heukelum; +Cc: linux-kernel, heukelum


* Alexander van Heukelum <heukelum@mailshack.com> wrote:

> On x86, test_bit is currently implemented as a preprocessor macro. It 
> uses gcc's __builtin_constant_p to determine if the bit position is 
> known at compile time and defers to one of two functions depending on 
> that. This changes the same logic to an __always_inline wrapper 
> instead.

thanks Alexander, applied.

> Hi Ingo,
> 
> This patch is a cleanup, changing a macro into an inline function.
> 
> However, the size comparison here is interesting. Allowing gcc to 
> uninline functions marked inline causes a nice decrease in size for a 
> defconfig, but on my tiny config the size increases.
> 
>    text    data     bss     dec     hex vmlinux/i386
> 4850554  474688  622592 5947834  5ac1ba defconfig+forced inlining
> 4850888  474688  622592 5948168  5ac308 defconfig+forced inlining (patched)
> 4721175  474688  622592 5818455  58c857 defconfig (uninlining)
> 4705595  474688  622592 5802875  588b7b defconfig (uninlining, patched)
>    
>    text    data     bss     dec     hex vmlinux/i386
>  759524   77400   81920  918844   e053c tiny (forced inlining)
>  759562   77400   81920  918882   e0562 tiny (forced inlining, patched)
>  787508   77400   81920  946828   e728c tiny (uninlining)
>  784492   77400   81920  943812   e66c4 tiny (uninlining, patched)

when gcc is allowed to uninline (the new CONFIG_OPTIMIZE_INLINING=y 
feature in x86.git/latest) then your change gives a nice size reduction 
of around 0.400% on both large and small kernel images.

the size increase on the 'big' kernel is small, and it's rather 
immaterial as well, forced-inlining kernels are +3.080% larger anyway.

the behavior on tiny is interesting - could you post the config you used 
for your tiny kernel?

what would be interesting to see is the effect of allowing gcc to 
optimize for size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) - and compare 
!CC_OPTIMIZE_FOR_SIZE+!OPTIMIZE_INLINING against 
CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y kernels - the size difference 
should be _brutal_.

> If gcc decides to emit a separate function instead of inlining it, the 
> image can have multiple instances of this uninlined function. 
> Moreover, gcc decides if a function is too big to be inlined using a 
> heuristic and sometimes gets it wrong. In particular, it uninlined 
> constant_bit_test which indeed looks a bit big, but should reduce to a 
> single assembly instruction after constant folding.

yep, gcc could definitely improve here. But if we never give it the 
possibility to do so (by always forcing inlining) then gcc (or any other 
compiler) wont really be optimized for Linux in this area.

we saw that with CC_OPTIMIZE_FOR_SIZE=y well - when the Linux kernel 
started using it then both the correctness and the quality of gcc's -Os 
output improved.

> So I guess we should sprinkle some __always_inlines in the core parts 
> of the kernel? The most problematic ones are easily spotted using: nm 
> -S vmlinux|uniq -f2 -D

yes - butwe should use a separate (new) __hint_always_inline tag - and 
keep __always_inline for the cases where the code _must_ be inlined for 
correctness reasons. (there are a few places where we do that)

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-13 18:16     ` Andi Kleen
@ 2008-04-14 12:24       ` Alexander van Heukelum
  2008-04-14 12:28         ` Ingo Molnar
  2008-04-14 12:34         ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-14 12:24 UTC (permalink / raw
  To: Andi Kleen; +Cc: Alexander van Heukelum, Ingo Molnar, linux-kernel

On Sun, 13 Apr 2008 20:16:42 +0200, "Andi Kleen" <andi@firstfloor.org>
said:
> > I have googled, but the only problem I found was concerning dead code 
> 
> It might come as a surprise, but google is still not omniscient.

Hi,

It's just not always easy to think of the right question to ask the
oracle ;).

> > elimination, and in particular references to unavailable object from
> > code that was expected to be discarded. The worst that can happen in 
> > this case is that gcc might produce a strange construction where a
> > runtime check will choose between the two alternative implementations of
> > test_bit. Another is that it will select the 'wrong' implementation. 
> > Both will result is some code-bloat, but at least the code should work
> > properly.
> 
> Yes, but extreme code bloat can be fatal.
> 
> > I have not checked with 3.2. The oldest compiler I have available here
> > is 3.3. That version compiles the functions as expected: I have found
> > instances of either type in the objdump and I have not found strange
> > constructions with both types there.
> 
> It would be good to check on 3.2 too to avoid potential nasty surprises.

Checked with a stock 3.2.3 form gcc.gnu.org (compiled using debian's
gcc-2.95). It gets it right: places that should get the C version get
the C version and places that should get the inline-assembly version
get the inline-assembly version. No funny things at all.

B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a
good reason why gcc-3.2.3 (5 years old in a week) should still be
supported?

> > If you were thinking of another/bigger problem with gcc-3.2, could you 
> > please give me a pointer?
> 
> Not sure what you mean here.

I just thought you might have been thinking of a specific problem. A
bug-report or something like that would have been convenient to have.
Looking for a problem is easier than looking for no problems ;).

Greetings,
    Alexander

> -Andi
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Does exactly what it says on the tin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-14 12:24       ` Alexander van Heukelum
@ 2008-04-14 12:28         ` Ingo Molnar
  2008-04-14 12:34         ` Andi Kleen
  1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-04-14 12:28 UTC (permalink / raw
  To: Alexander van Heukelum; +Cc: Andi Kleen, Alexander van Heukelum, linux-kernel


* Alexander van Heukelum <heukelum@fastmail.fm> wrote:

> B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a 
> good reason why gcc-3.2.3 (5 years old in a week) should still be 
> supported?

we normally desupport given gcc versions only if their support in the 
kernel is very difficult and leads to lots of uglinesses, or if a gcc 
version is known to fatally miscompile the kernel, with no workaround 
available.

i.e. we try not to break people's build setups (folks with a specific 
gcc version in a cross-build setup), without a good reason.

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-14 12:24       ` Alexander van Heukelum
  2008-04-14 12:28         ` Ingo Molnar
@ 2008-04-14 12:34         ` Andi Kleen
  1 sibling, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2008-04-14 12:34 UTC (permalink / raw
  To: Alexander van Heukelum
  Cc: Andi Kleen, Alexander van Heukelum, Ingo Molnar, linux-kernel

> Checked with a stock 3.2.3 form gcc.gnu.org (compiled using debian's
> gcc-2.95). It gets it right: places that should get the C version get
> the C version and places that should get the inline-assembly version
> get the inline-assembly version. No funny things at all.

Great. Thanks for testing. Perhaps the problem was limited to 2.9x only.

> B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a
> good reason why gcc-3.2.3 (5 years old in a week) should still be
> supported?

The compiler version is normally only moved up when there is some 
bug in an older version that cannot be reasonably be worked around.
So far 3.2 didn't have anything like this, at least not on x86
(on some other architectures there are problems with older toolkits) 

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
  2008-04-14  7:52 ` Ingo Molnar
@ 2008-04-14 13:49   ` Alexander van Heukelum
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-14 13:49 UTC (permalink / raw
  To: Ingo Molnar, Alexander van Heukelum; +Cc: linux-kernel

On Mon, 14 Apr 2008 09:52:04 +0200, "Ingo Molnar" <mingo@elte.hu> said:
> the behavior on tiny is interesting - could you post the config you used 
> for your tiny kernel?

I use the following mini-config (../klibc.i386.mini). To expand it:
make allnoconfig KCONFIG_ALLCONFIG="../klibc.i386.mini"


CONFIG_EXPERIMENTAL=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="../initramfs"
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_EMBEDDED=y
CONFIG_KALLSYMS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_SLOB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MPENTIUMII=y
CONFIG_HZ_100=y
CONFIG_PHYSICAL_START=0x110000
CONFIG_PHYSICAL_ALIGN=0x10000
CONFIG_BINFMT_ELF=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_PROC_FS=y
CONFIG_SYSFS=y
CONFIG_OPTIMIZE_INLINING=y


This references the file initramfs:


#####################
# Shared klibc initramfs
# See gen_init_cpio for syntax

dir dev 0755 0 0
dir lib 0755 0 0
dir lib/modules 0755 0 0
dir bin 0755 0 0

nod dev/console 0600 0 0 c 5 1
file lib/klibc-K9OvCTU5pCdyhCSkxwBjXuR7KA4.so
../klibc/usr/klibc/klibc-K9OvCTU5pCdyhCSkxw$
file init ../init 0755 0 0

file bin/sh ../klibc/usr/dash/sh.shared 0755 0 0
file bin/kinit ../klibc/usr/kinit/kinit.shared 0755 0 0
file bin/gzip ../klibc/usr/gzip/gzip 0755 0 0
file bin/cat ../klibc/usr/utils/shared/cat 0755 0 0
file bin/chroot ../klibc/usr/utils/shared/chroot 0755 0 0
file bin/cpio ../klibc/usr/utils/shared/cpio 0755 0 0
file bin/dd ../klibc/usr/utils/shared/dd 0755 0 0
file bin/false ../klibc/usr/utils/shared/false 0755 0 0
file bin/halt ../klibc/usr/utils/shared/halt 0755 0 0
file bin/kill ../klibc/usr/utils/shared/kill 0755 0 0
file bin/ln ../klibc/usr/utils/shared/ln 0755 0 0
file bin/ps ../klibc/usr/utils/shared/minips 0755 0 0
file bin/mkdir ../klibc/usr/utils/shared/mkdir 0755 0 0
file bin/mkfifo ../klibc/usr/utils/shared/mkfifo 0755 0 0
file bin/mknod ../klibc/usr/utils/shared/mknod 0755 0 0
file bin/mount ../klibc/usr/utils/shared/mount 0755 0 0
file bin/nuke ../klibc/usr/utils/shared/nuke 0755 0 0
file bin/pivot_root ../klibc/usr/utils/shared/pivot_root 0755 0 0
file bin/poweroff ../klibc/usr/utils/shared/poweroff 0755 0 0
file bin/readlink ../klibc/usr/utils/shared/readlink 0755 0 0
file bin/reboot ../klibc/usr/utils/shared/reboot 0755 0 0
file bin/sleep ../klibc/usr/utils/shared/sleep 0755 0 0
file bin/sync ../klibc/usr/utils/shared/sync 0755 0 0
file bin/true ../klibc/usr/utils/shared/true 0755 0 0
file bin/umount ../klibc/usr/utils/shared/umount 0755 0 0
file bin/uname ../klibc/usr/utils/shared/uname 0755 0 0


And ../init is a small script:


#! /bin/sh
mkdir proc
mount -t proc proc /proc
mkdir sys
mount -t sysfs sysfs /sys
bin/sh -i
reboot


This is usually the first config I try to build if I change something.
Klibc is here: git://git.kernel.org/pub/scm/libs/klibc/klibc.git
If all goes well you will be able to boot to a shell with a recent
qemu using: qemu -m 4 -smp 2 -cpu pentium2 -nographic -no-reboot
-serial stdio -cdrom arch/x86/boot/image.iso.

> what would be interesting to see is the effect of allowing gcc to 
> optimize for size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) - and compare 
> !CC_OPTIMIZE_FOR_SIZE+!OPTIMIZE_INLINING against 
> CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y kernels - the size difference 
> should be _brutal_.

I see I had CONFIG_DEBUG_SECTION_MISMATCH enabled all the time.
This has quite a bit of influence on code generation! I did not
expect that. Here are the numbers for the configuration given
above for all combinations of the three settings.

Using: gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)

CONFIG_DEBUG_SECTION_MISMATCH=y
   text    data     bss     dec     hex vmlinux/i386
 806039   81464   73728  961231   eaacf OPT_SIZE=y OPT_INL=y
 778053   81464   73728  933245   e3d7d OPT_SIZE=y OPT_INL=n
 929940   81464   73728 1085132  108ecc OPT_SIZE=n OPT_INL=y
 929541   81464   73728 1084733  108d3d OPT_SIZE=n OPT_INL=n

CONFIG_DEBUG_SECTION_MISMATCH=n
   text    data     bss     dec     hex vmlinux/i386
 768031   81464   73728  923223   e1657 OPT_SIZE=y OPT_INL=y
 763675   81464   73728  918867   e0553 OPT_SIZE=y OPT_INL=n
 904284   81464   73728 1059476  102a94 OPT_SIZE=n OPT_INL=y
 904764   81464   73728 1059956  102c74 OPT_SIZE=n OPT_INL=n

The patch still helps OPT_SIZE=y OPT_INL=y, though:
 765896   81464   73728  921088   e0e00 OPT_SIZE=y OPT_INL=y, patched
 763717   81464   73728  918909   e057d OPT_SIZE=y OPT_INL=n, patched
 904490   81464   73728 1059682  102b62 OPT_SIZE=n OPT_INL=y, patched
 904938   81464   73728 1060130  102d22 OPT_SIZE=n OPT_INL=n, patched

> > If gcc decides to emit a separate function instead of inlining it, the 
> > image can have multiple instances of this uninlined function. 
> > Moreover, gcc decides if a function is too big to be inlined using a 
> > heuristic and sometimes gets it wrong. In particular, it uninlined 
> > constant_bit_test which indeed looks a bit big, but should reduce to a 
> > single assembly instruction after constant folding.
> 
> yep, gcc could definitely improve here. But if we never give it the 
> possibility to do so (by always forcing inlining) then gcc (or any other 
> compiler) wont really be optimized for Linux in this area.
> 
> we saw that with CC_OPTIMIZE_FOR_SIZE=y well - when the Linux kernel 
> started using it then both the correctness and the quality of gcc's -Os 
> output improved.
> 
> > So I guess we should sprinkle some __always_inlines in the core parts 
> > of the kernel? The most problematic ones are easily spotted using: nm 
> > -S vmlinux|uniq -f2 -D
> 
> yes - butwe should use a separate (new) __hint_always_inline tag - and 
> keep __always_inline for the cases where the code _must_ be inlined for 
> correctness reasons. (there are a few places where we do that)

That's becoming quite long. What about __wrapper? It should only be
applied to functions that do nothing more than select an implementation
based on compile-time properties of the arguments of the function or on
configuration variables.

Greetings,
    Alexander

> 	Ingo
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
                          love email again


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-14 13:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
2008-04-13 16:58 ` Andi Kleen
2008-04-13 18:03   ` Alexander van Heukelum
2008-04-13 18:16     ` Andi Kleen
2008-04-14 12:24       ` Alexander van Heukelum
2008-04-14 12:28         ` Ingo Molnar
2008-04-14 12:34         ` Andi Kleen
2008-04-14  7:52 ` Ingo Molnar
2008-04-14 13:49   ` Alexander van Heukelum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.