* [PATCH] x86: always_inline wrapper for x86's test_bit
@ 2008-04-13 11:23 Alexander van Heukelum
2008-04-13 16:58 ` Andi Kleen
2008-04-14 7:52 ` Ingo Molnar
0 siblings, 2 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-13 11:23 UTC (permalink / raw
To: Ingo Molnar; +Cc: linux-kernel, heukelum
On x86, test_bit is currently implemented as a preprocessor macro. It
uses gcc's __builtin_constant_p to determine if the bit position is
known at compile time and defers to one of two functions depending
on that. This changes the same logic to an __always_inline wrapper
instead.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
---
Hi Ingo,
This patch is a cleanup, changing a macro into an inline function.
However, the size comparison here is interesting. Allowing gcc to
uninline functions marked inline causes a nice decrease in size
for a defconfig, but on my tiny config the size increases.
text data bss dec hex vmlinux/i386
4850554 474688 622592 5947834 5ac1ba defconfig+forced inlining
4850888 474688 622592 5948168 5ac308 defconfig+forced inlining (patched)
4721175 474688 622592 5818455 58c857 defconfig (uninlining)
4705595 474688 622592 5802875 588b7b defconfig (uninlining, patched)
text data bss dec hex vmlinux/i386
759524 77400 81920 918844 e053c tiny (forced inlining)
759562 77400 81920 918882 e0562 tiny (forced inlining, patched)
787508 77400 81920 946828 e728c tiny (uninlining)
784492 77400 81920 943812 e66c4 tiny (uninlining, patched)
If gcc decides to emit a separate function instead of inlining it,
the image can have multiple instances of this uninlined function.
Moreover, gcc decides if a function is too big to be inlined using
a heuristic and sometimes gets it wrong. In particular, it uninlined
constant_bit_test which indeed looks a bit big, but should reduce to
a single assembly instruction after constant folding.
So I guess we should sprinkle some __always_inlines in the core
parts of the kernel? The most problematic ones are easily spotted
using: nm -S vmlinux|uniq -f2 -D
Greetings,
Alexander
arch/x86/boot/bitops.h | 20 ++++++++++----------
include/asm-x86/bitops.h | 23 +++++++++--------------
2 files changed, 19 insertions(+), 24 deletions(-)
diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h
index 878e4b9..b1a4841 100644
--- a/arch/x86/boot/bitops.h
+++ b/arch/x86/boot/bitops.h
@@ -16,12 +16,7 @@
#define BOOT_BITOPS_H
#define _LINUX_BITOPS_H /* Inhibit inclusion of <linux/bitops.h> */
-static inline int constant_test_bit(int nr, const void *addr)
-{
- const u32 *p = (const u32 *)addr;
- return ((1UL << (nr & 31)) & (p[nr >> 5])) != 0;
-}
-static inline int variable_test_bit(int nr, const void *addr)
+static inline int variable_test_bit(int nr, const volatile unsigned long *addr)
{
u8 v;
const u32 *p = (const u32 *)addr;
@@ -30,10 +25,15 @@ static inline int variable_test_bit(int nr, const void *addr)
return v;
}
-#define test_bit(nr,addr) \
-(__builtin_constant_p(nr) ? \
- constant_test_bit((nr),(addr)) : \
- variable_test_bit((nr),(addr)))
+static __always_inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+ if (__builtin_constant_p(nr)) {
+ return ((1UL << (nr % BITS_PER_LONG)) &
+ (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
+ }
+
+ return variable_test_bit(nr, addr);
+}
static inline void set_bit(int nr, void *addr)
{
diff --git a/include/asm-x86/bitops.h b/include/asm-x86/bitops.h
index b81a4d4..85dc2e7 100644
--- a/include/asm-x86/bitops.h
+++ b/include/asm-x86/bitops.h
@@ -266,13 +266,7 @@ static inline int test_and_change_bit(int nr, volatile void *addr)
return oldbit;
}
-static inline int constant_test_bit(int nr, const volatile void *addr)
-{
- return ((1UL << (nr % BITS_PER_LONG)) &
- (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
-}
-
-static inline int variable_test_bit(int nr, volatile const void *addr)
+static inline int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
@@ -285,19 +279,20 @@ static inline int variable_test_bit(int nr, volatile const void *addr)
return oldbit;
}
-#if 0 /* Fool kernel-doc since it doesn't do macros yet */
/**
* test_bit - Determine whether a bit is set
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static int test_bit(int nr, const volatile unsigned long *addr);
-#endif
+static __always_inline int test_bit(int nr, const volatile unsigned long *addr)
+{
+ if (__builtin_constant_p(nr)) {
+ return ((1UL << (nr % BITS_PER_LONG)) &
+ (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
+ }
-#define test_bit(nr, addr) \
- (__builtin_constant_p((nr)) \
- ? constant_test_bit((nr), (addr)) \
- : variable_test_bit((nr), (addr)))
+ return variable_test_bit(nr, addr);
+}
/**
* __ffs - find first set bit in word
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
@ 2008-04-13 16:58 ` Andi Kleen
2008-04-13 18:03 ` Alexander van Heukelum
2008-04-14 7:52 ` Ingo Molnar
1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2008-04-13 16:58 UTC (permalink / raw
To: Alexander van Heukelum; +Cc: Ingo Molnar, linux-kernel, heukelum
Alexander van Heukelum <heukelum@mailshack.com> writes:
> On x86, test_bit is currently implemented as a preprocessor macro. It
> uses gcc's __builtin_constant_p to determine if the bit position is
> known at compile time and defers to one of two functions depending
> on that. This changes the same logic to an __always_inline wrapper
> instead.
Some old gccs didn't support __builtin_constant_p in inline properly,
that is why it was always written in macros.
Please double check with the oldest still supported gcc (3.2) if it
really generates the expected code for the constant/non constant case.
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-13 16:58 ` Andi Kleen
@ 2008-04-13 18:03 ` Alexander van Heukelum
2008-04-13 18:16 ` Andi Kleen
0 siblings, 1 reply; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-13 18:03 UTC (permalink / raw
To: Andi Kleen, Alexander van Heukelum; +Cc: Ingo Molnar, linux-kernel
On Sun, 13 Apr 2008 18:58:30 +0200, "Andi Kleen" <andi@firstfloor.org>
said:
> Alexander van Heukelum <heukelum@mailshack.com> writes:
>
> > On x86, test_bit is currently implemented as a preprocessor macro. It
> > uses gcc's __builtin_constant_p to determine if the bit position is
> > known at compile time and defers to one of two functions depending
> > on that. This changes the same logic to an __always_inline wrapper
> > instead.
>
> Some old gccs didn't support __builtin_constant_p in inline properly,
> that is why it was always written in macros.
>
> Please double check with the oldest still supported gcc (3.2) if it
> really generates the expected code for the constant/non constant case.
Hi Andi,
I have googled, but the only problem I found was concerning dead code
elimination, and in particular references to unavailable object from
code that was expected to be discarded. The worst that can happen in
this case is that gcc might produce a strange construction where a
runtime check will choose between the two alternative implementations of
test_bit. Another is that it will select the 'wrong' implementation.
Both will result is some code-bloat, but at least the code should work
properly.
I have not checked with 3.2. The oldest compiler I have available here
is 3.3. That version compiles the functions as expected: I have found
instances of either type in the objdump and I have not found strange
constructions with both types there.
If you were thinking of another/bigger problem with gcc-3.2, could you
please give me a pointer?
Greetings,
Alexander
> -Andi
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Or how I learned to stop worrying and
love email again
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-13 18:03 ` Alexander van Heukelum
@ 2008-04-13 18:16 ` Andi Kleen
2008-04-14 12:24 ` Alexander van Heukelum
0 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2008-04-13 18:16 UTC (permalink / raw
To: Alexander van Heukelum
Cc: Andi Kleen, Alexander van Heukelum, Ingo Molnar, linux-kernel
> I have googled, but the only problem I found was concerning dead code
It might come as a surprise, but google is still not omniscient.
> elimination, and in particular references to unavailable object from
> code that was expected to be discarded. The worst that can happen in
> this case is that gcc might produce a strange construction where a
> runtime check will choose between the two alternative implementations of
> test_bit. Another is that it will select the 'wrong' implementation.
> Both will result is some code-bloat, but at least the code should work
> properly.
Yes, but extreme code bloat can be fatal.
> I have not checked with 3.2. The oldest compiler I have available here
> is 3.3. That version compiles the functions as expected: I have found
> instances of either type in the objdump and I have not found strange
> constructions with both types there.
It would be good to check on 3.2 too to avoid potential nasty surprises.
> If you were thinking of another/bigger problem with gcc-3.2, could you
> please give me a pointer?
Not sure what you mean here.
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
2008-04-13 16:58 ` Andi Kleen
@ 2008-04-14 7:52 ` Ingo Molnar
2008-04-14 13:49 ` Alexander van Heukelum
1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2008-04-14 7:52 UTC (permalink / raw
To: Alexander van Heukelum; +Cc: linux-kernel, heukelum
* Alexander van Heukelum <heukelum@mailshack.com> wrote:
> On x86, test_bit is currently implemented as a preprocessor macro. It
> uses gcc's __builtin_constant_p to determine if the bit position is
> known at compile time and defers to one of two functions depending on
> that. This changes the same logic to an __always_inline wrapper
> instead.
thanks Alexander, applied.
> Hi Ingo,
>
> This patch is a cleanup, changing a macro into an inline function.
>
> However, the size comparison here is interesting. Allowing gcc to
> uninline functions marked inline causes a nice decrease in size for a
> defconfig, but on my tiny config the size increases.
>
> text data bss dec hex vmlinux/i386
> 4850554 474688 622592 5947834 5ac1ba defconfig+forced inlining
> 4850888 474688 622592 5948168 5ac308 defconfig+forced inlining (patched)
> 4721175 474688 622592 5818455 58c857 defconfig (uninlining)
> 4705595 474688 622592 5802875 588b7b defconfig (uninlining, patched)
>
> text data bss dec hex vmlinux/i386
> 759524 77400 81920 918844 e053c tiny (forced inlining)
> 759562 77400 81920 918882 e0562 tiny (forced inlining, patched)
> 787508 77400 81920 946828 e728c tiny (uninlining)
> 784492 77400 81920 943812 e66c4 tiny (uninlining, patched)
when gcc is allowed to uninline (the new CONFIG_OPTIMIZE_INLINING=y
feature in x86.git/latest) then your change gives a nice size reduction
of around 0.400% on both large and small kernel images.
the size increase on the 'big' kernel is small, and it's rather
immaterial as well, forced-inlining kernels are +3.080% larger anyway.
the behavior on tiny is interesting - could you post the config you used
for your tiny kernel?
what would be interesting to see is the effect of allowing gcc to
optimize for size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) - and compare
!CC_OPTIMIZE_FOR_SIZE+!OPTIMIZE_INLINING against
CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y kernels - the size difference
should be _brutal_.
> If gcc decides to emit a separate function instead of inlining it, the
> image can have multiple instances of this uninlined function.
> Moreover, gcc decides if a function is too big to be inlined using a
> heuristic and sometimes gets it wrong. In particular, it uninlined
> constant_bit_test which indeed looks a bit big, but should reduce to a
> single assembly instruction after constant folding.
yep, gcc could definitely improve here. But if we never give it the
possibility to do so (by always forcing inlining) then gcc (or any other
compiler) wont really be optimized for Linux in this area.
we saw that with CC_OPTIMIZE_FOR_SIZE=y well - when the Linux kernel
started using it then both the correctness and the quality of gcc's -Os
output improved.
> So I guess we should sprinkle some __always_inlines in the core parts
> of the kernel? The most problematic ones are easily spotted using: nm
> -S vmlinux|uniq -f2 -D
yes - butwe should use a separate (new) __hint_always_inline tag - and
keep __always_inline for the cases where the code _must_ be inlined for
correctness reasons. (there are a few places where we do that)
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-13 18:16 ` Andi Kleen
@ 2008-04-14 12:24 ` Alexander van Heukelum
2008-04-14 12:28 ` Ingo Molnar
2008-04-14 12:34 ` Andi Kleen
0 siblings, 2 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-14 12:24 UTC (permalink / raw
To: Andi Kleen; +Cc: Alexander van Heukelum, Ingo Molnar, linux-kernel
On Sun, 13 Apr 2008 20:16:42 +0200, "Andi Kleen" <andi@firstfloor.org>
said:
> > I have googled, but the only problem I found was concerning dead code
>
> It might come as a surprise, but google is still not omniscient.
Hi,
It's just not always easy to think of the right question to ask the
oracle ;).
> > elimination, and in particular references to unavailable object from
> > code that was expected to be discarded. The worst that can happen in
> > this case is that gcc might produce a strange construction where a
> > runtime check will choose between the two alternative implementations of
> > test_bit. Another is that it will select the 'wrong' implementation.
> > Both will result is some code-bloat, but at least the code should work
> > properly.
>
> Yes, but extreme code bloat can be fatal.
>
> > I have not checked with 3.2. The oldest compiler I have available here
> > is 3.3. That version compiles the functions as expected: I have found
> > instances of either type in the objdump and I have not found strange
> > constructions with both types there.
>
> It would be good to check on 3.2 too to avoid potential nasty surprises.
Checked with a stock 3.2.3 form gcc.gnu.org (compiled using debian's
gcc-2.95). It gets it right: places that should get the C version get
the C version and places that should get the inline-assembly version
get the inline-assembly version. No funny things at all.
B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a
good reason why gcc-3.2.3 (5 years old in a week) should still be
supported?
> > If you were thinking of another/bigger problem with gcc-3.2, could you
> > please give me a pointer?
>
> Not sure what you mean here.
I just thought you might have been thinking of a specific problem. A
bug-report or something like that would have been convenient to have.
Looking for a problem is easier than looking for no problems ;).
Greetings,
Alexander
> -Andi
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Does exactly what it says on the tin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-14 12:24 ` Alexander van Heukelum
@ 2008-04-14 12:28 ` Ingo Molnar
2008-04-14 12:34 ` Andi Kleen
1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2008-04-14 12:28 UTC (permalink / raw
To: Alexander van Heukelum; +Cc: Andi Kleen, Alexander van Heukelum, linux-kernel
* Alexander van Heukelum <heukelum@fastmail.fm> wrote:
> B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a
> good reason why gcc-3.2.3 (5 years old in a week) should still be
> supported?
we normally desupport given gcc versions only if their support in the
kernel is very difficult and leads to lots of uglinesses, or if a gcc
version is known to fatally miscompile the kernel, with no workaround
available.
i.e. we try not to break people's build setups (folks with a specific
gcc version in a cross-build setup), without a good reason.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-14 12:24 ` Alexander van Heukelum
2008-04-14 12:28 ` Ingo Molnar
@ 2008-04-14 12:34 ` Andi Kleen
1 sibling, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2008-04-14 12:34 UTC (permalink / raw
To: Alexander van Heukelum
Cc: Andi Kleen, Alexander van Heukelum, Ingo Molnar, linux-kernel
> Checked with a stock 3.2.3 form gcc.gnu.org (compiled using debian's
> gcc-2.95). It gets it right: places that should get the C version get
> the C version and places that should get the inline-assembly version
> get the inline-assembly version. No funny things at all.
Great. Thanks for testing. Perhaps the problem was limited to 2.9x only.
> B.T.W., debian and ubuntu don't provide gcc-3.2 packages. Is there a
> good reason why gcc-3.2.3 (5 years old in a week) should still be
> supported?
The compiler version is normally only moved up when there is some
bug in an older version that cannot be reasonably be worked around.
So far 3.2 didn't have anything like this, at least not on x86
(on some other architectures there are problems with older toolkits)
-Andi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86: always_inline wrapper for x86's test_bit
2008-04-14 7:52 ` Ingo Molnar
@ 2008-04-14 13:49 ` Alexander van Heukelum
0 siblings, 0 replies; 9+ messages in thread
From: Alexander van Heukelum @ 2008-04-14 13:49 UTC (permalink / raw
To: Ingo Molnar, Alexander van Heukelum; +Cc: linux-kernel
On Mon, 14 Apr 2008 09:52:04 +0200, "Ingo Molnar" <mingo@elte.hu> said:
> the behavior on tiny is interesting - could you post the config you used
> for your tiny kernel?
I use the following mini-config (../klibc.i386.mini). To expand it:
make allnoconfig KCONFIG_ALLCONFIG="../klibc.i386.mini"
CONFIG_EXPERIMENTAL=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="../initramfs"
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_EMBEDDED=y
CONFIG_KALLSYMS=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_SLOB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MPENTIUMII=y
CONFIG_HZ_100=y
CONFIG_PHYSICAL_START=0x110000
CONFIG_PHYSICAL_ALIGN=0x10000
CONFIG_BINFMT_ELF=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_PROC_FS=y
CONFIG_SYSFS=y
CONFIG_OPTIMIZE_INLINING=y
This references the file initramfs:
#####################
# Shared klibc initramfs
# See gen_init_cpio for syntax
dir dev 0755 0 0
dir lib 0755 0 0
dir lib/modules 0755 0 0
dir bin 0755 0 0
nod dev/console 0600 0 0 c 5 1
file lib/klibc-K9OvCTU5pCdyhCSkxwBjXuR7KA4.so
../klibc/usr/klibc/klibc-K9OvCTU5pCdyhCSkxw$
file init ../init 0755 0 0
file bin/sh ../klibc/usr/dash/sh.shared 0755 0 0
file bin/kinit ../klibc/usr/kinit/kinit.shared 0755 0 0
file bin/gzip ../klibc/usr/gzip/gzip 0755 0 0
file bin/cat ../klibc/usr/utils/shared/cat 0755 0 0
file bin/chroot ../klibc/usr/utils/shared/chroot 0755 0 0
file bin/cpio ../klibc/usr/utils/shared/cpio 0755 0 0
file bin/dd ../klibc/usr/utils/shared/dd 0755 0 0
file bin/false ../klibc/usr/utils/shared/false 0755 0 0
file bin/halt ../klibc/usr/utils/shared/halt 0755 0 0
file bin/kill ../klibc/usr/utils/shared/kill 0755 0 0
file bin/ln ../klibc/usr/utils/shared/ln 0755 0 0
file bin/ps ../klibc/usr/utils/shared/minips 0755 0 0
file bin/mkdir ../klibc/usr/utils/shared/mkdir 0755 0 0
file bin/mkfifo ../klibc/usr/utils/shared/mkfifo 0755 0 0
file bin/mknod ../klibc/usr/utils/shared/mknod 0755 0 0
file bin/mount ../klibc/usr/utils/shared/mount 0755 0 0
file bin/nuke ../klibc/usr/utils/shared/nuke 0755 0 0
file bin/pivot_root ../klibc/usr/utils/shared/pivot_root 0755 0 0
file bin/poweroff ../klibc/usr/utils/shared/poweroff 0755 0 0
file bin/readlink ../klibc/usr/utils/shared/readlink 0755 0 0
file bin/reboot ../klibc/usr/utils/shared/reboot 0755 0 0
file bin/sleep ../klibc/usr/utils/shared/sleep 0755 0 0
file bin/sync ../klibc/usr/utils/shared/sync 0755 0 0
file bin/true ../klibc/usr/utils/shared/true 0755 0 0
file bin/umount ../klibc/usr/utils/shared/umount 0755 0 0
file bin/uname ../klibc/usr/utils/shared/uname 0755 0 0
And ../init is a small script:
#! /bin/sh
mkdir proc
mount -t proc proc /proc
mkdir sys
mount -t sysfs sysfs /sys
bin/sh -i
reboot
This is usually the first config I try to build if I change something.
Klibc is here: git://git.kernel.org/pub/scm/libs/klibc/klibc.git
If all goes well you will be able to boot to a shell with a recent
qemu using: qemu -m 4 -smp 2 -cpu pentium2 -nographic -no-reboot
-serial stdio -cdrom arch/x86/boot/image.iso.
> what would be interesting to see is the effect of allowing gcc to
> optimize for size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) - and compare
> !CC_OPTIMIZE_FOR_SIZE+!OPTIMIZE_INLINING against
> CC_OPTIMIZE_FOR_SIZE=y+OPTIMIZE_INLINING=y kernels - the size difference
> should be _brutal_.
I see I had CONFIG_DEBUG_SECTION_MISMATCH enabled all the time.
This has quite a bit of influence on code generation! I did not
expect that. Here are the numbers for the configuration given
above for all combinations of the three settings.
Using: gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
CONFIG_DEBUG_SECTION_MISMATCH=y
text data bss dec hex vmlinux/i386
806039 81464 73728 961231 eaacf OPT_SIZE=y OPT_INL=y
778053 81464 73728 933245 e3d7d OPT_SIZE=y OPT_INL=n
929940 81464 73728 1085132 108ecc OPT_SIZE=n OPT_INL=y
929541 81464 73728 1084733 108d3d OPT_SIZE=n OPT_INL=n
CONFIG_DEBUG_SECTION_MISMATCH=n
text data bss dec hex vmlinux/i386
768031 81464 73728 923223 e1657 OPT_SIZE=y OPT_INL=y
763675 81464 73728 918867 e0553 OPT_SIZE=y OPT_INL=n
904284 81464 73728 1059476 102a94 OPT_SIZE=n OPT_INL=y
904764 81464 73728 1059956 102c74 OPT_SIZE=n OPT_INL=n
The patch still helps OPT_SIZE=y OPT_INL=y, though:
765896 81464 73728 921088 e0e00 OPT_SIZE=y OPT_INL=y, patched
763717 81464 73728 918909 e057d OPT_SIZE=y OPT_INL=n, patched
904490 81464 73728 1059682 102b62 OPT_SIZE=n OPT_INL=y, patched
904938 81464 73728 1060130 102d22 OPT_SIZE=n OPT_INL=n, patched
> > If gcc decides to emit a separate function instead of inlining it, the
> > image can have multiple instances of this uninlined function.
> > Moreover, gcc decides if a function is too big to be inlined using a
> > heuristic and sometimes gets it wrong. In particular, it uninlined
> > constant_bit_test which indeed looks a bit big, but should reduce to a
> > single assembly instruction after constant folding.
>
> yep, gcc could definitely improve here. But if we never give it the
> possibility to do so (by always forcing inlining) then gcc (or any other
> compiler) wont really be optimized for Linux in this area.
>
> we saw that with CC_OPTIMIZE_FOR_SIZE=y well - when the Linux kernel
> started using it then both the correctness and the quality of gcc's -Os
> output improved.
>
> > So I guess we should sprinkle some __always_inlines in the core parts
> > of the kernel? The most problematic ones are easily spotted using: nm
> > -S vmlinux|uniq -f2 -D
>
> yes - butwe should use a separate (new) __hint_always_inline tag - and
> keep __always_inline for the cases where the code _must_ be inlined for
> correctness reasons. (there are a few places where we do that)
That's becoming quite long. What about __wrapper? It should only be
applied to functions that do nothing more than select an implementation
based on compile-time properties of the arguments of the function or on
configuration variables.
Greetings,
Alexander
> Ingo
--
Alexander van Heukelum
heukelum@fastmail.fm
--
http://www.fastmail.fm - Or how I learned to stop worrying and
love email again
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-04-14 13:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-13 11:23 [PATCH] x86: always_inline wrapper for x86's test_bit Alexander van Heukelum
2008-04-13 16:58 ` Andi Kleen
2008-04-13 18:03 ` Alexander van Heukelum
2008-04-13 18:16 ` Andi Kleen
2008-04-14 12:24 ` Alexander van Heukelum
2008-04-14 12:28 ` Ingo Molnar
2008-04-14 12:34 ` Andi Kleen
2008-04-14 7:52 ` Ingo Molnar
2008-04-14 13:49 ` Alexander van Heukelum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.