* TOS target broken wrt RFC 1349
@ 2004-04-28 12:15 Chris Wilson
2004-04-28 23:26 ` Henrik Nordstrom
0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-28 12:15 UTC (permalink / raw
To: Fred N. van Kempen; +Cc: Netfilter Developers
Summary: the TOS target (and indeed the whole Linux kernel) appears to use
different bits in the IP header than those defined in RFC 1349 for TOS
use, therefore TOS under Linux is seriously broken?
Hi all,
I have a question about the TOS target, because I'm confused about how it
works. I know that TOS has been superceded by DSCP, but most routers
and networks still do not support DSCP but do support TOS.
RFC 1349 [ftp://ftp.isi.edu/in-notes/rfc1349.txt] defines the use of the
TOS octet in the IP header as follows:
The Type of Service octet consists of three fields:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | |
| PRECEDENCE | TOS | MBZ |
| | | |
+-----+-----+-----+-----+-----+-----+-----+-----+
The first field, labeled "PRECEDENCE" above, is intended to denote
the importance or priority of the datagram. This field is not
discussed in detail in this memo.
The second field, labeled "TOS" above, denotes how the network should
make tradeoffs between throughput, delay, reliability, and cost. The
TOS field is the primary topic of this memo.
Now, if we look at linux/include/ip.h:
#define IPTOS_TOS_MASK 0x1E
#define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
#define IPTOS_LOWDELAY 0x10
#define IPTOS_THROUGHPUT 0x08
#define IPTOS_RELIABILITY 0x04
#define IPTOS_MINCOST 0x02
It seems to me that the four bits which ip.h specifies for TOS flags are
bits 1-5, not bits 3-7, in the TOS octet. Bits 0 and 1 are currently
redefined by RFC 3168 [ftp://ftp.isi.edu/in-notes/rfc3168.txt] for use in
ECN.
Therefore, it appears that use of the iptables TOS target, which modifies
bits 1-5 using the constants defined inip.h, will have unexpected results,
corrupting the ECN flags and not setting the actual TOS flags to the
expected values.
Can anyone tell me where I'm going wrong, if at all, or whether in their
experience the TOS target works as expected under Linux? I would guess
that this problem permeates the whole kernel, and as a result would only
be a problem for interoperability between Linux and non-Linux systems.
tcpdump shows that the value of the TOS octet (which it reports as
[tos 0x??]) is set to the value specified on the iptables command line,
and provides additional confirmation that the TOS target does indeed set
the wrong bits in this octet. For example, with "-j TOS --set-tos 0x02",
we see:
12:55:28.150000 xxx.115.249.2.smtp > xxx.255.116.207.3225: . ack 2770120
win 62780 (DF) [tos 0x2,ECT]
(tcpdump's interpretation of the ECT bit is also wrong under RFC 2481
[ftp://ftp.isi.edu/in-notes/rfc2481.txt], since it uses bits 0-1 instead
of 6-7 as mandated by the RFC for ECN).
Cheers, Chris.
--
_ __ __ _
/ __/ / ,__(_)_ | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt RFC 1349
2004-04-28 12:15 TOS target broken wrt RFC 1349 Chris Wilson
@ 2004-04-28 23:26 ` Henrik Nordstrom
2004-04-29 13:52 ` Chris Wilson
2004-04-29 17:31 ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
0 siblings, 2 replies; 8+ messages in thread
From: Henrik Nordstrom @ 2004-04-28 23:26 UTC (permalink / raw
To: Chris Wilson; +Cc: Netfilter Developers
On Wed, 28 Apr 2004, Chris Wilson wrote:
> RFC 1349 [ftp://ftp.isi.edu/in-notes/rfc1349.txt] defines the use of the
> TOS octet in the IP header as follows:
>
> The Type of Service octet consists of three fields:
>
> 0 1 2 3 4 5 6 7
> +-----+-----+-----+-----+-----+-----+-----+-----+
> | | | |
> | PRECEDENCE | TOS | MBZ |
> | | | |
> +-----+-----+-----+-----+-----+-----+-----+-----+
If I am not mistaken network byte order defines the highest order bit as
bit 0... this is more obvious when looking at larger structures such as an
IP header and can be confusing when looking at a field which maps directly
to an octet like the IP TOS field.
> It seems to me that the four bits which ip.h specifies for TOS flags are
> bits 1-5, not bits 3-7, in the TOS octet. Bits 0 and 1 are currently
> redefined by RFC 3168 [ftp://ftp.isi.edu/in-notes/rfc3168.txt] for use in
> ECN.
Actually this is bits 6 & 7 (network order), which maps to the last bit of
the TOS field and the MBZ field above, or bits 0+1 in host order when
reading the TOS octet alone.
> Therefore, it appears that use of the iptables TOS target, which modifies
> bits 1-5 using the constants defined inip.h, will have unexpected results,
> corrupting the ECN flags and not setting the actual TOS flags to the
> expected values.
The TOS target allows you to manipulate the whole Type of Service octet as
defined by STD 5 (RFC 791). As this target gives you full power to
manipulate this field it is your responsibility to not mess around with
bits in ways outside of their defined scope.
> (tcpdump's interpretation of the ECT bit is also wrong under RFC 2481
> [ftp://ftp.isi.edu/in-notes/rfc2481.txt], since it uses bits 0-1 instead
> of 6-7 as mandated by the RFC for ECN).
Actually I think both Linux, tcpdump and the RFC are correct here. Just a
matter of different bit notions obfuscating matters.
Regards
Henrik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt RFC 1349
2004-04-28 23:26 ` Henrik Nordstrom
@ 2004-04-29 13:52 ` Chris Wilson
2004-04-29 17:58 ` Henrik Nordstrom
2004-04-29 17:31 ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-29 13:52 UTC (permalink / raw
To: Henrik Nordstrom; +Cc: Netfilter Developers
Hi Henrik,
> If I am not mistaken network byte order defines the highest order bit as
> bit 0... this is more obvious when looking at larger structures such as an
> IP header and can be confusing when looking at a field which maps directly
> to an octet like the IP TOS field.
OK, thank you for clearing that up for me. However, either tcpdump or the
kernel must be wrong, because when I set TOS to 0x02, tcpdump says that I
set the ECT bit! (I know you will all say that tcpdump is wrong :-)
> > Therefore, it appears that use of the iptables TOS target, which modifies
> > bits 1-5 using the constants defined inip.h, will have unexpected results,
> > corrupting the ECN flags and not setting the actual TOS flags to the
> > expected values.
>
> The TOS target allows you to manipulate the whole Type of Service octet as
> defined by STD 5 (RFC 791). As this target gives you full power to
> manipulate this field it is your responsibility to not mess around with
> bits in ways outside of their defined scope.
Well, it doesn't give you full power over the TOS octet, see ipt_TOS.c:
if (tos != IPTOS_LOWDELAY
&& tos != IPTOS_THROUGHPUT
&& tos != IPTOS_RELIABILITY
&& tos != IPTOS_MINCOST
&& tos != IPTOS_NORMALSVC) {
printk(KERN_WARNING "TOS: bad tos value %#x\n", tos);
return 0;
}
> Actually I think both Linux, tcpdump and the RFC are correct here. Just a
> matter of different bit notions obfuscating matters.
I wasn't implying that the RFC was wrong :-) but tcpdump and the kernel
disagree over the meanings of these bits. I will contact the tcpdump
maintainers.
Cheers, Chris.
--
_ __ __ _
/ __/ / ,__(_)_ | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt ECN (was RFC 1349)
2004-04-28 23:26 ` Henrik Nordstrom
2004-04-29 13:52 ` Chris Wilson
@ 2004-04-29 17:31 ` Chris Wilson
2004-04-30 9:25 ` Glen Turner
1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-29 17:31 UTC (permalink / raw
To: Henrik Nordstrom; +Cc: Netfilter Developers
Hi Henrik,
Sorry to bother you again, but based on the information in the RFCs
(condensed):
RFC 1349:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| PRECEDENCE | TOS | MBZ |
+-----+-----+-----+-----+-----+-----+-----+-----+
RFC 2474:
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| DSCP | CU |
+---+---+---+---+---+---+---+---+
RFC 3168:
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+
Would you say that the bits allocated to ECN overlap with the old TOS and
MBZ fields (as it appears to me now), rather than the bit order being
reversed in one or two of these diagrams?
If so, then the TOS target is dangerous since it will disrupt the
operation of ECN, and the documentation should mention that. I'd be happy
to write a patch to update it if people are interested.
Cheers, Chris.
--
_ __ __ _
/ __/ / ,__(_)_ | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt RFC 1349
2004-04-29 13:52 ` Chris Wilson
@ 2004-04-29 17:58 ` Henrik Nordstrom
0 siblings, 0 replies; 8+ messages in thread
From: Henrik Nordstrom @ 2004-04-29 17:58 UTC (permalink / raw
To: Chris Wilson; +Cc: Netfilter Developers
On Thu, 29 Apr 2004, Chris Wilson wrote:
> OK, thank you for clearing that up for me. However, either tcpdump or the
> kernel must be wrong, because when I set TOS to 0x02, tcpdump says that I
> set the ECT bit! (I know you will all say that tcpdump is wrong :-)
It is a matter on conflicting RFCs.
Both RFC 1349 (obsolete TOS RFC) and RFC 3168 (ECN) defines different
meanings of bit #6 (0x02).
STD 5 (RFC 791). Defines 3 bits precedence and 3 bits type of service.
The last 2 bits (#6, #7) is undefined.
RFC 1349. Defines 3 bits of precedence (STD 5) and 4 bits of TOS. This
matches the Linux TOS definitions. Only 1 bit undefined of the TOS octet
in this RFC (MBZ, Must Be Zero)
RFC 2474. Defines 6 bits of DS, combining the precendence and TOS bits of
STD 5 as one single 6 bit field. The code pints xxx000 is defined in the
same manner as the Precedence of STD 5. The last 2 bits of the TOS octet
is left undefined.
RFC 1349 defines the two unused bits of RFC 2474 for use by ECN.
> Well, it doesn't give you full power over the TOS octet, see ipt_TOS.c:
>
> if (tos != IPTOS_LOWDELAY
> && tos != IPTOS_THROUGHPUT
> && tos != IPTOS_RELIABILITY
> && tos != IPTOS_MINCOST
> && tos != IPTOS_NORMALSVC) {
> printk(KERN_WARNING "TOS: bad tos value %#x\n", tos);
> return 0;
> }
Ugh.. did not know of this. But ok.. this is not good. The TOS is far to
overloaded to assume that one wants to use on obsoleted RFC.. In addition
for all practical purposes Precedence is a form of TOS and is why it is
included in DS (and is in fact the only defined codepoints of DS)
Regards
Henrik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt ECN (was RFC 1349)
2004-04-29 17:31 ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
@ 2004-04-30 9:25 ` Glen Turner
2004-04-30 10:22 ` Chris Wilson
0 siblings, 1 reply; 8+ messages in thread
From: Glen Turner @ 2004-04-30 9:25 UTC (permalink / raw
To: Chris Wilson; +Cc: Henrik Nordstrom, Netfilter Developers
On Fri, 2004-04-30 at 03:01, Chris Wilson wrote:
> Sorry to bother you again, but based on the information in the RFCs
> (condensed):
>
> RFC 1349:
>
> 0 1 2 3 4 5 6 7
> +-----+-----+-----+-----+-----+-----+-----+-----+
> | PRECEDENCE | TOS | MBZ |
> +-----+-----+-----+-----+-----+-----+-----+-----+
> RFC 3168:
>
> 0 1 2 3 4 5 6 7
> +-----+-----+-----+-----+-----+-----+-----+-----+
> | DS FIELD, DSCP | ECN FIELD |
> +-----+-----+-----+-----+-----+-----+-----+-----+
Bit 6, the one at issue, was occasionally used for the "minimize cost"
TOS. It never gained much OS support at the time, and Netfilter offers
better support for that ToS than the OSs of the pre-DiffServ era.
Monitoring for that ToS bit by backbone ISP during the IETF discussions
about ECN didn't show any use. Ironically Netfilter could alter that
situation :-)
> If so, then the TOS target is dangerous since it will disrupt the
> operation of ECN, and the documentation should mention that. I'd be happy
> to write a patch to update it if people are interested.
Best if Netfilter makes it clear that Bit 6 is discouraged. Also the
description
# iptables --match tos --help
[!] --tos value Match Type of Service field from one of the
following numeric or descriptive values:
Minimize-Delay 16 (0x10)
Maximize-Throughput 8 (0x08)
Maximize-Reliability 4 (0x04)
Minimize-Cost 2 (0x02)
Normal-Service 0 (0x00)
is misleading. Unlike DSCP there are no values for the ToS field as the
ToS field isn't a variable but a group of independent bits, of which
only one can be set at a time (of course there's an internal value, but
you shouldn't be leaking that as it gets confusing. For example lots of
software (including other parts of the kernel like setsockopt(IP_TOS))
use the value of the Precedence/TOS/MBZ octet and mask out the
irrelevant bits).
It would help immensely if the tos page noted that ToS is deprecated and
if ToS-backward-compatibility were added to DSCP.
# iptables --match dscp --dscp tos-minimize-delay
Obviously there would be no tos-minimize-cost, but that would be no
loss.
--
Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936
Network Engineer Email: glen.turner@aarnet.edu.au
Australian Academic & Research Network www.aarnet.edu.au
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt ECN (was RFC 1349)
2004-04-30 9:25 ` Glen Turner
@ 2004-04-30 10:22 ` Chris Wilson
2004-05-04 5:20 ` Glen Turner
0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-30 10:22 UTC (permalink / raw
To: Glen Turner; +Cc: Henrik Nordstrom, Netfilter Developers
Hi Glen,
> Bit 6, the one at issue, was occasionally used for the "minimize cost"
> TOS. It never gained much OS support at the time, and Netfilter offers
> better support for that ToS than the OSs of the pre-DiffServ era.
> Monitoring for that ToS bit by backbone ISP during the IETF discussions
> about ECN didn't show any use. Ironically Netfilter could alter that
> situation :-)
Yes, I was trying to use it to reduce the priority of TCP packets outbound
from our leased line router to the Internet, and some firewalls out there
were dropping our TCP traffic because this bit was set (presumably because
they thought it was ECN, and took a dislike to it ;)
> > If so, then the TOS target is dangerous since it will disrupt the
> > operation of ECN, and the documentation should mention that. I'd be happy
> > to write a patch to update it if people are interested.
>
> Best if Netfilter makes it clear that Bit 6 is discouraged.
Not just bit 6. From line 36 of ipt_TOS.c:
iph->tos = (iph->tos & IPTOS_PREC_MASK) | tosinfo->tos;
IPTOS_PREC_MASK is #defined to 0xE0, so this will clear the lowest five
bits of the TOS octet, including both ECN bits, and replace them with
whatever value the user specifies.
Unfortunately, I have to agree with you that minimize-cost and
maximize-security TOS values are now useless since the IETF has redefined
their bits to something else. It would appear that backwards compatibility
has been completely broken, because ECN packets can be handled badly by
TOS routers, and TOS packets handled badly by ECN routers.
Therefore, I think there should be a big fat warning on the TOS target
that it breaks ECN and should not be used on the Internet.
A less drastic option is to not clear the lower two bits, and disable the
minimize-cost TOS target. However, this target will still have strange
interactions with ECN, since an ECN router can set bit 6 to signal
CE, which causes other TOS routers to treat the packet differently
(minimize-cost or an invalid TOS value, depending on the values of the
other bits).
> It would help immensely if the tos page noted that ToS is deprecated and
> if ToS-backward-compatibility were added to DSCP.
I agree, but I'd say that it's not just deprecated, but dangerous as well.
> # iptables --match dscp --dscp tos-minimize-delay
>
> Obviously there would be no tos-minimize-cost, but that would be no
> loss.
Except financially ;)
Cheers, Chris.
--
_ __ __ _
/ __/ / ,__(_)_ | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_ ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: TOS target broken wrt ECN (was RFC 1349)
2004-04-30 10:22 ` Chris Wilson
@ 2004-05-04 5:20 ` Glen Turner
0 siblings, 0 replies; 8+ messages in thread
From: Glen Turner @ 2004-05-04 5:20 UTC (permalink / raw
To: Chris Wilson; +Cc: Netfilter Developers
On Fri, 2004-04-30 at 19:52, Chris Wilson wrote:
> > Obviously there would be no tos-minimize-cost, but that would be no
> > loss.
>
> Except financially ;)
Um no. No ISP offers least charge routing for that ToS.
If you are building a network you can easily offer a
least cost DSCP. For example, we offer a DSCP=8 "worst
effort" service for bulk file transfers. DSCP=8 maps
to IP Precedence=1, allowing for older OSs to participate.
And the default mappings from DSCP to link layer markings
like 802.1p and MPLS Exp work well.
Search Google for "Scavenger service". The AARNet web
site has iptables rules and code changes for Apache and
some FTP servers.
--
Glen Turner Tel: (08) 8303 3936 or +61 8 8303 3936
Network Engineer Email: glen.turner@aarnet.edu.au
Australian Academic & Research Network www.aarnet.edu.au
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-05-04 5:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-28 12:15 TOS target broken wrt RFC 1349 Chris Wilson
2004-04-28 23:26 ` Henrik Nordstrom
2004-04-29 13:52 ` Chris Wilson
2004-04-29 17:58 ` Henrik Nordstrom
2004-04-29 17:31 ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
2004-04-30 9:25 ` Glen Turner
2004-04-30 10:22 ` Chris Wilson
2004-05-04 5:20 ` Glen Turner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.