All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* TOS target broken wrt RFC 1349
@ 2004-04-28 12:15 Chris Wilson
  2004-04-28 23:26 ` Henrik Nordstrom
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-28 12:15 UTC (permalink / raw
  To: Fred N. van Kempen; +Cc: Netfilter Developers

Summary: the TOS target (and indeed the whole Linux kernel) appears to use 
different bits in the IP header than those defined in RFC 1349 for TOS 
use, therefore TOS under Linux is seriously broken?

Hi all,

I have a question about the TOS target, because I'm confused about how it 
works. I know that TOS has been superceded by DSCP, but most routers 
and networks still do not support DSCP but do support TOS.

RFC 1349 [ftp://ftp.isi.edu/in-notes/rfc1349.txt] defines the use of the 
TOS octet in the IP header as follows:

   The Type of Service octet consists of three fields:

                0     1     2     3     4     5     6     7
             +-----+-----+-----+-----+-----+-----+-----+-----+
             |                 |                       |     |
             |   PRECEDENCE    |          TOS          | MBZ |
             |                 |                       |     |
             +-----+-----+-----+-----+-----+-----+-----+-----+

   The first field, labeled "PRECEDENCE" above, is intended to denote
   the importance or priority of the datagram.  This field is not
   discussed in detail in this memo.

   The second field, labeled "TOS" above, denotes how the network should
   make tradeoffs between throughput, delay, reliability, and cost.  The
   TOS field is the primary topic of this memo.

Now, if we look at linux/include/ip.h:

   #define IPTOS_TOS_MASK          0x1E
   #define IPTOS_TOS(tos)          ((tos)&IPTOS_TOS_MASK)
   #define IPTOS_LOWDELAY          0x10
   #define IPTOS_THROUGHPUT        0x08
   #define IPTOS_RELIABILITY       0x04
   #define IPTOS_MINCOST           0x02

It seems to me that the four bits which ip.h specifies for TOS flags are 
bits 1-5, not bits 3-7, in the TOS octet. Bits 0 and 1 are currently 
redefined by RFC 3168 [ftp://ftp.isi.edu/in-notes/rfc3168.txt] for use in 
ECN.

Therefore, it appears that use of the iptables TOS target, which modifies
bits 1-5 using the constants defined inip.h, will have unexpected results,
corrupting the ECN flags and not setting the actual TOS flags to the 
expected values.

Can anyone tell me where I'm going wrong, if at all, or whether in their
experience the TOS target works as expected under Linux? I would guess
that this problem permeates the whole kernel, and as a result would only
be a problem for interoperability between Linux and non-Linux systems.

tcpdump shows that the value of the TOS octet (which it reports as
[tos 0x??]) is set to the value specified on the iptables command line, 
and provides additional confirmation that the TOS target does indeed set 
the wrong bits in this octet. For example, with "-j TOS --set-tos 0x02", 
we see:

  12:55:28.150000 xxx.115.249.2.smtp > xxx.255.116.207.3225: . ack 2770120 
	win 62780 (DF) [tos 0x2,ECT]

(tcpdump's interpretation of the ECT bit is also wrong under RFC 2481
[ftp://ftp.isi.edu/in-notes/rfc2481.txt], since it uses bits 0-1 instead
of 6-7 as mandated by the RFC for ECN).

Cheers, Chris.
-- 
_  __ __     _
 / __/ / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_  ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt RFC 1349
  2004-04-28 12:15 TOS target broken wrt RFC 1349 Chris Wilson
@ 2004-04-28 23:26 ` Henrik Nordstrom
  2004-04-29 13:52   ` Chris Wilson
  2004-04-29 17:31   ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
  0 siblings, 2 replies; 8+ messages in thread
From: Henrik Nordstrom @ 2004-04-28 23:26 UTC (permalink / raw
  To: Chris Wilson; +Cc: Netfilter Developers

On Wed, 28 Apr 2004, Chris Wilson wrote:

> RFC 1349 [ftp://ftp.isi.edu/in-notes/rfc1349.txt] defines the use of the 
> TOS octet in the IP header as follows:
> 
>    The Type of Service octet consists of three fields:
> 
>                 0     1     2     3     4     5     6     7
>              +-----+-----+-----+-----+-----+-----+-----+-----+
>              |                 |                       |     |
>              |   PRECEDENCE    |          TOS          | MBZ |
>              |                 |                       |     |
>              +-----+-----+-----+-----+-----+-----+-----+-----+

If I am not mistaken network byte order defines the highest order bit as 
bit 0... this is more obvious when looking at larger structures such as an 
IP header and can be confusing when looking at a field which maps directly 
to an octet like the IP TOS field.

> It seems to me that the four bits which ip.h specifies for TOS flags are 
> bits 1-5, not bits 3-7, in the TOS octet. Bits 0 and 1 are currently 
> redefined by RFC 3168 [ftp://ftp.isi.edu/in-notes/rfc3168.txt] for use in 
> ECN.

Actually this is bits 6 & 7 (network order), which maps to the last bit of
the TOS field and the MBZ field above, or bits 0+1 in host order when 
reading the TOS octet alone.

> Therefore, it appears that use of the iptables TOS target, which modifies
> bits 1-5 using the constants defined inip.h, will have unexpected results,
> corrupting the ECN flags and not setting the actual TOS flags to the 
> expected values.

The TOS target allows you to manipulate the whole Type of Service octet as
defined by STD 5 (RFC 791). As this target gives you full power to
manipulate this field it is your responsibility to not mess around with
bits in ways outside of their defined scope.

> (tcpdump's interpretation of the ECT bit is also wrong under RFC 2481
> [ftp://ftp.isi.edu/in-notes/rfc2481.txt], since it uses bits 0-1 instead
> of 6-7 as mandated by the RFC for ECN).

Actually I think both Linux, tcpdump and the RFC are correct here. Just a 
matter of different bit notions obfuscating matters.

Regards
Henrik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt RFC 1349
  2004-04-28 23:26 ` Henrik Nordstrom
@ 2004-04-29 13:52   ` Chris Wilson
  2004-04-29 17:58     ` Henrik Nordstrom
  2004-04-29 17:31   ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-29 13:52 UTC (permalink / raw
  To: Henrik Nordstrom; +Cc: Netfilter Developers

Hi Henrik,

> If I am not mistaken network byte order defines the highest order bit as 
> bit 0... this is more obvious when looking at larger structures such as an 
> IP header and can be confusing when looking at a field which maps directly 
> to an octet like the IP TOS field.

OK, thank you for clearing that up for me. However, either tcpdump or the
kernel must be wrong, because when I set TOS to 0x02, tcpdump says that I
set the ECT bit! (I know you will all say that tcpdump is wrong :-)

> > Therefore, it appears that use of the iptables TOS target, which modifies
> > bits 1-5 using the constants defined inip.h, will have unexpected results,
> > corrupting the ECN flags and not setting the actual TOS flags to the 
> > expected values.
> 
> The TOS target allows you to manipulate the whole Type of Service octet as
> defined by STD 5 (RFC 791). As this target gives you full power to
> manipulate this field it is your responsibility to not mess around with
> bits in ways outside of their defined scope.

Well, it doesn't give you full power over the TOS octet, see ipt_TOS.c:

        if (tos != IPTOS_LOWDELAY
            && tos != IPTOS_THROUGHPUT
            && tos != IPTOS_RELIABILITY
            && tos != IPTOS_MINCOST
            && tos != IPTOS_NORMALSVC) {
                printk(KERN_WARNING "TOS: bad tos value %#x\n", tos);
                return 0;
        }

> Actually I think both Linux, tcpdump and the RFC are correct here. Just a 
> matter of different bit notions obfuscating matters.

I wasn't implying that the RFC was wrong :-) but tcpdump and the kernel 
disagree over the meanings of these bits. I will contact the tcpdump 
maintainers.

Cheers, Chris.
-- 
_  __ __     _
 / __/ / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_  ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt ECN (was RFC 1349)
  2004-04-28 23:26 ` Henrik Nordstrom
  2004-04-29 13:52   ` Chris Wilson
@ 2004-04-29 17:31   ` Chris Wilson
  2004-04-30  9:25     ` Glen Turner
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-29 17:31 UTC (permalink / raw
  To: Henrik Nordstrom; +Cc: Netfilter Developers

Hi Henrik,

Sorry to bother you again, but based on the information in the RFCs 
(condensed):

  RFC 1349:

         0     1     2     3     4     5     6     7
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |   PRECEDENCE    |          TOS          | MBZ |
      +-----+-----+-----+-----+-----+-----+-----+-----+

  RFC 2474:

        0   1   2   3   4   5   6   7
      +---+---+---+---+---+---+---+---+
      |         DSCP          |  CU   |
      +---+---+---+---+---+---+---+---+

  RFC 3168:

         0     1     2     3     4     5     6     7
      +-----+-----+-----+-----+-----+-----+-----+-----+
      |          DS FIELD, DSCP           | ECN FIELD |
      +-----+-----+-----+-----+-----+-----+-----+-----+

Would you say that the bits allocated to ECN overlap with the old TOS and 
MBZ fields (as it appears to me now), rather than the bit order being 
reversed in one or two of these diagrams?

If so, then the TOS target is dangerous since it will disrupt the 
operation of ECN, and the documentation should mention that. I'd be happy 
to write a patch to update it if people are interested.

Cheers, Chris.
-- 
_  __ __     _
 / __/ / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_  ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt RFC 1349
  2004-04-29 13:52   ` Chris Wilson
@ 2004-04-29 17:58     ` Henrik Nordstrom
  0 siblings, 0 replies; 8+ messages in thread
From: Henrik Nordstrom @ 2004-04-29 17:58 UTC (permalink / raw
  To: Chris Wilson; +Cc: Netfilter Developers

On Thu, 29 Apr 2004, Chris Wilson wrote:

> OK, thank you for clearing that up for me. However, either tcpdump or the
> kernel must be wrong, because when I set TOS to 0x02, tcpdump says that I
> set the ECT bit! (I know you will all say that tcpdump is wrong :-)

It is a matter on conflicting RFCs.

Both RFC 1349 (obsolete TOS RFC) and RFC 3168 (ECN) defines different
meanings of bit #6 (0x02).

STD 5 (RFC 791). Defines 3 bits precedence and 3 bits type of service. 
The last 2 bits (#6, #7) is undefined.

RFC 1349. Defines 3 bits of precedence (STD 5) and 4 bits of TOS. This
matches the Linux TOS definitions. Only 1 bit undefined of the TOS octet
in this RFC (MBZ, Must Be Zero)

RFC 2474. Defines 6 bits of DS, combining the precendence and TOS bits of
STD 5 as one single 6 bit field. The code pints xxx000 is defined in the
same manner as the Precedence of STD 5. The last 2 bits of the TOS octet
is left undefined.

RFC 1349 defines the two unused bits of RFC 2474 for use by ECN.

> Well, it doesn't give you full power over the TOS octet, see ipt_TOS.c:
> 
>         if (tos != IPTOS_LOWDELAY
>             && tos != IPTOS_THROUGHPUT
>             && tos != IPTOS_RELIABILITY
>             && tos != IPTOS_MINCOST
>             && tos != IPTOS_NORMALSVC) {
>                 printk(KERN_WARNING "TOS: bad tos value %#x\n", tos);
>                 return 0;
>         }

Ugh.. did not know of this. But ok.. this is not good. The TOS is far to
overloaded to assume that one wants to use on obsoleted RFC.. In addition
for all practical purposes Precedence is a form of TOS and is why it is 
included in DS (and is in fact the only defined codepoints of DS)

Regards
Henrik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt ECN (was RFC 1349)
  2004-04-29 17:31   ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
@ 2004-04-30  9:25     ` Glen Turner
  2004-04-30 10:22       ` Chris Wilson
  0 siblings, 1 reply; 8+ messages in thread
From: Glen Turner @ 2004-04-30  9:25 UTC (permalink / raw
  To: Chris Wilson; +Cc: Henrik Nordstrom, Netfilter Developers

On Fri, 2004-04-30 at 03:01, Chris Wilson wrote:

> Sorry to bother you again, but based on the information in the RFCs 
> (condensed):
> 
>   RFC 1349:
> 
>          0     1     2     3     4     5     6     7
>       +-----+-----+-----+-----+-----+-----+-----+-----+
>       |   PRECEDENCE    |          TOS          | MBZ |
>       +-----+-----+-----+-----+-----+-----+-----+-----+

>   RFC 3168:
> 
>          0     1     2     3     4     5     6     7
>       +-----+-----+-----+-----+-----+-----+-----+-----+
>       |          DS FIELD, DSCP           | ECN FIELD |
>       +-----+-----+-----+-----+-----+-----+-----+-----+

Bit 6, the one at issue, was occasionally used for the "minimize cost"
TOS. It never gained much OS support at the time, and Netfilter offers
better support for that ToS than the OSs of the pre-DiffServ era. 
Monitoring for that ToS bit by backbone ISP during the IETF discussions
about ECN didn't show any use. Ironically Netfilter could alter that
situation :-)

> If so, then the TOS target is dangerous since it will disrupt the 
> operation of ECN, and the documentation should mention that. I'd be happy 
> to write a patch to update it if people are interested.

Best if Netfilter makes it clear that Bit 6 is discouraged. Also the
description

# iptables --match tos --help
[!] --tos value                 Match Type of Service field from one of the
                                following numeric or descriptive values:
                                     Minimize-Delay 16 (0x10)
                                     Maximize-Throughput 8 (0x08)
                                     Maximize-Reliability 4 (0x04)
                                     Minimize-Cost 2 (0x02)
                                     Normal-Service 0 (0x00)
 
is misleading. Unlike DSCP there are no values for the ToS field as the
ToS field isn't a variable but a group of independent bits, of which
only one can be set at a time (of course there's an internal value, but
you shouldn't be leaking that as it gets confusing. For example lots of
software (including other parts of the kernel like setsockopt(IP_TOS))
use the value of the Precedence/TOS/MBZ octet and mask out the
irrelevant bits).

It would help immensely if the tos page noted that ToS is deprecated and
if ToS-backward-compatibility were added to DSCP.

# iptables --match dscp --dscp tos-minimize-delay

Obviously there would be no tos-minimize-cost, but that would be no
loss.

-- 
Glen Turner         Tel: (08) 8303 3936 or +61 8 8303 3936 
Network Engineer          Email: glen.turner@aarnet.edu.au
Australian Academic & Research Network   www.aarnet.edu.au

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt ECN (was RFC 1349)
  2004-04-30  9:25     ` Glen Turner
@ 2004-04-30 10:22       ` Chris Wilson
  2004-05-04  5:20         ` Glen Turner
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2004-04-30 10:22 UTC (permalink / raw
  To: Glen Turner; +Cc: Henrik Nordstrom, Netfilter Developers

Hi Glen,

> Bit 6, the one at issue, was occasionally used for the "minimize cost"
> TOS. It never gained much OS support at the time, and Netfilter offers
> better support for that ToS than the OSs of the pre-DiffServ era. 
> Monitoring for that ToS bit by backbone ISP during the IETF discussions
> about ECN didn't show any use. Ironically Netfilter could alter that
> situation :-)

Yes, I was trying to use it to reduce the priority of TCP packets outbound 
from our leased line router to the Internet, and some firewalls out there 
were dropping our TCP traffic because this bit was set (presumably because 
they thought it was ECN, and took a dislike to it ;)

> > If so, then the TOS target is dangerous since it will disrupt the 
> > operation of ECN, and the documentation should mention that. I'd be happy 
> > to write a patch to update it if people are interested.
> 
> Best if Netfilter makes it clear that Bit 6 is discouraged.

Not just bit 6. From line 36 of ipt_TOS.c:

	iph->tos = (iph->tos & IPTOS_PREC_MASK) | tosinfo->tos;

IPTOS_PREC_MASK is #defined to 0xE0, so this will clear the lowest five 
bits of the TOS octet, including both ECN bits, and replace them with 
whatever value the user specifies.

Unfortunately, I have to agree with you that minimize-cost and 
maximize-security TOS values are now useless since the IETF has redefined 
their bits to something else. It would appear that backwards compatibility 
has been completely broken, because ECN packets can be handled badly by 
TOS routers, and TOS packets handled badly by ECN routers.

Therefore, I think there should be a big fat warning on the TOS target 
that it breaks ECN and should not be used on the Internet.

A less drastic option is to not clear the lower two bits, and disable the 
minimize-cost TOS target. However, this target will still have strange 
interactions with ECN, since an ECN router can set bit 6 to signal 
CE, which causes other TOS routers to treat the packet differently 
(minimize-cost or an invalid TOS value, depending on the values of the 
other bits).

> It would help immensely if the tos page noted that ToS is deprecated and
> if ToS-backward-compatibility were added to DSCP.

I agree, but I'd say that it's not just deprecated, but dangerous as well.

> # iptables --match dscp --dscp tos-minimize-delay
> 
> Obviously there would be no tos-minimize-cost, but that would be no
> loss.

Except financially ;)

Cheers, Chris.
-- 
_  __ __     _
 / __/ / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_  ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TOS target broken wrt ECN (was RFC 1349)
  2004-04-30 10:22       ` Chris Wilson
@ 2004-05-04  5:20         ` Glen Turner
  0 siblings, 0 replies; 8+ messages in thread
From: Glen Turner @ 2004-05-04  5:20 UTC (permalink / raw
  To: Chris Wilson; +Cc: Netfilter Developers

On Fri, 2004-04-30 at 19:52, Chris Wilson wrote:

> > Obviously there would be no tos-minimize-cost, but that would be no
> > loss.
> 
> Except financially ;)

Um no.  No ISP offers least charge routing for that ToS.

If you are building a network you can easily offer a
least cost DSCP. For example, we offer a DSCP=8 "worst
effort" service for bulk file transfers. DSCP=8 maps
to IP Precedence=1, allowing for older OSs to participate.
And the default mappings from DSCP to link layer markings
like 802.1p and MPLS Exp work well.

Search Google for "Scavenger service".  The AARNet web
site has iptables rules and code changes for Apache and
some FTP servers.

-- 
Glen Turner         Tel: (08) 8303 3936 or +61 8 8303 3936 
Network Engineer          Email: glen.turner@aarnet.edu.au
Australian Academic & Research Network   www.aarnet.edu.au

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-05-04  5:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-28 12:15 TOS target broken wrt RFC 1349 Chris Wilson
2004-04-28 23:26 ` Henrik Nordstrom
2004-04-29 13:52   ` Chris Wilson
2004-04-29 17:58     ` Henrik Nordstrom
2004-04-29 17:31   ` TOS target broken wrt ECN (was RFC 1349) Chris Wilson
2004-04-30  9:25     ` Glen Turner
2004-04-30 10:22       ` Chris Wilson
2004-05-04  5:20         ` Glen Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.