Thomas Graf [Mon, 26 Mar 2007 06:24:24 +0000 (23:24 -0700)]
[NET] rules: Unified rules dumping
Implements a unified, protocol independant rules dumping function
which is capable of both, dumping a specific protocol family or
all of them. This speeds up dumping as less lookups are required.
Thomas Graf [Thu, 22 Mar 2007 18:48:11 +0000 (11:48 -0700)]
[RTNL]: Message handler registration interface
This patch adds a new interface to register rtnetlink message
handlers replacing the exported rtnl_links[] array which
required many message handlers to be exported unnecessarly.
Gerrit Renker [Tue, 20 Mar 2007 18:31:56 +0000 (15:31 -0300)]
[CCID3]: Use initial RTT sample from SYN exchange
The patch follows the following recommendation made in an erratum to RFC 4342:
"Senders MAY additionally make use of other available RTT measurements,
including those from the initial Request-Response packet exchange."
It implements larger initial windows with regard to this inital RTT measurement,
using the mechanism suggested in draft-ietf-dccp-rfc3448bis, section 4.2.
Gerrit Renker [Tue, 20 Mar 2007 18:23:18 +0000 (15:23 -0300)]
[DCCP]: Provide function for RTT sampling
A recurring problem, in particular in the CCID code, is that RTT samples
from packets with timestamp echo and elapsed time options need to be taken.
This service is provided via a new function dccp_sample_rtt in this patch.
Furthermore, to protect against `insane' RTT samples, the sampled value
is bounded between 100 microseconds and 4 seconds - for which u32 is sufficient.
Gerrit Renker [Tue, 20 Mar 2007 18:04:30 +0000 (15:04 -0300)]
[CCID3]: Remove build warnings for 64bit
This clears the following sparc64 build warnings:
1) warning: format "%ld" expects type "long int", but argument 3 has type "suseconds_t"
2) warning: format "%llu" expects type "long long unsigned int", but argument 3 has type "__u64"
Fixed by using typecast to unsigned. This is argued to be safe, since the quantities, after
de-scaling (factor 2^6) fit all in u32.
Gerrit Renker [Tue, 20 Mar 2007 18:00:28 +0000 (15:00 -0300)]
[DCCP]: More debug information for dccp_wait_for_ccid
This adds more detail in the wait_for_ccid packet scheduling loop.
In particular, it informs about (i) when delay is used and (ii) why
a packet is discarded.
Gerrit Renker [Tue, 20 Mar 2007 17:59:23 +0000 (14:59 -0300)]
[DCCP]: Always use debug-toggle parameters
Currently debugging output (when configured) is automatically enabled when
DCCP modules are compiled into the kernel rather than built as loadable modules.
This is not necessary, since the module parameters in this case become kernel
commandline parameters, e.g. DCCP or CCID3 debug output can be enabled for a
static build by appending the following at the boot prompt:
dccp.dccp_debug=1 dccp_ccid3.ccid3_debug=1
This patch therefore does away with the more complicated way of always enabling
debug output for static builds
Gerrit Renker [Tue, 20 Mar 2007 17:28:44 +0000 (14:28 -0300)]
[CCID3]: Use MSS for larger initial windows
This improves the slow-start phase by using the MSS
(as suggested in RFC 4342, sec. 5) instead of the packet size s.
Also figured out that __u32 is ample resource enough.
Gerrit Renker [Tue, 20 Mar 2007 16:11:24 +0000 (13:11 -0300)]
[CCID3]: Re-order CCID 3 source file
No code change at all.
This splits ccid3.c into a RX and a TX section, so that the file has an
organisation similar to the other ones (e.g. packet_history.{h,c}).
Gerrit Renker [Tue, 20 Mar 2007 16:10:15 +0000 (13:10 -0300)]
[CCID3]: Remove redundant `len' test
Since CCID3 avoids sending 0-byte data packets (cf. ccid3_hc_tx_send_packet),
testing for zero-payload length, as performed by ccid3_hc_tx_update_s, is
redundant - hence removed by this patch.
Gerrit Renker [Tue, 20 Mar 2007 16:08:19 +0000 (13:08 -0300)]
[DCCP]: Remove ambiguity in the way before48 is used
This removes two ambiguities in employing the new definition of before48,
following the analysis on http://www.mail-archive.com/[email protected]/msg01295.html
(1) Updating GSR when P.seqno >= S.SWL
With the old definition we did not update when P.seqno and S.SWL are 2^47 apart. To
ensure the same behaviour as with the old definition, this is replaced with the
equivalent condition dccp_delta_seqno(S.SWL, P.seqno) >= 0
(2) Sending SYNC when P.seqno >= S.OSR
Here it is debatable whether the new definition causes an ambiguity: the case is
similar to (1); and to have consistency with the case (1), we use the equivalent
condition dccp_delta_seqno(S.OSR, P.seqno) >= 0
Gerrit Renker [Tue, 20 Mar 2007 16:03:47 +0000 (13:03 -0300)]
[DCCP]: Fix for follows48
The follows48 relation identifies whether 48-bit sequence number
x is the direct successor of y. Currently, it does not handle cases
of the following type correctly:
follows48(0x(prefix)10000LL, 0x(prefix)0FFFFLL)
where prefix is an arbitrary hex sequence of up to 7 digits.
This is fixed by reusing the new dccp_delta_seqno function.
Gerrit Renker [Tue, 20 Mar 2007 15:26:51 +0000 (12:26 -0300)]
[DCCP]: 48-bit sequence number arithmetic
This patch
* organizes the sequence arithmetic functions into one corner of dccp.h
* performs a small modification of dccp_set_seqno to make it more widely reusable
(now it is safe to use any number, since it performs modulo-2^48 assignment)
* adds functions and generic macros for 48-bit sequence arithmetic:
--48 bit complement
--modulo-48 addition and modulo-48 subtraction
--dccp_inc_seqno now a special case of add48
Constants renamed following a suggestion by Arnaldo.
For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the
number of direct accesses to skb->data and for consistency with all the other
cast skb member helpers.
Robert Olsson [Mon, 19 Mar 2007 23:29:58 +0000 (16:29 -0700)]
[IPV4]: fib_trie root node settings
The threshold for root node can be more aggressive set to get
better tree compression. The new setting mekes the root grow
from 16 to 19 bits and substansial improvemnt in Aver depth
this with the current table of 214393 prefixes
But really the dynamic resize should need more investigation
both in terms convergence and performance and maybe it should
be possible to change...
Maybe just for the brave to start with or we may have to back
this out.
Robert Olsson [Mon, 19 Mar 2007 23:27:37 +0000 (16:27 -0700)]
[IPV4]: fib_trie resize break
The patch below adds break condition for the resize operations. If
we don't achieve the desired fill factor a warning is printed. Trie
should still be operational but new thresholds should be considered.
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures
With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
in further shrinking work, likely with the offsetization of other pointers,
such as ->{data,tail,end}, at the cost of adds, that were minimized by the
usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
accessed multiple times in each function, it also is not more expensive than
before with regards to most of the handling of such headers, like setting one
of these headers to another (transport to network, etc), or subtracting, adding
to/from it, comparing them, etc.
Now we have this layout for sk_buff on a x86_64 machine:
On 32 bits nothing changes, and pointers continue to be used with the compiler
turning all this abstraction layer into dust. But there are some sk_buff
validation tricks that are now possible, humm... :-)
[SK_BUFF]: unions of just one member don't get anything done, kill them
Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.
Patrick McHardy [Fri, 16 Mar 2007 19:34:52 +0000 (12:34 -0700)]
[NET_SCHED]: Fix warning
net/sched/sch_api.c: In function 'psched_show':
net/sched/sch_api.c:1219: warning: format '%08x' expects type 'unsigned int', but argument 6 has type 's64'
Patrick McHardy [Fri, 16 Mar 2007 19:31:28 +0000 (12:31 -0700)]
[NET_SCHED]: sch_cbq: fix watchdog scheduled too late
q->now is increased during dequeue and doesn't contain the current time
afterwards, resulting in a too large timeout value for the qdisc watchdog.
Use "now" instead, which still contains the current time.
Patrick McHardy [Fri, 16 Mar 2007 08:23:28 +0000 (01:23 -0700)]
[NET_SCHED]: Export real timer resolution in /proc/net/psched
The timer resolution exported in /proc/net/psched is used by userspace to
calculate HTB's burst values. Currently it is set to HZ, since we're now
using hrtimers, use KTIME_MONOTONIC_RES, which makes HTB use smaller burst
values.
This patch also affects libnl, which incorrectly uses this value for
the SFQ perturbation parameter, which is always in seconds, and some
routing cache values, which are in USER_HZ, so both cases are broken
anyway.
Patrick McHardy [Fri, 16 Mar 2007 08:23:02 +0000 (01:23 -0700)]
[NET_SCHED]: kill jiffie conversion macros
Now that all packet schedulers have been converted to hrtimers most users
of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use
it to convert external time units to packet scheduler clock ticks, so use
PSCHED_TICKS_PER_SEC instead.
Patrick McHardy [Fri, 16 Mar 2007 08:22:20 +0000 (01:22 -0700)]
[NET_SCHED]: sch_cbq: use hrtimer for delay_timer
Switch delay_timer to hrtimer.
The class penalty parameter is changed to use psched ticks as units.
Since iproute never supported using this and the only existing user
(libnl) incorrectly assumes psched ticks as units anyway, this
shouldn't break anything.
Patrick McHardy [Fri, 16 Mar 2007 08:21:40 +0000 (01:21 -0700)]
[NET_SCHED]: sch_cbq: fix cbq_undelay_prio for non-active priorites
cbq_undelay_prio is supposed to return a time delta, but returns the
current time for non-active priorities, causing cbq_undelay to mark
the priority as active and schedule a timer for twice the current
time.
Patrick McHardy [Fri, 16 Mar 2007 08:18:42 +0000 (01:18 -0700)]
[NET_SCHED]: Use ktime as clocksource
Get rid of the manual clock source selection mess and use ktime. Also
use a scalar representation, which allows to clean up pkt_sched.h a bit
more and results in less ktime_to_ns() calls in most cases.
The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
inefficient by this patch, following patches will convert all qdiscs
to hrtimers and get rid of them entirely.
[SK_BUFF]: More skb_put related skb_reset_transport_header
This time we have to set it to skb->tail that is not anymore equal to
skb->data, so we either add a new helper or just add the skb->tail - skb->data
offset, for now do the later.
ip6_nd_hdr is always called immediately after a alloc_skb + skb_reserve
sequence, i.e. when skb->tail is equal to skb->data, making it correct to use
skb_reset_network_header().
[NETFILTER]: nfnetlink: parse attributes with nfattr_parse in nfnetlink_check_attribute
Use nfattr_parse to parse attributes, this patch also modifies the default
behaviour since unknown attributes will be ignored instead of returning
EINVAL. This ensure backward compatibility: new libraries with new
attributes and old kernels can work.
Willy Tarreau [Wed, 14 Mar 2007 23:44:53 +0000 (16:44 -0700)]
[NETFILTER]: TCP conntrack: factorize out the PUSH flag
The PUSH flag is accepted with every other valid combination.
Let's get it out of the tcp_valid_flags table and reduce the
number of combinations we have to handle. This does not
significantly reduce the table size however (8 bytes).
Willy Tarreau [Wed, 14 Mar 2007 23:44:31 +0000 (16:44 -0700)]
[NETFILTER]: TCP conntrack: accept RST|PSH as valid
This combination has been encountered on an IBM AS/400 in response
to packets sent to a closed session. There is no particular reason
to mark it invalid.
The retrying after an allocation failure is not necessary anymore
since we're holding the mutex the entire time, for the same
reason the double allocation race can't happen anymore.
Now that we don't use nf_conntrack_lock anymore but a single mutex for
all protocol handling, no need to release and grab it again for sysctl
registration.
Patrick McHardy [Wed, 14 Mar 2007 23:38:25 +0000 (16:38 -0700)]
[NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration
Remove ugly special-casing of nf_conntrack_l4proto_generic, all it
wants is its sysctl tables registered, so do that explicitly in an
init function and move the remaining protocol initialization and
cleanup code to nf_conntrack_proto.c as well.
Patrick McHardy [Wed, 14 Mar 2007 23:37:52 +0000 (16:37 -0700)]
[NETFILTER]: nf_conntrack: switch protocol registration/unregistration to mutex
The protocol lookups done by nf_conntrack are already protected by RCU,
there is no need to keep taking nf_conntrack_lock for registration
and unregistration. Switch to a mutex.
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.