Git Repo - linux.git/log

bpf, docs: Document the opcode classes

Add a description for each opcode class.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, docs: Add subsections for ALU and JMP instructions

Add a little more stucture to the ALU/JMP documentation with sections and
improve the example text.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, docs: Add a setion to explain the basic instruction encoding

The eBPF instruction set document does not currently document the basic
instruction encoding. Add a section to do that.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, selftests: Add verifier test for mem_or_null register with offset.

Add a new test case with mem_or_null typed register with off > 0 to ensure
it gets rejected by the verifier:

  # ./test_verifier 1011
  #1009/u check with invalid reg offset 0 OK
  #1009/p check with invalid reg offset 0 OK
  Summary: 2 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: Don't promote bogus looking registers after null check.

If we ever get to a point again where we convert a bogus looking <ptr>_or_null
typed register containing a non-zero fixed or variable offset, then lets not
reset these bounds to zero since they are not and also don't promote the register
to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown
register could be an avenue as well, but then if we run into this case it would
allow to leak a kernel pointer this way.

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf, sockmap: Fix double bpf_prog_put on error case in map_link

sock_map_link() is called to update a sockmap entry with a sk. But, if the
sock_map_init_proto() call fails then we return an error to the map_update
op against the sockmap. In the error path though we need to cleanup psock
and dec the refcnt on any programs associated with the map, because we
refcnt them early in the update process to ensure they are pinned for the
psock. (This avoids a race where user deletes programs while also updating
the map with new socks.)

In current code we do the prog refcnt dec explicitely by calling
bpf_prog_put() when the program was found in the map. But, after commit
'38207a5e81230' in this error path we've already done the prog to psock
assignment so the programs have a reference from the psock as well. This
then causes the psock tear down logic, invoked by sk_psock_put() in the
error path, to similarly call bpf_prog_put on the programs there.

To be explicit this logic does the prog->psock assignment:

  if (msg_*)
    psock_set_prog(...)

Then the error path under the out_progs label does a similar check and
dec with:

  if (msg_*)
     bpf_prog_put(...)

And the teardown logic sk_psock_put() does ...

  psock_set_prog(msg_*, NULL)

... triggering another bpf_prog_put(...). Then KASAN gives us this splat,
found by syzbot because we've created an inbalance between bpf_prog_inc and
bpf_prog_put calling put twice on the program.

  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline]
  BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829
  BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829
  Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641

To fix clean up error path so it doesn't try to do the bpf_prog_put in the
error path once progs are assigned then it relies on the normal psock
tear down logic to do complete cleanup.

For completness we also cover the case whereh sk_psock_init_strp() fails,
but this is not expected because it indicates an incorrect socket type
and should be caught earlier.

Fixes: 38207a5e8123 ("bpf, sockmap: Attach map progs to psock early for feature probes")
Reported-by: [email protected]
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, sockmap: Fix return codes from tcp_bpf_recvmsg_parser()

Applications can be confused slightly because we do not always return the
same error code as expected, e.g. what the TCP stack normally returns. For
example on a sock err sk->sk_err instead of returning the sock_error we
return EAGAIN. This usually means the application will 'try again'
instead of aborting immediately. Another example, when a shutdown event
is received we should immediately abort instead of waiting for data when
the user provides a timeout.

These tend to not be fatal, applications usually recover, but introduces
bogus errors to the user or introduces unexpected latency. Before
'c5d2177a72a16' we fell back to the TCP stack when no data was available
so we managed to catch many of the cases here, although with the extra
latency cost of calling tcp_msg_wait_data() first.

To fix lets duplicate the error handling in TCP stack into tcp_bpf so
that we get the same error codes.

These were found in our CI tests that run applications against sockmap
and do longer lived testing, at least compared to test_sockmap that
does short-lived ping/pong tests, and in some of our test clusters
we deploy.

Its non-trivial to do these in a shorter form CI tests that would be
appropriate for BPF selftests, but we are looking into it so we can
ensure this keeps working going forward. As a preview one idea is to
pull in the packetdrill testing which catches some of this.

Fixes: c5d2177a72a16 ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, arm64: Use emit_addr_mov_i64() for BPF_PSEUDO_FUNC

The following error is reported when running "./test_progs -t for_each"
under arm64:

  bpf_jit: multi-func JIT bug 58 != 56
  [...]
  JIT doesn't support bpf-to-bpf calls

The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:

  0x48:  21 00 C0 D2    movz x1, #0x1, lsl #32
  0x4c:  21 00 80 F2    movk x1, #0x1

  0x48:  E1 3F C0 92    movn x1, #0x1ff, lsl #32
  0x4c:  41 FE A2 F2    movk x1, #0x17f2, lsl #16
  0x50:  81 70 9F F2    movk x1, #0xfb84

Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.

Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net: gemini: allow any RGMII interface mode

The four RGMII interface modes take care of the required RGMII delay
configuration at the PHY and should not be limited by the network MAC
driver. Sadly, gemini was only permitting RGMII mode with no delays,
which would require the required delay to be inserted via PCB tracking
or by the MAC.

However, there are designs that require the PHY to add the delay, which
is impossible without Gemini permitting the other three PHY interface
modes. Fix the driver to allow these.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Tested-by: Corentin Labbe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'fix-rgmii-delays-for-88e1118'

Russell King says:

====================
Fix RGMII delays for 88E1118

This series fixes the RGMII delays for 88E1118 Marvell PHYs, after
a report by Corentin Labbe that the Marvell driver fails to work.

Patch 1 cleans up the paged register accesses in m88e1118_config_init()
and patch 2 adds the RGMII delay configuration.

This comes with an element of risk as existing DT may need to be fixed
for this in a similar way as we have done in the recent past for other
PHY drivers that have misinterpreted the RGMII interface modes.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: marvell: configure RGMII delays for 88E1118

Corentin Labbe reports that the SSI 1328 does not work when allowing
the PHY to operate at gigabit speeds, but does work with the generic
PHY driver.

This appears to be because m88e1118_config_init() writes a fixed value
to the MSCR register, claiming that this is to enable 1G speeds.
However, this always sets bits 4 and 5, enabling RGMII transmit and
receive delays. The suspicion is that the original board this was
added for required the delays to make 1G speeds work.

Add the necessary configuration for RGMII delays for the 88E1118 to
bring this into line with the requirements for RGMII support, and thus
make the SSI 1328 work.

Corentin Labbe has tested this on gemini-ssi1328 and gemini-ns2502.

Reported-by: Corentin Labbe <[email protected]>
Tested-by: Corentin Labbe <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: phy: marvell: use phy_write_paged() to set MSCR

Use phy_write_paged() in m88e1118_config_init() to set the MSCR value.
We leave the other paged write for the LEDs in case the DT register
parsing is relying on this page.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Tested-by: Corentin Labbe <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: set amt.sh executable

amt.sh test script will not work because it doesn't have execution
permission. So, it adds execution permission.

Reported-by: Hangbin Liu <[email protected]>
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"

This reverts commit f77b83b5bbab53d2be339184838b19ed2c62c0a5.

This change breaks multiple usb to ethernet dongles attached on Lenovo
USB hub.

Fixes: f77b83b5bbab ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks")
Signed-off-by: Aaron Ma <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

netlink: do not allocate a device refcount tracker in ethnl_default_notify()

As reported by Johannes, the tracker allocated in
ethnl_default_notify() is not really needed, as this
function is not expected to change a device reference count.

Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Johannes Berg <[email protected]>
Tested-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/sched: add missing tracker information in qdisc_create()

qdisc_create() error path needs to use dev_put_track()
because qdisc_alloc() allocated the tracker.

Fixes: 606509f27f67 ("net/sched: add net device refcount tracker to struct Qdisc")
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"Here are two last fixes for this release cycle from the GPIO
  subsystem:

   - fix irq offset calculation in gpio-aspeed-sgpio

   - update the MAINTAINERS entry for gpio-brcmstb"

* tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: update gpio-brcmstb maintainers
  gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler

Merge tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan

Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2022-01-05

Below I have a last minute fix for the atusb driver.

Pavel fixes a KASAN uninit report for the driver. This version is the
minimal impact fix to ease backporting. A bigger rework of the driver to
avoid potential similar problems is ongoing and will come through net-next
when ready.

* tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'dsa-notifier-cleanup'

Vladimir Oltean says:

====================
DSA cross-chip notifier cleanup

This series deletes the no-op cross-chip notifier support for MRP and
HSR, features which were introduced relatively recently and did not get
full review at the time. The new code is functionally equivalent, but
simpler.

Cc: Horatiu Vultur <[email protected]>
Cc: George McCollister <[email protected]>
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: remove cross-chip support for HSR

The cross-chip notifiers for HSR are bypass operations, meaning that
even though all switches in a tree are notified, only the switch
specified in the info structure is targeted.

We can eliminate the unnecessary complexity by deleting the cross-chip
notifier logic and calling the ds->ops straight from port.c.

Cc: George McCollister <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: George McCollister <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: remove cross-chip support for MRP

The cross-chip notifiers for MRP are bypass operations, meaning that
even though all switches in a tree are notified, only the switch
specified in the info structure is targeted.

We can eliminate the unnecessary complexity by deleting the cross-chip
notifier logic and calling the ds->ops straight from port.c.

Cc: Horatiu Vultur <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: fix incorrect function pointer check for MRP ring roles

The cross-chip notifier boilerplate code meant to check the presence of
ds->ops->port_mrp_add_ring_role before calling it, but checked
ds->ops->port_mrp_add instead, before calling
ds->ops->port_mrp_add_ring_role.

Therefore, a driver which implements one operation but not the other
would trigger a NULL pointer dereference.

There isn't any such driver in DSA yet, so there is no reason to
backport the change. Issue found through code inspection.

Cc: Horatiu Vultur <[email protected]>
Fixes: c595c4330da0 ("net: dsa: add MRP support")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: pci: Avoid flow control for EMAD packets

Locally generated packets ingress the device through its CPU port. When
the CPU port is congested and there are not enough credits in its
headroom buffer, packets can be dropped.

While this might be acceptable for data packets that traverse the
network, configuration packets exchanged between the host and the device
(EMADs) should not be subjected to this flow control.

The "sdq_lp" bit in the SDQ (Send Descriptor Queue) context allows the
host to instruct the device to treat packets sent on this queue as
"local processing" and always process them, regardless of the state of
the CPU port's headroom.

Add the definition of this bit and set it for the dedicated SDQ reserved
for the transmission of EMAD packets. This makes the "local processing"
bit in the WQE (Work Queue Element) redundant, so clear it.

Signed-off-by: Danielle Ratson <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-next-for-5.17-20220105' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2022-01-05

this is a pull request of 15 patches for net-next/master.

The first patch is by me and removed an unused variable from the
usb_8dev driver.

Andy Shevchenko contributes a patch for the mcp251x driver, which
removes an unneeded assignment.

Jimmy Assarsson's patch for the kvaser_usb makes use of units.h in the
assignment of frequencies.

Lad Prabhakar provides 2 patches, converting the ti_hecc and the
sja1000 driver to make use of platform_get_irq().

The 10 remaining patches are by Vincent Mailhol. First the etas_es58x
driver populates the net_device::dev_port. The next 5 patches cleanup
the handling of CAN error and CAN RTR messages of all drivers. The
remaining 4 patches enhance the CAN controller mode flag handling and
export it via netlink to user space.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-cleanups'

Vladimir Oltean says:

====================
Cleanup to main DSA structures

This series contains changes that do the following:

- struct dsa_port reduced from 576 to 544 bytes, and first cache line a
  bit better organized
- struct dsa_switch from 160 to 136 bytes, and first cache line a bit
  better organized
- struct dsa_switch_tree from 112 to 104 bytes, and first cache line a
  bit better organized
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: combine two holes in struct dsa_switch_tree

There is a 7 byte hole after dst->setup and a 4 byte hole after
dst->default_proto. Combining them, we have a single hole of just 3
bytes on 64 bit machines.

Before:

pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0    16 */
        struct list_head           ports;                /*    16    16 */
        struct raw_notifier_head   nh;                   /*    32     8 */
        unsigned int               index;                /*    40     4 */
        struct kref                refcount;             /*    44     4 */
        struct net_device * *      lags;                 /*    48     8 */
        bool                       setup;                /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        const struct dsa_device_ops  * tag_ops;          /*    64     8 */
        enum dsa_tag_protocol      default_proto;        /*    72     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    80     8 */
        struct list_head           rtable;               /*    88    16 */
        unsigned int               lags_len;             /*   104     4 */
        unsigned int               last_switch;          /*   108     4 */

        /* size: 112, cachelines: 2, members: 13 */
        /* sum members: 101, holes: 2, sum holes: 11 */
        /* last cacheline: 48 bytes */
};

After:

pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0    16 */
        struct list_head           ports;                /*    16    16 */
        struct raw_notifier_head   nh;                   /*    32     8 */
        unsigned int               index;                /*    40     4 */
        struct kref                refcount;             /*    44     4 */
        struct net_device * *      lags;                 /*    48     8 */
        const struct dsa_device_ops  * tag_ops;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        enum dsa_tag_protocol      default_proto;        /*    64     4 */
        bool                       setup;                /*    68     1 */

        /* XXX 3 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    72     8 */
        struct list_head           rtable;               /*    80    16 */
        unsigned int               lags_len;             /*    96     4 */
        unsigned int               last_switch;          /*   100     4 */

        /* size: 104, cachelines: 2, members: 13 */
        /* sum members: 101, holes: 1, sum holes: 3 */
        /* last cacheline: 40 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: move dsa_switch_tree :: ports and lags to first cache line

dst->ports is accessed most notably by dsa_master_find_slave(), which is
invoked in the RX path.

dst->lags is accessed by dsa_lag_dev(), which is invoked in the RX path
of tag_dsa.c.

dst->tag_ops, dst->default_proto and dst->pd don't need to be in the
first cache line, so they are moved out by this change.

Before:

pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0    16 */
        struct raw_notifier_head   nh;                   /*    16     8 */
        unsigned int               index;                /*    24     4 */
        struct kref                refcount;             /*    28     4 */
        bool                       setup;                /*    32     1 */

        /* XXX 7 bytes hole, try to pack */

        const struct dsa_device_ops  * tag_ops;          /*    40     8 */
        enum dsa_tag_protocol      default_proto;        /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct list_head           ports;                /*    64    16 */
        struct list_head           rtable;               /*    80    16 */
        struct net_device * *      lags;                 /*    96     8 */
        unsigned int               lags_len;             /*   104     4 */
        unsigned int               last_switch;          /*   108     4 */

        /* size: 112, cachelines: 2, members: 13 */
        /* sum members: 101, holes: 2, sum holes: 11 */
        /* last cacheline: 48 bytes */
};

After:

pahole -C dsa_switch_tree net/dsa/slave.o
struct dsa_switch_tree {
        struct list_head           list;                 /*     0    16 */
        struct list_head           ports;                /*    16    16 */
        struct raw_notifier_head   nh;                   /*    32     8 */
        unsigned int               index;                /*    40     4 */
        struct kref                refcount;             /*    44     4 */
        struct net_device * *      lags;                 /*    48     8 */
        bool                       setup;                /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        const struct dsa_device_ops  * tag_ops;          /*    64     8 */
        enum dsa_tag_protocol      default_proto;        /*    72     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_platform_data * pd;                   /*    80     8 */
        struct list_head           rtable;               /*    88    16 */
        unsigned int               lags_len;             /*   104     4 */
        unsigned int               last_switch;          /*   108     4 */

        /* size: 112, cachelines: 2, members: 13 */
        /* sum members: 101, holes: 2, sum holes: 11 */
        /* last cacheline: 48 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: make dsa_switch :: num_ports an unsigned int

Currently, num_ports is declared as size_t, which is defined as
__kernel_ulong_t, therefore it occupies 8 bytes of memory.

Even switches with port numbers in the range of tens are exotic, so
there is no need for this amount of storage.

Additionally, because the max_num_bridges member right above it is also
4 bytes, it means the compiler needs to add padding between the last 2
fields. By reducing the size, we don't need that padding and can reduce
the struct size.

Before:

pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
        struct device *            dev;                  /*     0     8 */
        struct dsa_switch_tree *   dst;                  /*     8     8 */
        unsigned int               index;                /*    16     4 */
        u32                        setup:1;              /*    20: 0  4 */
        u32                        vlan_filtering_is_global:1; /*    20: 1  4 */
        u32                        needs_standalone_vlan_filtering:1; /*    20: 2  4 */
        u32                        configure_vlan_while_not_filtering:1; /*    20: 3  4 */
        u32                        untag_bridge_pvid:1;  /*    20: 4  4 */
        u32                        assisted_learning_on_cpu_port:1; /*    20: 5  4 */
        u32                        vlan_filtering:1;     /*    20: 6  4 */
        u32                        pcs_poll:1;           /*    20: 7  4 */
        u32                        mtu_enforcement_ingress:1; /*    20: 8  4 */

        /* XXX 23 bits hole, try to pack */

        struct notifier_block      nb;                   /*    24    24 */

        /* XXX last struct has 4 bytes of padding */

        void *                     priv;                 /*    48     8 */
        void *                     tagger_data;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_chip_data *     cd;                   /*    64     8 */
        const struct dsa_switch_ops  * ops;              /*    72     8 */
        u32                        phys_mii_mask;        /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        struct mii_bus *           slave_mii_bus;        /*    88     8 */
        unsigned int               ageing_time_min;      /*    96     4 */
        unsigned int               ageing_time_max;      /*   100     4 */
        struct dsa_8021q_context * tag_8021q_ctx;        /*   104     8 */
        struct devlink *           devlink;              /*   112     8 */
        unsigned int               num_tx_queues;        /*   120     4 */
        unsigned int               num_lag_ids;          /*   124     4 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        unsigned int               max_num_bridges;      /*   128     4 */

        /* XXX 4 bytes hole, try to pack */

        size_t                     num_ports;            /*   136     8 */

        /* size: 144, cachelines: 3, members: 27 */
        /* sum members: 132, holes: 2, sum holes: 8 */
        /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 16 bytes */
};

After:

pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
        struct device *            dev;                  /*     0     8 */
        struct dsa_switch_tree *   dst;                  /*     8     8 */
        unsigned int               index;                /*    16     4 */
        u32                        setup:1;              /*    20: 0  4 */
        u32                        vlan_filtering_is_global:1; /*    20: 1  4 */
        u32                        needs_standalone_vlan_filtering:1; /*    20: 2  4 */
        u32                        configure_vlan_while_not_filtering:1; /*    20: 3  4 */
        u32                        untag_bridge_pvid:1;  /*    20: 4  4 */
        u32                        assisted_learning_on_cpu_port:1; /*    20: 5  4 */
        u32                        vlan_filtering:1;     /*    20: 6  4 */
        u32                        pcs_poll:1;           /*    20: 7  4 */
        u32                        mtu_enforcement_ingress:1; /*    20: 8  4 */

        /* XXX 23 bits hole, try to pack */

        struct notifier_block      nb;                   /*    24    24 */

        /* XXX last struct has 4 bytes of padding */

        void *                     priv;                 /*    48     8 */
        void *                     tagger_data;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_chip_data *     cd;                   /*    64     8 */
        const struct dsa_switch_ops  * ops;              /*    72     8 */
        u32                        phys_mii_mask;        /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        struct mii_bus *           slave_mii_bus;        /*    88     8 */
        unsigned int               ageing_time_min;      /*    96     4 */
        unsigned int               ageing_time_max;      /*   100     4 */
        struct dsa_8021q_context * tag_8021q_ctx;        /*   104     8 */
        struct devlink *           devlink;              /*   112     8 */
        unsigned int               num_tx_queues;        /*   120     4 */
        unsigned int               num_lag_ids;          /*   124     4 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        unsigned int               max_num_bridges;      /*   128     4 */
        unsigned int               num_ports;            /*   132     4 */

        /* size: 136, cachelines: 3, members: 27 */
        /* sum members: 128, holes: 1, sum holes: 4 */
        /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 8 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: merge all bools of struct dsa_switch into a single u32

struct dsa_switch has 9 boolean properties, many of which are in fact
set by drivers for custom behavior (vlan_filtering_is_global,
needs_standalone_vlan_filtering, etc etc). The binary layout of the
structure could be improved. For example, the "bool setup" at the
beginning introduces a gratuitous 7 byte hole in the first cache line.

The change merges all boolean properties into bitfields of an u32, and
places that u32 in the first cache line of the structure, since many
bools are accessed from the data path (untag_bridge_pvid, vlan_filtering,
vlan_filtering_is_global).

We place this u32 after the existing ds->index, which is also 4 bytes in
size. As a positive side effect, ds->tagger_data now fits into the first
cache line too, because 4 bytes are saved.

Before:

pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
        bool                       setup;                /*     0     1 */

        /* XXX 7 bytes hole, try to pack */

        struct device *            dev;                  /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        unsigned int               index;                /*    24     4 */

        /* XXX 4 bytes hole, try to pack */

        struct notifier_block      nb;                   /*    32    24 */

        /* XXX last struct has 4 bytes of padding */

        void *                     priv;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        void *                     tagger_data;          /*    64     8 */
        struct dsa_chip_data *     cd;                   /*    72     8 */
        const struct dsa_switch_ops  * ops;              /*    80     8 */
        u32                        phys_mii_mask;        /*    88     4 */

        /* XXX 4 bytes hole, try to pack */

        struct mii_bus *           slave_mii_bus;        /*    96     8 */
        unsigned int               ageing_time_min;      /*   104     4 */
        unsigned int               ageing_time_max;      /*   108     4 */
        struct dsa_8021q_context * tag_8021q_ctx;        /*   112     8 */
        struct devlink *           devlink;              /*   120     8 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        unsigned int               num_tx_queues;        /*   128     4 */
        bool                       vlan_filtering_is_global; /*   132     1 */
        bool                       needs_standalone_vlan_filtering; /*   133     1 */
        bool                       configure_vlan_while_not_filtering; /*   134     1 */
        bool                       untag_bridge_pvid;    /*   135     1 */
        bool                       assisted_learning_on_cpu_port; /*   136     1 */
        bool                       vlan_filtering;       /*   137     1 */
        bool                       pcs_poll;             /*   138     1 */
        bool                       mtu_enforcement_ingress; /*   139     1 */
        unsigned int               num_lag_ids;          /*   140     4 */
        unsigned int               max_num_bridges;      /*   144     4 */

        /* XXX 4 bytes hole, try to pack */

        size_t                     num_ports;            /*   152     8 */

        /* size: 160, cachelines: 3, members: 27 */
        /* sum members: 141, holes: 4, sum holes: 19 */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 32 bytes */
};

After:

pahole -C dsa_switch net/dsa/slave.o
struct dsa_switch {
        struct device *            dev;                  /*     0     8 */
        struct dsa_switch_tree *   dst;                  /*     8     8 */
        unsigned int               index;                /*    16     4 */
        u32                        setup:1;              /*    20: 0  4 */
        u32                        vlan_filtering_is_global:1; /*    20: 1  4 */
        u32                        needs_standalone_vlan_filtering:1; /*    20: 2  4 */
        u32                        configure_vlan_while_not_filtering:1; /*    20: 3  4 */
        u32                        untag_bridge_pvid:1;  /*    20: 4  4 */
        u32                        assisted_learning_on_cpu_port:1; /*    20: 5  4 */
        u32                        vlan_filtering:1;     /*    20: 6  4 */
        u32                        pcs_poll:1;           /*    20: 7  4 */
        u32                        mtu_enforcement_ingress:1; /*    20: 8  4 */

        /* XXX 23 bits hole, try to pack */

        struct notifier_block      nb;                   /*    24    24 */

        /* XXX last struct has 4 bytes of padding */

        void *                     priv;                 /*    48     8 */
        void *                     tagger_data;          /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_chip_data *     cd;                   /*    64     8 */
        const struct dsa_switch_ops  * ops;              /*    72     8 */
        u32                        phys_mii_mask;        /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        struct mii_bus *           slave_mii_bus;        /*    88     8 */
        unsigned int               ageing_time_min;      /*    96     4 */
        unsigned int               ageing_time_max;      /*   100     4 */
        struct dsa_8021q_context * tag_8021q_ctx;        /*   104     8 */
        struct devlink *           devlink;              /*   112     8 */
        unsigned int               num_tx_queues;        /*   120     4 */
        unsigned int               num_lag_ids;          /*   124     4 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        unsigned int               max_num_bridges;      /*   128     4 */

        /* XXX 4 bytes hole, try to pack */

        size_t                     num_ports;            /*   136     8 */

        /* size: 144, cachelines: 3, members: 27 */
        /* sum members: 132, holes: 2, sum holes: 8 */
        /* sum bitfield members: 9 bits, bit holes: 1, sum bit holes: 23 bits */
        /* paddings: 1, sum paddings: 4 */
        /* last cacheline: 16 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: move dsa_port :: type near dsa_port :: index

Both dsa_port :: type and dsa_port :: index introduce a 4 octet hole
after them, so we can group them together and the holes would be
eliminated, turning 16 octets of storage into just 8. This makes the
cpu_dp pointer fit in the first cache line, which is good, because
dsa_slave_to_master(), called by dsa_enqueue_skb(), uses it.

Before:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_switch *        ds;                   /*    40     8 */
        unsigned int               index;                /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        const char  *              name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_port *          cpu_dp;               /*    64     8 */
        u8                         mac[6];               /*    72     6 */
        u8                         stp_state;            /*    78     1 */
        u8                         vlan_filtering:1;     /*    79: 0  1 */
        u8                         learning:1;           /*    79: 1  1 */
        u8                         lag_tx_enabled:1;     /*    79: 2  1 */
        u8                         devlink_port_setup:1; /*    79: 3  1 */
        u8                         setup:1;              /*    79: 4  1 */

        /* XXX 3 bits hole, try to pack */

        struct device_node *       dn;                   /*    80     8 */
        unsigned int               ageing_time;          /*    88     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_bridge *        bridge;               /*    96     8 */
        struct devlink_port        devlink_port;         /*   104   288 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        struct phylink *           pl;                   /*   392     8 */
        struct phylink_config      pl_config;            /*   400    40 */
        struct net_device *        lag_dev;              /*   440     8 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct net_device *        hsr_dev;              /*   448     8 */
        struct list_head           list;                 /*   456    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   472     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   480     8 */
        struct mutex               addr_lists_lock;      /*   488    32 */
        /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
        struct list_head           fdbs;                 /*   520    16 */
        struct list_head           mdbs;                 /*   536    16 */

        /* size: 552, cachelines: 9, members: 30 */
        /* sum members: 539, holes: 3, sum holes: 12 */
        /* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
        /* last cacheline: 40 bytes */
};

After:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        struct dsa_switch *        ds;                   /*    32     8 */
        unsigned int               index;                /*    40     4 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    44     4 */
        const char  *              name;                 /*    48     8 */
        struct dsa_port *          cpu_dp;               /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u8                         mac[6];               /*    64     6 */
        u8                         stp_state;            /*    70     1 */
        u8                         vlan_filtering:1;     /*    71: 0  1 */
        u8                         learning:1;           /*    71: 1  1 */
        u8                         lag_tx_enabled:1;     /*    71: 2  1 */
        u8                         devlink_port_setup:1; /*    71: 3  1 */
        u8                         setup:1;              /*    71: 4  1 */

        /* XXX 3 bits hole, try to pack */

        struct device_node *       dn;                   /*    72     8 */
        unsigned int               ageing_time;          /*    80     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_bridge *        bridge;               /*    88     8 */
        struct devlink_port        devlink_port;         /*    96   288 */
        /* --- cacheline 6 boundary (384 bytes) --- */
        struct phylink *           pl;                   /*   384     8 */
        struct phylink_config      pl_config;            /*   392    40 */
        struct net_device *        lag_dev;              /*   432     8 */
        struct net_device *        hsr_dev;              /*   440     8 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct list_head           list;                 /*   448    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   464     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   472     8 */
        struct mutex               addr_lists_lock;      /*   480    32 */
        /* --- cacheline 8 boundary (512 bytes) --- */
        struct list_head           fdbs;                 /*   512    16 */
        struct list_head           mdbs;                 /*   528    16 */

        /* size: 544, cachelines: 9, members: 30 */
        /* sum members: 539, holes: 1, sum holes: 4 */
        /* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
        /* last cacheline: 32 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: merge all bools of struct dsa_port into a single u8

struct dsa_port has 5 bool members which create quite a number of 7 byte
holes in the structure layout. By merging them all into bitfields of an
u8, and placing that u8 in the 1-byte hole after dp->mac and dp->stp_state,
we can reduce the structure size from 576 bytes to 552 bytes on arm64.

Before:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_switch *        ds;                   /*    40     8 */
        unsigned int               index;                /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        const char  *              name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_port *          cpu_dp;               /*    64     8 */
        u8                         mac[6];               /*    72     6 */
        u8                         stp_state;            /*    78     1 */

        /* XXX 1 byte hole, try to pack */

        struct device_node *       dn;                   /*    80     8 */
        unsigned int               ageing_time;          /*    88     4 */
        bool                       vlan_filtering;       /*    92     1 */
        bool                       learning;             /*    93     1 */

        /* XXX 2 bytes hole, try to pack */

        struct dsa_bridge *        bridge;               /*    96     8 */
        struct devlink_port        devlink_port;         /*   104   288 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        bool                       devlink_port_setup;   /*   392     1 */

        /* XXX 7 bytes hole, try to pack */

        struct phylink *           pl;                   /*   400     8 */
        struct phylink_config      pl_config;            /*   408    40 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct net_device *        lag_dev;              /*   448     8 */
        bool                       lag_tx_enabled;       /*   456     1 */

        /* XXX 7 bytes hole, try to pack */

        struct net_device *        hsr_dev;              /*   464     8 */
        struct list_head           list;                 /*   472    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   488     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   496     8 */
        struct mutex               addr_lists_lock;      /*   504    32 */
        /* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
        struct list_head           fdbs;                 /*   536    16 */
        struct list_head           mdbs;                 /*   552    16 */
        bool                       setup;                /*   568     1 */

        /* size: 576, cachelines: 9, members: 30 */
        /* sum members: 544, holes: 6, sum holes: 25 */
        /* padding: 7 */
};

After:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_switch *        ds;                   /*    40     8 */
        unsigned int               index;                /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        const char  *              name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_port *          cpu_dp;               /*    64     8 */
        u8                         mac[6];               /*    72     6 */
        u8                         stp_state;            /*    78     1 */
        u8                         vlan_filtering:1;     /*    79: 0  1 */
        u8                         learning:1;           /*    79: 1  1 */
        u8                         lag_tx_enabled:1;     /*    79: 2  1 */
        u8                         devlink_port_setup:1; /*    79: 3  1 */
        u8                         setup:1;              /*    79: 4  1 */

        /* XXX 3 bits hole, try to pack */

        struct device_node *       dn;                   /*    80     8 */
        unsigned int               ageing_time;          /*    88     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_bridge *        bridge;               /*    96     8 */
        struct devlink_port        devlink_port;         /*   104   288 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        struct phylink *           pl;                   /*   392     8 */
        struct phylink_config      pl_config;            /*   400    40 */
        struct net_device *        lag_dev;              /*   440     8 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct net_device *        hsr_dev;              /*   448     8 */
        struct list_head           list;                 /*   456    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   472     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   480     8 */
        struct mutex               addr_lists_lock;      /*   488    32 */
        /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
        struct list_head           fdbs;                 /*   520    16 */
        struct list_head           mdbs;                 /*   536    16 */

        /* size: 552, cachelines: 9, members: 30 */
        /* sum members: 539, holes: 3, sum holes: 12 */
        /* sum bitfield members: 5 bits, bit holes: 1, sum bit holes: 3 bits */
        /* last cacheline: 40 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: move dsa_port :: stp_state near dsa_port :: mac

The MAC address of a port is 6 octets in size, and this creates a 2
octet hole after it. There are some other u8 members of struct dsa_port
that we can put in that hole. One such member is the stp_state.

Before:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_switch *        ds;                   /*    40     8 */
        unsigned int               index;                /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        const char  *              name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_port *          cpu_dp;               /*    64     8 */
        u8                         mac[6];               /*    72     6 */

        /* XXX 2 bytes hole, try to pack */

        struct device_node *       dn;                   /*    80     8 */
        unsigned int               ageing_time;          /*    88     4 */
        bool                       vlan_filtering;       /*    92     1 */
        bool                       learning;             /*    93     1 */
        u8                         stp_state;            /*    94     1 */

        /* XXX 1 byte hole, try to pack */

        struct dsa_bridge *        bridge;               /*    96     8 */
        struct devlink_port        devlink_port;         /*   104   288 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        bool                       devlink_port_setup;   /*   392     1 */

        /* XXX 7 bytes hole, try to pack */

        struct phylink *           pl;                   /*   400     8 */
        struct phylink_config      pl_config;            /*   408    40 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct net_device *        lag_dev;              /*   448     8 */
        bool                       lag_tx_enabled;       /*   456     1 */

        /* XXX 7 bytes hole, try to pack */

        struct net_device *        hsr_dev;              /*   464     8 */
        struct list_head           list;                 /*   472    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   488     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   496     8 */
        struct mutex               addr_lists_lock;      /*   504    32 */
        /* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
        struct list_head           fdbs;                 /*   536    16 */
        struct list_head           mdbs;                 /*   552    16 */
        bool                       setup;                /*   568     1 */

        /* size: 576, cachelines: 9, members: 30 */
        /* sum members: 544, holes: 6, sum holes: 25 */
        /* padding: 7 */
};

After:

pahole -C dsa_port net/dsa/slave.o
struct dsa_port {
        union {
                struct net_device * master;              /*     0     8 */
                struct net_device * slave;               /*     0     8 */
        };                                               /*     0     8 */
        const struct dsa_device_ops  * tag_ops;          /*     8     8 */
        struct dsa_switch_tree *   dst;                  /*    16     8 */
        struct sk_buff *           (*rcv)(struct sk_buff *, struct net_device *); /*    24     8 */
        enum {
                DSA_PORT_TYPE_UNUSED = 0,
                DSA_PORT_TYPE_CPU    = 1,
                DSA_PORT_TYPE_DSA    = 2,
                DSA_PORT_TYPE_USER   = 3,
        } type;                                          /*    32     4 */

        /* XXX 4 bytes hole, try to pack */

        struct dsa_switch *        ds;                   /*    40     8 */
        unsigned int               index;                /*    48     4 */

        /* XXX 4 bytes hole, try to pack */

        const char  *              name;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct dsa_port *          cpu_dp;               /*    64     8 */
        u8                         mac[6];               /*    72     6 */
        u8                         stp_state;            /*    78     1 */

        /* XXX 1 byte hole, try to pack */

        struct device_node *       dn;                   /*    80     8 */
        unsigned int               ageing_time;          /*    88     4 */
        bool                       vlan_filtering;       /*    92     1 */
        bool                       learning;             /*    93     1 */

        /* XXX 2 bytes hole, try to pack */

        struct dsa_bridge *        bridge;               /*    96     8 */
        struct devlink_port        devlink_port;         /*   104   288 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        bool                       devlink_port_setup;   /*   392     1 */

        /* XXX 7 bytes hole, try to pack */

        struct phylink *           pl;                   /*   400     8 */
        struct phylink_config      pl_config;            /*   408    40 */
        /* --- cacheline 7 boundary (448 bytes) --- */
        struct net_device *        lag_dev;              /*   448     8 */
        bool                       lag_tx_enabled;       /*   456     1 */

        /* XXX 7 bytes hole, try to pack */

        struct net_device *        hsr_dev;              /*   464     8 */
        struct list_head           list;                 /*   472    16 */
        const struct ethtool_ops  * orig_ethtool_ops;    /*   488     8 */
        const struct dsa_netdevice_ops  * netdev_ops;    /*   496     8 */
        struct mutex               addr_lists_lock;      /*   504    32 */
        /* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
        struct list_head           fdbs;                 /*   536    16 */
        struct list_head           mdbs;                 /*   552    16 */
        bool                       setup;                /*   568     1 */

        /* size: 576, cachelines: 9, members: 30 */
        /* sum members: 544, holes: 6, sum holes: 25 */
        /* padding: 7 */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'hns3-stats-refactor'

Jie Wang says:

====================
net: hns3: refactor rss/tqp stats functions

Currently, hns3 PF and VF module have two sets of rss and tqp stats APIs
to provide get and set functions. Most of these APIs are the same. There is
no need to keep these two sets of same functions for double development and
bugfix work.

This series refactor the rss and tqp stats APIs in hns3 PF and VF by
implementing one set of common APIs for PF and VF reuse and deleting the
old APIs.
====================

Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new common cmd code for PF and VF modules

Currently PF and VF use two sets of command code for modules to interact
with firmware. These codes values are same espect the macro names. It is
redundent to keep two sets of command code for same functions between PF
and VF.

So this patch firstly creates a unified command code for PF and VF module.
We keep the macro name same with the PF command code name to avoid too many
meaningless modifications. Secondly the new common command codes are used
to replace the old ones in VF and deletes the old ones.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor VF tqp stats APIs with new common tqp stats APIs

This patch firstly uses new tqp struct(hclge_comm_tqp) and removes the
old VF tqp struct(hclgevf_tqp). All the tqp stats members used in VF module
are modified according to the new hclge_comm_tqp.

Secondly VF tqp stats APIs are refactored to use new common tqp stats APIs.
The old tqp stats APIs in VF are deleted.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor PF tqp stats APIs with new common tqp stats APIs

This patch firstly uses new tqp struct(hclge_comm_tqp) and deletes the
old PF tqp struct(hclge_tqp). All the tqp stats members used in PF module
are modified according to the new hclge_comm_tqp.

Secondly PF tqp stats APIs are refactored to use new common tqp stats APIs.
The old tqp stats APIs in PF are deleted.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new set of common tqp stats APIs for PF and VF reuse

This patch creates new set of common tqp stats structures and APIs for PF
and VF tqp stats module. Subfunctions such as get tqp stats, update tqp
stats and reset tqp stats are inclued in this patch.

These new common tqp stats APIs will be used to replace the old PF and VF
tqp stats APIs in next patches.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor VF rss init APIs with new common rss init APIs

This patch uses common rss init APIs to replace the old APIs in VF rss
module and removes the old VF rss init APIs. Several related Subfunctions
and macros are also modified in this patch.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor PF rss init APIs with new common rss init APIs

This patch uses common rss init APIs to replace the old APIs in PF rss
module and deletes the old PF rss init APIs. Some related subfunctions and
macros are also modified in this patch.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new set of common rss init APIs for PF and VF reuse

This patch creates new set of common rss init APIs for PF and VF rss
module. Subfunctions called by rss init process are also created include
rss tuple configuration and rss indirect table configuration.

These new common rss init APIs will be used to replace the old PF and VF
rss init APIs in next patches.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor VF rss set APIs with new common rss set APIs

This patch uses new common rss set APIs to replace the old APIs in VF rss
module and removes those old rss set APIs. The related macros in VF are
also modified.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor PF rss set APIs with new common rss set APIs

This patch uses new common rss set APIs to replace the old APIs in PF rss
module and deletes the old rss set APIs. The related macros are also
modified.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new set of common rss set APIs for PF and VF module

Currently, hns3 PF and VF rss module have two sets of rss set APIs to
configure rss. There is no need to keep two sets of these same APIs.

So this patch creates new set of common rss set APIs for PF and VF reuse.
These new APIs will be used to unify old APIs in next patches.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor VF rss get APIs with new common rss get APIs

This patch firstly uses new rss parameter struct(hclge_comm_rss_cfg) as
child member of hclgevf_dev and deletes the original child rss parameter
member(hclgevf_rss_cfg). All the rss parameter members used in VF rss
module is modified according to the new hclge_comm_rss_cfg.

Secondly VF rss get APIs are refactored to use new common rss get APIs. The
old rss get APIs in VF are deleted.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor PF rss get APIs with new common rss get APIs

This patch firstly uses new rss parameter struct(hclge_comm_rss_cfg) as
child member of hclge_dev and deletes the original child rss parameter
members in vport. All the vport child rss parameter members used in PF rss
module is modified according to the new hclge_comm_rss_cfg.

Secondly PF rss get APIs are refactored to use new common rss get APIs. The
old rss get APIs in PF are deleted.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new set of common rss get APIs for PF and VF rss module

The PF and VF rss get APIs are almost the same espect the suffixes of API
names. These same impementions bring double development and bugfix work.

So this patch creates new common rss get APIs for PF and VF rss module.
Subfunctions called by rss query process are also created(e.g. rss tuple
conversion APIs).

These new common rss get APIs will be used to replace PF and VF old rss
APIs in next patches.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: refactor hclge_comm_send function in PF/VF drivers

Currently, there are two different sets of special command codes in PF and
VF cmdq modules, this is because VF driver only uses small part of all the
command codes. In other words, these not used command codes in VF are also
sepcial command codes theoretically.

So this patch unifes the special command codes and deletes the bool param
is_pf of hclge_comm_send. All the related functions are refactored
according to the new hclge_comm_send function prototype.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: create new rss common structure hclge_comm_rss_cfg

Currently PF stores its rss parameters in vport structure. VF stores rss
configurations in hclgevf_rss_cfg structure. Actually hns3 rss parameters
are same beween PF and VF. The two set of rss parameters are redundent and
may add extra bugfix work.

So this patch creates new common rss parameter struct(hclge_comm_rss_cfg)
to unify PF and VF rss configurations.

These new structures will be used to unify rss configurations in PF and VF
rss APIs in next patches.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpf/selftests: Fix namespace mount setup in tc_redirect

The tc_redirect umounts /sys in the new namespace, which can be
mounted as shared and cause global umount. The lazy umount also
takes down mounted trees under /sys like debugfs, which won't be
available after sysfs mounts again and could cause fails in other
tests.

  # cat /proc/self/mountinfo | grep debugfs
  34 23 0:7 / /sys/kernel/debug rw,nosuid,nodev,noexec,relatime shared:14 - debugfs debugfs rw
  # cat /proc/self/mountinfo | grep sysfs
  23 86 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:2 - sysfs sysfs rw
  # mount | grep debugfs
  debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)

  # ./test_progs -t tc_redirect
  #164 tc_redirect:OK
  Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED

  # mount | grep debugfs
  # cat /proc/self/mountinfo | grep debugfs
  # cat /proc/self/mountinfo | grep sysfs
  25 86 0:22 / /sys rw,relatime shared:2 - sysfs sysfs rw

Making the sysfs private under the new namespace so the umount won't
trigger the global sysfs umount.

Reported-by: Hangbin Liu <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Jussi Maki <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpftool: Probe for instruction set extensions

This patch introduces new probes to check whether the kernel supports
instruction set extensions v2 and v3. The first introduced eBPF
instructions BPF_J{LT,LE,SLT,SLE} in commit 92b31a9af73b ("bpf: add
BPF_J{LT,LE,SLT,SLE} instructions"). The second introduces 32-bit
variants of all jump instructions in commit 092ed0968bb6 ("bpf:
verifier support JMP32").

These probes are useful for userspace BPF projects that want to use newer
instruction set extensions on newer kernels, to reduce the programs'
sizes or their complexity. LLVM already provides an mcpu=probe option to
automatically probe the kernel and select the newest-supported
instruction set extension. That is however not flexible enough for all
use cases. For example, in Cilium, we only want to use the v3
instruction set extension on v5.10+, even though it is supported on all
kernels v5.1+.

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/3bfedcd9898c1f41ac67ca61f144fec84c6c3a92.1641314075.git.paul@isovalent.com

bpftool: Probe for bounded loop support

This patch introduces a new probe to check whether the verifier supports
bounded loops as introduced in commit 2589726d12a1 ("bpf: introduce
bounded loops"). This patch will allow BPF users such as Cilium to probe
for loop support on startup and only unconditionally unroll loops on
older kernels.

The results are displayed as part of the miscellaneous section, as shown
below.

  $ bpftool feature probe | grep loops
  Bounded loop support is available
  $ bpftool feature probe macro | grep LOOPS
  #define HAVE_BOUNDED_LOOPS
  $ bpftool feature probe -j | jq .misc
  {
    "have_large_insn_limit": true,
    "have_bounded_loops": true
  }

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/f7807c0b27d79f48e71de7b5a99c680ca4bd0151.1641314075.git.paul@isovalent.com

bpftool: Refactor misc. feature probe

There is currently a single miscellaneous feature probe,
HAVE_LARGE_INSN_LIMIT, to check for the 1M instructions limit in the
verifier. Subsequent patches will add additional miscellaneous probes,
which follow the same pattern at the existing probe. This patch
therefore refactors the probe to avoid code duplication in subsequent
patches.

The BPF program type and the checked error numbers in the
HAVE_LARGE_INSN_LIMIT probe are changed to better generalize to other
probes. The feature probe retains its current behavior despite those
changes.

Signed-off-by: Paul Chaignon <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/956c9329a932c75941194f91790d01f31dfbe01b.1641314075.git.paul@isovalent.com

Merge branch 'lan966x-extend-switchdev-and-mdb-support'

Horatiu Vultur says:

====================
net: lan966x: Extend switchdev with mdb support

This patch series extends lan966x with mdb support by implementing
the switchdev callbacks: SWITCHDEV_OBJ_ID_PORT_MDB and
SWITCHDEV_OBJ_ID_HOST_MDB.
It adds support for both ipv4/ipv6 entries and l2 entries.

v2->v3:
- rename PGID_FIRST and PGID_LAST to PGID_GP_START and PGID_GP_END
- don't forget and relearn an entry for the CPU if there are more
references to the cpu.

v1->v2:
- rename lan966x_mac_learn_impl to __lan966x_mac_learn
- rename lan966x_mac_cpu_copy to lan966x_mac_ip_learn
- fix grammar and typos in comments and commit messages
- add reference counter for entries that copy frames to CPU
====================

Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Extend switchdev with mdb support

Extend lan966x driver with mdb support by implementing the switchdev
calls: SWITCHDEV_OBJ_ID_PORT_MDB and SWITCHDEV_OBJ_ID_HOST_MDB.
It is allowed to add both ipv4/ipv6 entries and l2 entries. To add
ipv4/ipv6 entries is not required to use the PGID table while for l2
entries it is required. The PGID table is much smaller than MAC table
so only fewer l2 entries can be added.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Add PGID_GP_START and PGID_GP_END

The first entries in the PGID table are used by the front ports while
the last entries are used for different purposes like flooding mask,
copy to CPU, etc. So add these macros to define which entries can be
used for general purpose.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan966x: Add function lan966x_mac_ip_learn()

Extend mac functionality with the function lan966x_mac_ip_learn. This
function adds an entry in the MAC table for IP multicast addresses.
These entries can copy a frame to the CPU but also can forward on the
front ports.
This functionality is needed for mdb support. In case the CPU and some
of the front ports subscribe to an IP multicast address.

Signed-off-by: Horatiu Vultur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mtk_eth_soc-refactoring-and-clause45'

Daniel Golle says:

====================
net: ethernet: mtk_eth_soc: refactoring and Clause 45

Rework value and type of mdio read and write functions in mtk_eth_soc
and generally clean up and unify both functions.
Then add support to access Clause 45 phy registers, using newly
introduced helper inline functions added by a patch Russell King has
suggested in a reply to an earlier version of this series [1].

All three commits are tested on the Bananapi BPi-R64 board having
MediaTek MT7531BE DSA gigE switch using clause 22 MDIO and
Ubiquiti UniFi 6 LR access point having Aquantia AQR112C PHY using
clause 45 MDIO.

[1]: https://lore.kernel.org/netdev/[email protected]/

v11: also address return value of mtk_mdio_busy_wait
v10: correct order of SoB lines in 2/3, change patch order in series
v9: improved formatting and Cc missing maintainer
v8: add patch from Russel King, switch to bitfield helper macros
v7: remove unneeded variables and order OR-ed call parameters
v6: further clean up functions and more cleanly separate patches
v5: fix wrong variable name in first patch covered by follow-up patch
v4: clean-up return values and types, split into two commits
v3: return -1 instead of 0xffff on error in _mtk_mdio_write
v2: use MII_DEVADDR_C45_SHIFT and MII_REGADDR_C45_MASK to extract
    device id and register address. Unify read and write functions to
    have identical types and parameter names where possible as we are
    anyway already replacing both function bodies.
====================

Signed-off-by: David S. Miller <[email protected]>

net: ethernet: mtk_eth_soc: implement Clause 45 MDIO access

Implement read and write access to IEEE 802.3 Clause 45 Ethernet
phy registers while making use of new mdiobus_c45_regad and
mdiobus_c45_devad helpers.

Tested on the Ubiquiti UniFi 6 LR access point featuring
MediaTek MT7622BV WiSoC with Aquantia AQR112C.

Signed-off-by: Daniel Golle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mdio: add helpers to extract clause 45 regad and devad fields

Add a couple of helpers and definitions to extract the clause 45 regad
and devad fields from the regnum passed into MDIO drivers.

Tested-by: Daniel Golle <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Daniel Golle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: mtk_eth_soc: fix return values and refactor MDIO ops

Instead of returning -1 (-EPERM) when MDIO bus is stuck busy
while writing or 0xffff if it happens while reading, return the
appropriate -ETIMEDOUT. Also fix return type to int instead of u32.
Refactor functions to use bitfield helpers instead of having various
masking and shifting constants in the code, which also results in the
register definitions in the header file being more obviously related
to what is stated in the MediaTek's Reference Manual.

Fixes: 656e705243fd0 ("net-next: mediatek: add support for MT7623 ethernet")
Signed-off-by: Daniel Golle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-04

This series contains updates to i40e and iavf drivers.

Mateusz adjusts displaying of failed VF MAC message when the failure is
expected as well as modifying an NVM info message to not confuse the user
for i40e.

Di Zhu fixes a use-after-free issue MAC filters for i40e.

Jedrzej fixes an issue with misreporting of Rx and Tx queues during
reinitialization for i40e.

Karen correct checking of channel queue configuration to occur against
active queues for iavf.
====================

Signed-off-by: David S. Miller <[email protected]>

can: netlink: report the CAN controller mode supported flags

Currently, the CAN netlink interface provides no easy ways to check
the capabilities of a given controller. The only method from the
command line is to try each CAN_CTRLMODE_* individually to check
whether the netlink interface returns an -EOPNOTSUPP error or not
(alternatively, one may find it easier to directly check the source
code of the driver instead...)

This patch introduces a method for the user to check both the
supported and the static capabilities. The proposed method introduces
a new IFLA nest: IFLA_CAN_CTRLMODE_EXT which extends the current
IFLA_CAN_CTRLMODE. This is done to guaranty a full forward and
backward compatibility between the kernel and the user land
applications.

The IFLA_CAN_CTRLMODE_EXT nest contains one single entry:
IFLA_CAN_CTRLMODE_SUPPORTED. Because this entry is only used in one
direction: kernel to userland, no new struct nla_policy are
introduced.

Below table explains how IFLA_CAN_CTRLMODE_SUPPORTED (hereafter:
"supported") and can_ctrlmode::flags (hereafter: "flags") allow us to
identify both the supported and the static capabilities, when masked
with any of the CAN_CTRLMODE_* bit flags:

supported & flags & Controller capabilities
CAN_CTRLMODE_* CAN_CTRLMODE_*
-----------------------------------------------------------------------
false false Feature not supported (always disabled)
false true Static feature (always enabled)
true false Feature supported but disabled
true true Feature supported and enabled

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: dev: reorder struct can_priv members for better packing

Save eight bytes of holes on x86-64 architectures by reordering the
members of struct can_priv.

Before:

| $ pahole -C can_priv drivers/net/can/dev/dev.o
| struct can_priv {
| struct net_device *        dev;                  /*     0     8 */
| struct can_device_stats    can_stats;            /*     8    24 */
| const struct can_bittiming_const  * bittiming_const; /*    32     8 */
| const struct can_bittiming_const  * data_bittiming_const; /*    40     8 */
| struct can_bittiming       bittiming;            /*    48    32 */
| /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
| struct can_bittiming       data_bittiming;       /*    80    32 */
| const struct can_tdc_const  * tdc_const;         /*   112     8 */
| struct can_tdc             tdc;                  /*   120    12 */
| /* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
| unsigned int               bitrate_const_cnt;    /*   132     4 */
| const u32  *               bitrate_const;        /*   136     8 */
| const u32  *               data_bitrate_const;   /*   144     8 */
| unsigned int               data_bitrate_const_cnt; /*   152     4 */
| u32                        bitrate_max;          /*   156     4 */
| struct can_clock           clock;                /*   160     4 */
| unsigned int               termination_const_cnt; /*   164     4 */
| const u16  *               termination_const;    /*   168     8 */
| u16                        termination;          /*   176     2 */
|
| /* XXX 6 bytes hole, try to pack */
|
| struct gpio_desc *         termination_gpio;     /*   184     8 */
| /* --- cacheline 3 boundary (192 bytes) --- */
| u16                        termination_gpio_ohms[2]; /*   192     4 */
| enum can_state             state;                /*   196     4 */
| u32                        ctrlmode;             /*   200     4 */
| u32                        ctrlmode_supported;   /*   204     4 */
| int                        restart_ms;           /*   208     4 */
|
| /* XXX 4 bytes hole, try to pack */
|
| struct delayed_work        restart_work;         /*   216    88 */
|
| /* XXX last struct has 4 bytes of padding */
|
| /* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
| int                        (*do_set_bittiming)(struct net_device *); /*   304     8 */
| int                        (*do_set_data_bittiming)(struct net_device *); /*   312     8 */
| /* --- cacheline 5 boundary (320 bytes) --- */
| int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   320     8 */
| int                        (*do_set_termination)(struct net_device *, u16); /*   328     8 */
| int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   336     8 */
| int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   344     8 */
| unsigned int               echo_skb_max;         /*   352     4 */
|
| /* XXX 4 bytes hole, try to pack */
|
| struct sk_buff * *         echo_skb;             /*   360     8 */
|
| /* size: 368, cachelines: 6, members: 32 */
| /* sum members: 354, holes: 3, sum holes: 14 */
| /* paddings: 1, sum paddings: 4 */
| /* last cacheline: 48 bytes */
| };

After:

| $ pahole -C can_priv drivers/net/can/dev/dev.o
| struct can_priv {
| struct net_device *        dev;                  /*     0     8 */
| struct can_device_stats    can_stats;            /*     8    24 */
| const struct can_bittiming_const  * bittiming_const; /*    32     8 */
| const struct can_bittiming_const  * data_bittiming_const; /*    40     8 */
| struct can_bittiming       bittiming;            /*    48    32 */
| /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
| struct can_bittiming       data_bittiming;       /*    80    32 */
| const struct can_tdc_const  * tdc_const;         /*   112     8 */
| struct can_tdc             tdc;                  /*   120    12 */
| /* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
| unsigned int               bitrate_const_cnt;    /*   132     4 */
| const u32  *               bitrate_const;        /*   136     8 */
| const u32  *               data_bitrate_const;   /*   144     8 */
| unsigned int               data_bitrate_const_cnt; /*   152     4 */
| u32                        bitrate_max;          /*   156     4 */
| struct can_clock           clock;                /*   160     4 */
| unsigned int               termination_const_cnt; /*   164     4 */
| const u16  *               termination_const;    /*   168     8 */
| u16                        termination;          /*   176     2 */
|
| /* XXX 6 bytes hole, try to pack */
|
| struct gpio_desc *         termination_gpio;     /*   184     8 */
| /* --- cacheline 3 boundary (192 bytes) --- */
| u16                        termination_gpio_ohms[2]; /*   192     4 */
| unsigned int               echo_skb_max;         /*   196     4 */
| struct sk_buff * *         echo_skb;             /*   200     8 */
| enum can_state             state;                /*   208     4 */
| u32                        ctrlmode;             /*   212     4 */
| u32                        ctrlmode_supported;   /*   216     4 */
| int                        restart_ms;           /*   220     4 */
| struct delayed_work        restart_work;         /*   224    88 */
|
| /* XXX last struct has 4 bytes of padding */
|
| /* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
| int                        (*do_set_bittiming)(struct net_device *); /*   312     8 */
| /* --- cacheline 5 boundary (320 bytes) --- */
| int                        (*do_set_data_bittiming)(struct net_device *); /*   320     8 */
| int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   328     8 */
| int                        (*do_set_termination)(struct net_device *, u16); /*   336     8 */
| int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   344     8 */
| int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   352     8 */
|
| /* size: 360, cachelines: 6, members: 32 */
| /* sum members: 354, holes: 1, sum holes: 6 */
| /* paddings: 1, sum paddings: 4 */
| /* last cacheline: 40 bytes */
| };

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: dev: add sanity check in can_set_static_ctrlmode()

Previous patch removed can_priv::ctrlmode_static to replace it with
can_get_static_ctrlmode().

A condition sine qua non for this to work is that the controller
static modes should never be set in can_priv::ctrlmode_supported
(c.f. the comment on can_priv::ctrlmode_supported which states that it
is for "options that can be *modified* by netlink"). Also, this
condition is already correctly fulfilled by all existing drivers
which rely on the ctrlmode_static feature.

Nonetheless, we added an extra safeguard in can_set_static_ctrlmode()
to return an error value and to warn the developer who would be
adventurous enough to set to static a given feature that is already
set to supported.

The drivers which rely on the static controller mode are then updated
to check the return value of can_set_static_ctrlmode().

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: dev: replace can_priv::ctrlmode_static by can_get_static_ctrlmode()

The statically enabled features of a CAN controller can be retrieved
using below formula:

| u32 ctrlmode_static = priv->ctrlmode & ~priv->ctrlmode_supported;

As such, there is no need to store this information. This patch remove
the field ctrlmode_static of struct can_priv and provides, in
replacement, the inline function can_get_static_ctrlmode() which
returns the same value.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: do not increase tx_bytes statistics for RTR frames

The actual payload length of the CAN Remote Transmission Request (RTR)
frames is always 0, i.e. no payload is transmitted on the wire.
However, those RTR frames still use the DLC to indicate the length of
the requested frame.

As such, net_device_stats::tx_bytes should not be increased when
sending RTR frames.

The function can_get_echo_skb() already returns the correct length,
even for RTR frames (c.f. [1]). However, for historical reasons, the
drivers do not use can_get_echo_skb()'s return value and instead, most
of them store a temporary length (or dlc) in some local structure or
array. Using the return value of can_get_echo_skb() solves the
issue. After doing this, such length/dlc fields become unused and so
this patch does the adequate cleaning when needed.

This patch fixes all the CAN drivers.

Finally, can_get_echo_skb() is decorated with the __must_check
attribute in order to force future drivers to correctly use its return
value (else the compiler would emit a warning).

[1] commit ed3320cec279 ("can: dev: __can_get_echo_skb():
fix real payload length return value for RTR frames")

Link: https://lore.kernel.org/all/[email protected]
Cc: Nicolas Ferre <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: Ludovic Desroches <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Jernej Skrabec <[email protected]>
Cc: Yasushi SHOJI <[email protected]>
Cc: Oliver Hartkopp <[email protected]>
Cc: Stephane Grosjean <[email protected]>
Cc: Andreas Larsson <[email protected]>
Tested-by: Jimmy Assarsson <[email protected]> # kvaser
Signed-off-by: Vincent Mailhol <[email protected]>
Acked-by: Stefan Mätje <[email protected]> # esd_usb2
Tested-by: Stefan Mätje <[email protected]> # esd_usb2
[mkl: add conversion for grcan]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: do not increase rx_bytes statistics for RTR frames

The actual payload length of the CAN Remote Transmission Request (RTR)
frames is always 0, i.e. no payload is transmitted on the wire.
However, those RTR frames still use the DLC to indicate the length of
the requested frame.

As such, net_device_stats::rx_bytes should not be increased for the
RTR frames.

This patch fixes all the CAN drivers.

Link: https://lore.kernel.org/all/[email protected]
Cc: Marc Kleine-Budde <[email protected]>
Cc: Nicolas Ferre <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: Ludovic Desroches <[email protected]>
Cc: Chandrasekar Ramakrishnan <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Jernej Skrabec <[email protected]>
Cc: Yasushi SHOJI <[email protected]>
Cc: Appana Durga Kedareswara rao <[email protected]>
Cc: Naga Sureshkumar Relli <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Stephane Grosjean <[email protected]>
Tested-by: Jimmy Assarsson <[email protected]> # kvaser
Signed-off-by: Vincent Mailhol <[email protected]>
Acked-by: Stefan Mätje <[email protected]> # esd_usb2
Tested-by: Stefan Mätje <[email protected]> # esd_usb2
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: do not copy the payload of RTR frames

The actual payload length of the CAN Remote Transmission Request (RTR)
frames is always 0, i.e. no payload is transmitted on the wire.
However, those RTR frames still use the DLC to indicate the length of
the requested frame.

For this reason, it is incorrect to copy the payload of RTR frames
(the payload buffer would only contain garbage data). This patch
encapsulates the payload copy in a check toward the RTR flag.

Link: https://lore.kernel.org/all/[email protected]
Cc: Yasushi SHOJI <[email protected]>
Tested-by: Yasushi SHOJI <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: do not increase tx statistics when sending error message frames

The CAN error message frames (i.e. error skb) are an interface
specific to socket CAN. The payload of the CAN error message frames
does not correspond to any actual data sent on the wire. Only an error
flag and a delimiter are transmitted when an error occurs (c.f. ISO
11898-1 section 10.4.4.2 "Error flag").

For this reason, it makes no sense to increment the tx_packets and
tx_bytes fields of struct net_device_stats when sending an error
message frame because no actual payload will be transmitted on the
wire.

N.B. Sending error message frames is a very specific feature which, at
the moment, is only supported by the Kvaser Hydra hardware. Please
refer to [1] for more details on the topic.

[1] https://lore.kernel.org/linux-can/CAMZ6RqK0rTNg3u3mBpZOoY51jLZ-et-J01tY6-+mWsM4meVw-A@mail.gmail.com/t/#u

Link: https://lore.kernel.org/all/[email protected]
Co-developed-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: do not increase rx statistics when generating a CAN rx error message frame

The CAN error message frames (i.e. error skb) are an interface
specific to socket CAN. The payload of the CAN error message frames
does not correspond to any actual data sent on the wire. Only an error
flag and a delimiter are transmitted when an error occurs (c.f. ISO
11898-1 section 10.4.4.2 "Error flag").

For this reason, it makes no sense to increment the rx_packets and
rx_bytes fields of struct net_device_stats because no actual payload
were transmitted on the wire.

This patch fixes all the CAN drivers.

Link: https://lore.kernel.org/all/[email protected]
CC: Marc Kleine-Budde <[email protected]>
CC: Nicolas Ferre <[email protected]>
CC: Alexandre Belloni <[email protected]>
CC: Ludovic Desroches <[email protected]>
CC: Chandrasekar Ramakrishnan <[email protected]>
CC: Maxime Ripard <[email protected]>
CC: Chen-Yu Tsai <[email protected]>
CC: Jernej Skrabec <[email protected]>
CC: Appana Durga Kedareswara rao <[email protected]>
CC: Naga Sureshkumar Relli <[email protected]>
CC: Michal Simek <[email protected]>
CC: Stephane Grosjean <[email protected]>
Tested-by: Jimmy Assarsson <[email protected]> # kvaser
Signed-off-by: Vincent Mailhol <[email protected]>
Acked-by: Stefan Mätje <[email protected]> # esd_usb2
Tested-by: Stefan Mätje <[email protected]> # esd_usb2
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: etas_es58x: es58x_init_netdev: populate net_device::dev_port

The field dev_port of struct net_device indicates the port number of a
network device [1]. This patch populates this field.

This field can be helpful to distinguish between the two network
interfaces of a dual channel device (i.e. ES581.4 or ES582.1). Indeed,
at the moment, all the network interfaces of a same device share the
same static udev attributes c.f. output of:

| udevadm info --attribute-walk /sys/class/net/canX

The dev_port attribute can then be used to write some udev rules to,
for example, assign a permanent name to each network interface based
on the serial/dev_port pair (which is convenient when you have a test
bench with several CAN devices connected simultaneously and wish to
keep consistent interface names upon reboot).

[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net

Link: https://lore.kernel.org/all/[email protected]
Suggested-by: Lukas Magel <[email protected]>
Signed-off-by: Vincent Mailhol <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: sja1000: sp_probe(): use platform_get_irq() to get the interrupt

It is preferred that drivers use platform_get_irq() instead of
irq_of_parse_and_map(), so replace.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Lad Prabhakar <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ti_hecc: ti_hecc_probe(): use platform_get_irq() to get the interrupt

platform_get_resource(pdev, IORESOURCE_IRQ, ..) relies on static
allocation of IRQ resources in DT core code, this causes an issue when
using hierarchical interrupt domains using "interrupts" property in
the node as this bypasses the hierarchical setup and messes up the irq
chaining.

In preparation for removal of static setup of IRQ resource from DT
core code use platform_get_irq().

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Lad Prabhakar <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: make use of units.h in assignment of frequency

Use the MEGA define plus the comment /* Hz */ when assigning
frequencies.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: mcp251x: mcp251x_gpio_setup(): Get rid of duplicate of_node assignment

GPIO library does copy the of_node from the parent device of the GPIO
chip, there is no need to repeat this in the individual drivers.
Remove assignment here.

For the details one may look into the of_gpio_dev_init()
implementation.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: usb_8dev: remove unused member echo_skb from struct usb_8dev_priv

This patch removes the unused memberecho_skb from the struct
usb_8dev_priv.

Fixes: 0024d8ad1639 ("can: usb_8dev: Add support for USB2CAN interface from 8 devices")
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

Revert "net: wwan: iosm: Keep device at D0 for s2idle case"

Depending on BIOS configuration IOSM driver exchanges
protocol required for putting device into D3L2 or D3L1.2.

ipc_pcie_suspend_s2idle() is implemented to put device to D3L1.2.

This patch forces PCI core know this device should stay at D0.
- pci_save_state()is expensive since it does a lot of slow PCI
config reads.

The reported issue is not observed on x86 platform. The supurios
wake on AMD platform needs to be futher debugged with orignal patch
submitter [1]. Also the impact of adding pci_save_state() needs to be
assessed by testing it on other platforms.

This reverts commit f4dd5174e273("net: wwan: iosm: Keep device
at D0 for s2idle case").

[1] https://lore.kernel.org/all/20211224081914 [email protected]/

Signed-off-by: M Chetan Kumar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sfc: The RX page_ring is optional

The RX page_ring is an optional feature that improves
performance. When allocation fails the driver can still
function, but possibly with a lower bandwidth.
Guard against dereferencing a NULL page_ring.

Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs")
Signed-off-by: Martin Habets <[email protected]>
Reported-by: Jiasheng Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

iavf: Fix limit of total number of queues to active queues of VF

In the absence of this validation, if the user requests to
configure queues more than the enabled queues, it results in
sending the requested number of queues to the kernel stack
(due to the asynchronous nature of VF response), in which
case the stack might pick a queue to transmit that is not
enabled and result in Tx hang. Fix this bug by
limiting the total number of queues allocated for VF to
active queues of VF.

Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Ashwin Vijayavel <[email protected]>
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix incorrect netdev's real number of RX/TX queues

There was a wrong queues representation in sysfs during
driver's reinitialization in case of online cpus number is
less than combined queues. It was caused by stopped
NetworkManager, which is responsible for calling vsi_open
function during driver's initialization.
In specific situation (ex. 12 cpus online) there were 16 queues
in /sys/class/net/<iface>/queues. In case of modifying queues with
value higher, than number of online cpus, then it caused write
errors and other errors.
Add updating of sysfs's queues representation during driver
initialization.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Lukasz Cieplicki <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix for displaying message regarding NVM version

When loading the i40e driver, it prints a message like: 'The driver for the
device detected a newer version of the NVM image v1.x than expected v1.y.
Please install the most recent version of the network driver.' This is
misleading as the driver is working as expected.

Fix that by removing the second part of message and changing it from
dev_info to dev_dbg.

Fixes: 4fb29bddb57f ("i40e: The driver now prints the API version in error message")
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: fix use-after-free in i40e_sync_filters_subtask()

Using ifconfig command to delete the ipv6 address will cause
the i40e network card driver to delete its internal mac_filter and
i40e_service_task kernel thread will concurrently access the mac_filter.
These two processes are not protected by lock
so causing the following use-after-free problems.

print_address_description+0x70/0x360
? vprintk_func+0x5e/0xf0
kasan_report+0x1b2/0x330
i40e_sync_vsi_filters+0x4f0/0x1850 [i40e]
i40e_sync_filters_subtask+0xe3/0x130 [i40e]
i40e_service_task+0x195/0x24c0 [i40e]
process_one_work+0x3f5/0x7d0
worker_thread+0x61/0x6c0
? process_one_work+0x7d0/0x7d0
kthread+0x1c3/0x1f0
? kthread_park+0xc0/0xc0
ret_from_fork+0x35/0x40

Allocated by task 2279810:
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc_trace+0xf3/0x1e0
i40e_add_filter+0x127/0x2b0 [i40e]
i40e_add_mac_filter+0x156/0x190 [i40e]
i40e_addr_sync+0x2d/0x40 [i40e]
__hw_addr_sync_dev+0x154/0x210
i40e_set_rx_mode+0x6d/0xf0 [i40e]
__dev_set_rx_mode+0xfb/0x1f0
__dev_mc_add+0x6c/0x90
igmp6_group_added+0x214/0x230
__ipv6_dev_mc_inc+0x338/0x4f0
addrconf_join_solict.part.7+0xa2/0xd0
addrconf_dad_work+0x500/0x980
process_one_work+0x3f5/0x7d0
worker_thread+0x61/0x6c0
kthread+0x1c3/0x1f0
ret_from_fork+0x35/0x40

Freed by task 2547073:
__kasan_slab_free+0x130/0x180
kfree+0x90/0x1b0
__i40e_del_filter+0xa3/0xf0 [i40e]
i40e_del_mac_filter+0xf3/0x130 [i40e]
i40e_addr_unsync+0x85/0xa0 [i40e]
__hw_addr_sync_dev+0x9d/0x210
i40e_set_rx_mode+0x6d/0xf0 [i40e]
__dev_set_rx_mode+0xfb/0x1f0
__dev_mc_del+0x69/0x80
igmp6_group_dropped+0x279/0x510
__ipv6_dev_mc_dec+0x174/0x220
addrconf_leave_solict.part.8+0xa2/0xd0
__ipv6_ifa_notify+0x4cd/0x570
ipv6_ifa_notify+0x58/0x80
ipv6_del_addr+0x259/0x4a0
inet6_addr_del+0x188/0x260
addrconf_del_ifaddr+0xcc/0x130
inet6_ioctl+0x152/0x190
sock_do_ioctl+0xd8/0x2b0
sock_ioctl+0x2e5/0x4c0
do_vfs_ioctl+0x14e/0xa80
ksys_ioctl+0x7c/0xa0
__x64_sys_ioctl+0x42/0x50
do_syscall_64+0x98/0x2c0
entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Di Zhu <[email protected]>
Signed-off-by: Rui Zhang <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix to not show opcode msg on unsuccessful VF MAC change

Hide i40e opcode information sent during response to VF in case when
untrusted VF tried to change MAC on the VF interface.

This is implemented by adding an additional parameter 'hide' to the
response sent to VF function that hides the display of error
information, but forwards the error code to VF.

Previously it was not possible to send response with some error code
to VF without displaying opcode information.

Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Reviewed-by: Paul M Stillwell Jr <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ieee802154: atusb: fix uninit value in atusb_set_extended_addr

Alexander reported a use of uninitialized value in
atusb_set_extended_addr(), that is caused by reading 0 bytes via
usb_control_msg().

Fix it by validating if the number of bytes transferred is actually
correct, since usb_control_msg() may read less bytes, than was requested
by caller.

Fail log:

BUG: KASAN: uninit-cmp in ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline]
BUG: KASAN: uninit-cmp in atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline]
BUG: KASAN: uninit-cmp in atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056
Uninit value used in comparison: 311daa649a2003bd stack handle: 000000009a2003bd
ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline]
atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline]
atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056
usb_probe_interface+0x314/0x7f0 drivers/usb/core/driver.c:396

Fixes: 7490b008d123 ("ieee802154: add support for atusb transceiver")
Reported-by: Alexander Potapenko <[email protected]>
Acked-by: Alexander Aring <[email protected]>
Signed-off-by: Pavel Skripkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>

Merge tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Just a few more changes:
- mac80211: allow non-standard VHT MCSes 10/11
- mac80211: add sleepable station iterator for drivers
- nl80211: clarify a comment
- mac80211: small cleanup to use typed element helpers

* tag 'mac80211-next-for-net-next-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
  mac80211: use ieee80211_bss_get_elem()
  nl80211: clarify comment for mesh PLINK_BLOCKED state
  mac80211: Add stations iterator where the iterator function may sleep
  mac80211: allow non-standard VHT MCS-10/11
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two more changes:
- mac80211: initialize a variable to avoid using it uninitialized
- mac80211 mesh: put some data structures into the container to
   fix bugs with and not have to deal with allocation failures

* tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
  mac80211: initialize variable have_higher_than_11mbit
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mac80211: use ieee80211_bss_get_elem()

Instead of ieee80211_bss_get_ie(), use the more typed
ieee80211_bss_get_elem().

Link: https://lore.kernel.org/r/20211220113609.56f8e2a70152.Id5a56afb8a4f9b38d10445e5a1874e93e84b5251@changeid
Signed-off-by: Johannes Berg <[email protected]>

nl80211: clarify comment for mesh PLINK_BLOCKED state

When a mesh link is in blocked state, it is very useful to still allow
auth requests from the peer to re-establish it.
When a remote node is power cycled, the peer state can easily end up
in blocked state if multiple auth attempts are performed. Since this
can lead to several minutes of downtime, we should accept auth attempts
of the peer after it has come back.

Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: Add stations iterator where the iterator function may sleep

ieee80211_iterate_active_interfaces() and
ieee80211_iterate_active_interfaces_atomic() already exist, where the
former allows the iterator function to sleep. Add
ieee80211_iterate_stations() which is similar to
ieee80211_iterate_stations_atomic() but allows the iterator to sleep.
This is needed for adding SDIO support to the rtw88 driver. Some
interators there are reading or writing registers. With the SDIO ops
(sdio_readb, sdio_writeb and friends) this means that the iterator
function may sleep.

Signed-off-by: Martin Blumenstingl <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: allow non-standard VHT MCS-10/11

Some AP can possibly try non-standard VHT rate and mac80211 warns and drops
packets, and leads low TCP throughput.

    Rate marked as a VHT rate but data is invalid: MCS: 10, NSS: 2
    WARNING: CPU: 1 PID: 7817 at net/mac80211/rx.c:4856 ieee80211_rx_list+0x223/0x2f0 [mac8021

Since commit c27aa56a72b8 ("cfg80211: add VHT rate entries for MCS-10 and MCS-11")
has added, mac80211 adds this support as well.

After this patch, throughput is good and iw can get the bitrate:
    rx bitrate: 975.1 MBit/s VHT-MCS 10 80MHz short GI VHT-NSS 2
or
    rx bitrate: 1083.3 MBit/s VHT-MCS 11 80MHz short GI VHT-NSS 2

Buglink: https://bugzilla.suse.com/show_bug.cgi?id=1192891
Reported-by: Goldwyn Rodrigues <[email protected]>
Signed-off-by: Ping-Ke Shih <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh

Syzbot hit NULL deref in rhashtable_free_and_destroy(). The problem was
in mesh_paths and mpp_paths being NULL.

mesh_pathtbl_init() could fail in case of memory allocation failure, but
nobody cared, since ieee80211_mesh_init_sdata() returns void. It led to
leaving 2 pointers as NULL. Syzbot has found null deref on exit path,
but it could happen anywhere else, because code assumes these pointers are
valid.

Since all ieee80211_*_setup_sdata functions are void and do not fail,
let's embedd mesh_paths and mpp_paths into parent struct to avoid
adding error handling on higher levels and follow the pattern of others
setup_sdata functions

Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable")
Reported-and-tested-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: initialize variable have_higher_than_11mbit

Clang static analysis reports this warnings

mlme.c:5332:7: warning: Branch condition evaluates to a
  garbage value
    have_higher_than_11mbit)
    ^~~~~~~~~~~~~~~~~~~~~~~

have_higher_than_11mbit is only set to true some of the time in
ieee80211_get_rates() but is checked all of the time.  So
have_higher_than_11mbit needs to be initialized to false.

Fixes: 5d6a1b069b7f ("mac80211: set basic rates earlier")
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

ethernet/sfc: remove redundant rc variable

Return value from efx_mcdi_rpc() directly instead
of taking this in another redundant variable.

Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Minghao Chi <[email protected]>
Signed-off-by: CGEL ZTE <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'namespacify-mtu-ipv4'

xu xin says:

====================
ipv4: Namespaceify two sysctls related with mtu

The following patch series enables the min_pmtu and mtu_expires to
be visible and configurable per net namespace. Different namespace
application might have different requirements on the setting of
min_pmtu and mtu_expires.

If these two patches are applied, inside a net namespace we create,
we can see two more sysctls under /proc/sys/net/ipv4/route:
1. min_pmtu
2. mtu_expires

where min_pmtu and mtu_expires are configurable.
====================

Signed-off-by: David S. Miller <[email protected]>

Namespaceify mtu_expires sysctl

This patch enables the sysctl mtu_expires to be configured per net
namespace.

Signed-off-by: xu xin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Namespaceify min_pmtu sysctl

This patch enables the sysctl min_pmtu to be configured per net
namespace.

Signed-off-by: xu xin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc

tx_queue_len can be set to ~0U, we need to be more
careful about overflows.

__fls(0) is undefined, as this report shows:

UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24
shift exponent 51770272 is too large for 32-bit type 'int'
CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330
qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430
qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253
tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660
rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x5b9/0x910 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x280/0x370 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netrom: fix copying in user data in nr_setsockopt

This code used to copy in an unsigned long worth of data before
the sockptr_t conversion, so restore that.

Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fixup build after bpf header changes

Recent bpf-next merge brought in header changes which uncovered
includes missing in net-next which were not present in bpf-next.
Build problems happen only on less-popular arches like hppa,
sparc, alpha etc.

I could repro the build problem with ice but not the mlx5 problem
Abdul was reporting. mlx5 does look like it should include filter.h,
anyway.

Reported-by: Abdul Haleem <[email protected]>
Fixes: e63a02348958 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lantiq_xrx200: add ingress SG DMA support

This patch adds support for scatter gather DMA. DMA in PMAC splits
the packet into several buffers when the MTU on the CPU port is
less than the MTU of the switch. The first buffer starts at an
offset of NET_IP_ALIGN. In subsequent buffers, dma ignores the
offset. Thanks to this patch, the user can still connect to the
device in such a situation. For normal configurations, the patch
has no effect on performance.

Signed-off-by: Aleksander Jan Bajkowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>