linux.git
3 years agosmb3: add rasize mount parameter to improve readahead performance
Steve French [Sun, 25 Apr 2021 02:46:23 +0000 (21:46 -0500)]
smb3: add rasize mount parameter to improve readahead performance

In some cases readahead of more than the read size can help
(to allow parallel i/o of read ahead which can improve performance).

Ceph introduced a mount parameter "rasize" to allow controlling this.
Add mount parameter "rasize" to allow control of amount of readahead
requested of the server. If rasize not set, rasize defaults to
negotiated rsize as before.

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoMerge branch 'bpf: Tracing and lsm programs re-attach'
Alexei Starovoitov [Mon, 26 Apr 2021 04:09:03 +0000 (21:09 -0700)]
Merge branch 'bpf: Tracing and lsm programs re-attach'

Jiri Olsa says:

====================

hi,
while adding test for pinning the module while there's
trampoline attach to it, I noticed that we don't allow
link detach and following re-attach for trampolines.
Adding that for tracing and lsm programs.

You need to have patch [1] from bpf tree for test module
attach test to pass.

v5 changes:
  - fixed missing hlist_del_init change
  - fixed several ASSERT calls
  - added extra patch for missing ';'
  - added ASSERT macros to lsm test
  - added acks

thanks,
jirka

[1] https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 years agoselftests/bpf: Use ASSERT macros in lsm test
Jiri Olsa [Wed, 14 Apr 2021 19:51:47 +0000 (21:51 +0200)]
selftests/bpf: Use ASSERT macros in lsm test

Replacing CHECK with ASSERT macros.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-8-jolsa@kernel.org
3 years agoselftests/bpf: Test that module can't be unloaded with attached trampoline
Jiri Olsa [Wed, 14 Apr 2021 19:51:46 +0000 (21:51 +0200)]
selftests/bpf: Test that module can't be unloaded with attached trampoline

Adding test to verify that once we attach module's trampoline,
the module can't be unloaded.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-7-jolsa@kernel.org
3 years agoselftests/bpf: Add re-attach test to lsm test
Jiri Olsa [Wed, 14 Apr 2021 19:51:45 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to lsm test

Adding the test to re-attach (detach/attach again) lsm programs,
plus check that already linked program can't be attached again.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-6-jolsa@kernel.org
3 years agoselftests/bpf: Add re-attach test to fexit_test
Jiri Olsa [Wed, 14 Apr 2021 19:51:44 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to fexit_test

Adding the test to re-attach (detach/attach again) tracing
fexit programs, plus check that already linked program can't
be attached again.

Also switching to ASSERT* macros.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-5-jolsa@kernel.org
3 years agoselftests/bpf: Add re-attach test to fentry_test
Jiri Olsa [Wed, 14 Apr 2021 19:51:43 +0000 (21:51 +0200)]
selftests/bpf: Add re-attach test to fentry_test

Adding the test to re-attach (detach/attach again) tracing
fentry programs, plus check that already linked program can't
be attached again.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-4-jolsa@kernel.org
3 years agobpf: Allow trampoline re-attach for tracing and lsm programs
Jiri Olsa [Wed, 14 Apr 2021 19:51:41 +0000 (21:51 +0200)]
bpf: Allow trampoline re-attach for tracing and lsm programs

Currently we don't allow re-attaching of trampolines. Once
it's detached, it can't be re-attach even when the program
is still loaded.

Adding the possibility to re-attach the loaded tracing and
lsm programs.

Fixing missing unlock with proper cleanup goto jump reported
by Julia.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210414195147.1624932-2-jolsa@kernel.org
3 years agonetfilter: nfnetlink: add struct nfnl_info and pass it to callbacks
Pablo Neira Ayuso [Thu, 22 Apr 2021 22:17:09 +0000 (00:17 +0200)]
netfilter: nfnetlink: add struct nfnl_info and pass it to callbacks

Add a new structure to reduce callback footprint and to facilite
extensions of the nfnetlink callback interface in the future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nftables: add nft_pernet() helper function
Pablo Neira Ayuso [Thu, 22 Apr 2021 22:17:08 +0000 (00:17 +0200)]
netfilter: nftables: add nft_pernet() helper function

Consolidate call to net_generic(net, nf_tables_net_id) in this
wrapper function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agoMerge branch 'bnxt_en-next'
David S. Miller [Mon, 26 Apr 2021 01:37:39 +0000 (18:37 -0700)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

This series includes these main enhancements:

1. Link related changes
    - add NRZ/PAM4 link signal mode to the link up message if known
    - rely on firmware to bring down the link during ifdown

2. SRIOV related changes
    - allow VF promiscuous mode if the VF is trusted
    - allow ndo operations to configure VF when the PF is ifdown
    - fix the scenario of the VF taking back control of it's MAC address
    - add Hyper-V VF device IDs

3. Support the option to transmit without FCS/CRC.

4. Implement .ndo_features_check() to disable offload when the UDP
   encap. packets are not supported.

v2: Patch10: Reverse the check for supported UDP ports to be more straight
forward.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Implement .ndo_features_check().
Michael Chan [Sun, 25 Apr 2021 17:45:27 +0000 (13:45 -0400)]
bnxt_en: Implement .ndo_features_check().

For UDP encapsultions, we only support the offloaded Vxlan port and
Geneve port.  All other ports included FOU and GUE are not supported so
we need to turn off TSO and checksum features.

v2: Reverse the check for supported UDP ports to be more straight forward.

Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Support IFF_SUPP_NOFCS feature to transmit without ethernet FCS.
Michael Chan [Sun, 25 Apr 2021 17:45:26 +0000 (13:45 -0400)]
bnxt_en: Support IFF_SUPP_NOFCS feature to transmit without ethernet FCS.

If firmware is capable, set the IFF_SUPP_NOFCS flag to support the
sockets option to transmit packets without FCS.  This is mainly used
for testing.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Add PCI IDs for Hyper-V VF devices.
Michael Chan [Sun, 25 Apr 2021 17:45:25 +0000 (13:45 -0400)]
bnxt_en: Add PCI IDs for Hyper-V VF devices.

Support VF device IDs used by the Hyper-V hypervisor.

Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Call bnxt_approve_mac() after the PF gives up control of the VF MAC.
Michael Chan [Sun, 25 Apr 2021 17:45:24 +0000 (13:45 -0400)]
bnxt_en: Call bnxt_approve_mac() after the PF gives up control of the VF MAC.

When the PF is no longer enforcing an assigned MAC address on a VF, the
VF needs to call bnxt_approve_mac() to tell the PF what MAC address it is
now using.  Otherwise it gets out of sync and the PF won't know what
MAC address the VF wants to use.  Ultimately the VF will fail when it
tries to setup the L2 MAC filter for the vnic.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Move bnxt_approve_mac().
Michael Chan [Sun, 25 Apr 2021 17:45:23 +0000 (13:45 -0400)]
bnxt_en: Move bnxt_approve_mac().

Move it before bnxt_update_vf_mac().  In the next patch, we need to call
bnxt_approve_mac() from bnxt_update_mac() under some conditions.  This
will avoid forward declaration.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: allow VF config ops when PF is closed
Edwin Peer [Sun, 25 Apr 2021 17:45:22 +0000 (13:45 -0400)]
bnxt_en: allow VF config ops when PF is closed

It is perfectly legal for the stack to query and configure VFs via PF
NDOs while the NIC is administratively down.  Remove the unnecessary
check for the PF to be in open state.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: allow promiscuous mode for trusted VFs
Edwin Peer [Sun, 25 Apr 2021 17:45:21 +0000 (13:45 -0400)]
bnxt_en: allow promiscuous mode for trusted VFs

Firmware previously only allowed promiscuous mode for VFs associated with
a default VLAN. It is now possible to enable promiscuous mode for a VF
having no VLAN configured provided that it is trusted. In such cases the
VF will see all packets received by the PF, irrespective of destination
MAC or VLAN.

Note, it is necessary to query firmware at the time of bnxt_promisc_ok()
instead of in bnxt_hwrm_func_qcfg() because the trusted status might be
altered by the PF after the VF has been configured. This check must now
also be deferred because the firmware call sleeps.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Add support for fw managed link down feature.
Michael Chan [Sun, 25 Apr 2021 17:45:20 +0000 (13:45 -0400)]
bnxt_en: Add support for fw managed link down feature.

In the current code, the driver will not shutdown the link during
IFDOWN if there are still VFs sharing the port.  Newer firmware will
manage the link down decision when the port is shared by VFs, so
we can just call firmware to shutdown the port unconditionally and
let firmware make the final decision.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Add a new phy_flags field to the main driver structure.
Michael Chan [Sun, 25 Apr 2021 17:45:19 +0000 (13:45 -0400)]
bnxt_en: Add a new phy_flags field to the main driver structure.

Copy the phy related feature flags from the firmware call
HWRM_PORT_PHY_QCAPS to this new field.  We can also remove the flags
field in the bnxt_test_info structure.  It's cleaner to have all PHY
related flags in one location, directly copied from the firmware.

To keep the BNXT_PHY_CFG_ABLE() macro logic the same, we need to make
a slight adjustment to check that it is a PF.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: report signal mode in link up messages
Edwin Peer [Sun, 25 Apr 2021 17:45:18 +0000 (13:45 -0400)]
bnxt_en: report signal mode in link up messages

Firmware reports link signalling mode for certain speeds. In these
cases, print the signalling modes in kernel log link up messages.

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agomacvlan: Add nodst option to macvlan type source
Jethro Beekman [Sun, 25 Apr 2021 09:22:03 +0000 (11:22 +0200)]
macvlan: Add nodst option to macvlan type source

The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

v2: netdev wants non-standard line length

Signed-off-by: Jethro Beekman <kernel@jbeekman.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'mlx5-updates-2021-04-21' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Mon, 26 Apr 2021 01:31:35 +0000 (18:31 -0700)]
Merge tag 'mlx5-updates-2021-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-21

devlink external port attribute for SF (Sub-Function) port flavour

This adds the support to instantiate Sub-Functions on external hosts
E.g when Eswitch manager is enabled on the ARM SmarNic SoC CPU, users
are now able to spawn new Sub-Functions on the Host server CPU.

Parav Pandit Says:
==================

This series introduces and uses external attribute for the SF port to
indicate that a SF port belongs to an external controller.

This is needed to generate unique phys_port_name when PF and SF numbers
are overlapping between local and external controllers.
For example two controllers 0 and 1, both of these controller have a SF.
having PF number 0, SF number 77. Here, phys_port_name has duplicate
entry which doesn't have controller number in it.

Hence, add controller number optionally when a SF port is for an
external controller. This extension is similar to existing PF and VF
eswitch ports of the external controller.

When a SF is for external controller an example view of external SF
port and config sequence:

On eswitch system:
$ devlink dev eswitch set pci/0033:01:00.0 mode switchdev

$ devlink port show
pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false
pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1
pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

phys_port_name construction:
$ cat /sys/class/net/eth1/phys_port_name
c1pf0sf77

Patch summary:
First 3 patches prepares the eswitch to handle vports in more generic
way using xarray to lookup vport from its unique vport number.
Patch-1 returns maximum eswitch ports only when eswitch is enabled
Patch-2 prepares eswitch to return eswitch max ports from a struct
Patch-3 uses xarray for vport and representor lookup
Patch-4 considers SF for an additioanl range of SF vports
Patch-5 relies on SF hw table to check SF support
Patch-6 extends SF devlink port attribute for external flag
Patch-7 stores the per controller SF allocation attributes
Patch-8 uses SF function id for filtering events
Patch-9 uses helper for allocation and free
Patch-10 splits hw table into per controller table and generic one
Patch-11 extends sf table for additional range

==================

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ixp4xx: Support device tree probing
Linus Walleij [Sun, 25 Apr 2021 00:30:38 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Support device tree probing

This adds device tree probing to the IXP4xx ethernet
driver.

Add a platform data bool to tell us whether to
register an MDIO bus for the device or not, as well
as the corresponding NPE.

We need to drop the memory region request as part of
this since the OF core will request the memory for the
device.

Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ixp4xx: Retire ancient phy retrieveal
Linus Walleij [Sun, 25 Apr 2021 00:30:37 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Retire ancient phy retrieveal

This driver was using a really dated way of obtaining the
phy by printing a string and using it with phy_connect().
Switch to using more reasonable modern interfaces.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: ethernet: ixp4xx: Add DT bindings
Linus Walleij [Sun, 25 Apr 2021 00:30:36 +0000 (02:30 +0200)]
net: ethernet: ixp4xx: Add DT bindings

This adds device tree bindings for the IXP4xx ethernet
controller with optional MDIO bridge.

Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Cc: devicetree@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agor8152: remove some bit operations
Hayes Wang [Sat, 24 Apr 2021 06:09:03 +0000 (14:09 +0800)]
r8152: remove some bit operations

Remove DELL_TB_RX_AGG_BUG and LENOVO_MACPASSTHRU flags of rtl8152_flags.
They are only set when initializing and wouldn't be change. It is enough
to record them with variables.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonetfilter: nf_log_syslog: Unset bridge logger in pernet exit
Phil Sutter [Wed, 21 Apr 2021 10:34:21 +0000 (12:34 +0200)]
netfilter: nf_log_syslog: Unset bridge logger in pernet exit

Without this, a stale pointer remains in pernet loggers after module
unload causing a kernel oops during dereference. Easily reproduced by:

| # modprobe nf_log_syslog
| # rmmod nf_log_syslog
| # cat /proc/net/netfilter/nf_log

Fixes: 77ccee96a6742 ("netfilter: nf_log_bridge: merge with nf_log_syslog")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: remove all xt_table anchors from struct net
Florian Westphal [Wed, 21 Apr 2021 07:51:10 +0000 (09:51 +0200)]
netfilter: remove all xt_table anchors from struct net

No longer needed, table pointer arg is now passed via netfilter core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: ip6_tables: pass table pointer via nf_hook_ops
Florian Westphal [Wed, 21 Apr 2021 07:51:09 +0000 (09:51 +0200)]
netfilter: ip6_tables: pass table pointer via nf_hook_ops

Same patch as the ip_tables one: removal of all accesses to ip6_tables
xt_table pointers.  After this patch the struct net xt_table anchors
can be removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: arp_tables: pass table pointer via nf_hook_ops
Florian Westphal [Wed, 21 Apr 2021 07:51:08 +0000 (09:51 +0200)]
netfilter: arp_tables: pass table pointer via nf_hook_ops

Same change as previous patch.  Only difference:
no need to handle NULL template_ops parameter, the only caller
(arptable_filter) always passes non-NULL argument.

This removes all remaining accesses to net->ipv4.arptable_filter.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: ip_tables: pass table pointer via nf_hook_ops
Florian Westphal [Wed, 21 Apr 2021 07:51:07 +0000 (09:51 +0200)]
netfilter: ip_tables: pass table pointer via nf_hook_ops

iptable_x modules rely on 'struct net' to contain a pointer to the
table that should be evaluated.

In order to remove these pointers from struct net, pass them via
the 'priv' pointer in a similar fashion as nf_tables passes the
rule data.

To do that, duplicate the nf_hook_info array passed in from the
iptable_x modules, update the ops->priv pointers of the copy to
refer to the table and then change the hookfn implementations to
just pass the 'priv' argument to the traverser.

After this patch, the xt_table pointers can already be removed
from struct net.

However, changes to struct net result in re-compile of the entire
network stack, so do the removal after arptables and ip6tables
have been converted as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: xt_nat: pass table to hookfn
Florian Westphal [Wed, 21 Apr 2021 07:51:06 +0000 (09:51 +0200)]
netfilter: xt_nat: pass table to hookfn

This changes how ip(6)table nat passes the ruleset/table to the
evaluation loop.

At the moment, it will fetch the table from struct net.

This change stores the table in the hook_ops 'priv' argument
instead.

This requires to duplicate the hook_ops for each netns, so
they can store the (per-net) xt_table structure.

The dupliated nat hook_ops get stored in net_generic data area.
They are free'd in the namespace exit path.

This is a pre-requisite to remove the xt_table/ruleset pointers
from struct net.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: x_tables: remove paranoia tests
Florian Westphal [Wed, 21 Apr 2021 07:51:05 +0000 (09:51 +0200)]
netfilter: x_tables: remove paranoia tests

No need for these.
There is only one caller, the xtables core, when the table is registered
for the first time with a particular network namespace.

After ->table_init() call, the table is linked into the tables[af] list,
so next call to that function will skip the ->table_init().

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: arptables: unregister the tables by name
Florian Westphal [Wed, 21 Apr 2021 07:51:04 +0000 (09:51 +0200)]
netfilter: arptables: unregister the tables by name

and again, this time for arptables.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: ip6tables: unregister the tables by name
Florian Westphal [Wed, 21 Apr 2021 07:51:03 +0000 (09:51 +0200)]
netfilter: ip6tables: unregister the tables by name

Same as the previous patch, but for ip6tables.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: iptables: unregister the tables by name
Florian Westphal [Wed, 21 Apr 2021 07:51:02 +0000 (09:51 +0200)]
netfilter: iptables: unregister the tables by name

xtables stores the xt_table structs in the struct net.  This isn't
needed anymore, the structures could be passed via the netfilter hook
'private' pointer to the hook functions, which would allow us to remove
those pointers from struct net.

As a first step, reduce the number of accesses to the
net->ipv4.ip6table_{raw,filter,...} pointers.
This allows the tables to get unregistered by name instead of having to
pass the raw address.

The xt_table structure cane looked up by name+address family instead.

This patch is useless as-is (the backends still have the raw pointer
address), but it lowers the bar to remove those.

It also allows to put the 'was table registered in the first place' check
into ip_tables.c rather than have it in each table sub module.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: x_tables: add xt_find_table
Florian Westphal [Wed, 21 Apr 2021 07:51:01 +0000 (09:51 +0200)]
netfilter: x_tables: add xt_find_table

This will be used to obtain the xt_table struct given address family and
table name.

Followup patches will reduce the number of direct accesses to the xt_table
structures via net->ipv{4,6}.ip(6)table_{nat,mangle,...} pointers, then
remove them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: x_tables: remove ipt_unregister_table
Florian Westphal [Wed, 21 Apr 2021 07:51:00 +0000 (09:51 +0200)]
netfilter: x_tables: remove ipt_unregister_table

Its the same function as ipt_unregister_table_exit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: ebtables: remove the 3 ebtables pointers from struct net
Florian Westphal [Wed, 21 Apr 2021 07:50:59 +0000 (09:50 +0200)]
netfilter: ebtables: remove the 3 ebtables pointers from struct net

ebtables stores the table internal data (what gets passed to the
ebt_do_table() interpreter) in struct net.

nftables keeps the internal interpreter format in pernet lists
and passes it via the netfilter core infrastructure (priv pointer).

Do the same for ebtables: the nf_hook_ops are duplicated via kmemdup,
then the ops->priv pointer is set to the table that is being registered.

After that, the netfilter core passes this table info to the hookfn.

This allows to remove the pointers from struct net.

Same pattern can be applied to ip/ip6/arptables.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: disable defrag once its no longer needed
Florian Westphal [Wed, 21 Apr 2021 07:45:40 +0000 (09:45 +0200)]
netfilter: disable defrag once its no longer needed

When I changed defrag hooks to no longer get registered by default I
intentionally made it so that registration can only be un-done by unloading
the nf_defrag_ipv4/6 module.

In hindsight this was too conservative; there is no reason to keep defrag
on while there is no feature dependency anymore.

Moreover, this won't work if user isn't allowed to remove nf_defrag module.

This adds the disable() functions for both ipv4 and ipv6 and calls them
from conntrack, TPROXY and the xtables socket module.

ipvs isn't converted here, it will behave as before this patch and
will need module removal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nft_socket: add support for cgroupsv2
Pablo Neira Ayuso [Tue, 20 Apr 2021 23:12:44 +0000 (01:12 +0200)]
netfilter: nft_socket: add support for cgroupsv2

Allow to match on the cgroupsv2 id from ancestor level.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agonetfilter: nat: move nf_xfrm_me_harder to where it is used
Florian Westphal [Mon, 19 Apr 2021 16:16:49 +0000 (18:16 +0200)]
netfilter: nat: move nf_xfrm_me_harder to where it is used

remove the export and make it static.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 years agohv_netvsc: Make netvsc/VF binding check both MAC and serial number
Dexuan Cui [Sat, 24 Apr 2021 01:12:35 +0000 (18:12 -0700)]
hv_netvsc: Make netvsc/VF binding check both MAC and serial number

Currently the netvsc/VF binding logic only checks the PCI serial number.

The Microsoft Azure Network Adapter (MANA) supports multiple net_device
interfaces (each such interface is called a "vPort", and has its unique
MAC address) which are backed by the same VF PCI device, so the binding
logic should check both the MAC address and the PCI serial number.

The change should not break any other existing VF drivers, because
Hyper-V NIC SR-IOV implementation requires the netvsc network
interface and the VF network interface have the same MAC address.

Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Co-developed-by: Shachar Raindel <shacharr@microsoft.com>
Signed-off-by: Shachar Raindel <shacharr@microsoft.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobnxt_en: Fix RX consumer index logic in the error path.
Michael Chan [Fri, 23 Apr 2021 22:13:19 +0000 (18:13 -0400)]
bnxt_en: Fix RX consumer index logic in the error path.

In bnxt_rx_pkt(), the RX buffers are expected to complete in order.
If the RX consumer index indicates an out of order buffer completion,
it means we are hitting a hardware bug and the driver will abort all
remaining RX packets and reset the RX ring.  The RX consumer index
that we pass to bnxt_discard_rx() is not correct.  We should be
passing the current index (tmp_raw_cons) instead of the old index
(raw_cons).  This bug can cause us to be at the wrong index when
trying to abort the next RX packet.  It can crash like this:

 #0 [ffff9bbcdf5c39a8] machine_kexec at ffffffff9b05e007
 #1 [ffff9bbcdf5c3a00] __crash_kexec at ffffffff9b111232
 #2 [ffff9bbcdf5c3ad0] panic at ffffffff9b07d61e
 #3 [ffff9bbcdf5c3b50] oops_end at ffffffff9b030978
 #4 [ffff9bbcdf5c3b78] no_context at ffffffff9b06aaf0
 #5 [ffff9bbcdf5c3bd8] __bad_area_nosemaphore at ffffffff9b06ae2e
 #6 [ffff9bbcdf5c3c28] bad_area_nosemaphore at ffffffff9b06af24
 #7 [ffff9bbcdf5c3c38] __do_page_fault at ffffffff9b06b67e
 #8 [ffff9bbcdf5c3cb0] do_page_fault at ffffffff9b06bb12
 #9 [ffff9bbcdf5c3ce0] page_fault at ffffffff9bc015c5
    [exception RIP: bnxt_rx_pkt+237]
    RIP: ffffffffc0259cdd  RSP: ffff9bbcdf5c3d98  RFLAGS: 00010213
    RAX: 000000005dd8097f  RBX: ffff9ba4cb11b7e0  RCX: ffffa923cf6e9000
    RDX: 0000000000000fff  RSI: 0000000000000627  RDI: 0000000000001000
    RBP: ffff9bbcdf5c3e60   R8: 0000000000420003   R9: 000000000000020d
    R10: ffffa923cf6ec138  R11: ffff9bbcdf5c3e83  R12: ffff9ba4d6f928c0
    R13: ffff9ba4cac28080  R14: ffff9ba4cb11b7f0  R15: ffff9ba4d5a30000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Fixes: a1b0e4e684e9 ("bnxt_en: Improve RX consumer index validity check.")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoch_ktls: Remove redundant variable result
Jiapeng Chong [Fri, 23 Apr 2021 09:52:23 +0000 (17:52 +0800)]
ch_ktls: Remove redundant variable result

Variable result is being assigned a value from a calculation
however the variable is never read, so this redundant variable
can be removed.

Cleans up the following clang-analyzer warning:

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:1488:2:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:876:3:
warning: Value stored to 'pos' is never read
[clang-analyzer-deadcode.DeadStores].

drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c:36:3:
warning: Value stored to 'start' is never read
[clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Mon, 26 Apr 2021 01:02:32 +0000 (18:02 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-04-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 69 non-merge commits during the last 22 day(s) which contain
a total of 69 files changed, 3141 insertions(+), 866 deletions(-).

The main changes are:

1) Add BPF static linker support for extern resolution of global, from Andrii.

2) Refine retval for bpf_get_task_stack helper, from Dave.

3) Add a bpf_snprintf helper, from Florent.

4) A bunch of miscellaneous improvements from many developers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosmb3: limit noisy error
Steve French [Tue, 20 Apr 2021 04:22:37 +0000 (23:22 -0500)]
smb3: limit noisy error

For servers which don't support copy_range (SMB3 CopyChunk), the
logging of:
 CIFS: VFS: \\server\share refcpy ioctl error -95 getting resume key
can fill the client logs and make debugging real problems more
difficult.  Change the -EOPNOTSUPP on copy_range to a "warn once"

Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: fix leak in cifs_smb3_do_mount() ctx
David Disseldorp [Thu, 22 Apr 2021 22:14:03 +0000 (00:14 +0200)]
cifs: fix leak in cifs_smb3_do_mount() ctx

cifs_smb3_do_mount() calls smb3_fs_context_dup() and then
cifs_setup_volume_info(). The latter's subsequent smb3_parse_devname()
call overwrites the cifs_sb->ctx->UNC string already dup'ed by
smb3_fs_context_dup(), resulting in a leak. E.g.

unreferenced object 0xffff888002980420 (size 32):
  comm "mount", pid 160, jiffies 4294892541 (age 30.416s)
  hex dump (first 32 bytes):
    5c 5c 31 39 32 2e 31 36 38 2e 31 37 34 2e 31 30  \\192.168.174.10
    34 5c 72 61 70 69 64 6f 2d 73 68 61 72 65 00 00  4\rapido-share..
  backtrace:
    [<00000000069e12f6>] kstrdup+0x28/0x50
    [<00000000b61f4032>] smb3_fs_context_dup+0x127/0x1d0 [cifs]
    [<00000000c6e3e3bf>] cifs_smb3_do_mount+0x77/0x660 [cifs]
    [<0000000063467a6b>] smb3_get_tree+0xdf/0x220 [cifs]
    [<00000000716f731e>] vfs_get_tree+0x1b/0x90
    [<00000000491d3892>] path_mount+0x62a/0x910
    [<0000000046b2e774>] do_mount+0x50/0x70
    [<00000000ca7b64dd>] __x64_sys_mount+0x81/0xd0
    [<00000000b5122496>] do_syscall_64+0x33/0x40
    [<000000002dd397af>] entry_SYSCALL_64_after_hwframe+0x44/0xae

This change is a bandaid until the cifs_setup_volume_info() TODO and
error handling issues are resolved.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
CC: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: remove unnecessary copies of tcon->crfid.fid
Muhammad Usama Anjum [Thu, 15 Apr 2021 15:24:09 +0000 (20:24 +0500)]
cifs: remove unnecessary copies of tcon->crfid.fid

pfid is being set to tcon->crfid.fid and they are copied in each other
multiple times. Remove the memcopy between same pointers - memory
locations.

Addresses-Coverity: ("Overlapped copy")
Fixes: 9e81e8ff74b9 ("cifs: return cached_fid from open_shroot")
Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Return correct error code from smb2_get_enc_key
Paul Aurich [Tue, 13 Apr 2021 21:25:27 +0000 (14:25 -0700)]
cifs: Return correct error code from smb2_get_enc_key

Avoid a warning if the error percolates back up:

[440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
[440700.386947] ------------[ cut here ]------------
[440700.386948] err = 1
[440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
...
[440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G           OE     5.4.0-70-generic #78~18.04.1-Ubuntu
...
[440700.397334] Call Trace:
[440700.397346]  __filemap_set_wb_err+0x1a/0x70
[440700.397419]  cifs_writepages+0x9c7/0xb30 [cifs]
[440700.397426]  do_writepages+0x4b/0xe0
[440700.397444]  __filemap_fdatawrite_range+0xcb/0x100
[440700.397455]  filemap_write_and_wait+0x42/0xa0
[440700.397486]  cifs_setattr+0x68b/0xf30 [cifs]
[440700.397493]  notify_change+0x358/0x4a0
[440700.397500]  utimes_common+0xe9/0x1c0
[440700.397510]  do_utimes+0xc5/0x150
[440700.397520]  __x64_sys_utimensat+0x88/0xd0

Fixes: 61cfac6f267d ("CIFS: Fix possible use after free in demultiplex thread")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: fix out-of-bound memory access when calling smb3_notify() at mount point
Eugene Korenevsky [Fri, 16 Apr 2021 07:35:30 +0000 (10:35 +0300)]
cifs: fix out-of-bound memory access when calling smb3_notify() at mount point

If smb3_notify() is called at mount point of CIFS, build_path_from_dentry()
returns the pointer to kmalloc-ed memory with terminating zero (this is
empty FileName to be passed to SMB2 CREATE request). This pointer is assigned
to the `path` variable.
Then `path + 1` (to skip first backslash symbol) is passed to
cifs_convert_path_to_utf16(). This is incorrect for empty path and causes
out-of-bound memory access.

Get rid of this "increase by one". cifs_convert_path_to_utf16() already
contains the check for leading backslash in the path.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212693
CC: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agosmb2: fix use-after-free in smb2_ioctl_query_info()
Aurelien Aptel [Fri, 9 Apr 2021 13:47:01 +0000 (15:47 +0200)]
smb2: fix use-after-free in smb2_ioctl_query_info()

* rqst[1,2,3] is allocated in vars
* each rqst->rq_iov is also allocated in vars or using pooled memory

SMB2_open_free, SMB2_ioctl_free, SMB2_query_info_free are iterating on
each rqst after vars has been freed (use-after-free), and they are
freeing the kvec a second time (double-free).

How to trigger:

* compile with KASAN
* mount a share

$ smbinfo quota /mnt/foo
Segmentation fault
$ dmesg

 ==================================================================
 BUG: KASAN: use-after-free in SMB2_open_free+0x1c/0xa0
 Read of size 8 at addr ffff888007b10c00 by task python3/1200

 CPU: 2 PID: 1200 Comm: python3 Not tainted 5.12.0-rc6+ #107
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
 Call Trace:
  dump_stack+0x93/0xc2
  print_address_description.constprop.0+0x18/0x130
  ? SMB2_open_free+0x1c/0xa0
  ? SMB2_open_free+0x1c/0xa0
  kasan_report.cold+0x7f/0x111
  ? smb2_ioctl_query_info+0x240/0x990
  ? SMB2_open_free+0x1c/0xa0
  SMB2_open_free+0x1c/0xa0
  smb2_ioctl_query_info+0x2bf/0x990
  ? smb2_query_reparse_tag+0x600/0x600
  ? cifs_mapchar+0x250/0x250
  ? rcu_read_lock_sched_held+0x3f/0x70
  ? cifs_strndup_to_utf16+0x12c/0x1c0
  ? rwlock_bug.part.0+0x60/0x60
  ? rcu_read_lock_sched_held+0x3f/0x70
  ? cifs_convert_path_to_utf16+0xf8/0x140
  ? smb2_check_message+0x6f0/0x6f0
  cifs_ioctl+0xf18/0x16b0
  ? smb2_query_reparse_tag+0x600/0x600
  ? cifs_readdir+0x1800/0x1800
  ? selinux_bprm_creds_for_exec+0x4d0/0x4d0
  ? do_user_addr_fault+0x30b/0x950
  ? __x64_sys_openat+0xce/0x140
  __x64_sys_ioctl+0xb9/0xf0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fdcf1f4ba87
 Code: b3 66 90 48 8b 05 11 14 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 13 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffef1ce7748 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00000000c018cf07 RCX: 00007fdcf1f4ba87
 RDX: 0000564c467c5590 RSI: 00000000c018cf07 RDI: 0000000000000003
 RBP: 00007ffef1ce7770 R08: 00007ffef1ce7420 R09: 00007fdcf0e0562b
 R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000004018
 R13: 0000000000000001 R14: 0000000000000003 R15: 0000564c467c5590

 Allocated by task 1200:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x7a/0x90
  smb2_ioctl_query_info+0x10e/0x990
  cifs_ioctl+0xf18/0x16b0
  __x64_sys_ioctl+0xb9/0xf0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 Freed by task 1200:
  kasan_save_stack+0x1b/0x40
  kasan_set_track+0x1c/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0xe5/0x110
  slab_free_freelist_hook+0x53/0x130
  kfree+0xcc/0x320
  smb2_ioctl_query_info+0x2ad/0x990
  cifs_ioctl+0xf18/0x16b0
  __x64_sys_ioctl+0xb9/0xf0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 The buggy address belongs to the object at ffff888007b10c00
  which belongs to the cache kmalloc-512 of size 512
 The buggy address is located 0 bytes inside of
  512-byte region [ffff888007b10c00ffff888007b10e00)
 The buggy address belongs to the page:
 page:0000000044e14b75 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7b10
 head:0000000044e14b75 order:2 compound_mapcount:0 compound_pincount:0
 flags: 0x100000000010200(slab|head)
 raw: 0100000000010200 ffffea000015f500 0000000400000004 ffff888001042c80
 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff888007b10b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff888007b10b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff888007b10c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                    ^
  ffff888007b10c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff888007b10d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: export supported mount options via new mount_params /proc file
Aurelien Aptel [Thu, 18 Mar 2021 12:52:59 +0000 (13:52 +0100)]
cifs: export supported mount options via new mount_params /proc file

Can aid in making mount problems easier to diagnose

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: log mount errors using cifs_errorf()
Aurelien Aptel [Mon, 1 Mar 2021 18:34:02 +0000 (19:34 +0100)]
cifs: log mount errors using cifs_errorf()

This makes the errors accessible from userspace via dmesg and
the fs_context fd.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: add fs_context param to parsing helpers
Aurelien Aptel [Mon, 1 Mar 2021 18:32:09 +0000 (19:32 +0100)]
cifs: add fs_context param to parsing helpers

Add fs_context param to parsing helpers to be able to log into it in
next patch.

Make some helper static as they are not used outside of fs_context.c

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: make fs_context error logging wrapper
Aurelien Aptel [Mon, 1 Mar 2021 18:25:00 +0000 (19:25 +0100)]
cifs: make fs_context error logging wrapper

This new helper will be used in the fs_context mount option parsing
code. It log errors both in:
* the fs_context log queue for userspace to read
* kernel printk buffer (dmesg, old behaviour)

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: add FALLOC_FL_INSERT_RANGE support
Ronnie Sahlberg [Fri, 26 Mar 2021 20:31:30 +0000 (06:31 +1000)]
cifs: add FALLOC_FL_INSERT_RANGE support

Emulated via server side copy and setsize for
SMB3 and later. In the future we could compound
this (and/or optionally use DUPLICATE_EXTENTS
if supported by the server).

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: add support for FALLOC_FL_COLLAPSE_RANGE
Ronnie Sahlberg [Fri, 26 Mar 2021 19:52:29 +0000 (05:52 +1000)]
cifs: add support for FALLOC_FL_COLLAPSE_RANGE

Emulated for SMB3 and later via server side copy
and setsize. Eventually this could be compounded.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: check the timestamp for the cached dirent when deciding on revalidate
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:35 +0000 (09:07 +1000)]
cifs: check the timestamp for the cached dirent when deciding on revalidate

Improves directory metadata caching

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: pass the dentry instead of the inode down to the revalidation check functions
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:34 +0000 (09:07 +1000)]
cifs: pass the dentry instead of the inode down to the revalidation check functions

Needed for the final patch in the directory caching series

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: add a timestamp to track when the lease of the cached dir was taken
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:33 +0000 (09:07 +1000)]
cifs: add a timestamp to track when the lease of the cached dir was taken

and clear the timestamp when we receive a lease break.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: add a function to get a cached dir based on its dentry
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:32 +0000 (09:07 +1000)]
cifs: add a function to get a cached dir based on its dentry

Needed for subsequent patches in the directory caching
series.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Grab a reference for the dentry of the cached directory during the lifetime...
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:31 +0000 (09:07 +1000)]
cifs: Grab a reference for the dentry of the cached directory during the lifetime of the cache

We need to hold both a reference for the root/superblock as well as the directory that we
are caching. We need to drop these references before we call kill_anon_sb().

At this point, the root and the cached dentries are always the same but this will change
once we start caching other directories as well.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: store a pointer to the root dentry in cifs_sb_info once we have completed mount...
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:30 +0000 (09:07 +1000)]
cifs: store a pointer to the root dentry in cifs_sb_info once we have completed mounting the share

And use this to only allow to take out a shared handle once the mount has completed and the
sb becomes available.
This will become important in follow up patches where we will start holding a reference to the
directory dentry for the shared handle during the lifetime of the handle.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: rename the *_shroot* functions to *_cached_dir*
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:29 +0000 (09:07 +1000)]
cifs: rename the *_shroot* functions to *_cached_dir*

These functions will eventually be used to cache any directory, not just the root
so change the names.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: pass a path to open_shroot and check if it is the root or not
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:28 +0000 (09:07 +1000)]
cifs: pass a path to open_shroot and check if it is the root or not

Move the check for the directory path into the open_shroot() function
but still fail for any non-root directories.
This is preparation for later when we will start using the cache also
for other directories than the root.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: move the check for nohandlecache into open_shroot
Ronnie Sahlberg [Mon, 8 Mar 2021 23:07:27 +0000 (09:07 +1000)]
cifs: move the check for nohandlecache into open_shroot

instead of doing it in the callsites for open_shroot.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: switch build_path_from_dentry() to using dentry_path_raw()
Al Viro [Sat, 6 Mar 2021 02:53:48 +0000 (21:53 -0500)]
cifs: switch build_path_from_dentry() to using dentry_path_raw()

The cost is that we might need to flip '/' to '\\' in more than
just the prefix.  Needs profiling, but I suspect that we won't
get slowdown on that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: allocate buffer in the caller of build_path_from_dentry()
Al Viro [Fri, 5 Mar 2021 22:36:04 +0000 (17:36 -0500)]
cifs: allocate buffer in the caller of build_path_from_dentry()

build_path_from_dentry() open-codes dentry_path_raw().  The reason
we can't use dentry_path_raw() in there (and postprocess the
result as needed) is that the callers of build_path_from_dentry()
expect that the object to be freed on cleanup and the string to
be used are at the same address.  That's painful, since the path
is naturally built end-to-beginning - we start at the leaf and
go through the ancestors, accumulating the pathname.

Life would be easier if we left the buffer allocation to callers.
It wouldn't be exact-sized buffer, but none of the callers keep
the result for long - it's always freed before the caller returns.
So there's no need to do exact-sized allocation; better use
__getname()/__putname(), same as we do for pathname arguments
of syscalls.  What's more, there's no need to do allocation under
spinlocks, so GFP_ATOMIC is not needed.

Next patch will replace the open-coded dentry_path_raw() (in
build_path_from_dentry_optional_prefix()) with calling the real
thing.  This patch only introduces wrappers for allocating/freeing
the buffers and switches to new calling conventions:
build_path_from_dentry(dentry, buf)
expects buf to be address of a page-sized object or NULL,
return value is a pathname built inside that buffer on success,
ERR_PTR(-ENOMEM) if buf is NULL and ERR_PTR(-ENAMETOOLONG) if
the pathname won't fit into page.  Note that we don't need to
check for failure when allocating the buffer in the caller -
build_path_from_dentry() will do the right thing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: make build_path_from_dentry() return const char *
Al Viro [Thu, 18 Mar 2021 19:47:35 +0000 (15:47 -0400)]
cifs: make build_path_from_dentry() return const char *

... and adjust the callers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: constify pathname arguments in a bunch of helpers
Al Viro [Thu, 18 Mar 2021 19:44:05 +0000 (15:44 -0400)]
cifs: constify pathname arguments in a bunch of helpers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: constify path argument of ->make_node()
Al Viro [Thu, 18 Mar 2021 05:38:53 +0000 (01:38 -0400)]
cifs: constify path argument of ->make_node()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: constify get_normalized_path() properly
Al Viro [Thu, 18 Mar 2021 05:03:34 +0000 (01:03 -0400)]
cifs: constify get_normalized_path() properly

As it is, it takes const char * and, in some cases, stores it in
caller's variable that is plain char *.  Fortunately, none of the
callers actually proceeded to modify the string via now-non-const
alias, but that's trouble waiting to happen.

It's easy to do properly, anyway...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: don't cargo-cult strndup()
Al Viro [Fri, 5 Mar 2021 20:02:34 +0000 (15:02 -0500)]
cifs: don't cargo-cult strndup()

strndup(s, strlen(s)) is a highly unidiomatic way to spell strdup(s);
it's *NOT* safer in any way, since strlen() is just as sensitive to
NUL-termination as strdup() is.

strndup() is for situations when you need a copy of a known-sized
substring, not a magic security juju to drive the bad spirits away.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoSMB3: update structures for new compression protocol definitions
Steve French [Sat, 10 Apr 2021 01:16:41 +0000 (20:16 -0500)]
SMB3: update structures for new compression protocol definitions

Protocol has been extended for additional compression headers.
See MS-SMB2 section 2.2.42

Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: remove old dead code
Aurelien Aptel [Mon, 12 Apr 2021 16:01:43 +0000 (18:01 +0200)]
cifs: remove old dead code

While reviewing a patch clarifying locks and locking hierarchy I
realized some locks were unused.

This commit removes old data and code that isn't actually used
anywhere, or hidden in ifdefs which cannot be enabled from the kernel
config.

* The uid/gid trees and associated locks are left-overs from when
  uid/sid mapping had an extra caching layer on top of the keyring and
  are now unused.
  See commit faa65f07d21e ("cifs: simplify id_to_sid and sid_to_id mapping code")
  from 2012.

* cifs_oplock_break_ops is a left-over from when slow_work was remplaced
  by regular workqueue and is now unused.
  See commit 9b646972467f ("cifs: use workqueue instead of slow-work")
  from 2010.

* CIFSSMBSetAttrLegacy is SMB1 cruft dealing with some legacy
  NT4/Win9x behaviour.

* Remove CONFIG_CIFS_DNOTIFY_EXPERIMENTAL left-overs. This was already
  partially removed in 392e1c5dc9cc ("cifs: rename and clarify CIFS_ASYNC_OP and CIFS_NO_RESP")
  from 2019. Kill it completely.

* Another candidate that was considered but spared is
  CONFIG_CIFS_NFSD_EXPORT which has an empty implementation and cannot
  be enabled by a config option (although it is listed but disabled with
  "BROKEN" as a dep). It's unclear whether this could even function
  today in its current form but it has it's own .c file and Kconfig
  entry which is a bit more involved to remove and might make a come
  back?

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: cifspdu.h: Replace one-element array with flexible-array member
Gustavo A. R. Silva [Fri, 26 Mar 2021 01:11:17 +0000 (20:11 -0500)]
cifs: cifspdu.h: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warning:

  CC [M]  fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSFindNext’:
fs/cifs/cifssmb.c:4636:23: warning: array subscript 1 is above array bounds of ‘char[1]’ [-Warray-bounds]
 4636 |   pSMB->ResumeFileName[name_len+1] = 0;
      |   ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agofs: cifs: Remove repeated struct declaration
Wan Jiabing [Fri, 9 Apr 2021 02:46:39 +0000 (10:46 +0800)]
fs: cifs: Remove repeated struct declaration

struct cifs_writedata is declared twice.
One is declared at 209th line.
And struct cifs_writedata is defined blew.
The declaration hear is not needed. Remove the duplicate.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoDocumentation/admin-guide/cifs: document open_files and dfscache
Aurelien Aptel [Mon, 22 Mar 2021 17:34:37 +0000 (18:34 +0100)]
Documentation/admin-guide/cifs: document open_files and dfscache

Add missing documentation for open_files and dfscache /proc files.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: simplify SWN code with dummy funcs instead of ifdefs
Aurelien Aptel [Fri, 9 Apr 2021 14:31:37 +0000 (16:31 +0200)]
cifs: simplify SWN code with dummy funcs instead of ifdefs

This commit doesn't change the logic of SWN.

Add dummy implementation of SWN functions when SWN is disabled instead
of using ifdef sections.

The dummy functions get optimized out, this leads to clearer code and
compile time type-checking regardless of config options with no
runtime penalty.

Leave the simple ifdefs section as-is.

A single bitfield (bool foo:1) on its own will use up one int. Move
tcon->use_witness out of ifdefs with the other tcon bitfields.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agosmb3: update protocol header definitions based to include new flags
Steve French [Fri, 9 Apr 2021 20:20:24 +0000 (15:20 -0500)]
smb3: update protocol header definitions based to include new flags

[MS-SMB2] protocol specification was recently updated to include
new flags, new negotiate context and some minor changes to fields.
Update smb2pdu.h structure definitions to match the newest version
of the protocol specification.  Updates to the compression context
values will be in a followon patch.

Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: correct comments explaining internal semaphore usage in the module
Steve French [Fri, 9 Apr 2021 19:49:15 +0000 (14:49 -0500)]
cifs: correct comments explaining internal semaphore usage in the module

A few of the semaphores had been removed, and one additional one
needed to be noted in the comments.

Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Remove useless variable
Jiapeng Chong [Thu, 8 Apr 2021 08:31:02 +0000 (16:31 +0800)]
cifs: Remove useless variable

Fix the following gcc warning:

fs/cifs/cifsacl.c:1097:8: warning: variable ‘nmode’ set but not used
[-Wunused-but-set-variable].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agocifs: Fix spelling of 'security'
jack1.li_cp [Sat, 10 Apr 2021 03:00:37 +0000 (22:00 -0500)]
cifs: Fix spelling of 'security'

secuirty -> security

Signed-off-by: jack1.li_cp <liliu1@yulong.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
3 years agoLinux 5.12 v5.12
Linus Torvalds [Sun, 25 Apr 2021 20:49:08 +0000 (13:49 -0700)]
Linux 5.12

3 years agocan: proc: fix rcvlist_* header alignment on 64-bit system
Erik Flodin [Sun, 25 Apr 2021 14:14:35 +0000 (16:14 +0200)]
can: proc: fix rcvlist_* header alignment on 64-bit system

Before this fix, the function and userdata columns weren't aligned:
  device   can_id   can_mask  function  userdata   matches  ident
   vcan0  92345678  9fffffff  0000000000000000  0000000000000000         0  raw
   vcan0     123    00000123  0000000000000000  0000000000000000         0  raw

After the fix they are:
  device   can_id   can_mask      function          userdata       matches  ident
   vcan0  92345678  9fffffff  0000000000000000  0000000000000000         0  raw
   vcan0     123    00000123  0000000000000000  0000000000000000         0  raw

Link: https://lore.kernel.org/r/20210425141440.229653-1-erik@flodin.me
Signed-off-by: Erik Flodin <erik@flodin.me>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agokconfig: refactor .gitignore
Masahiro Yamada [Sat, 24 Apr 2021 13:55:24 +0000 (22:55 +0900)]
kconfig: refactor .gitignore

Add '/' prefix to clarify that the generated files exist right under
scripts/kconfig/, but not in any sub-directory.

Replace '*conf-cfg' with '[gmnq]conf-cfg' to make it explicit, and
still short enough.

Use '[gmnq]conf' to combine gconf, mconf, nconf, and qconf.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
3 years agoMerge tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 25 Apr 2021 16:48:46 +0000 (09:48 -0700)]
Merge tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix potential NULL pointer dereference in the auxtrace option parser

 - Fix access to PID in an array when setting a PID filter in 'perf ftrace'

 - Fix error return code in the 'perf data' tool and in maps__clone(),
   found using a static analysis tool from Huawei

* tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf map: Fix error return code in maps__clone()
  perf ftrace: Fix access to pid in array when setting a pid filter
  perf auxtrace: Fix potential NULL pointer dereference
  perf data: Fix error return code in perf_data__create_dir()

3 years agoMerge tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 25 Apr 2021 16:42:06 +0000 (09:42 -0700)]
Merge tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Borislav Petkov:

 - Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs

 - Fix a panic when initializing perf uncore machinery on Haswell and
   Broadwell servers

* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
  perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3

3 years agoio_uring: io_sq_thread() no longer needs to reset current->pf_io_worker
Stefan Metzmacher [Sat, 24 Apr 2021 23:26:04 +0000 (01:26 +0200)]
io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker

This is done by create_io_thread() now.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agokernel: always initialize task->pf_io_worker to NULL
Stefan Metzmacher [Sat, 24 Apr 2021 23:26:03 +0000 (01:26 +0200)]
kernel: always initialize task->pf_io_worker to NULL

Otherwise io_wq_worker_{running,sleeping}() may dereference an
invalid pointer (in future). Currently all users of create_io_thread()
are fine and get task->pf_io_worker = NULL implicitly from the
wq_manager, which got it either from the userspace thread
of the sq_thread, which explicitly reset it to NULL.

I think it's safer to always reset it in order to avoid future
problems.

Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: update sq_thread_idle after ctx deleted
Hao Xu [Sat, 24 Apr 2021 09:26:20 +0000 (17:26 +0800)]
io_uring: update sq_thread_idle after ctx deleted

we shall update sq_thread_idle anytime we do ctx deletion from ctx_list

Fixes:734551df6f9b ("io_uring: fix shared sqpoll cancellation hangs")

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619256380-236460-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add full-fledged dynamic buffers support
Pavel Begunkov [Sun, 25 Apr 2021 13:32:26 +0000 (14:32 +0100)]
io_uring: add full-fledged dynamic buffers support

Hook buffers into all rsrc infrastructure, including tagging and
updates.

Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: implement fixed buffers registration similar to fixed files
Bijan Mottahedeh [Sun, 25 Apr 2021 13:32:25 +0000 (14:32 +0100)]
io_uring: implement fixed buffers registration similar to fixed files

Apply fixed_rsrc functionality for fixed buffers support.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
[rebase, remove multi-level tables, fix unregister on exit]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17035f4f75319dc92962fce4fc04bc0afb5a68dc.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: prepare fixed rw for dynanic buffers
Pavel Begunkov [Sun, 25 Apr 2021 13:32:24 +0000 (14:32 +0100)]
io_uring: prepare fixed rw for dynanic buffers

With dynamic buffer updates, registered buffers in the table may change
at any moment. First of all we want to prevent future races between
updating and importing (i.e. io_import_fixed()), where the latter one
may happen without uring_lock held, e.g. from io-wq.

Save the first loaded io_mapped_ubuf buffer and reuse.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/21a2302d07766ae956640b6f753292c45200fe8f.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: keep table of pointers to ubufs
Pavel Begunkov [Sun, 25 Apr 2021 13:32:23 +0000 (14:32 +0100)]
io_uring: keep table of pointers to ubufs

Instead of keeping a table of ubufs convert them into pointers to ubuf,
so we can atomically read one pointer and be sure that the content of
ubuf won't change.

Because it was already dynamically allocating imu->bvec, throw both
imu and bvec into a single structure so they can be allocated together.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b96efa4c5febadeccf41d0e849ac099f4c83b0d3.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add generic rsrc update with tags
Pavel Begunkov [Sun, 25 Apr 2021 13:32:22 +0000 (14:32 +0100)]
io_uring: add generic rsrc update with tags

Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc
tags. Implement it for registered files.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add IORING_REGISTER_RSRC
Pavel Begunkov [Sun, 25 Apr 2021 13:32:21 +0000 (14:32 +0100)]
io_uring: add IORING_REGISTER_RSRC

Add a new io_uring_register() opcode for rsrc registeration. Instead of
accepting a pointer to resources, fds or iovecs, it @arg is now pointing
to a struct io_uring_rsrc_register, and the second argument tells how
large that struct is to make it easily extendible by adding new fields.

All that is done mainly to be able to pass in a pointer with tags. Pass
it in and enable CQE posting for file resources. Doesn't support setting
tags on update yet.

A design choice made here is to not post CQEs on rsrc de-registration,
but only when we updated-removed it by rsrc dynamic update.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c498aaec32a4bb277b2406b9069662c02cdda98c.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: enumerate dynamic resources
Pavel Begunkov [Sun, 25 Apr 2021 13:32:20 +0000 (14:32 +0100)]
io_uring: enumerate dynamic resources

As resources are getting more support and common parts, it'll be more
convenient to index resources and use it for indexing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0be63e9310212d5601d36277c2946ff7a040485.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This page took 0.127045 seconds and 4 git commands to generate.