]> Git Repo - linux.git/log
linux.git
3 years agonvme-rdma: fix in-casule data send for chained sgls
Sagi Grimberg [Fri, 28 May 2021 01:16:38 +0000 (18:16 -0700)]
nvme-rdma: fix in-casule data send for chained sgls

We have only 2 inline sg entries and we allow 4 sg entries for the send
wr sge. Larger sgls entries will be chained. However when we build
in-capsule send wr sge, we iterate without taking into account that the
sgl may be chained and still fit in-capsule (which can happen if the sgl
is bigger than 2, but lower-equal to 4).

Fix in-capsule data mapping to correctly iterate chained sgls.

Fixes: 38e1800275d3 ("nvme-rdma: Avoid preallocating big SGL for data")
Reported-by: Walker, Benjamin <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
3 years agoLinux 5.13-rc4 v5.13-rc4
Linus Torvalds [Sun, 30 May 2021 21:58:25 +0000 (11:58 -1000)]
Linux 5.13-rc4

3 years agonet: stmmac: fix kernel panic due to NULL pointer dereference of mdio_bus_data
Sriranjani P [Fri, 28 May 2021 07:10:56 +0000 (12:40 +0530)]
net: stmmac: fix kernel panic due to NULL pointer dereference of mdio_bus_data

Fixed link does not need mdio bus and in that case mdio_bus_data will
not be allocated. Before using mdio_bus_data we should check for NULL.

This patch fix the kernel panic due to NULL pointer dereference of
mdio_bus_data when it is not allocated.

Without this patch we do see following kernel crash caused due to kernel
NULL pointer dereference.

Call trace:
stmmac_dvr_probe+0x3c/0x10b0
dwc_eth_dwmac_probe+0x224/0x378
platform_probe+0x68/0xe0
really_probe+0x130/0x3d8
driver_probe_device+0x68/0xd0
device_driver_attach+0x74/0x80
__driver_attach+0x58/0xf8
bus_for_each_dev+0x7c/0xd8
driver_attach+0x24/0x30
bus_add_driver+0x148/0x1f0
driver_register+0x64/0x120
__platform_driver_register+0x28/0x38
dwc_eth_dwmac_driver_init+0x1c/0x28
do_one_initcall+0x78/0x158
kernel_init_freeable+0x1f0/0x244
kernel_init+0x14/0x118
ret_from_fork+0x10/0x30
Code: f9002bfb 9113e2d9 910e6273 aa0003f7 (f9405c78)
---[ end trace 32d9d41562ddc081 ]---

Fixes: e5e5b771f684 ("net: stmmac: make in-band AN mode parsing is supported for non-DT")
Signed-off-by: Sriranjani P <[email protected]>
Signed-off-by: Pankaj Dubey <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agomt76: mt7921: remove leftover 80+80 HE capability
Felix Fietkau [Fri, 28 May 2021 12:03:04 +0000 (14:03 +0200)]
mt76: mt7921: remove leftover 80+80 HE capability

Fixes interop issues with some APs that disable HE Tx if this is present

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
3 years agomt76: mt7615: do not set MT76_STATE_PM at bootstrap
Lorenzo Bianconi [Sat, 15 May 2021 13:26:12 +0000 (15:26 +0200)]
mt76: mt7615: do not set MT76_STATE_PM at bootstrap

Remove MT76_STATE_PM in mt7615_init_device() and introduce
__mt7663s_mcu_drv_pmctrl for fw loading in mt7663s.
This patch fixes a crash at bootstrap for device (e.g. mt7622) that do
not support runtime-pm

Fixes: 7f2bc8ba11a0 ("mt76: connac: introduce wake counter for fw_pmctrl synchronization")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/e5a2618574007113d844874420f7855891abf167.1621085028.git.lorenzo@kernel.org
3 years agoALSA: hda: Add AlderLake-M PCI ID
Kai Vehmanen [Fri, 28 May 2021 18:51:23 +0000 (21:51 +0300)]
ALSA: hda: Add AlderLake-M PCI ID

Add HD Audio PCI ID for Intel AlderLake-M. Add rules to
snd_intel_dsp_find_config() to choose SOF driver for ADL-M systems with
PCH-DMIC or Soundwire codecs, and legacy driver for the rest.

Signed-off-by: Kai Vehmanen <[email protected]>
Reviewed-by: Péter Ujfalusi <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
3 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sun, 30 May 2021 04:24:00 +0000 (18:24 -1000)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "This is a bit larger than usual at rc4 time. The reason is due to
  Lee's work of fixing newly reported build warnings.

  The rest is fixes as usual"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (22 commits)
  MAINTAINERS: adjust to removing i2c designware platform data
  i2c: s3c2410: fix possible NULL pointer deref on read message after write
  i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
  i2c: i801: Don't generate an interrupt on bus reset
  i2c: mpc: implement erratum A-004447 workaround
  powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
  powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
  dt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag
  i2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name
  i2c: busses: i2c-st: Fix copy/paste function misnaming issues
  i2c: busses: i2c-pnx: Provide descriptions for 'alg_data' data structure
  i2c: busses: i2c-ocores: Place the expected function names into the documentation headers
  i2c: busses: i2c-eg20t: Fix 'bad line' issue and provide description for 'msgs' param
  i2c: busses: i2c-designware-master: Fix misnaming of 'i2c_dw_init_master()'
  i2c: busses: i2c-cadence: Fix incorrectly documented 'enum cdns_i2c_slave_mode'
  i2c: busses: i2c-ali1563: File headers are not good candidates for kernel-doc
  i2c: muxes: i2c-arb-gpio-challenge: Demote non-conformant kernel-doc headers
  i2c: busses: i2c-nomadik: Fix formatting issue pertaining to 'timeout'
  i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
  i2c: I2C_HISI should depend on ACPI
  ...

3 years agoMerge tag 'seccomp-fixes-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 May 2021 04:16:09 +0000 (18:16 -1000)]
Merge tag 'seccomp-fixes-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
 "This fixes a hard-to-hit race condition in the addfd user_notif
  feature of seccomp, visible since v5.9.

  And a small documentation fix"

* tag 'seccomp-fixes-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  seccomp: Refactor notification handler to prepare for new semantics
  Documentation: seccomp: Fix user notification documentation

3 years agoMerge tag 'riscv-for-linus-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 30 May 2021 04:10:10 +0000 (18:10 -1000)]
Merge tag 'riscv-for-linus-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A handful of RISC-V related fixes:

   - avoid errors when the stack tracing code is tracing itself.

   - resurrect the memtest= kernel command line argument on RISC-V,
     which was briefly enabled during the merge window before a
     refactoring disabled it.

   - build fix and some warning cleanups"

* tag 'riscv-for-linus-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: kexec: Fix W=1 build warnings
  riscv: kprobes: Fix build error when MMU=n
  riscv: Select ARCH_USE_MEMTEST
  riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled

3 years agoMerge tag 'xfs-5.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sun, 30 May 2021 03:47:19 +0000 (17:47 -1000)]
Merge tag 'xfs-5.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "This week's pile mitigates some decades-old problems in how extent
  size hints interact with realtime volumes, fixes some failures in
  online shrink, and fixes a problem where directory and symlink
  shrinking on extremely fragmented filesystems could fail.

  The most user-notable change here is to point users at our (new) IRC
  channel on OFTC. Freedom isn't free, it costs folks like you and me;
  and if you don't kowtow, they'll expel everyone and take over your
  channel. (Ok, ok, that didn't fit the song lyrics...)

  Summary:

   - Fix a bug where unmapping operations end earlier than expected,
     which can cause chaos on multi-block directory and symlink shrink
     operations.

   - Fix an erroneous assert that can trigger if we try to transition a
     bmap structure from btree format to extents format with zero
     extents. This was exposed by xfs/538"

* tag 'xfs-5.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: bunmapi has unnecessary AG lock ordering issues
  xfs: btree format inode forks can have zero extents
  xfs: add new IRC channel to MAINTAINERS
  xfs: validate extsz hints against rt extent size when rtinherit is set
  xfs: standardize extent size hint validation
  xfs: check free AG space when making per-AG reservations

3 years agoio_uring: fix misaccounting fix buf pinned pages
Pavel Begunkov [Sat, 29 May 2021 11:01:02 +0000 (12:01 +0100)]
io_uring: fix misaccounting fix buf pinned pages

As Andres reports "... io_sqe_buffer_register() doesn't initialize imu.
io_buffer_account_pin() does imu->acct_pages++, before calling
io_account_mem(ctx, imu->acct_pages).", leading to evevntual -ENOMEM.

Initialise the field.

Reported-by: Andres Freund <[email protected]>
Fixes: 41edf1a5ec967 ("io_uring: keep table of pointers to ubufs")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/438a6f46739ae5e05d9c75a0c8fa235320ff367c.1622285901.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
3 years agoriscv: Use -mno-relax when using lld linker
Khem Raj [Fri, 14 May 2021 21:37:41 +0000 (14:37 -0700)]
riscv: Use -mno-relax when using lld linker

lld does not implement the RISCV relaxation optimizations like GNU ld
therefore disable it when building with lld, Also pass it to
assembler when using external GNU assembler ( LLVM_IAS != 1 ), this
ensures that relevant assembler option is also enabled along. if these
options are not used then we see following relocations in objects

0000000000000000 R_RISCV_ALIGN     *ABS*+0x0000000000000002

These are then rejected by lld
ld.lld: error: capability.c:(.fixup+0x0): relocation R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with -mno-relax but the .o is already compiled with -mno-relax

Signed-off-by: Khem Raj <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
3 years agoseccomp: Refactor notification handler to prepare for new semantics
Sargun Dhillon [Mon, 17 May 2021 19:39:06 +0000 (12:39 -0700)]
seccomp: Refactor notification handler to prepare for new semantics

This refactors the user notification code to have a do / while loop around
the completion condition. This has a small change in semantic, in that
previously we ignored addfd calls upon wakeup if the notification had been
responded to, but instead with the new change we check for an outstanding
addfd calls prior to returning to userspace.

Rodrigo Campos also identified a bug that can result in addfd causing
an early return, when the supervisor didn't actually handle the
syscall [1].

[1]: https://lore.kernel.org/lkml/20210413160151[email protected]/

Fixes: 7cf97b125455 ("seccomp: Introduce addfd ioctl to seccomp user notifier")
Signed-off-by: Sargun Dhillon <[email protected]>
Acked-by: Tycho Andersen <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Tested-by: Rodrigo Campos <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
3 years agoMerge tag 'thermal-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/therma...
Linus Torvalds [Sat, 29 May 2021 16:55:55 +0000 (06:55 -1000)]
Merge tag 'thermal-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal fixes from Daniel Lezcano:

 - Fix uninitialized error code value for the SPMI adc driver (Yang
   Yingliang)

 - Fix kernel doc warning (Yang Li)

 - Fix wrong read-write thermal trip point initialization (Srinivas
   Pandruvada)

* tag 'thermal-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/qcom: Fix error code in adc_tm5_get_dt_channel_data()
  thermal/ti-soc-thermal: Fix kernel-doc
  thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID

3 years agoMerge tag 'char-misc-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 29 May 2021 16:41:50 +0000 (06:41 -1000)]
Merge tag 'char-misc-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some tiny char/misc driver fixes for 5.13-rc4.

  Nothing huge here, just some tiny fixes for reported issues:

   - two interconnect driver fixes

   - kgdb build warning fix for gcc-11

   - hgafb regression fix

   - soundwire driver fix

   - mei driver fix

  All have been in linux-next with no reported issues"

* tag 'char-misc-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mei: request autosuspend after sending rx flow control
  kgdb: fix gcc-11 warnings harder
  video: hgafb: correctly handle card detect failure during probe
  soundwire: qcom: fix handling of qcom,ports-block-pack-mode
  interconnect: qcom: Add missing MODULE_DEVICE_TABLE
  interconnect: qcom: bcm-voter: add a missing of_node_put()

3 years agoMerge tag 'driver-core-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 29 May 2021 16:33:28 +0000 (06:33 -1000)]
Merge tag 'driver-core-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three small driver core / debugfs fixes for 5.13-rc4:

   - debugfs fix for incorrect "lockdown" mode for selinux accesses

   - two device link changes, one bugfix and one cleanup

  All of these have been in linux-next for over a week with no reported
  problems"

* tag 'driver-core-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers: base: Reduce device link removal code duplication
  drivers: base: Fix device link removal
  debugfs: fix security_locked_down() call for SELinux

3 years agoMerge tag 'staging-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 29 May 2021 16:29:13 +0000 (06:29 -1000)]
Merge tag 'staging-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
 "Here are some small IIO and staging driver fixes for reported issues
  for 5.13-rc4.

  Nothing major here, tiny changes for reported problems, full details
  are in the shortlog if people are curious.

  All have been in linux-next for a while with no reported problems"

* tag 'staging-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: adc: ad7793: Add missing error code in ad7793_setup()
  iio: adc: ad7923: Fix undersized rx buffer.
  iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
  iio: dac: ad5770r: Put fwnode in error case during ->probe()
  iio: gyro: fxas21002c: balance runtime power in error path
  staging: emxx_udc: fix loop in _nbu2ss_nuke()
  staging: iio: cdc: ad7746: avoid overwrite of num_channels
  iio: adc: ad7192: handle regulator voltage error first
  iio: adc: ad7192: Avoid disabling a clock that was never enabled.
  iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
  iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.

3 years agoMerge tag 'tty-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 29 May 2021 16:25:16 +0000 (06:25 -1000)]
Merge tag 'tty-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty / serial driver fixes from Greg KH:
 "Here are some small fixes for reported problems for tty and serial
  drivers for 5.13-rc4.

  They consist of:

   - 8250 bugfixes and new device support

   - lockdown security mode fixup

   - syzbot found problems fixed

   - 8250_omap fix for interrupt storm

   - revert of 8250_omap driver fix as it caused worse problem than the
     original issue

  All but the last patch have been in linux-next for a while, the last
  one is a revert of a problem found in linux-next with the 8250_omap
  driver change"

* tag 'tty-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250: 8250_omap: Fix possible interrupt storm"
  serial: 8250_pci: handle FL_NOIRQ board flag
  serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
  serial: 8250_pci: Add support for new HPE serial device
  serial: 8250: 8250_omap: Fix possible interrupt storm
  serial: 8250: Use BIT(x) for UART_{CAP,BUG}_*
  serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
  serial: 8250_dw: Add device HID for new AMD UART controller
  serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
  serial: core: fix suspicious security_locked_down() call
  serial: tegra: Fix a mask operation that is always true

3 years agoMerge tag 'usb-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 29 May 2021 16:11:21 +0000 (06:11 -1000)]
Merge tag 'usb-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB / Thunderbolt fixes from Greg KH:
 "Here are a number of tiny USB and Thunderbolt driver fixes for
  5.13-rc4.

  They consist of:

   - thunderbolt fixes for some NVM bound issues

   - xhci fixes for reported problems

   - control-request fixups

   - documentation build warning fixes

   - new usb-serial driver device ids

   - typec bugfixes for reported issues

   - usbfs warning fixups (could be triggered from userspace)

   - other tiny fixes for reported problems.

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
  xhci: Fix 5.12 regression of missing xHC cache clearing command after a Stall
  xhci: fix giving back URB with incorrect status regression in 5.12
  usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
  usb: typec: tcpm: Respond Not_Supported if no snk_vdo
  usb: typec: tcpm: Properly interrupt VDM AMS
  USB: trancevibrator: fix control-request direction
  usb: Restore the usb_header label
  usb: typec: tcpm: Use LE to CPU conversion when accessing msg->header
  usb: typec: ucsi: Clear pending after acking connector change
  usb: typec: mux: Fix matching with typec_altmode_desc
  misc/uss720: fix memory leak in uss720_probe
  usb: dwc3: gadget: Properly track pending and queued SG
  USB: usbfs: Don't WARN about excessively large memory allocations
  thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
  thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
  usb: chipidea: udc: assign interrupt number to USB gadget structure
  usb: cdnsp: Fix lack of removing request from pending list.
  usb: cdns3: Fix runtime PM imbalance on error
  USB: serial: pl2303: add device id for ADLINK ND-6530 GC
  USB: serial: ti_usb_3410_5052: add startech.com device id
  ...

3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 29 May 2021 16:02:25 +0000 (06:02 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM fixes:

   - Another state update on exit to userspace fix

   - Prevent the creation of mixed 32/64 VMs

   - Fix regression with irqbypass not restarting the guest on failed
     connect

   - Fix regression with debug register decoding resulting in
     overlapping access

   - Commit exception state on exit to usrspace

   - Fix the MMU notifier return values

   - Add missing 'static' qualifiers in the new host stage-2 code

  x86 fixes:

   - fix guest missed wakeup with assigned devices

   - fix WARN reported by syzkaller

   - do not use BIT() in UAPI headers

   - make the kvm_amd.avic parameter bool

  PPC fixes:

   - make halt polling heuristics consistent with other architectures

  selftests:

   - various fixes

   - new performance selftest memslot_perf_test

   - test UFFD minor faults in demand_paging_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
  selftests: kvm: fix overlapping addresses in memslot_perf_test
  KVM: X86: Kill off ctxt->ud
  KVM: X86: Fix warning caused by stale emulation context
  KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
  KVM: x86/mmu: Fix comment mentioning skip_4k
  KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
  KVM: x86: add start_assignment hook to kvm_x86_ops
  KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  selftests: kvm: do only 1 memslot_perf_test run by default
  KVM: X86: Use _BITUL() macro in UAPI headers
  KVM: selftests: add shared hugetlbfs backing source type
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: add shmem backing source type
  KVM: selftests: refactor vm_mem_backing_src_type flags
  KVM: selftests: allow different backing source types
  KVM: selftests: compute correct demand paging size
  KVM: selftests: simplify setup_demand_paging error handling
  KVM: selftests: Print a message if /dev/kvm is missing
  ...

3 years agoMerge tag 's390-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 29 May 2021 15:51:53 +0000 (05:51 -1000)]
Merge tag 's390-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:
 "Fix races in vfio-ccw request handling"

* tag 's390-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  vfio-ccw: Serialize FSM IDLE state with I/O completion
  vfio-ccw: Reset FSM state to IDLE inside FSM
  vfio-ccw: Check initialized flag in cp_init()

3 years agoselftests: kvm: fix overlapping addresses in memslot_perf_test
Paolo Bonzini [Fri, 28 May 2021 19:10:58 +0000 (15:10 -0400)]
selftests: kvm: fix overlapping addresses in memslot_perf_test

vm_create allocates memory and maps it close to GPA.  This memory
is separate from what is allocated in subsequent calls to
vm_userspace_mem_region_add, so it is incorrect to pass the
test memory size to vm_create_default.  Just pass a small
fixed amount of memory which can be used later for page table,
otherwise GPAs are already allocated at MEM_GPA and the
test aborts.

Signed-off-by: Paolo Bonzini <[email protected]>
3 years agox86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
Thomas Gleixner [Tue, 25 May 2021 11:08:41 +0000 (13:08 +0200)]
x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing

PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.

When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.

As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.

Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.

Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.

Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 29 May 2021 00:47:48 +0000 (14:47 -1000)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Ten small fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
  scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq
  scsi: vmw_pvscsi: Set correct residual data length
  scsi: bnx2fc: Return failure if io_req is already in ABTS processing
  scsi: aic7xxx: Remove multiple definition of globals
  scsi: aic7xxx: Restore several defines for aic7xxx firmware build
  scsi: target: iblock: Fix smp_processor_id() BUG messages
  scsi: libsas: Use _safe() loop in sas_resume_port()
  scsi: target: tcmu: Fix xarray RCU warning
  scsi: target: core: Avoid smp_processor_id() in preemptible code

3 years agoMerge tag 'block-5.13-2021-05-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 29 May 2021 00:42:37 +0000 (14:42 -1000)]
Merge tag 'block-5.13-2021-05-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request (Christoph):
      - fix a memory leak in nvme_cdev_add (Guoqing Jiang)
      - fix inline data size comparison in nvmet_tcp_queue_response (Hou
        Pu)
      - fix false keep-alive timeout when a controller is torn down
        (Sagi Grimberg)
      - fix a nvme-tcp Kconfig dependency (Sagi Grimberg)
      - short-circuit reconnect retries for FC (Hannes Reinecke)
      - decode host pathing error for connect (Hannes Reinecke)

 - MD pull request (Song):
      - Fix incorrect chunk boundary assert (Christoph)

 - Fix s390/dasd verification panic (Stefan)

* tag 'block-5.13-2021-05-28' of git://git.kernel.dk/linux-block:
  nvmet: fix false keep-alive timeout when a controller is torn down
  nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
  nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
  md/raid5: remove an incorrect assert in in_chunk_boundary
  s390/dasd: add missing discipline function
  nvme-fabrics: decode host pathing error for connect
  nvme-fc: short-circuit reconnect retries
  nvme: fix potential memory leaks in nvme_cdev_add

3 years agoMerge tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 29 May 2021 00:35:55 +0000 (14:35 -1000)]
Merge tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few minor fixes:

   - Fix an issue with hashed wait removal on exit (Zqiang, Pavel)

   - Fix a recent data race introduced in this series (Marco)"

* tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block:
  io_uring: fix data race to avoid potential NULL-deref
  io-wq: Fix UAF when wakeup wqe in hash waitqueue
  io_uring/io-wq: close io-wq full-stop gap

3 years agoMerge tag 'drm-fixes-2021-05-29' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Sat, 29 May 2021 00:28:58 +0000 (14:28 -1000)]
Merge tag 'drm-fixes-2021-05-29' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Pretty quiet this week, couple of amdgpu, one i915, and a few misc otherwise.

  ttm:
   - prevent irrelevant swapout

  amdgpu:
   - MultiGPU fan fix
   - VCN powergating fixes

  amdkfd:
   - Fix SDMA register offset error

  meson:
   - fix shutdown crash

  i915:
   - Re-enable LTTPR non-transparent LT mode for DPCD_REV < 1.4"

* tag 'drm-fixes-2021-05-29' of git://anongit.freedesktop.org/drm/drm:
  drm/ttm: Skip swapout if ttm object is not populated
  drm/i915: Reenable LTTPR non-transparent LT mode for DPCD_REV<1.4
  drm/meson: fix shutdown crash when component not probed
  drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
  drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
  drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
  drm/amd/pm: correct MGpuFanBoost setting

3 years agoMerge tag 'perf-tools-fixes-for-v5.13-2021-05-28' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sat, 29 May 2021 00:23:05 +0000 (14:23 -1000)]
Merge tag 'perf-tools-fixes-for-v5.13-2021-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix error checking of BPF prog attachment in 'perf stat'.

 - Fix getting maximum number of fds in the vendor events JSON parser.

 - Move debug initialization earlier, fixing a segfault in some cases.

 - Fix eventcode of power10 JSON events.

* tag 'perf-tools-fixes-for-v5.13-2021-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf vendor events powerpc: Fix eventcode of power10 JSON events
  perf stat: Fix error check for bpf_program__attach
  perf debug: Move debug initialization earlier
  perf jevents: Fix getting maximum number of fds

3 years agoMerge tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 29 May 2021 00:15:47 +0000 (14:15 -1000)]
Merge tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three SMB3 fixes.

  Two for stable, and the other fixes a problem pointed out with a
  recently added ioctl"

* tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: change format of CIFS_FULL_KEY_DUMP ioctl
  cifs: fix string declarations and assignments in tracepoints
  cifs: set server->cipher_type to AES-128-CCM for SMB3.0

3 years agoMerge branch 'mptcp-fixes-for-5-13'
Jakub Kicinski [Fri, 28 May 2021 20:51:43 +0000 (13:51 -0700)]
Merge branch 'mptcp-fixes-for-5-13'

Mat Martineau says:

====================
mptcp: Fixes for 5.13

These patches address two issues in MPTCP.

Patch 1 fixes a locking issue affecting MPTCP-level retransmissions.

Patches 2-4 improve handling of out-of-order packet arrival early
in a connection, so it falls back to TCP rather than forcing a
reset. Includes a selftest.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agomptcp: update selftest for fallback due to OoO
Paolo Abeni [Thu, 27 May 2021 23:31:40 +0000 (16:31 -0700)]
mptcp: update selftest for fallback due to OoO

The previous commit noted that we can have fallback
scenario due to OoO (or packet drop). Update the self-tests
accordingly

Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agomptcp: do not reset MP_CAPABLE subflow on mapping errors
Paolo Abeni [Thu, 27 May 2021 23:31:39 +0000 (16:31 -0700)]
mptcp: do not reset MP_CAPABLE subflow on mapping errors

When some mapping related errors occurs we close the main
MPC subflow with a RST. We should instead fallback gracefully
to TCP, and do the reset only for MPJ subflows.

Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192
Reported-by: Matthieu Baerts <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agomptcp: always parse mptcp options for MPC reqsk
Paolo Abeni [Thu, 27 May 2021 23:31:38 +0000 (16:31 -0700)]
mptcp: always parse mptcp options for MPC reqsk

In subflow_syn_recv_sock() we currently skip options parsing
for OoO packet, given that such packets may not carry the relevant
MPC option.

If the peer generates an MPC+data TSO packet and some of the early
segments are lost or get reorder, we server will ignore the peer key,
causing transient, unexpected fallback to TCP.

The solution is always parsing the incoming MPTCP options, and
do the fallback only for in-order packets. This actually cleans
the existing code a bit.

Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option")
Reported-by: Matthieu Baerts <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agomptcp: fix sk_forward_memory corruption on retransmission
Paolo Abeni [Thu, 27 May 2021 23:31:37 +0000 (16:31 -0700)]
mptcp: fix sk_forward_memory corruption on retransmission

MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/172
Reported-by: Matthieu Baerts <[email protected]>
Tested-by: Matthieu Baerts <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agoMerge tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Fri, 28 May 2021 18:53:19 +0000 (08:53 -1000)]
Merge tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
"Stable fixes:
   - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
   - Fix Oops in xs_tcp_send_request() when transport is disconnected
   - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()

  Bugfixes:
   - Fix instances where signal_pending() should be fatal_signal_pending()
   - fix an incorrect limit in filelayout_decode_layout()
   - Fixes for the SUNRPC backlogged RPC queue
   - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
   - Revert commit 586a0787ce35 ("Clean up rpcrdma_prepare_readch()")"

* tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Remove trailing semicolon in macros
  xprtrdma: Revert 586a0787ce35
  NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
  NFS: Clean up reset of the mirror accounting variables
  NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
  NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
  SUNRPC: More fixes for backlog congestion
  SUNRPC: Fix Oops in xs_tcp_send_request() when transport is disconnected
  NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
  SUNRPC in case of backlog, hand free slots directly to waiting task
  pNFS/NFSv4: Remove redundant initialization of 'rd_size'
  NFS: fix an incorrect limit in filelayout_decode_layout()
  fs/nfs: Use fatal_signal_pending instead of signal_pending

3 years agoMerge tag 'sound-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 28 May 2021 18:47:50 +0000 (08:47 -1000)]
Merge tag 'sound-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A slightly high volume at this time due to pending ASoC fixes.

  While there are a few generic simple-card fixes for regressions, most
  of the changes are device-specific fixes: ASoC Intel SOF, codec
  clocks, other codec / platform fixes as well as usual HD-audio and
  USB-audio"

* tag 'sound-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
  ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
  ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
  ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
  ALSA: hda/realtek: Headphone volume is controlled by Front mixer
  ALSA: usb-audio: scarlett2: Improve driver startup messages
  ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
  ALSA: usb-audio: fix control-request direction
  ASoC: qcom: lpass-cpu: Use optional clk APIs
  ASoC: cs35l33: fix an error code in probe()
  ASoC: SOF: Intel: hda: don't send DAI_CONFIG IPC for older firmware
  ASoC: fsl: fix SND_SOC_IMX_RPMSG dependency
  ASoC: cs42l52: Minor tidy up of error paths
  ASoC: cs35l32: Add missing regmap use_single config
  ASoC: cs35l34: Add missing regmap use_single config
  ASoC: cs42l73: Add missing regmap use_single config
  ASoC: cs53l30: Add missing regmap use_single config
  ...

3 years agoMerge tag 'clang-features-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 28 May 2021 18:31:48 +0000 (08:31 -1000)]
Merge tag 'clang-features-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull clang feature fixes from Kees Cook:

 - Correctly pass stack frame size checking under LTO (Nick Desaulniers)

 - Avoid CFI mismatches by checking initcall_t types (Marco Elver)

* tag 'clang-features-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  Makefile: LTO: have linker check -Wframe-larger-than
  init: verify that function is initcall_t at compile-time

3 years agoMerge tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 28 May 2021 18:24:13 +0000 (08:24 -1000)]
Merge tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fix function/preempt trace hangs

 - a few build fixes

* tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
  MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
  MIPS: launch.h: add include guard to prevent build errors
  MIPS: alchemy: xxs1500: add gpio-au1000.h header file

3 years agoMerge tag 'kvmarm-fixes-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Fri, 28 May 2021 17:02:03 +0000 (13:02 -0400)]
Merge tag 'kvmarm-fixes-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.13, take #2

- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs

3 years agoKVM: X86: Kill off ctxt->ud
Wanpeng Li [Fri, 28 May 2021 00:01:37 +0000 (17:01 -0700)]
KVM: X86: Kill off ctxt->ud

ctxt->ud is consumed only by x86_decode_insn(), we can kill it off by
passing emulation_type to x86_decode_insn() and dropping ctxt->ud
altogether. Tracking that info in ctxt for literally one call is silly.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <1622160097[email protected]>

3 years agoKVM: X86: Fix warning caused by stale emulation context
Wanpeng Li [Fri, 28 May 2021 00:01:36 +0000 (17:01 -0700)]
KVM: X86: Fix warning caused by stale emulation context

Reported by syzkaller:

  WARNING: CPU: 7 PID: 10526 at linux/arch/x86/kvm//x86.c:7621 x86_emulate_instruction+0x41b/0x510 [kvm]
  RIP: 0010:x86_emulate_instruction+0x41b/0x510 [kvm]
  Call Trace:
   kvm_mmu_page_fault+0x126/0x8f0 [kvm]
   vmx_handle_exit+0x11e/0x680 [kvm_intel]
   vcpu_enter_guest+0xd95/0x1b40 [kvm]
   kvm_arch_vcpu_ioctl_run+0x377/0x6a0 [kvm]
   kvm_vcpu_ioctl+0x389/0x630 [kvm]
   __x64_sys_ioctl+0x8e/0xd0
   do_syscall_64+0x3c/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Commit 4a1e10d5b5d8 ("KVM: x86: handle hardware breakpoints during emulation())
adds hardware breakpoints check before emulation the instruction and parts of
emulation context initialization, actually we don't have the EMULTYPE_NO_DECODE flag
here and the emulation context will not be reused. Commit c8848cee74ff ("KVM: x86:
set ctxt->have_exception in x86_decode_insn()) triggers the warning because it
catches the stale emulation context has #UD, however, it is not during instruction
decoding which should result in EMULATION_FAILED. This patch fixes it by moving
the second part emulation context initialization into init_emulate_ctxt() and
before hardware breakpoints check. The ctxt->ud will be dropped by a follow-up
patch.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134683fdd00000

Reported-by: [email protected]
Fixes: 4a1e10d5b5d8 (KVM: x86: handle hardware breakpoints during emulation)
Signed-off-by: Wanpeng Li <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <1622160097[email protected]>

3 years agoKVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
Yuan Yao [Wed, 26 May 2021 06:38:28 +0000 (14:38 +0800)]
KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception

The kvm_get_linear_rip() handles x86/long mode cases well and has
better readability, __kvm_set_rflags() also use the paired
function kvm_is_linear_rip() to check the vcpu->arch.singlestep_rip
set in kvm_arch_vcpu_ioctl_set_guest_debug(), so change the
"CS.BASE + RIP" code in kvm_arch_vcpu_ioctl_set_guest_debug() and
handle_exception_nmi() to this one.

Signed-off-by: Yuan Yao <[email protected]>
Message-Id: <20210526063828[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
3 years agoDocumentation: seccomp: Fix user notification documentation
Sargun Dhillon [Mon, 17 May 2021 19:39:05 +0000 (12:39 -0700)]
Documentation: seccomp: Fix user notification documentation

The documentation had some previously incorrect information about how
userspace notifications (and responses) were handled due to a change
from a previously proposed patchset.

Signed-off-by: Sargun Dhillon <[email protected]>
Acked-by: Tycho Andersen <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
3 years agoMAINTAINERS: adjust to removing i2c designware platform data
Lukas Bulwahn [Mon, 19 Apr 2021 06:18:09 +0000 (08:18 +0200)]
MAINTAINERS: adjust to removing i2c designware platform data

Commit 5a517b5bf687 ("i2c: designware: Get rid of legacy platform data")
removes ./include/linux/platform_data/i2c-designware.h, but misses to
adjust the SYNOPSYS DESIGNWARE I2C DRIVER section in MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:

  warning: no file matches F: include/linux/platform_data/i2c-designware.h

Remove the file entry to this removed file as well.

Signed-off-by: Lukas Bulwahn <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoKVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Nicholas Piggin [Wed, 26 May 2021 12:58:51 +0000 (22:58 +1000)]
KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path

Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.

Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv

Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: [email protected] # v3.14+
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
3 years agopowerpc: Fix reverse map real-mode address lookup with huge vmalloc
Nicholas Piggin [Wed, 26 May 2021 12:00:05 +0000 (22:00 +1000)]
powerpc: Fix reverse map real-mode address lookup with huge vmalloc

real_vmalloc_addr() does not currently work for huge vmalloc, which is
what the reverse map can be allocated with for radix host, hash guest.

Extract the hugepage aware equivalent from eeh code into a helper, and
convert existing sites including this one to use it.

Fixes: 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
3 years agoperf vendor events powerpc: Fix eventcode of power10 JSON events
Kajol Jain [Tue, 25 May 2021 06:37:23 +0000 (12:07 +0530)]
perf vendor events powerpc: Fix eventcode of power10 JSON events

Fixed the eventcode values in the power10 JSON event files to prepend
"0x" since these are hexadecimal values.

The patch also changes the event description of the PM_EXEC_STALL_LOAD_FINISH
and PM_EXEC_STALL_NTC_FLUSH event and move some events to correct files.

Fixes: 32daa5d7899e ("perf vendor events: Initial JSON/events list for power10 platform")
Signed-off-by: Kajol Jain <[email protected]>
Reviewed-by: Paul A. Clarke <[email protected]>
Tested-by: Nageswara R Sastry <[email protected]>
Cc: Athira Jajeev <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
3 years agopowerpc/kprobes: Fix validation of prefixed instructions across page boundary
Naveen N. Rao [Wed, 19 May 2021 10:47:17 +0000 (16:17 +0530)]
powerpc/kprobes: Fix validation of prefixed instructions across page boundary

When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page.

Fix this by skipping the check for all instructions at the beginning of
a page. Prefixed instructions cannot cross a 64-byte boundary and as
such, we don't expect to encounter a suffix as the very first word in a
page for kernel text. Even if there are prefixed instructions crossing
a page boundary (from a module, for instance), the instruction will be
illegal, so preventing probing on the suffix of such prefix instructions
isn't worthwhile.

Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Cc: [email protected] # v5.8+
Reported-by: Christophe Leroy <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
3 years agoRevert "serial: 8250: 8250_omap: Fix possible interrupt storm"
Greg Kroah-Hartman [Fri, 28 May 2021 08:58:49 +0000 (10:58 +0200)]
Revert "serial: 8250: 8250_omap: Fix possible interrupt storm"

This reverts commit 31fae7c8b18c3f8029a2a5dce97a3182c1a167a0.

Tony writes:
I just noticed this causes the following regression in Linux
next when pressing a key on uart console after boot at least on
omap3. This seems to happen on serial_port_in(port, UART_RX) in
the quirk handling.

So let's drop this.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Fixes: 31fae7c8b18c ("serial: 8250: 8250_omap: Fix possible interrupt storm")
Reported-by: Tony Lindgren <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: Vignesh Raghavendra <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
3 years agoi2c: s3c2410: fix possible NULL pointer deref on read message after write
Krzysztof Kozlowski [Wed, 26 May 2021 12:39:37 +0000 (08:39 -0400)]
i2c: s3c2410: fix possible NULL pointer deref on read message after write

Interrupt handler processes multiple message write requests one after
another, till the driver message queue is drained.  However if driver
encounters a read message without preceding START, it stops the I2C
transfer as it is an invalid condition for the controller.  At least the
comment describes a requirement "the controller forces us to send a new
START when we change direction".  This stop results in clearing the
message queue (i2c->msg = NULL).

The code however immediately jumped back to label "retry_write" which
dereferenced the "i2c->msg" making it a possible NULL pointer
dereference.

The Coverity analysis:
1. Condition !is_msgend(i2c), taking false branch.
   if (!is_msgend(i2c)) {

2. Condition !is_lastmsg(i2c), taking true branch.
   } else if (!is_lastmsg(i2c)) {

3. Condition i2c->msg->flags & 1, taking true branch.
   if (i2c->msg->flags & I2C_M_RD) {

4. write_zero_model: Passing i2c to s3c24xx_i2c_stop, which sets i2c->msg to NULL.
   s3c24xx_i2c_stop(i2c, -EINVAL);

5. Jumping to label retry_write.
   goto retry_write;

6. var_deref_model: Passing i2c to is_msgend, which dereferences null i2c->msg.
   if (!is_msgend(i2c)) {"

All previous calls to s3c24xx_i2c_stop() in this interrupt service
routine are followed by jumping to end of function (acknowledging
the interrupt and returning).  This seems a reasonable choice also here
since message buffer was entirely emptied.

Addresses-Coverity: Explicit null dereferenced
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
Qii Wang [Thu, 27 May 2021 12:04:04 +0000 (20:04 +0800)]
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset

The i2c controller driver do dma reset after transfer timeout,
but sometimes dma reset will trigger an unexpected DMA_ERR irq.
It will cause the i2c controller to continuously send interrupts
to the system and cause soft lock-up. So we need to disable i2c
start_en and clear intr_stat to stop i2c controller before dma
reset when transfer timeout.

Fixes: aafced673c06("i2c: mediatek: move dma reset before i2c reset")
Signed-off-by: Qii Wang <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoMerge tag 'drm-intel-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-intel...
Dave Airlie [Fri, 28 May 2021 03:28:18 +0000 (13:28 +1000)]
Merge tag 'drm-intel-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.13-rc4:
- Re-enable LTTPR non-transparent LT mode for DPCD_REV<1.4

Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
3 years agoMerge tag 'drm-misc-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-misc...
Dave Airlie [Fri, 28 May 2021 03:24:25 +0000 (13:24 +1000)]
Merge tag 'drm-misc-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes

A fix in meson for a crash at shutdown and one for TTM to prevent
irrelevant swapout

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20210527120828.3w7f53krzkslc4ii@gilmour
3 years agoperf stat: Fix error check for bpf_program__attach
Namhyung Kim [Thu, 27 May 2021 22:00:52 +0000 (15:00 -0700)]
perf stat: Fix error check for bpf_program__attach

It seems the bpf_program__attach() returns a negative error code instead
of a NULL pointer in case of error.

Fixes: 7fac83aaf2ee ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
3 years agoMerge tag 'amd-drm-fixes-5.13-2021-05-26' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 27 May 2021 23:18:04 +0000 (09:18 +1000)]
Merge tag 'amd-drm-fixes-5.13-2021-05-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.13-2021-05-26:

amdgpu:
- MultiGPU fan fix
- VCN powergating fixes

amdkfd:
- Fix SDMA register offset error

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Jakub Kicinski [Thu, 27 May 2021 22:54:11 +0000 (15:54 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix incorrect sockopts unregistration from error path,
   from Florian Westphal.

2) A few patches to provide better error reporting when missing kernel
   netfilter options are missing in .config.

3) Fix dormant table flag updates.

4) Memleak in IPVS  when adding service with IP_VS_SVC_F_HASHED flag.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
  netfilter: nf_tables: fix table flag updates
  netfilter: nf_tables: extended netlink error reporting for chain type
  netfilter: nf_tables: missing error reporting for not selected expressions
  netfilter: conntrack: unregister ipv4 sockopts on error unwind
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agoMerge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis...
Linus Torvalds [Thu, 27 May 2021 22:01:26 +0000 (12:01 -1000)]
Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu

Pull percpu fixes from Dennis Zhou:
 "This contains a cleanup to lib/percpu-refcount.c and an update to the
  MAINTAINERS file to more formally take over support for lib/percpu*"

* 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  MAINTAINERS: Add lib/percpu* as part of percpu entry
  percpu_ref: Don't opencode percpu_ref_is_dying

3 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Thu, 27 May 2021 21:58:26 +0000 (11:58 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Don't use contiguous or block mappings for the linear map when KFENCE
   is enabled.

 - Fix link in the arch_counter_enforce_ordering() comment.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: don't use CON and BLK mapping if KFENCE is enabled
  arm64: Fix stale link in the arch_counter_enforce_ordering() comment

3 years agoMerge tag 'for-5.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 27 May 2021 21:54:36 +0000 (11:54 -1000)]
Merge tag 'for-5.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM verity target's 'require_signatures' module_param permissions.

 - Revert DM snapshot fix from v5.13-rc3 and then properly fix crash
   when an origin has no snapshots. This allows only the proper fix to
   go to stable@ (since the original fix was successfully dropped).

* tag 'for-5.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm snapshot: properly fix a crash when an origin has no snapshots
  dm snapshot: revert "fix a crash when an origin has no snapshots"
  dm verity: fix require_signatures module_param permissions

3 years agonet/sched: act_ct: Fix ct template allocation for zone 0
Ariel Levkovich [Wed, 26 May 2021 17:01:10 +0000 (20:01 +0300)]
net/sched: act_ct: Fix ct template allocation for zone 0

Fix current behavior of skipping template allocation in case the
ct action is in zone 0.

Skipping the allocation may cause the datapath ct code to ignore the
entire ct action with all its attributes (commit, nat) in case the ct
action in zone 0 was preceded by a ct clear action.

The ct clear action sets the ct_state to untracked and resets the
skb->_nfct pointer. Under these conditions and without an allocated
ct template, the skb->_nfct pointer will remain NULL which will
cause the tc ct action handler to exit without handling commit and nat
actions, if such exist.

For example, the following rule in OVS dp:
recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \
in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \
recirc(0x37a)

Will result in act_ct skipping the commit and nat actions in zone 0.

The change removes the skipping of template allocation for zone 0 and
treats it the same as any other zone.

Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
Signed-off-by: Ariel Levkovich <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agonet/sched: act_ct: Offload connections with commit action
Paul Blakey [Wed, 26 May 2021 11:44:09 +0000 (14:44 +0300)]
net/sched: act_ct: Offload connections with commit action

Currently established connections are not offloaded if the filter has a
"ct commit" action. This behavior will not offload connections of the
following scenario:

$ tc_filter add dev $DEV ingress protocol ip prio 1 flower \
  ct_state -trk \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \
  action mirred egress redirect dev $DEV2

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \
  ct_state +trk+est \
  action mirred egress redirect dev $DEV

Offload established connections, regardless of the commit flag.

Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows")
Reviewed-by: Oz Shlomo <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agodevlink: Correct VIRTUAL port to not have phys_port attributes
Parav Pandit [Wed, 26 May 2021 20:00:27 +0000 (23:00 +0300)]
devlink: Correct VIRTUAL port to not have phys_port attributes

Physical port name, port number attributes do not belong to virtual port
flavour. When VF or SF virtual ports are registered they incorrectly
append "np0" string in the netdevice name of the VF/SF.

Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF
0 and 1 respectively.

After the fix, they are ens2f0v0, ens2f0v1.

With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns
-EOPNOTSUPP.

Also devlink port show example for 2 VFs on one PF to ensure that any
physical port attributes are not exposed.

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false
pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false

This change introduces a netdevice name change on systemd/udev
version 245 and higher which honors phys_port_name sysfs file for
generation of netdevice name.

This also aligns to phys_port_name usage which is limited to switchdev
ports as described in [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/networking/switchdev.rst

Fixes: acf1ee44ca5d ("devlink: Introduce devlink port flavour virtual")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
3 years agobtrfs: fix deadlock when cloning inline extents and low on available space
Filipe Manana [Tue, 25 May 2021 10:05:28 +0000 (11:05 +0100)]
btrfs: fix deadlock when cloning inline extents and low on available space

There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a7621ce66c ("Btrfs: implement full reflink support for inline extents")
CC: [email protected] # 5.10+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: fix fsync failure and transaction abort after writes to prealloc extents
Filipe Manana [Mon, 24 May 2021 10:35:53 +0000 (11:35 +0100)]
btrfs: fix fsync failure and transaction abort after writes to prealloc extents

When doing a series of partial writes to different ranges of preallocated
extents with transaction commits and fsyncs in between, we can end up with
a checksum items in a log tree. This causes an fsync to fail with -EIO and
abort the transaction, turning the filesystem to RO mode, when syncing the
log.

For this to happen, we need to have a full fsync of a file following one
or more fast fsyncs.

The following example reproduces the problem and explains how it happens:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with 2 preallocated extents. Leave a 1M hole
  # between them to ensure that we get two file extent items that will
  # never be merged into a single one. The extents are contiguous on disk,
  # which will later result in the checksums for their data to be merged
  # into a single checksum item in the csums btree.
  #
  $ xfs_io -f \
           -c "falloc 0 1M" \
           -c "falloc 3M 3M" \
           /mnt/foobar

  # Now write to the second extent and leave only 1M of it as unwritten,
  # which corresponds to the file range [4M, 5M[.
  #
  # Then fsync the file to flush delalloc and to clear full sync flag from
  # the inode, so that a future fsync will use the fast code path.
  #
  # After the writeback triggered by the fsync we have 3 file extent items
  # that point to the second extent we previously allocated:
  #
  # 1) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [3M, 4M[
  #
  # 2) One file extent item of type BTRFS_FILE_EXTENT_PREALLOC that covers
  #    the file range [4M, 5M[
  #
  # 3) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [5M, 6M[
  #
  # All these file extent items have a generation of 6, which is the ID of
  # the transaction where they were created. The split of the original file
  # extent item is done at btrfs_mark_extent_written() when ordered extents
  # complete for the file ranges [3M, 4M[ and [5M, 6M[.
  #
  $ xfs_io -c "pwrite -S 0xab 3M 1M" \
           -c "pwrite -S 0xef 5M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Commit the current transaction. This wipes out the log tree created by
  # the previous fsync.
  sync

  # Now write to the unwritten range of the second extent we allocated,
  # corresponding to the file range [4M, 5M[, and fsync the file, which
  # triggers the fast fsync code path.
  #
  # The fast fsync code path sees that there is a new extent map covering
  # the file range [4M, 5M[ and therefore it will log a checksum item
  # covering the range [1M, 2M[ of the second extent we allocated.
  #
  # Also, after the fsync finishes we no longer have the 3 file extent
  # items that pointed to 3 sections of the second extent we allocated.
  # Instead we end up with a single file extent item pointing to the whole
  # extent, with a type of BTRFS_FILE_EXTENT_REG and a generation of 7 (the
  # current transaction ID). This is due to the file extent item merging we
  # do when completing ordered extents into ranges that point to unwritten
  # (preallocated) extents. This merging is done at
  # btrfs_mark_extent_written().
  #
  $ xfs_io -c "pwrite -S 0xcd 4M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Now do some write to our file outside the range of the second extent
  # that we allocated with fallocate() and truncate the file size from 6M
  # down to 5M.
  #
  # The truncate operation sets the full sync runtime flag on the inode,
  # forcing the next fsync to use the slow code path. It also changes the
  # length of the second file extent item so that it represents the file
  # range [3M, 5M[ and not the range [3M, 6M[ anymore.
  #
  # Finally fsync the file. Since this is a fsync that triggers the slow
  # code path, it will remove all items associated to the inode from the
  # log tree and then it will scan for file extent items in the
  # fs/subvolume tree that have a generation matching the current
  # transaction ID, which is 7. This means it will log 2 file extent
  # items:
  #
  # 1) One for the first extent we allocated, covering the file range
  #    [0, 1M[
  #
  # 2) Another for the first 2M of the second extent we allocated,
  #    covering the file range [3M, 5M[
  #
  # When logging the first file extent item we log a single checksum item
  # that has all the checksums for the entire extent.
  #
  # When logging the second file extent item, we also lookup for the
  # checksums that are associated with the range [0, 2M[ of the second
  # extent we allocated (file range [3M, 5M[), and then we log them with
  # btrfs_csum_file_blocks(). However that results in ending up with a log
  # that has two checksum items with ranges that overlap:
  #
  # 1) One for the range [1M, 2M[ of the second extent we allocated,
  #    corresponding to the file range [4M, 5M[, which we logged in the
  #    previous fsync that used the fast code path;
  #
  # 2) One for the ranges [0, 1M[ and [0, 2M[ of the first and second
  #    extents, respectively, corresponding to the files ranges [0, 1M[
  #    and [3M, 5M[. This one was added during this last fsync that uses
  #    the slow code path and overlaps with the previous one logged by
  #    the previous fast fsync.
  #
  # This happens because when logging the checksums for the second
  # extent, we notice they start at an offset that matches the end of the
  # checksums item that we logged for the first extent, and because both
  # extents are contiguous on disk, btrfs_csum_file_blocks() decides to
  # extend that existing checksums item and append the checksums for the
  # second extent to this item. The end result is we end up with two
  # checksum items in the log tree that have overlapping ranges, as
  # listed before, resulting in the fsync to fail with -EIO and aborting
  # the transaction, turning the filesystem into RO mode.
  #
  $ xfs_io -c "pwrite -S 0xff 0 1M" \
           -c "truncate 5M" \
           -c "fsync" \
           /mnt/foobar
  fsync: Input/output error

After running the example, dmesg/syslog shows the tree checker complained
about the checksum items with overlapping ranges and we aborted the
transaction:

  $ dmesg
  (...)
  [756289.557487] BTRFS critical (device sdc): corrupt leaf: root=18446744073709551610 block=30720000 slot=5, csum end range (16777216) goes beyond the start range (15728640) of the next csum item
  [756289.560583] BTRFS info (device sdc): leaf 30720000 gen 7 total ptrs 7 free space 11677 owner 18446744073709551610
  [756289.562435] BTRFS info (device sdc): refs 2 lock_owner 0 current 2303929
  [756289.563654]  item 0 key (257 1 0) itemoff 16123 itemsize 160
  [756289.564649]  inode generation 6 size 5242880 mode 100600
  [756289.565636]  item 1 key (257 12 256) itemoff 16107 itemsize 16
  [756289.566694]  item 2 key (257 108 0) itemoff 16054 itemsize 53
  [756289.567725]  extent data disk bytenr 13631488 nr 1048576
  [756289.568697]  extent data offset 0 nr 1048576 ram 1048576
  [756289.569689]  item 3 key (257 108 1048576) itemoff 16001 itemsize 53
  [756289.570682]  extent data disk bytenr 0 nr 0
  [756289.571363]  extent data offset 0 nr 2097152 ram 2097152
  [756289.572213]  item 4 key (257 108 3145728) itemoff 15948 itemsize 53
  [756289.573246]  extent data disk bytenr 14680064 nr 3145728
  [756289.574121]  extent data offset 0 nr 2097152 ram 3145728
  [756289.574993]  item 5 key (18446744073709551606 128 13631488) itemoff 12876 itemsize 3072
  [756289.576113]  item 6 key (18446744073709551606 128 15728640) itemoff 11852 itemsize 1024
  [756289.577286] BTRFS error (device sdc): block=30720000 write time tree block corruption detected
  [756289.578644] ------------[ cut here ]------------
  [756289.579376] WARNING: CPU: 0 PID: 2303929 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.580857] Modules linked in: btrfs dm_zero dm_dust loop dm_snapshot (...)
  [756289.591534] CPU: 0 PID: 2303929 Comm: xfs_io Tainted: G        W         5.12.0-rc8-btrfs-next-87 #1
  [756289.592580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [756289.594161] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.595122] Code: 5d c3 e8 76 60 (...)
  [756289.597509] RSP: 0018:ffffb51b416cb898 EFLAGS: 00010282
  [756289.598142] RAX: 0000000000000000 RBX: fffff02b8a365bc0 RCX: 0000000000000000
  [756289.598970] RDX: 0000000000000000 RSI: ffffffffa9112421 RDI: 00000000ffffffff
  [756289.599798] RBP: ffffa06500880000 R08: 0000000000000000 R09: 0000000000000000
  [756289.600619] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  [756289.601456] R13: ffffa0652b1d8980 R14: ffffa06500880000 R15: 0000000000000000
  [756289.602278] FS:  00007f08b23c9800(0000) GS:ffffa0682be00000(0000) knlGS:0000000000000000
  [756289.603217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [756289.603892] CR2: 00005652f32d0138 CR3: 000000025d616003 CR4: 0000000000370ef0
  [756289.604725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [756289.605563] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [756289.606400] Call Trace:
  [756289.606704]  btree_csum_one_bio+0x244/0x2b0 [btrfs]
  [756289.607313]  btrfs_submit_metadata_bio+0xb7/0x100 [btrfs]
  [756289.608040]  submit_one_bio+0x61/0x70 [btrfs]
  [756289.608587]  btree_write_cache_pages+0x587/0x610 [btrfs]
  [756289.609258]  ? free_debug_processing+0x1d5/0x240
  [756289.609812]  ? __module_address+0x28/0xf0
  [756289.610298]  ? lock_acquire+0x1a0/0x3e0
  [756289.610754]  ? lock_acquired+0x19f/0x430
  [756289.611220]  ? lock_acquire+0x1a0/0x3e0
  [756289.611675]  do_writepages+0x43/0xf0
  [756289.612101]  ? __filemap_fdatawrite_range+0xa4/0x100
  [756289.612800]  __filemap_fdatawrite_range+0xc5/0x100
  [756289.613393]  btrfs_write_marked_extents+0x68/0x160 [btrfs]
  [756289.614085]  btrfs_sync_log+0x21c/0xf20 [btrfs]
  [756289.614661]  ? finish_wait+0x90/0x90
  [756289.615096]  ? __mutex_unlock_slowpath+0x45/0x2a0
  [756289.615661]  ? btrfs_log_inode_parent+0x3c9/0xdc0 [btrfs]
  [756289.616338]  ? lock_acquire+0x1a0/0x3e0
  [756289.616801]  ? lock_acquired+0x19f/0x430
  [756289.617284]  ? lock_acquire+0x1a0/0x3e0
  [756289.617750]  ? lock_release+0x214/0x470
  [756289.618221]  ? lock_acquired+0x19f/0x430
  [756289.618704]  ? dput+0x20/0x4a0
  [756289.619079]  ? dput+0x20/0x4a0
  [756289.619452]  ? lockref_put_or_lock+0x9/0x30
  [756289.619969]  ? lock_release+0x214/0x470
  [756289.620445]  ? lock_release+0x214/0x470
  [756289.620924]  ? lock_release+0x214/0x470
  [756289.621415]  btrfs_sync_file+0x46a/0x5b0 [btrfs]
  [756289.621982]  do_fsync+0x38/0x70
  [756289.622395]  __x64_sys_fsync+0x10/0x20
  [756289.622907]  do_syscall_64+0x33/0x80
  [756289.623438]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [756289.624063] RIP: 0033:0x7f08b27fbb7b
  [756289.624588] Code: 0f 05 48 3d 00 (...)
  [756289.626760] RSP: 002b:00007ffe2583f940 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
  [756289.627639] RAX: ffffffffffffffda RBX: 00005652f32cd0f0 RCX: 00007f08b27fbb7b
  [756289.628464] RDX: 00005652f32cbca0 RSI: 00005652f32cd110 RDI: 0000000000000003
  [756289.629323] RBP: 00005652f32cd110 R08: 0000000000000000 R09: 00007f08b28c4be0
  [756289.630172] R10: fffffffffffff39a R11: 0000000000000293 R12: 0000000000000001
  [756289.631007] R13: 00005652f32cd0f0 R14: 0000000000000001 R15: 00005652f32cc480
  [756289.631819] irq event stamp: 0
  [756289.632188] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [756289.632911] hardirqs last disabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.633893] softirqs last  enabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.634871] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [756289.635606] ---[ end trace 0a039fdc16ff3fef ]---
  [756289.636179] BTRFS: error (device sdc) in btrfs_sync_log:3136: errno=-5 IO failure
  [756289.637082] BTRFS info (device sdc): forced readonly

Having checksum items covering ranges that overlap is dangerous as in some
cases it can lead to having extent ranges for which we miss checksums
after log replay or getting the wrong checksum item. There were some fixes
in the past for bugs that resulted in this problem, and were explained and
fixed by the following commits:

  27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and outdated checksums")
  b84b8390d6009c ("Btrfs: fix file read corruption after extent cloning and fsync")
  40e046acbd2f36 ("Btrfs: fix missing data checksums after replaying a log tree")
  e289f03ea79bbc ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents")

Fix the issue by making btrfs_csum_file_blocks() taking into account the
start offset of the next checksum item when it decides to extend an
existing checksum item, so that it never extends the checksum to end at a
range that goes beyond the start range of the next checksum item.

When we can not access the next checksum item without releasing the path,
simply drop the optimization of extending the previous checksum item and
fallback to inserting a new checksum item - this happens rarely and the
optimization is not significant enough for a log tree in order to justify
the extra complexity, as it would only save a few bytes (the size of a
struct btrfs_item) of leaf space.

This behaviour is only needed when inserting into a log tree because
for the regular checksums tree we never have a case where we try to
insert a range of checksums that overlap with a range that was previously
inserted.

A test case for fstests will follow soon.

Reported-by: Philipp Fent <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 5.4+
Tested-by: Anand Jain <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: abort in rename_exchange if we fail to insert the second ref
Josef Bacik [Wed, 19 May 2021 18:04:21 +0000 (14:04 -0400)]
btrfs: abort in rename_exchange if we fail to insert the second ref

Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: [email protected] # 4.9+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: check error value from btrfs_update_inode in tree log
Josef Bacik [Wed, 19 May 2021 15:26:25 +0000 (11:26 -0400)]
btrfs: check error value from btrfs_update_inode in tree log

Error injection testing uncovered a case where we ended up with invalid
link counts on an inode.  This happened because we failed to notice an
error when updating the inode while replaying the tree log, and
committed the transaction with an invalid file system.

Fix this by checking the return value of btrfs_update_inode.  This
resolved the link count errors I was seeing, and we already properly
handle passing up the error values in these paths.

CC: [email protected] # 4.4+
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: fixup error handling in fixup_inode_link_counts
Josef Bacik [Wed, 19 May 2021 17:13:15 +0000 (13:13 -0400)]
btrfs: fixup error handling in fixup_inode_link_counts

This function has the following pattern

while (1) {
ret = whatever();
if (ret)
goto out;
}
ret = 0
out:
return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: [email protected] # 4.4+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: mark ordered extent and inode with error if we fail to finish
Josef Bacik [Wed, 19 May 2021 13:38:27 +0000 (09:38 -0400)]
btrfs: mark ordered extent and inode with error if we fail to finish

While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.

It turns out the abort came from finish ordered io while trying to write
out the free space cache.  It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.

Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.

With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.

CC: [email protected] # 4.19+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: return errors from btrfs_del_csums in cleanup_ref_head
Josef Bacik [Wed, 19 May 2021 14:52:46 +0000 (10:52 -0400)]
btrfs: return errors from btrfs_del_csums in cleanup_ref_head

We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail.  We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.

Reviewed-by: Qu Wenruo <[email protected]>
CC: [email protected] # 4.19+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: fix error handling in btrfs_del_csums
Josef Bacik [Wed, 19 May 2021 14:52:45 +0000 (10:52 -0400)]
btrfs: fix error handling in btrfs_del_csums

Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent.  This occurred because the pattern
in btrfs_del_csums was

while (1) {
ret = btrfs_search_slot();
if (ret < 0)
break;
}
ret = 0;
out:
btrfs_free_path(path);
return ret;

If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out.  Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the

ret = 0;
out:

pattern and simply allow break to have the proper error reporting.  With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.

Reviewed-by: Qu Wenruo <[email protected]>
CC: [email protected] # 4.4+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agobtrfs: fix compressed writes that cross stripe boundary
Qu Wenruo [Tue, 25 May 2021 05:52:43 +0000 (13:52 +0800)]
btrfs: fix compressed writes that cross stripe boundary

[BUG]
When running btrfs/027 with "-o compress" mount option, it always
crashes with the following call trace:

  BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/volumes.c:6651!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G           OE     5.13.0-rc2-custom+ #26
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs]
  Call Trace:
   btrfs_submit_compressed_write+0x2d7/0x470 [btrfs]
   submit_compressed_extents+0x3b0/0x470 [btrfs]
   ? mark_held_locks+0x49/0x70
   btrfs_work_helper+0x131/0x3e0 [btrfs]
   process_one_work+0x28f/0x5d0
   worker_thread+0x55/0x3c0
   ? process_one_work+0x5d0/0x5d0
   kthread+0x141/0x160
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  ---[ end trace 63113a3a91f34e68 ]---

[CAUSE]
The critical message before the crash means we have a bio at logical
bytenr 298901504 length 12288, but only 8192 bytes can fit into one
stripe, the remaining 4096 bytes go to another stripe.

In btrfs, all bios are properly split to avoid cross stripe boundary,
but commit 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes")
changed the behavior for compressed writes.

Previously if we find our new page can't be fitted into current stripe,
ie. "submit == 1" case, we submit current bio without adding current
page.

       submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);

   page->mapping = NULL;
   if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
       PAGE_SIZE) {

But after the modification, we will add the page no matter if it crosses
stripe boundary, leading to the above crash.

       submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);

   if (pg_index == 0 && use_append)
           len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
   else
           len = bio_add_page(bio, page, PAGE_SIZE, 0);

   page->mapping = NULL;
   if (submit || len < PAGE_SIZE) {

[FIX]
It's no longer possible to revert to the original code style as we have
two different bio_add_*_page() calls now.

The new fix is to skip the bio_add_*_page() call if @submit is true.

Also to avoid @len to be uninitialized, always initialize it to zero.

If @submit is true, @len will not be checked.
If @submit is not true, @len will be the return value of
bio_add_*_page() call.
Either way, the behavior is still the same as the old code.

Reported-by: Josef Bacik <[email protected]>
Fixes: 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes")
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
3 years agocifs: change format of CIFS_FULL_KEY_DUMP ioctl
Aurelien Aptel [Fri, 21 May 2021 15:19:28 +0000 (17:19 +0200)]
cifs: change format of CIFS_FULL_KEY_DUMP ioctl

Make CIFS_FULL_KEY_DUMP ioctl able to return variable-length keys.

* userspace needs to pass the struct size along with optional
  session_id and some space at the end to store keys
* if there is enough space kernel returns keys in the extra space and
  sets the length of each key via xyz_key_length fields

This also fixes the build error for get_user() on ARM.

Sample program:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <sys/fcntl.h>
#include <sys/ioctl.h>

struct smb3_full_key_debug_info {
        uint32_t   in_size;
        uint64_t   session_id;
        uint16_t   cipher_type;
        uint8_t    session_key_length;
        uint8_t    server_in_key_length;
        uint8_t    server_out_key_length;
        uint8_t    data[];
        /*
         * return this struct with the keys appended at the end:
         * uint8_t session_key[session_key_length];
         * uint8_t server_in_key[server_in_key_length];
         * uint8_t server_out_key[server_out_key_length];
         */
} __attribute__((packed));

#define CIFS_IOCTL_MAGIC 0xCF
#define CIFS_DUMP_FULL_KEY _IOWR(CIFS_IOCTL_MAGIC, 10, struct smb3_full_key_debug_info)

void dump(const void *p, size_t len) {
        const char *hex = "0123456789ABCDEF";
        const uint8_t *b = p;
        for (int i = 0; i < len; i++)
                printf("%c%c ", hex[(b[i]>>4)&0xf], hex[b[i]&0xf]);
        putchar('\n');
}

int main(int argc, char **argv)
{
        struct smb3_full_key_debug_info *keys;
        uint8_t buf[sizeof(*keys)+1024] = {0};
        size_t off = 0;
        int fd, rc;

        keys = (struct smb3_full_key_debug_info *)&buf;
        keys->in_size = sizeof(buf);

        fd = open(argv[1], O_RDONLY);
        if (fd < 0)
                perror("open"), exit(1);

        rc = ioctl(fd, CIFS_DUMP_FULL_KEY, keys);
        if (rc < 0)
                perror("ioctl"), exit(1);

        printf("SessionId      ");
        dump(&keys->session_id, 8);
        printf("Cipher         %04x\n", keys->cipher_type);

        printf("SessionKey     ");
        dump(keys->data+off, keys->session_key_length);
        off += keys->session_key_length;

        printf("ServerIn Key   ");
        dump(keys->data+off, keys->server_in_key_length);
        off += keys->server_in_key_length;

        printf("ServerOut Key  ");
        dump(keys->data+off, keys->server_out_key_length);

        return 0;
}

Usage:

$ gcc -o dumpkeys dumpkeys.c

Against Windows Server 2020 preview (with AES-256-GCM support):

# mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.0,seal"
# ./dumpkeys /mnt/somefile
SessionId      0D 00 00 00 00 0C 00 00
Cipher         0002
SessionKey     AB CD CC 0D E4 15 05 0C 6F 3C 92 90 19 F3 0D 25
ServerIn Key   73 C6 6A C8 6B 08 CF A2 CB 8E A5 7D 10 D1 5B DC
ServerOut Key  6D 7E 2B A1 71 9D D7 2B 94 7B BA C4 F0 A5 A4 F8
# umount /mnt

With 256 bit keys:

# echo 1 > /sys/module/cifs/parameters/require_gcm_256
# mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.11,seal"
# ./dumpkeys /mnt/somefile
SessionId      09 00 00 00 00 0C 00 00
Cipher         0004
SessionKey     93 F5 82 3B 2F B7 2A 50 0B B9 BA 26 FB 8C 8B 03
ServerIn Key   6C 6A 89 B2 CB 7B 78 E8 04 93 37 DA 22 53 47 DF B3 2C 5F 02 26 70 43 DB 8D 33 7B DC 66 D3 75 A9
ServerOut Key  04 11 AA D7 52 C7 A8 0F ED E3 93 3A 65 FE 03 AD 3F 63 03 01 2B C0 1B D7 D7 E5 52 19 7F CC 46 B4

Signed-off-by: Aurelien Aptel <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
3 years agoi2c: i801: Don't generate an interrupt on bus reset
Jean Delvare [Tue, 25 May 2021 15:03:36 +0000 (17:03 +0200)]
i2c: i801: Don't generate an interrupt on bus reset

Now that the i2c-i801 driver supports interrupts, setting the KILL bit
in a attempt to recover from a timed out transaction triggers an
interrupt. Unfortunately, the interrupt handler (i801_isr) is not
prepared for this situation and will try to process the interrupt as
if it was signaling the end of a successful transaction. In the case
of a block transaction, this can result in an out-of-range memory
access.

This condition was reproduced several times by syzbot:
https://syzkaller.appspot.com/bug?extid=ed71512d469895b5b34e
https://syzkaller.appspot.com/bug?extid=8c8dedc0ba9e03f6c79e
https://syzkaller.appspot.com/bug?extid=c8ff0b6d6c73d81b610e
https://syzkaller.appspot.com/bug?extid=33f6c360821c399d69eb
https://syzkaller.appspot.com/bug?extid=be15dc0b1933f04b043a
https://syzkaller.appspot.com/bug?extid=b4d3fd1dfd53e90afd79

So disable interrupts while trying to reset the bus. Interrupts will
be enabled again for the following transaction.

Fixes: 636752bcb517 ("i2c-i801: Enable IRQ for SMBus transactions")
Reported-by: [email protected]
Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Jarkko Nikula <[email protected]>
Tested-by: Jarkko Nikula <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: mpc: implement erratum A-004447 workaround
Chris Packham [Tue, 11 May 2021 21:20:52 +0000 (09:20 +1200)]
i2c: mpc: implement erratum A-004447 workaround

The P2040/P2041 has an erratum where the normal i2c recovery mechanism
does not work. Implement the alternative recovery mechanism documented
in the P2040 Chip Errata Rev Q.

Signed-off-by: Chris Packham <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agopowerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers
Chris Packham [Tue, 11 May 2021 21:20:51 +0000 (09:20 +1200)]
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P1010 i2c controllers

The i2c controllers on the P1010 have an erratum where the documented
scheme for i2c bus recovery will not work (A-004447). A different
mechanism is needed which is documented in the P1010 Chip Errata Rev L.

Signed-off-by: Chris Packham <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agopowerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers
Chris Packham [Tue, 11 May 2021 21:20:50 +0000 (09:20 +1200)]
powerpc/fsl: set fsl,i2c-erratum-a004447 flag for P2041 i2c controllers

The i2c controllers on the P2040/P2041 have an erratum where the
documented scheme for i2c bus recovery will not work (A-004447). A
different mechanism is needed which is documented in the P2040 Chip
Errata Rev Q (latest available at the time of writing).

Signed-off-by: Chris Packham <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agodt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag
Chris Packham [Tue, 11 May 2021 21:20:49 +0000 (09:20 +1200)]
dt-bindings: i2c: mpc: Add fsl,i2c-erratum-a004447 flag

Document the fsl,i2c-erratum-a004447 flag which indicates the presence
of an i2c erratum on some QorIQ SoCs.

Signed-off-by: Chris Packham <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name
Lee Jones [Thu, 20 May 2021 19:01:03 +0000 (20:01 +0100)]
i2c: busses: i2c-stm32f4: Remove incorrectly placed ' ' from function name

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-stm32f4.c:321: warning: expecting prototype for stm32f4_i2c_write_ byte()(). Prototype was for stm32f4_i2c_write_byte() instead

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Alain Volmat <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-st: Fix copy/paste function misnaming issues
Lee Jones [Thu, 20 May 2021 19:01:02 +0000 (20:01 +0100)]
i2c: busses: i2c-st: Fix copy/paste function misnaming issues

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-st.c:531: warning: expecting prototype for st_i2c_handle_write(). Prototype was for st_i2c_handle_read() instead
 drivers/i2c/busses/i2c-st.c:566: warning: expecting prototype for st_i2c_isr(). Prototype was for st_i2c_isr_thread() instead

Fix the "enmpty" typo while here.

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Alain Volmat <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-pnx: Provide descriptions for 'alg_data' data structure
Lee Jones [Thu, 20 May 2021 19:01:00 +0000 (20:01 +0100)]
i2c: busses: i2c-pnx: Provide descriptions for 'alg_data' data structure

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-pnx.c:147: warning: Function parameter or member 'alg_data' not described in 'i2c_pnx_start'
 drivers/i2c/busses/i2c-pnx.c:147: warning: Excess function parameter 'adap' description in 'i2c_pnx_start'
 drivers/i2c/busses/i2c-pnx.c:202: warning: Function parameter or member 'alg_data' not described in 'i2c_pnx_stop'
 drivers/i2c/busses/i2c-pnx.c:202: warning: Excess function parameter 'adap' description in 'i2c_pnx_stop'
 drivers/i2c/busses/i2c-pnx.c:231: warning: Function parameter or member 'alg_data' not described in 'i2c_pnx_master_xmit'
 drivers/i2c/busses/i2c-pnx.c:231: warning: Excess function parameter 'adap' description in 'i2c_pnx_master_xmit'
 drivers/i2c/busses/i2c-pnx.c:301: warning: Function parameter or member 'alg_data' not described in 'i2c_pnx_master_rcv'
 drivers/i2c/busses/i2c-pnx.c:301: warning: Excess function parameter 'adap' description in 'i2c_pnx_master_rcv'

Signed-off-by: Lee Jones <[email protected]>
Acked-by: Vladimir Zapolskiy <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-ocores: Place the expected function names into the documentation...
Lee Jones [Thu, 20 May 2021 19:00:59 +0000 (20:00 +0100)]
i2c: busses: i2c-ocores: Place the expected function names into the documentation headers

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-ocores.c:253: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 drivers/i2c/busses/i2c-ocores.c:267: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 drivers/i2c/busses/i2c-ocores.c:299: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 drivers/i2c/busses/i2c-ocores.c:347: warning: expecting prototype for It handles an IRQ(). Prototype was for ocores_process_polling() instead

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Peter Korsgaard <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-eg20t: Fix 'bad line' issue and provide description for 'msgs' param
Lee Jones [Thu, 20 May 2021 19:00:57 +0000 (20:00 +0100)]
i2c: busses: i2c-eg20t: Fix 'bad line' issue and provide description for 'msgs' param

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-eg20t.c:151: warning: bad line:                          PCH i2c controller
 drivers/i2c/busses/i2c-eg20t.c:369: warning: Function parameter or member 'msgs' not described in 'pch_i2c_writebytes'

Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-designware-master: Fix misnaming of 'i2c_dw_init_master()'
Lee Jones [Thu, 20 May 2021 19:00:56 +0000 (20:00 +0100)]
i2c: busses: i2c-designware-master: Fix misnaming of 'i2c_dw_init_master()'

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-designware-master.c:176: warning: expecting prototype for i2c_dw_init(). Prototype was for i2c_dw_init_master() instead

Signed-off-by: Lee Jones <[email protected]>
Acked-by: Jarkko Nikula <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-cadence: Fix incorrectly documented 'enum cdns_i2c_slave_mode'
Lee Jones [Thu, 20 May 2021 19:00:55 +0000 (20:00 +0100)]
i2c: busses: i2c-cadence: Fix incorrectly documented 'enum cdns_i2c_slave_mode'

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-cadence.c:157: warning: expecting prototype for enum cdns_i2c_slave_mode. Prototype was for enum cdns_i2c_slave_state instead

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Michal Simek <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-ali1563: File headers are not good candidates for kernel-doc
Lee Jones [Thu, 20 May 2021 19:00:52 +0000 (20:00 +0100)]
i2c: busses: i2c-ali1563: File headers are not good candidates for kernel-doc

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-ali1563.c:24: warning: expecting prototype for i2c(). Prototype was for ALI1563_MAX_TIMEOUT() instead

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: muxes: i2c-arb-gpio-challenge: Demote non-conformant kernel-doc headers
Lee Jones [Thu, 20 May 2021 19:00:51 +0000 (20:00 +0100)]
i2c: muxes: i2c-arb-gpio-challenge: Demote non-conformant kernel-doc headers

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/muxes/i2c-arb-gpio-challenge.c:43: warning: Function parameter or member 'muxc' not described in 'i2c_arbitrator_select'
 drivers/i2c/muxes/i2c-arb-gpio-challenge.c:43: warning: Function parameter or member 'chan' not described in 'i2c_arbitrator_select'
 drivers/i2c/muxes/i2c-arb-gpio-challenge.c:86: warning: Function parameter or member 'muxc' not described in 'i2c_arbitrator_deselect'
 drivers/i2c/muxes/i2c-arb-gpio-challenge.c:86: warning: Function parameter or member 'chan' not described in 'i2c_arbitrator_deselect'

Signed-off-by: Lee Jones <[email protected]>
Acked-by: Douglas Anderson <[email protected]>
Acked-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agoi2c: busses: i2c-nomadik: Fix formatting issue pertaining to 'timeout'
Lee Jones [Thu, 20 May 2021 19:00:50 +0000 (20:00 +0100)]
i2c: busses: i2c-nomadik: Fix formatting issue pertaining to 'timeout'

Fixes the following W=1 kernel build warning(s):

 drivers/i2c/busses/i2c-nomadik.c:184: warning: Function parameter or member 'timeout' not described in 'nmk_i2c_dev'

Signed-off-by: Lee Jones <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
3 years agocifs: fix string declarations and assignments in tracepoints
Shyam Prasad N [Fri, 21 May 2021 06:35:52 +0000 (06:35 +0000)]
cifs: fix string declarations and assignments in tracepoints

We missed using the variable length string macros in several
tracepoints. Fixed them in this change.

There's probably more useful macros that we can use to print
others like flags etc. But I'll submit sepawrate patches for
those at a future date.

Signed-off-by: Shyam Prasad N <[email protected]>
Cc: <[email protected]> # v5.12
Signed-off-by: Steve French <[email protected]>
3 years agocifs: set server->cipher_type to AES-128-CCM for SMB3.0
Aurelien Aptel [Fri, 21 May 2021 15:19:27 +0000 (17:19 +0200)]
cifs: set server->cipher_type to AES-128-CCM for SMB3.0

SMB3.0 doesn't have encryption negotiate context but simply uses
the SMB2_GLOBAL_CAP_ENCRYPTION flag.

When that flag is present in the neg response cifs.ko uses AES-128-CCM
which is the only cipher available in this context.

cipher_type was set to the server cipher only when parsing encryption
negotiate context (SMB3.1.1).

For SMB3.0 it was set to 0. This means cipher_type value can be 0 or 1
for AES-128-CCM.

Fix this by checking for SMB3.0 and encryption capability and setting
cipher_type appropriately.

Signed-off-by: Aurelien Aptel <[email protected]>
Cc: <[email protected]>
Signed-off-by: Steve French <[email protected]>
3 years agoMerge tag 'acpi-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 27 May 2021 18:39:05 +0000 (08:39 -1000)]
Merge tag 'acpi-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "Fix a recent ACPI power management regression causing boot issues to
  occur on some systems due to attempts to turn off ACPI power resources
  that are already off (which should work according to the ACPI
  specification)"

* tag 'acpi-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: power: Refine turning off unused power resources

3 years agokbuild: Quote OBJCOPY var to avoid a pahole call break the build
Javier Martinez Canillas [Wed, 26 May 2021 21:52:28 +0000 (23:52 +0200)]
kbuild: Quote OBJCOPY var to avoid a pahole call break the build

The ccache tool can be used to speed up cross-compilation, by calling the
compiler and binutils through ccache. For example, following should work:

    $ export ARCH=arm64 CROSS_COMPILE="ccache aarch64-linux-gnu-"

    $ make M=drivers/gpu/drm/rockchip/

but pahole fails to extract the BTF info from DWARF, breaking the build:

      CC [M]  drivers/gpu/drm/rockchip//rockchipdrm.mod.o
      LD [M]  drivers/gpu/drm/rockchip//rockchipdrm.ko
      BTF [M] drivers/gpu/drm/rockchip//rockchipdrm.ko
    aarch64-linux-gnu-objcopy: invalid option -- 'J'
    Usage: aarch64-linux-gnu-objcopy [option(s)] in-file [out-file]
     Copies a binary file, possibly transforming it in the process
    ...
    make[1]: *** [scripts/Makefile.modpost:156: __modpost] Error 2
    make: *** [Makefile:1866: modules] Error 2

this fails because OBJCOPY is set to "ccache aarch64-linux-gnu-copy" and
later pahole is executed with the following command line:

    LLVM_OBJCOPY=$(OBJCOPY) $(PAHOLE) -J --btf_base vmlinux $@

which gets expanded to:

    LLVM_OBJCOPY=ccache aarch64-linux-gnu-objcopy pahole -J ...

instead of:

    LLVM_OBJCOPY="ccache aarch64-linux-gnu-objcopy" pahole -J ...

Fixes: 5f9ae91f7c0d ("kbuild: Build kernel module BTFs if BTF is enabled and pahole supports it")
Signed-off-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
3 years agodrm/tegra: sor: Fix AUX device reference leak
Thierry Reding [Thu, 27 May 2021 18:09:08 +0000 (20:09 +0200)]
drm/tegra: sor: Fix AUX device reference leak

In the case where the AUX provides an I2C-over-AUX DDC channel, a
reference is taken on the AUX parent device of the DDC channel rather
than the DDC channel like it would be for regular I2C controllers. To
make sure the correct reference is dropped, move the unreferencing code
into the SOR driver and make sure not to drop the I2C adapter reference
in that case.

Signed-off-by: Thierry Reding <[email protected]>
3 years agodrm/tegra: Get ref for DP AUX channel, not its ddc adapter
Lyude Paul [Fri, 14 May 2021 22:13:05 +0000 (18:13 -0400)]
drm/tegra: Get ref for DP AUX channel, not its ddc adapter

While we're taking a reference of the DDC adapter for a DP AUX channel in
tegra_sor_probe() because we're going to be using that adapter with the
SOR, now that we've moved where AUX registration happens the actual device
structure for the DDC adapter isn't initialized yet. Which means that we
can't really take a reference from it to try to keep it around anymore.

This should be fine though, because we can just take a reference of its
parent instead.

v2:
* Avoid calling i2c_put_adapter() in tegra_output_remove() for eDP/DP cases

Signed-off-by: Lyude Paul <[email protected]>
Fixes: 39c17ae60ea9 ("drm/tegra: Don't register DP AUX channels before connectors")
Cc: Lyude Paul <[email protected]>
Cc: Thierry Reding <[email protected]>
Cc: Jonathan Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thierry Reding <[email protected]>
3 years agoMerge tag 'iommu-fixes-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 27 May 2021 18:06:36 +0000 (08:06 -1000)]
Merge tag 'iommu-fixes-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Important fix for the AMD IOMMU driver in the recently added
   page-specific invalidation code to fix a calculation.

 - Fix a NULL-ptr dereference in the AMD IOMMU driver when a device
   switches domain types.

 - Fixes for the Intel VT-d driver to check for allocation failure and
   do correct cleanup.

 - Another fix for Intel VT-d to not allow supervisor page requests from
   devices when using second level page translation.

 - Add a MODULE_DEVICE_TABLE to the VIRTIO IOMMU driver

* tag 'iommu-fixes-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix sysfs leak in alloc_iommu()
  iommu/vt-d: Use user privilege for RID2PASID translation
  iommu/vt-d: Check for allocation failure in aux_detach_device()
  iommu/virtio: Add missing MODULE_DEVICE_TABLE
  iommu/amd: Fix wrong parentheses on page-specific invalidations
  iommu/amd: Clear DMA ops when switching domain

3 years agoperf debug: Move debug initialization earlier
Ian Rogers [Wed, 19 May 2021 16:44:47 +0000 (09:44 -0700)]
perf debug: Move debug initialization earlier

This avoids segfaults during option handlers that use pr_err. For
example, "perf --debug nopager list" segfaults before this change.

Fixes: 8abceacff87d (perf debug: Add debug_set_file function)
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
3 years agoafs: Fix the nlink handling of dir-over-dir rename
David Howells [Thu, 27 May 2021 10:24:33 +0000 (11:24 +0100)]
afs: Fix the nlink handling of dir-over-dir rename

Fix rename of one directory over another such that the nlink on the deleted
directory is cleared to 0 rather than being decremented to 1.

This was causing the generic/035 xfstest to fail.

Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/162194384460.3999479.7605572278074191079.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <[email protected]>
3 years agoBluetooth: fix the erroneous flush_work() order
Lin Ma [Tue, 25 May 2021 12:39:02 +0000 (14:39 +0200)]
Bluetooth: fix the erroneous flush_work() order

In the cleanup routine for failed initialization of HCI device,
the flush_work(&hdev->rx_work) need to be finished before the
flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can
possibly invoke new cmd_work and cause a bug, like double free,
in late processings.

This was assigned CVE-2021-3564.

This patch reorder the flush_work() to fix this bug.

Cc: Marcel Holtmann <[email protected]>
Cc: Johan Hedberg <[email protected]>
Cc: Luiz Augusto von Dentz <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lin Ma <[email protected]>
Signed-off-by: Hao Xiong <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
3 years agoxfs: bunmapi has unnecessary AG lock ordering issues
Dave Chinner [Thu, 27 May 2021 15:11:01 +0000 (08:11 -0700)]
xfs: bunmapi has unnecessary AG lock ordering issues

large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
3 years agoxfs: btree format inode forks can have zero extents
Dave Chinner [Thu, 27 May 2021 02:57:42 +0000 (19:57 -0700)]
xfs: btree format inode forks can have zero extents

xfs/538 is assert failing with this trace when testing with
directory block sizes of 64kB:

XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
....
Call Trace:
 xfs_bmap_btree_to_extents+0x2a9/0x470
 ? kmem_cache_alloc+0xe7/0x220
 __xfs_bunmapi+0x4ca/0xdf0
 xfs_bunmapi+0x1a/0x30
 xfs_dir2_shrink_inode+0x71/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_remove+0x244/0x460
 xfs_vn_unlink+0x53/0xa0
 ? selinux_inode_unlink+0x13/0x20
 vfs_unlink+0x117/0x220
 do_unlinkat+0x1a2/0x2d0
 __x64_sys_unlink+0x42/0x60
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.

We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.

Also, the assert directly after this bogus one is:

ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);

Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.

The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
3 years agoiommu/vt-d: Fix sysfs leak in alloc_iommu()
Rolf Eike Beer [Tue, 25 May 2021 07:08:02 +0000 (15:08 +0800)]
iommu/vt-d: Fix sysfs leak in alloc_iommu()

iommu_device_sysfs_add() is called before, so is has to be cleaned on subsequent
errors.

Fixes: 39ab9555c2411 ("iommu: Add sysfs bindings for struct iommu_device")
Cc: [email protected] # 4.11.x
Signed-off-by: Rolf Eike Beer <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Joerg Roedel <[email protected]>
This page took 0.157258 seconds and 4 git commands to generate.