netfilter: conntrack: remove offload_pickup sysctl again
These two sysctls were added because the hardcoded defaults (2 minutes,
tcp, 30 seconds, udp) turned out to be too low for some setups.
They appeared in 5.14-rc1 so it should be fine to remove it again.
Marcelo convinced me that there should be no difference between a flow
that was offloaded vs. a flow that was not wrt. timeout handling.
Thus the default is changed to those for TCP established and UDP stream,
5 days and 120 seconds, respectively.
Marcelo also suggested to account for the timeout value used for the
offloading, this avoids increase beyond the value in the conntrack-sysctl
and will also instantly expire the conntrack entry with altered sysctls.
This will remove offloaded udp flows after one minute, rather than two.
An earlier version of this patch also cleared the ASSURED bit to
allow nf_conntrack to evict the entry via early_drop (i.e., table full).
However, it looks like we can safely assume that connection timed out
via HW is still in established state, so this isn't needed.
Quoting Oz:
[..] the hardware sends all packets with a set FIN flags to sw.
[..] Connections that are aged in hardware are expected to be in the
established state.
In case it turns out that back-to-sw-path transition can occur for
'dodgy' connections too (e.g., one side disappeared while software-path
would have been in RETRANS timeout), we can adjust this later.
netfilter: nfnetlink_hook: Use same family as request message
Use the same family as the request message, for consistency. The
netlink payload provides sufficient information to describe the hook
object, including the family.
This makes it easier to userspace to correlate the hooks are that
visited by the packets for a certain family.
Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <[email protected]>
netfilter: nfnetlink_hook: use the sequence number of the request message
The sequence number allows to correlate the netlink reply message (as
part of the dump) with the original request message.
The cb->seq field is internally used to detect an interference (update)
of the hook list during the netlink dump, do not use it as sequence
number in the netlink dump header.
Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <[email protected]>
The family is relevant for pseudo-families like NFPROTO_INET
otherwise the user needs to rely on the hook function name to
differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names.
Add nfnl_hook_chain_desc_attributes instead of using the existing
NFTA_CHAIN_* attributes, since these do not provide a family number.
Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Pablo Neira Ayuso <[email protected]>
netfilter: conntrack: collect all entries in one cycle
Michal Kubecek reports that conntrack gc is responsible for frequent
wakeups (every 125ms) on idle systems.
On busy systems, timed out entries are evicted during lookup.
The gc worker is only needed to remove entries after system becomes idle
after a busy period.
To resolve this, always scan the entire table.
If the scan is taking too long, reschedule so other work_structs can run
and resume from next bucket.
After a completed scan, wait for 2 minutes before the next cycle.
Heuristics for faster re-schedule are removed.
GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow
tuning this as-needed or even turn the gc worker off.
Takashi Iwai [Fri, 6 Aug 2021 15:00:51 +0000 (17:00 +0200)]
Merge tag 'asoc-fix-v5.14-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.14
Quite a lot of fixes here, the biggest set being for the cs42l42 driver
which is reasonably old but has seen a sudden uptick in activity.
There's also some fixes for correctly referencing PCM buffer addresses
and the removal of some driver-local bodges that had been done for the
lack of prefix handling in DAPM which were broken by the core handling
that as expected.
tracepoint: Use rcu get state and cond sync for static call updates
State transitions from 1->0->1 and N->2->1 callbacks require RCU
synchronization. Rather than performing the RCU synchronization every
time the state change occurs, which is quite slow when many tracepoints
are registered in batch, instead keep a snapshot of the RCU state on the
most recent transitions which belong to a chain, and conditionally wait
for a grace period on the last transition of the chain if one g.p. has
not elapsed since the last snapshot.
This applies to both RCU and SRCU.
This brings the performance regression caused by commit 231264d6927f
("Fix: tracepoint: static call function vs data state mismatch") back to
what it was originally.
Before this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m10.593s
user 0m0.017s
sys 0m0.259s
After this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
Hao Xu [Thu, 5 Aug 2021 10:05:38 +0000 (18:05 +0800)]
io-wq: fix lack of acct->nr_workers < acct->max_workers judgement
There should be this judgement before we create an io-worker
Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Hao Xu [Thu, 5 Aug 2021 10:05:37 +0000 (18:05 +0800)]
io-wq: fix no lock protection of acct->nr_worker
There is an acct->nr_worker visit without lock protection. Think about
the case: two callers call io_wqe_wake_worker(), one is the original
context and the other one is an io-worker(by calling
io_wqe_enqueue(wqe, linked)), on two cpus paralelly, this may cause
nr_worker to be larger than max_worker.
Let's fix it by adding lock for it, and let's do nr_workers++ before
create_io_worker. There may be a edge cause that the first caller fails
to create an io-worker, but the second caller doesn't know it and then
quit creating io-worker as well:
say nr_worker = max_worker - 1
cpu 0 cpu 1
io_wqe_wake_worker() io_wqe_wake_worker()
nr_worker < max_worker
nr_worker++
create_io_worker() nr_worker == max_worker
failed return
return
But the chance of this case is very slim.
Fixes: 685fe7feedb9 ("io-wq: eliminate the need for a manager thread") Signed-off-by: Hao Xu <[email protected]>
[axboe: fix unconditional create_io_worker() call] Signed-off-by: Jens Axboe <[email protected]>
Kan Liang [Tue, 3 Aug 2021 13:25:28 +0000 (06:25 -0700)]
perf/x86/intel: Apply mid ACK for small core
A warning as below may be occasionally triggered in an ADL machine when
these conditions occur:
- Two perf record commands run one by one. Both record a PEBS event.
- Both runs on small cores.
- They have different adaptive PEBS configuration (PEBS_DATA_CFG).
Different from the big core, the small core requires the ACK right
before re-enabling counters in the NMI handler, otherwise a stale PEBS
record may be dumped into the later NMI handler, which trigger the
warning.
Add a new mid_ack flag to track the case. Add all PMI handler bits in
the struct x86_hybrid_pmu to track the bits for different types of
PMUs. Apply mid ACK for the small cores on an Alder Lake machine.
The existing hybrid() macro has a compile error when taking address of
a bit-field variable. Add a new macro hybrid_bit() to get the
bit-field value of a given PMU.
Hans de Goede [Fri, 6 Aug 2021 11:55:15 +0000 (13:55 +0200)]
platform/x86: pcengines-apuv2: Add missing terminating entries to gpio-lookup tables
The gpiod_lookup_table.table passed to gpiod_add_lookup_table() must
be terminated with an empty entry, add this.
Note we have likely been getting away with this not being present because
the GPIO lookup code first matches on the dev_id, causing most lookups to
skip checking the table and the lookups which do check the table will
find a matching entry before reaching the end. With that said, terminating
these tables properly still is obviously the correct thing to do.
Hans de Goede [Mon, 2 Aug 2021 14:10:00 +0000 (16:10 +0200)]
platform/x86: Make dual_accel_detect() KIOX010A + KIOX020A detect more robust
360 degree hinges devices with dual KIOX010A + KIOX020A accelerometers
always have both a KIOX010A and a KIOX020A ACPI device (one for each
accel).
Theoretical some vendor may re-use some DSDT for a non-convertible
stripping out just the KIOX020A ACPI device from the DSDT. Check that
both ACPI devices are present to make the check more robust.
John Hubbard [Fri, 6 Aug 2021 06:53:30 +0000 (23:53 -0700)]
net: mvvp2: fix short frame size on s390
On s390, the following build warning occurs:
drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in
conversion from 'long unsigned int' to 'int' changes value from
'18446744073709551584' to '-32' [-Woverflow]
844 | ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is
already 64-byte aligned) on some architectures, actually gets ALIGN'd up
to 512 bytes in the s390 case.
So then, when this is invoked:
MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
...that turns into:
704 - 224 - 512 == -32
...which is not a good frame size to end up with! The warning above is a
bit lucky: it notices a signed/unsigned bad behavior here, which leads
to the real problem of a frame that is too short for its contents.
Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is
just exactly big enough. (The other values can't readily be changed
without causing a lot of other problems.)
DENG Qingfang [Fri, 6 Aug 2021 04:05:27 +0000 (12:05 +0800)]
net: dsa: mt7530: add the missing RxUnicast MIB counter
Add the missing RxUnicast counter.
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: DENG Qingfang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
RDMA/iw_cxgb4: Fix refcount underflow while destroying cqs.
Previous atomic increment/decrement logic expects the atomic count to be
'0' after the final decrement.
Replacing atomic count with refcount does not allow that, as
refcount_dec() considers count of 1 as underflow and triggers a kernel
splat.
Fix the current refcount logic by using the usual pattern of decrementing
the refcount and test if it is '0' on the final deref in
c4iw_destroy_cq(). Use wait_for_completion() instead of wait_event().
drm/amd/display: workaround for hard hang on HPD on native DP
[Why]
HPD disable and enable sequences are not mutually exclusive
on Linux. For HPDs that spans over 1s (i.e. HPD low = 1s),
part of the disable sequence (specifically, a request to SMU
to lower refclk) could come right before the call to PHY
enable, causing DMUB to access an unresponsive PHY
and thus a hard hang on the system.
drm/amd/display: Fix resetting DCN3.1 HW when resuming from S4
[Why] On S4 resume we also need to fix detection of when to reload DMCUB
firmware because we're currently using the VBIOS version which isn't
compatible with the driver version.
[How] Update the hardware init check for DCN31 since it's the ASIC that
has this issue.
[WHY]
For DCN31 onward, LTTPR is to be enabled and set to Transparent by
VBIOS. Driver is to assume that VBIOS has done this without needing to
check the VBIOS interop bit.
[HOW]
Add LTTPR enable and interop VBIOS bits into dc->caps, and force-set the
interop bit to true for DCN31+.
Randy Dunlap [Fri, 30 Jul 2021 03:03:47 +0000 (20:03 -0700)]
drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled
'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP
is set/enabled. OTOH, when both SUSPEND and HIBERNATION are not set,
PM_SLEEP is not set, so this variable cannot be used.
../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c: In function ‘amdgpu_acpi_is_s0ix_active’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:1046:11: error: ‘pm_suspend_target_state’ undeclared (first use in this function); did you mean ‘__KSYM_pm_suspend_target_state’?
return pm_suspend_target_state == PM_SUSPEND_TO_IDLE;
^~~~~~~~~~~~~~~~~~~~~~~
__KSYM_pm_suspend_target_state
Also use shorter IS_ENABLED(CONFIG_foo) notation for checking the
2 config symbols.
----> [ delayed here on one CPU (e.g. vcpu preempted by the host) ]
static_call(tp_func_##name)(__data, args); \
} \
} while (0)
It has loaded the tp->funcs of the old callback, so it will try to use the old
data. This can be fixed by adding a RCU sync anywhere in the 1->0->1
transition chain.
On a N->2->1 transition, we need an rcu-sync because you may have a
sequence of 3->2->1 (or 1->2->1) where the element 0 data is unchanged
between 2->1, but was changed from 3->2 (or from 1->2), which may be
observed by the static call. This can be fixed by adding an
unconditional RCU sync in transition 2->1.
Note, this fixes a correctness issue at the cost of adding a tremendous
performance regression to the disabling of tracepoints.
Before this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m0.778s
user 0m0.000s
sys 0m0.061s
After this commit:
# trace-cmd start -e all
# time trace-cmd start -p nop
real 0m10.593s
user 0m0.017s
sys 0m0.259s
A follow up fix will introduce a more lightweight scheme based on RCU
get_state and cond_sync, that will return the performance back to what it
was. As both this change and the lightweight versions are complex on their
own, for bisecting any issues that this may cause, they are kept as two
separate changes.
tracepoint: static call: Compare data on transition from 2->1 callees
On transition from 2->1 callees, we should be comparing .data rather
than .func, because the same callback can be registered twice with
different data, and what we care about here is that the data of array
element 0 is unchanged to skip rcu sync.
Linus Torvalds [Thu, 5 Aug 2021 19:26:00 +0000 (12:26 -0700)]
Merge tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from ipsec.
Current release - regressions:
- sched: taprio: fix init procedure to avoid inf loop when dumping
- sctp: move the active_key update after sh_keys is added
Current release - new code bugs:
- sparx5: fix build with old GCC & bitmask on 32-bit targets
Previous releases - regressions:
- xfrm: redo the PREEMPT_RT RCU vs hash_resize_mutex deadlock fix
- xfrm: fixes for the compat netlink attribute translator
- phy: micrel: Fix detection of ksz87xx switch
Previous releases - always broken:
- gro: set inner transport header offset in tcp/udp GRO hook to avoid
crashes when such packets reach GSO
- vsock: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST, as required by spec
- dsa: sja1105: fix static FDB entries on SJA1105P/Q/R/S and SJA1110
- bridge: validate the NUD_PERMANENT bit when adding an extern_learn
FDB entry
- usb: lan78xx: don't modify phy_device state concurrently
- usb: pegasus: check for errors of IO routines"
* tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
net: vxge: fix use-after-free in vxge_device_unregister
net: fec: fix use-after-free in fec_drv_remove
net: pegasus: fix uninit-value in get_interrupt_interval
net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update()
bnx2x: fix an error code in bnx2x_nic_load()
net: wwan: iosm: fix recursive lock acquire in unregister
net: wwan: iosm: correct data protocol mask bit
net: wwan: iosm: endianness type correction
net: wwan: iosm: fix lkp buildbot warning
net: usb: lan78xx: don't modify phy_device state concurrently
docs: networking: netdevsim rules
net: usb: pegasus: Remove the changelog and DRIVER_VERSION.
net: usb: pegasus: Check the return value of get_geristers() and friends;
net/prestera: Fix devlink groups leakage in error flow
net: sched: fix lockdep_set_class() typo error for sch->seqlock
net: dsa: qca: ar9331: reorder MDIO write sequence
VSOCK: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST
mptcp: drop unused rcu member in mptcp_pm_addr_entry
net: ipv6: fix returned variable type in ip6_skb_dst_mtu
nfp: update ethtool reporting of pauseframe control
...
spi: cadence-quadspi: Fix check condition for DTR ops
buswidth and dtr fields in spi_mem_op are only valid when the
corresponding spi_mem_op phase has a non-zero length. For example,
SPI NAND core doesn't set buswidth when using SPI_MEM_OP_NO_ADDR
phase.
Fix the dtr checks in set_protocol() and suppports_mem_op() to
ignore empty spi_mem_op phases, as checking for dtr field in
empty phase will result in false negatives.
I2S always has two LRCLK phases and both CH1 and CH2 of the RX
must be enabled (corresponding to the low and high phases of LRCLK.)
The selection of the valid data channels is done by setting the DAC
CHA_SEL and CHB_SEL. CHA_SEL is always the first (left) channel,
CHB_SEL depends on the number of active channels.
Previously for mono ASP CH2 was not enabled, the result was playing
mono data would not produce any audio output.
ASoC: cs42l42: Constrain sample rate to prevent illegal SCLK
The lowest valid SCLK corresponds to 44.1 kHz at 16-bit. Sample
rates less than this would produce SCLK below the minimum when using
a normal I2S frame. A constraint must be applied to prevent this.
The constraint is not applied if the machine driver sets SCLK, to
allow setups where the host generates additional bits per LRCLK
phase to increase the SCLK frequency. In these cases the machine
driver would always have to inform this driver of the actual SCLK,
and it must select a legal SCLK.
An I2S frame starts on the falling edge of LRCLK so ASP_STP must
be 0.
At the same time, move other format settings in the same register
from cs42l42_pll_config() to cs42l42_set_dai_fmt() where you'd
expect to find them, and merge into a single write.
ASoC: cs42l42: PLL must be running when changing MCLK_SRC_SEL
Both SCLK and PLL clocks must be running to drive the glitch-free mux
behind MCLK_SRC_SEL and complete the switchover.
This patch moves the writing of MCLK_SRC_SEL to when the PLL is started
and stopped, so that it only transitions while the PLL is running.
The unconditional write MCLK_SRC_SEL=0 in cs42l42_mute_stream() is safe
because if the PLL is not running MCLK_SRC_SEL is already 0.
Tetsuo Handa [Wed, 4 Aug 2021 10:26:56 +0000 (19:26 +0900)]
Bluetooth: defer cleanup of resources in hci_unregister_dev()
syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
calling lock_sock() with rw spinlock held [1].
It seems that history of this locking problem is a trial and error.
Commit b40df5743ee8 ("[PATCH] bluetooth: fix socket locking in
hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to
lock_sock() as an attempt to fix lockdep warning.
Then, commit 4ce61d1c7a8e ("[BLUETOOTH]: Fix locking in
hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the
sleep in atomic context warning.
Then, commit 4b5dd696f81b ("Bluetooth: Remove local_bh_disable() from
hci_sock.c") in 3.3-rc1 removed local_bh_disable().
Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF
of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
lock_sock() as an attempt to fix CVE-2021-3573.
This difficulty comes from current implementation that
hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all
references from sockets because hci_unregister_dev() immediately
reclaims resources as soon as returning from
hci_sock_dev_event(HCI_DEV_UNREG).
But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not
doing what it should do.
Therefore, instead of trying to detach sockets from device, let's accept
not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG),
by moving actual cleanup of resources from hci_unregister_dev() to
hci_cleanup_dev() which is called by bt_host_release() when all
references to this unregistered device (which is a kobject) are gone.
Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets
hci_pi(sk)->hdev, we need to check whether this device was unregistered
and return an error based on HCI_UNREGISTER flag. There might be subtle
behavioral difference in "monitor the hdev" functionality; please report
if you found something went wrong due to this patch.
Linus Torvalds [Thu, 5 Aug 2021 19:06:31 +0000 (12:06 -0700)]
Merge tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One small SELinux fix for a problem where an error code was not being
propagated back up to userspace when a bogus SELinux policy is loaded
into the kernel"
* tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: correct the return value when loads initial sids
Linus Torvalds [Thu, 5 Aug 2021 19:00:00 +0000 (12:00 -0700)]
Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
"Fix a subtle locking versus reference counting bug in the ucount
changes, found by syzbot"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: Fix race condition between alloc_ucounts and put_ucounts
Linus Torvalds [Thu, 5 Aug 2021 18:53:34 +0000 (11:53 -0700)]
Merge tag 'trace-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- Fix NULL pointer dereference caused by an error path
- Give histogram calculation fields a size, otherwise it breaks
synthetic creation based on them.
- Reject strings being used for number calculations.
- Fix recordmcount.pl warning on llvm building RISC-V allmodconfig
- Fix the draw_functrace.py script to handle the new trace output
- Fix warning of smp_processor_id() in preemptible code"
* tag 'trace-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Quiet smp_processor_id() use in preemptable warning in hwlat
scripts/tracing: fix the bug that can't parse raw_trace_func
scripts/recordmcount.pl: Remove check_objcopy() and $can_use_local
tracing: Reject string operand in the histogram expression
tracing / histogram: Give calculation hist_fields a size
tracing: Fix NULL pointer dereference in start_creating
Linus Torvalds [Thu, 5 Aug 2021 18:23:09 +0000 (11:23 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Mostly bugfixes; plus, support for XMM arguments to Hyper-V hypercalls
now obeys KVM_CAP_HYPERV_ENFORCE_CPUID.
Both the XMM arguments feature and KVM_CAP_HYPERV_ENFORCE_CPUID are
new in 5.14, and each did not know of the other"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
KVM: selftests: fix hyperv_clock test
KVM: SVM: improve the code readability for ASID management
KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
KVM: Do not leak memory for duplicate debugfs directories
KVM: selftests: Test access to XMM fast hypercalls
KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input
KVM: x86: Introduce trace_kvm_hv_hypercall_done()
KVM: x86: hyper-v: Check access to hypercall before reading XMM registers
KVM: x86: accept userspace interrupt only if no event is injected
Shyam Prasad N [Wed, 4 Aug 2021 18:37:22 +0000 (18:37 +0000)]
cifs: create sd context must be a multiple of 8
We used to follow the rule earlier that the create SD context
always be a multiple of 8. However, with the change:
cifs: refactor create_sd_buf() and and avoid corrupting the buffer
...we recompute the length, and we failed that rule.
Fixing that with this change.
Yu Kuai [Thu, 5 Aug 2021 12:46:45 +0000 (20:46 +0800)]
blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit()
If queue is dying while iolatency_set_limit() is in progress,
blk_get_queue() won't increment the refcount of the queue. However,
blk_put_queue() will still decrement the refcount later, which will
cause the refcout to be unbalanced.
Jakub Kicinski [Thu, 5 Aug 2021 14:29:55 +0000 (07:29 -0700)]
Merge branch 'net-fix-use-after-free-bugs'
Pavel Skripkin says:
====================
net: fix use-after-free bugs
I've added new checker to smatch yesterday. It warns about using
netdev_priv() pointer after free_{netdev,candev}() call. I hope, it will
get into next smatch release.
Some of the reported bugs are fixed and upstreamed already, but Dan ran new
smatch with allmodconfig and found 2 more. Big thanks to Dan for doing it,
because I totally forgot to do it.
====================
Pavel Skripkin [Wed, 4 Aug 2021 15:52:20 +0000 (18:52 +0300)]
net: vxge: fix use-after-free in vxge_device_unregister
Smatch says:
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
Since vdev pointer is netdev private data accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Pavel Skripkin [Wed, 4 Aug 2021 15:51:51 +0000 (18:51 +0300)]
net: fec: fix use-after-free in fec_drv_remove
Smatch says:
drivers/net/ethernet/freescale/fec_main.c:3994 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
drivers/net/ethernet/freescale/fec_main.c:3995 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
Since fep pointer is netdev private data, accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Pavel Skripkin [Wed, 4 Aug 2021 14:30:05 +0000 (17:30 +0300)]
net: pegasus: fix uninit-value in get_interrupt_interval
Syzbot reported uninit value pegasus_probe(). The problem was in missing
error handling.
get_interrupt_interval() internally calls read_eprom_word() which can
fail in some cases. For example: failed to receive usb control message.
These cases should be handled to prevent uninit value bug, since
read_eprom_word() will not initialize passed stack variable in case of
internal failure.
tracing: Quiet smp_processor_id() use in preemptable warning in hwlat
The hardware latency detector (hwlat) has a mode that it runs one thread
across CPUs. The logic to move from the currently running CPU to the next
one in the list does a smp_processor_id() to find where it currently is.
Unfortunately, it's done with preemption enabled, and this triggers a
warning for using smp_processor_id() in a preempt enabled section.
As it is only using smp_processor_id() to get information on where it
currently is in order to simply move it to the next CPU, it doesn't really
care if it got moved in the mean time. It will simply balance out later if
such a case arises.
Switch smp_processor_id() to raw_smp_processor_id() to quiet that warning.
net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update()
The am65_cpsw_port_offload_fwd_mark_update() causes NULL exception crash
when there is at least one disabled port and any other port added to the
bridge first time.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000858
pc : am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
lr : am65_cpsw_netdevice_event+0x8c/0xf0
Call trace:
am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
notifier_call_chain+0x54/0x98
raw_notifier_call_chain+0x14/0x20
call_netdevice_notifiers_info+0x34/0x78
__netdev_upper_dev_link+0x1c8/0x290
netdev_master_upper_dev_link+0x1c/0x28
br_add_if+0x3f0/0x6d0 [bridge]
Fix it by adding proper check for port->ndev != NULL.
Fixes: 2934db9bcb30 ("net: ti: am65-cpsw-nuss: Add netdevice notifiers") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dan Carpenter [Thu, 5 Aug 2021 10:38:26 +0000 (13:38 +0300)]
bnx2x: fix an error code in bnx2x_nic_load()
Set the error code if bnx2x_alloc_fw_stats_mem() fails. The current
code returns success.
Fixes: ad5afc89365e ("bnx2x: Separate VF and PF logic") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
kbuild: cancel sub_make_done for the install target to fix DKMS
Since commit bcf637f54f6d ("kbuild: parse C= and M= before changing the
working directory"), external module builds invoked by DKMS fail because
M= option is not parsed.
I wanted to add 'unset sub_make_done' in install.sh but similar scripts,
arch/*/boot/install.sh, are duplicated, so I set sub_make_done empty in
the top Makefile.
Fixes: bcf637f54f6d ("kbuild: parse C= and M= before changing the working directory") Reported-by: John S Gruber <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: John S Gruber <[email protected]>
When cross compiling a MIPS kernel on a BSD based HOSTCC leads
to errors like
SYNC include/config/auto.conf.cmd - due to: .config
egrep: empty (sub)expression
UPD include/config/kernel.release
HOSTCC scripts/dtc/dtc.o - due to target missing
It turns out that egrep uses this egrep pattern:
(|MINOR_|PATCHLEVEL_)
This is not valid syntax or gives undefined results according
to POSIX 9.5.3 ERE Grammar
It seems to be silently accepted by the Linux egrep implementation
while a BSD host complains.
Such patterns can be replaced by a transformation like
"(|p1|p2)" -> "(p1|p2)?"
Fixes: 48c35b2d245f ("[MIPS] There is no __GNUC_MAJOR__") Signed-off-by: H. Nikolaus Schaller <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
Trying to run a cross-compiled x86 relocs tool on a BSD based
HOSTCC leads to errors like
VOFFSET arch/x86/boot/compressed/../voffset.h - due to: vmlinux
CC arch/x86/boot/compressed/misc.o - due to: arch/x86/boot/compressed/../voffset.h
OBJCOPY arch/x86/boot/compressed/vmlinux.bin - due to: vmlinux
RELOCS arch/x86/boot/compressed/vmlinux.relocs - due to: vmlinux
empty (sub)expressionarch/x86/boot/compressed/Makefile:118: recipe for target 'arch/x86/boot/compressed/vmlinux.relocs' failed
make[3]: *** [arch/x86/boot/compressed/vmlinux.relocs] Error 1
It turns out that relocs.c uses patterns like
"something(|_end)"
This is not valid syntax or gives undefined results according
to POSIX 9.5.3 ERE Grammar
David S. Miller [Thu, 5 Aug 2021 10:28:56 +0000 (11:28 +0100)]
Merge branch 'eean-iosm-fixes'
M Chetan Kumar says:
====================
net: wwan: iosm: fixes
This patch series contains IOSM Driver fixes. Below is the patch
series breakdown.
PATCH1:
* Correct the td buffer type casting & format specifier to fix lkp buildbot
warning.
PATCH2:
* Endianness type correction for nr_of_bytes. This field is exchanged
as part of host-device protocol communication.
PATCH3:
* Correct ul/dl data protocol mask bit to know which protocol capability
does device implement.
PATCH4:
* Calling unregister_netdevice() inside wwan del link is trying to
acquire the held lock in ndo_stop_cb(). Instead, queue net dev to
be unregistered later.
====================
M Chetan Kumar [Wed, 4 Aug 2021 16:09:52 +0000 (21:39 +0530)]
net: wwan: iosm: fix recursive lock acquire in unregister
Calling unregister_netdevice() inside wwan del link is trying to
acquire the held lock in ndo_stop_cb(). Instead, queue net dev to
be unregistered later.
Kyle Tso [Tue, 3 Aug 2021 09:13:14 +0000 (17:13 +0800)]
usb: typec: tcpm: Keep other events when receiving FRS and Sourcing_vbus events
When receiving FRS and Sourcing_Vbus events from low-level drivers, keep
other events which come a bit earlier so that they will not be ignored
in the event handler.
Wesley Cheng [Wed, 4 Aug 2021 06:24:05 +0000 (23:24 -0700)]
usb: dwc3: gadget: Avoid runtime resume if disabling pullup
If the device is already in the runtime suspended state, any call to
the pullup routine will issue a runtime resume on the DWC3 core
device. If the USB gadget is disabling the pullup, then avoid having
to issue a runtime resume, as DWC3 gadget has already been
halted/stopped.
This fixes an issue where the following condition occurs:
This leads to a situation where the DWC3 gadget is not properly
stopped, as the runtime resume would have re-enabled EP0 and event
interrupts, and since we avoided the DWC3 gadget suspend, these
resources were never disabled.
usb: dwc3: gadget: Use list_replace_init() before traversing lists
The list_for_each_entry_safe() macro saves the current item (n) and
the item after (n+1), so that n can be safely removed without
corrupting the list. However, when traversing the list and removing
items using gadget giveback, the DWC3 lock is briefly released,
allowing other routines to execute. There is a situation where, while
items are being removed from the cancelled_list using
dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
routine is running in parallel (due to UDC unbind). As the cleanup
routine removes n, and the pullup disable removes n+1, once the
cleanup retakes the DWC3 lock, it references a request who was already
removed/handled. With list debug enabled, this leads to a panic.
Ensure all instances of the macro are replaced where gadget giveback
is used.
Example call stack:
Thread#1:
__dwc3_gadget_ep_set_halt() - CLEAR HALT
-> dwc3_gadget_ep_cleanup_cancelled_requests()
->list_for_each_entry_safe()
->dwc3_gadget_giveback(n)
->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
->spin_unlock
->Thread#2 executes
...
->dwc3_gadget_giveback(n+1)
->Already removed!
Thread#2:
dwc3_gadget_pullup()
->waiting for dwc3 spin_lock
...
->Thread#1 released lock
->dwc3_stop_active_transfers()
->dwc3_remove_requests()
->fetches n+1 item from cancelled_list (n removed by Thread#1)
->dwc3_gadget_giveback()
->dwc3_gadget_del_and_unmap_request()- n+1
deleted[cancelled_list]
->spin_unlock
Fix this condition by utilizing list_replace_init(), and traversing
through a local copy of the current elements in the endpoint lists.
This will also set the parent list as empty, so if another thread is
also looping through the list, it will be empty on the next iteration.
Merge tag 'usb-serial-5.14-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.14-rc5
Here are two type-detection regression fixes for pl2303 and a patch to
increase the receive buffer size for for ch341 to avoid lost characters
at high line speeds.
Included are also some new device ids.
All but the last three commits have been in linux-next and with no
reported issues.
* tag 'usb-serial-5.14-rc5' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ftdi_sio: add device ID for Auto-M3 OP-COM v2
USB: serial: pl2303: fix GT type detection
USB: serial: option: add Telit FD980 composition 0x1056
USB: serial: pl2303: fix HX type detection
USB: serial: ch341: fix character loss at high transfer rates
KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU. This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.
Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels. As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to 4294967295 and effectively corrupts the counter.
This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects. The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e5031 ("mm:
vmscan: shrink deferred objects proportional to priority").
vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
Hui Su [Fri, 11 Jun 2021 02:21:07 +0000 (10:21 +0800)]
scripts/tracing: fix the bug that can't parse raw_trace_func
Since commit 77271ce4b2c0 ("tracing: Add irq, preempt-count and need resched info
to default trace output"), the default trace output format has been changed to:
<idle>-0 [009] d.h. 22420.068695: _raw_spin_lock_irqsave <-hrtimer_interrupt
<idle>-0 [000] ..s. 22420.068695: _nohz_idle_balance <-run_rebalance_domains
<idle>-0 [011] d.h. 22420.068695: account_process_tick <-update_process_times
The draw_functrace.py(introduced in v2.6.28) can't parse the new version format trace_func,
So we need modify draw_functrace.py to adapt the new version trace output format.
scripts/recordmcount.pl: Remove check_objcopy() and $can_use_local
When building ARCH=riscv allmodconfig with llvm-objcopy, the objcopy
version warning from this script appears:
WARNING: could not find objcopy version or version is less than 2.17.
Local function references are disabled.
The check_objcopy() function in scripts/recordmcount.pl is set up to
parse GNU objcopy's version string, not llvm-objcopy's, which triggers
the warning.
Commit 799c43415442 ("kbuild: thin archives make default for all archs")
made binutils 2.20 mandatory and commit ba64beb17493 ("kbuild: check the
minimum assembler version in Kconfig") enforces this at configuration
time so just remove check_objcopy() and $can_use_local instead, assuming
--globalize-symbol is always available.
llvm-objcopy has supported --globalize-symbol since LLVM 7.0.0 in 2018
and the minimum version for building the kernel with LLVM is 10.0.1 so
there is no issue introduced:
tracing: Reject string operand in the histogram expression
Since the string type can not be the target of the addition / subtraction
operation, it must be rejected. Without this fix, the string type silently
converted to digits.
tracing / histogram: Give calculation hist_fields a size
When working on my user space applications, I found a bug in the synthetic
event code where the automated synthetic event field was not matching the
event field calculation it was attached to. Looking deeper into it, it was
because the calculation hist_field was not given a size.
The synthetic event fields are matched to their hist_fields either by
having the field have an identical string type, or if that does not match,
then the size and signed values are used to match the fields.
The problem arose when I tried to match a calculation where the fields
were "unsigned int". My tool created a synthetic event of type "u32". But
it failed to match. The string was:
Adding debugging into the kernel, I found that the size of "diff" was 0.
And since it was given "unsigned int" as a type, the histogram fallback
code used size and signed. The signed matched, but the size of u32 (4) did
not match zero, and the event failed to be created.
This can be worse if the field you want to match is not one of the
acceptable fields for a synthetic event. As event fields can have any type
that is supported in Linux, this can cause an issue. For example, if a
type is an enum. Then there's no way to use that with any calculations.
Have the calculation field simply take on the size of what it is
calculating.
Fix modpost Section mismatch error in i915_globals_exit().
Since both an __init function and an __exit function can call
i915_globals_exit(), any function that i915_globals_exit() calls
should not be marked as __init or __exit. I.e., it needs to be
available for either of them.
WARNING: modpost: vmlinux.o(.text+0x8b796a): Section mismatch in reference from the function i915_globals_exit() to the function .exit.text:__i915_globals_flush()
The function i915_globals_exit() references a function in an exit section.
Often the function __i915_globals_flush() has valid usage outside the exit section
and the fix is to remove the __exit annotation of __i915_globals_flush.
ERROR: modpost: Section mismatches detected.
Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.
Jens Axboe [Tue, 3 Aug 2021 15:14:35 +0000 (09:14 -0600)]
io-wq: fix race between worker exiting and activating free worker
Nadav correctly reports that we have a race between a worker exiting,
and new work being queued. This can lead to work being queued behind
an existing worker that could be sleeping on an event before it can
run to completion, and hence introducing potential big latency gaps
if we hit this race condition:
Fix this by having the exiting worker go through the normal decrement
of a running worker, which will spawn a new one if needed.
The free worker activation is modified to only return success if we
were able to find a sleeping worker - if not, we keep looking through
the list. If we fail, we create a new worker as per usual.
Linus Torvalds [Wed, 4 Aug 2021 19:41:30 +0000 (12:41 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes, five in drivers.
The two core changes are a trivial warning removal in scsi_scan.c and
a change to rescan for capacity when a device makes a user induced
(via a write to the state variable) offline->running transition to fix
issues with device mapper"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix capacity set to zero after offlinining device
scsi: sr: Return correct event when media event code is 3
scsi: ibmvfc: Fix command state accounting and stale response detection
scsi: core: Avoid printing an error if target_alloc() returns -ENXIO
scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()
scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry()
scsi: pm80xx: Fix TMF task completion race condition
Linus Torvalds [Wed, 4 Aug 2021 19:31:53 +0000 (12:31 -0700)]
Merge tag 'gpio-updates-for-v5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- revert a patch intruducing breakage in interrupt handling in
gpio-mpc8xxx
- correctly handle missing IRQs in gpio-tqmx86 by really making them
optional
* tag 'gpio-updates-for-v5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: tqmx86: really make IRQ optional
Revert "gpio: mpc8xxx: change the gpio interrupt flags."
Jeff Layton [Tue, 3 Aug 2021 16:47:34 +0000 (12:47 -0400)]
ceph: take snap_empty_lock atomically with snaprealm refcount change
There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.
Fix this by protecting the 1->0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.
Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.
With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.
Luis Henriques [Tue, 6 Jul 2021 13:52:41 +0000 (14:52 +0100)]
ceph: reduce contention in ceph_check_delayed_caps()
Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
workqueue and it can be kept looping for quite some time if caps keep
being added back to the mdsc->cap_delay_list. This may result in the
watchdog tainting the kernel with the softlockup flag.
This patch breaks this loop if the caps have been recently (i.e. during
the loop execution). Any new caps added to the list will be handled in
the next run.
Also, allow schedule_delayed() callers to explicitly set the delay value
instead of defaulting to 5s, so we can ensure that it runs soon
afterward if it looks like there is more work.
Andy Shevchenko [Wed, 4 Aug 2021 11:21:41 +0000 (14:21 +0300)]
pinctrl: tigerlake: Fix GPIO mapping for newer version of software
The software mapping for GPIO, which initially comes from Microsoft,
is subject to change by respective Windows and firmware developers.
Due to the above the driver had been written and published way ahead
of the schedule, and thus the numbering schema used in it is outdated.
Fix the numbering schema in accordance with the real products on market.
s390/dasd: fix use after free in dasd path handling
When new configuration data is obtained after a path event it is stored
in the per path array. The old data needs to be freed.
The first valid configuration data is also referenced in the device
private structure to identify the device.
When the old per path configuration data was freed the device still
pointed to the already freed data leading to a use after free.
Fix by replacing also the device configuration data with the newly
obtained one before the old data gets freed.
Maxim Levitsky [Wed, 4 Aug 2021 11:20:57 +0000 (14:20 +0300)]
KVM: selftests: fix hyperv_clock test
The test was mistakenly using addr_gpa2hva on a gva and that happened
to work accidentally. Commit 106a2e766eae ("KVM: selftests: Lower the
min virtual address for misc page allocations") revealed this bug.
Mingwei Zhang [Mon, 2 Aug 2021 18:09:03 +0000 (11:09 -0700)]
KVM: SVM: improve the code readability for ASID management
KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
because it is never used by VM. Thus, in existing code, ASID value and its
bitmap postion always has an 'offset-by-1' relationship.
Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
[min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.
Existing code mixes the usage of ASID value and its bitmap position by
using the same variable called 'min_asid'.
Fix the min_asid usage: ensure that its usage is consistent with its name;
allocate extra size for ASID 0 to ensure that each ASID has the same value
with its bitmap position. Add comments on ASID bitmap allocation to clarify
the size change.
Peter Zijlstra [Thu, 29 Jul 2021 09:14:57 +0000 (11:14 +0200)]
perf/x86: Fix out of bound MSR access
On Wed, Jul 28, 2021 at 12:49:43PM -0400, Vince Weaver wrote:
> [32694.087403] unchecked MSR access error: WRMSR to 0x318 (tried to write 0x0000000000000000) at rIP: 0xffffffff8106f854 (native_write_msr+0x4/0x20)
> [32694.101374] Call Trace:
> [32694.103974] perf_clear_dirty_counters+0x86/0x100
The problem being that it doesn't filter out all fake counters, in
specific the above (erroneously) tries to use FIXED_BTS. Limit the
fixed counters indexes to the hardware supplied number.
Peter Zijlstra [Tue, 3 Aug 2021 10:45:01 +0000 (12:45 +0200)]
sched/rt: Fix double enqueue caused by rt_effective_prio
Double enqueues in rt runqueues (list) have been reported while running
a simple test that spawns a number of threads doing a short sleep/run
pattern while being concurrently setscheduled between rt and fair class.
__sched_setscheduler() uses rt_effective_prio() to handle proper queuing
of priority boosted tasks that are setscheduled while being boosted.
rt_effective_prio() is however called twice per each
__sched_setscheduler() call: first directly by __sched_setscheduler()
before dequeuing the task and then by __setscheduler() to actually do
the priority change. If the priority of the pi_top_task is concurrently
being changed however, it might happen that the two calls return
different results. If, for example, the first call returned the same rt
priority the task was running at and the second one a fair priority, the
task won't be removed by the rt list (on_list still set) and then
enqueued in the fair runqueue. When eventually setscheduled back to rt
it will be seen as enqueued already and the WARNING/BUG be issued.
Fix this by calling rt_effective_prio() only once and then reusing the
return value. While at it refactor code as well for clarity. Concurrent
priority inheritance handling is still safe and will eventually converge
to a new state by following the inheritance chain(s).
Ivan T. Ivanov [Wed, 4 Aug 2021 08:13:39 +0000 (11:13 +0300)]
net: usb: lan78xx: don't modify phy_device state concurrently
Currently phy_device state could be left in inconsistent state shown
by following alert message[1]. This is because phy_read_status could
be called concurrently from lan78xx_delayedwork, phy_state_machine and
__ethtool_get_link. Fix this by making sure that phy_device state is
updated atomically.
[1] lan78xx 1-1.1.1:1.0 eth0: No phy led trigger registered for speed(-1)
KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB
Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID. The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.
Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN. However, prior to commit 8a14fe4f0c54 ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable. Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.
Paolo Bonzini [Wed, 4 Aug 2021 09:28:52 +0000 (05:28 -0400)]
KVM: Do not leak memory for duplicate debugfs directories
KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine. The directory name is built from the process
pid and a VM fd. While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:
[ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.
When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked. The slow memory leak was spotted by syzkaller, where it
caused OOM reports.
Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created. This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.
1) Fix a sysbot reported memory leak in xfrm_user_rcv_msg.
From Pavel Skripkin.
2) Revert "xfrm: policy: Read seqcount outside of rcu-read side
in xfrm_policy_lookup_bytype". This commit tried to fix a
lockin bug, but only cured some of the symptoms. A proper
fix is applied on top of this revert.
3) Fix a locking bug on xfrm state hash resize. A recent change
on sequence counters accidentally repaced a spinlock by a mutex.
Fix from Frederic Weisbecker.
4) Fix possible user-memory-access in xfrm_user_rcv_msg_compat().
From Dmitry Safonov.
5) Add initialiation sefltest fot xfrm_spdattr_type_t.
From Dmitry Safonov.
====================
David S. Miller [Wed, 4 Aug 2021 09:38:42 +0000 (10:38 +0100)]
Merge branch 'pegasus-errors'
Petko Manolov says:
====================
net: usb: pegasus: better error checking and DRIVER_VERSION removal
v3:
Pavel Skripkin again: make sure -ETIMEDOUT is returned by __mii_op() on timeout
condition;
v2:
Special thanks to Pavel Skripkin for the review and who caught a few bugs.
setup_pegasus_II() would not print an erroneous message on the success path.
v1:
Add error checking for get_registers() and derivatives. If the usb transfer
fail then just don't use the buffer where the legal data should have been
returned.
Remove DRIVER_VERSION per Greg KH request.
====================
Petko Manolov [Tue, 3 Aug 2021 17:25:23 +0000 (20:25 +0300)]
net: usb: pegasus: Check the return value of get_geristers() and friends;
Certain call sites of get_geristers() did not do proper error handling. This
could be a problem as get_geristers() typically return the data via pointer to a
buffer. If an error occurred the code is carelessly manipulating the wrong data.
Johan Hovold [Wed, 4 Aug 2021 09:31:00 +0000 (11:31 +0200)]
USB: serial: pl2303: fix GT type detection
At least some PL2303GT have a bcdDevice of 0x305 instead of 0x100 as the
datasheet claims. Add it to the list of known release numbers for the
HXN (G) type.
Leon Romanovsky [Tue, 3 Aug 2021 12:00:43 +0000 (15:00 +0300)]
net/prestera: Fix devlink groups leakage in error flow
Devlink trap group is registered but not released in error flow,
add the missing devlink_trap_groups_unregister() call.
Fixes: 0a9003f45e91 ("net: marvell: prestera: devlink: add traps/groups implementation") Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yunsheng Lin [Tue, 3 Aug 2021 10:58:21 +0000 (18:58 +0800)]
net: sched: fix lockdep_set_class() typo error for sch->seqlock
According to comment in qdisc_alloc(), sch->seqlock's lockdep
class key should be set to qdisc_tx_busylock, due to possible
type error, sch->busylock's lockdep class key is set to
qdisc_tx_busylock, which is duplicated because sch->busylock's
lockdep class key is already set in qdisc_alloc().
So fix it by replacing sch->busylock with sch->seqlock.
Fixes: 96009c7d500e ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock") Signed-off-by: Yunsheng Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>