Eric Dumazet [Wed, 29 Sep 2021 22:57:50 +0000 (15:57 -0700)]
af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.
Wong Vee Khee [Thu, 30 Sep 2021 06:44:36 +0000 (14:44 +0800)]
net: stmmac: fix EEE init issue when paired with EEE capable PHYs
When STMMAC is paired with Energy-Efficient Ethernet(EEE) capable PHY,
and the PHY is advertising EEE by default, we need to enable EEE on the
xPCS side too, instead of having user to manually trigger the enabling
config via ethtool.
Fixed this by adding xpcs_config_eee() call in stmmac_eee_init().
Fixes: 7617af3d1a5e ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet") Cc: Michael Sit Wei Hong <[email protected]> Signed-off-by: Wong Vee Khee <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jakub Kicinski [Wed, 29 Sep 2021 15:32:24 +0000 (08:32 -0700)]
net: dev_addr_list: handle first address in __hw_addr_add_ex
struct dev_addr_list is used for device addresses, unicast addresses
and multicast addresses. The first of those needs special handling
of the main address - netdev->dev_addr points directly the data
of the entry and drivers write to it freely, so we can't maintain
it in the rbtree (for now, at least, to be fixed in net-next).
Current work around sprinkles special handling of the first
address on the list throughout the code but it missed the case
where address is being added. First address will not be visible
during subsequent adds.
Syzbot found a warning where unicast addresses are modified
without holding the rtnl lock, tl;dr is that team generates
the same modification multiple times, not necessarily when
right locks are held.
In the repro we have:
macvlan -> team -> veth
macvlan adds a unicast address to the team. Team then pushes
that address down to its memebers (veths). Next something unrelated
makes team sync member addrs again, and because of the bug
the addr entries get duplicated in the veths. macvlan gets
removed, removes its addr from team which removes only one
of the duplicated addresses from veths. This removal is done
under rtnl. Next syzbot uses iptables to add a multicast addr
to team (which does not hold rtnl lock). Team syncs veth addrs,
but because veths' unicast list still has the duplicate it will
also get sync, even though this update is intended for mc addresses.
Again, uc address updates need rtnl lock, boom.
Reported-by: [email protected] Fixes: 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
also removed rcu protection of individual filters which causes following
use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
rcu read lock while iterating and taking the filter reference and temporary
release the lock while calling arg->fn() callback that can sleep.
KASAN trace:
[ 352.773640] ==================================================================
[ 352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
[ 352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987
the problem originates from uncorrect lock annotation in the mptcp
code and is only visible since commit 2dcb96bacce3 ("net: core: Correct
the sock::sk_lock.owned lockdep annotations"), but is present since
the port-based endpoint support initial implementation.
This patch addresses the issue introducing a nested variant of
lock_sock_fast() and using it in the relevant code path.
Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Fixes: 2dcb96bacce3 ("net: core: Correct the sock::sk_lock.owned lockdep annotations") Suggested-by: Thomas Gleixner <[email protected]> Reported-and-tested-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
KVM: selftests: Ensure all migrations are performed when test is affined
Rework the CPU selection in the migration worker to ensure the specified
number of migrations are performed when the test iteslf is affined to a
subset of CPUs. The existing logic skips iterations if the target CPU is
not in the original set of possible CPUs, which causes the test to fail
if too many iterations are skipped.
==== Test Assertion Failure ====
rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
pid=10127 tid=10127 errno=4 - Interrupted system call
1 0x00000000004018e5: main at rseq_test.c:227
2 0x00007fcc8fc66bf6: ?? ??:0
3 0x0000000000401959: _start at ??:?
Only performed 4 KVM_RUNs, task stalled too much?
Calculate the min/max possible CPUs as a cheap "best effort" to avoid
high runtimes when the test is affined to a small percentage of CPUs.
Alternatively, a list or xarray of the possible CPUs could be used, but
even in a horrendously inefficient setup, such optimizations are not
needed because the runtime is completely dominated by the cost of
migrating the task, and the absolute runtime is well under a minute in
even truly absurd setups, e.g. running on a subset of vCPUs in a VM that
is heavily overcommited (16 vCPUs per pCPU).
KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks
Check whether a CPUID entry's index is significant before checking for a
matching index to hack-a-fix an undefined behavior bug due to consuming
uninitialized data. RESET/INIT emulation uses kvm_cpuid() to retrieve
CPUID.0x1, which does _not_ have a significant index, and fails to
initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
ECX, a.k.a. index, input.
Practically speaking, it's _extremely_ unlikely any compiler will yield
code that causes problems, as the compiler would need to inline the
kvm_cpuid() call to detect the uninitialized data, and intentionally hose
the kernel, e.g. insert ud2, instead of simply ignoring the result of
the index comparison.
Although the sketchy "dummy" pattern was introduced in SVM by commit 66f7b72e1171 ("KVM: x86: Make register state after reset conform to
specification"), it wasn't actually broken until commit 7ff6c0350315
("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
order of operations such that "index" was checked before the significant
flag.
Avoid consuming uninitialized data by reverting to checking the flag
before the index purely so that the fix can be easily backported; the
offending RESET/INIT code has been refactored, moved, and consolidated
from vendor code to common x86 since the bug was introduced. A future
patch will directly address the bad RESET/INIT behavior.
The undefined behavior was detected by syzbot + KernelMemorySanitizer.
Local variable ----dummy@kvm_vcpu_reset created at:
kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
Zelin Deng [Wed, 29 Sep 2021 05:13:49 +0000 (13:13 +0800)]
ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm
hv_clock is preallocated to have only HVC_BOOT_ARRAY_SIZE (64) elements;
if the PTP_SYS_OFFSET_PRECISE ioctl is executed on vCPUs whose index is
64 of higher, retrieving the struct pvclock_vcpu_time_info pointer with
"src = &hv_clock[cpu].pvti" will result in an out-of-bounds access and
a wild pointer. Change it to "this_cpu_pvti()" which is guaranteed to
be valid.
Zelin Deng [Wed, 29 Sep 2021 05:13:48 +0000 (13:13 +0800)]
x86/kvmclock: Move this_cpu_pvti into kvmclock.h
There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
so move it into kvmclock.h and export the symbol to make it visiable to
other modules.
Nick Desaulniers [Mon, 20 Sep 2021 23:25:33 +0000 (16:25 -0700)]
nbd: use shifts rather than multiplies
commit fad7cd3310db ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f0907827a8a9 ("compiler.h: enable builtin overflow checkers and
add fallback code")
As Stephen Rothwell notes:
The added check_mul_overflow() call is being passed 64 bit values.
COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
include/linux/overflow.h).
Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW. This is problematic for 64b
operands on 32b hosts.
This was fixed upstream by
commit 76ae847497bc ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.
Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.
ld.lld: error: undefined symbol: __mulodi4
In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.
In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).
Per gpio_chip interface, error shall be proparated to the caller.
Attempt to silent diagnostics by returning zero (as written in the
comment) is plain wrong, because the zero return can be interpreted by
the caller as the gpio value.
Merge tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This became a slightly large collection of changes, partly because
I've been off in the last weeks. Most of changes are small and
scattered while a bit big change is found in HD-audio Realtek codec
driver; it's a very device-specific fix that has been long wanted, so
I decided to pick up although it's in the middle RC.
Some highlights:
- A new guard ioctl for ALSA rawmidi API to avoid the misuse of the
new timestamp framing mode; it's for a regression fix
- HD-audio: a revert of the 5.15 change that might work badly, new
quirks for Lenovo Legion & co, a follow-up fix for CS8409
- ASoC: lots of SOF-related fixes, fsl component fixes, corrections
of mediatek drivers
- USB-audio: fix for the PM resume
- FireWire: oxfw and motu fixes"
* tag 'sound-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
ALSA: pcsp: Make hrtimer forwarding more robust
ALSA: rawmidi: introduce SNDRV_RAWMIDI_IOCTL_USER_PVERSION
ALSA: firewire-motu: fix truncated bytes in message tracepoints
ASoC: SOF: trace: Omit error print when waking up trace sleepers
ASoC: mediatek: mt8195: remove wrong fixup assignment on HDMITX
ASoC: SOF: loader: Re-phrase the missing firmware error to avoid duplication
ASoC: SOF: loader: release_firmware() on load failure to avoid batching
ALSA: hda/cs8409: Setup Dolphin Headset Mic as Phantom Jack
ALSA: pcxhr: "fix" PCXHR_REG_TO_PORT definition
ASoC: SOF: imx: imx8m: Bar index is only valid for IRAM and SRAM types
ASoC: SOF: imx: imx8: Bar index is only valid for IRAM and SRAM types
ASoC: SOF: Fix DSP oops stack dump output contents
ALSA: hda/realtek: Quirks to enable speaker output for Lenovo Legion 7i 15IMHG05, Yoga 7i 14ITL5/15ITL5, and 13s Gen2 laptops.
ALSA: usb-audio: Unify mixer resume and reset_resume procedure
Revert "ALSA: hda: Drop workaround for a hang at shutdown again"
ALSA: oxfw: fix transmission method for Loud models based on OXFW971
ASoC: mediatek: common: handle NULL case in suspend/resume function
ASoC: fsl_xcvr: register platform component before registering cpu dai
ASoC: fsl_spdif: register platform component before registering cpu dai
ASoC: fsl_micfil: register platform component before registering cpu dai
...
When EEE support was added to the 28nm EPHY it was assumed that it would
be able to support the standard clause 45 over clause 22 register access
method. It turns out that the PHY does not support that, which is the
very reason for using the indirect shadow mode 2 bank 3 access method.
Implement {read,write}_mmd to allow the standard PHY library routines
pertaining to EEE querying and configuration to work correctly on these
PHYs. This forces us to implement a __phy_set_clr_bits() function that
does not grab the MDIO bus lock since the PHY driver's {read,write}_mmd
functions are always called with that lock held.
Fixes: 83ee102a6998 ("net: phy: bcm7xxx: add support for 28nm EPHY") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: disable firmware compatible features when uninstall PF
Currently, the firmware compatible features are enabled in PF driver
initialization process, but they are not disabled in PF driver
deinitialization process and firmware keeps these features in enabled
status.
In this case, if load an old PF driver (for example, in VM) which not
support the firmware compatible features, firmware will still send mailbox
message to PF when link status changed and PF will print
"un-supported mailbox message, code = 201".
To fix this problem, disable these firmware compatible features in PF
driver deinitialization process.
Fixes: ed8fb4b262ae ("net: hns3: add link change event report") Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: fix always enable rx vlan filter problem after selftest
Currently, the rx vlan filter will always be disabled before selftest and
be enabled after selftest as the rx vlan filter feature is fixed on in
old device earlier than V3.
However, this feature is not fixed in some new devices and it can be
disabled by user. In this case, it is wrong if rx vlan filter is enabled
after selftest. So fix it.
Fixes: bcc26e8dc432 ("net: hns3: remove unused code in hns3_self_test()") Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: PF enable promisc for VF when mac table is overflow
If unicast mac address table is full, and user add a new mac address, the
unicast promisc needs to be enabled for the new unicast mac address can be
used. So does the multicast promisc.
Now this feature has been implemented for PF, and VF should be implemented
too. When the mac table of VF is overflow, PF will enable promisc for this
VF.
Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously") Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: fix show wrong state when add existing uc mac address
Currently, if function adds an existing unicast mac address, eventhough
driver will not add this address into hardware, but it will return 0 in
function hclge_add_uc_addr_common(). It will cause the state of this
unicast mac address is ACTIVE in driver, but it should be in TO-ADD state.
To fix this problem, function hclge_add_uc_addr_common() returns -EEXIST
if mac address is existing, and delete two error log to avoid printing
them all the time after this modification.
Fixes: 72110b567479 ("net: hns3: return 0 and print warning when hit duplicate MAC") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
HCLGE_FLAG_MQPRIO_ENABLE is supposed to set when enable
multiple TCs with tc mqprio, and HCLGE_FLAG_DCB_ENABLE is
supposed to set when enable multiple TCs with ets. But
the driver mixed the flags when updating the tm configuration.
Furtherly, PFC should be available when HCLGE_FLAG_MQPRIO_ENABLE
too, so remove the unnecessary limitation.
Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: don't rollback when destroy mqprio fail
For destroy mqprio is irreversible in stack, so it's unnecessary
to rollback the tc configuration when destroy mqprio failed.
Otherwise, it may cause the configuration being inconsistent
between driver and netstack.
As the failure is usually caused by reset, and the driver will
restore the configuration after reset, so it can keep the
configuration being consistent between driver and hardware.
Fixes: 5a5c90917467 ("net: hns3: add support for tc mqprio offload") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Currently, in function hns3_nic_set_real_num_queue(), the
driver doesn't report the queue count and offset for disabled
tc. If user enables multiple TCs, but only maps user
priorities to partial of them, it may cause the queue range
of the unmapped TC being displayed abnormally.
Fix it by removing the tc enable checking, ensure the queue
count is not zero.
With this change, the tc_en is useless now, so remove it.
Fixes: a75a8efa00c5 ("net: hns3: Fix tc setup when netdev is first up") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: do not allow call hns3_nic_net_open repeatedly
hns3_nic_net_open() is not allowed to called repeatly, but there
is no checking for this. When doing device reset and setup tc
concurrently, there is a small oppotunity to call hns3_nic_net_open
repeatedly, and cause kernel bug by calling napi_enable twice.
The calltrace information is like below:
[ 3078.222780] ------------[ cut here ]------------
[ 3078.230255] kernel BUG at net/core/dev.c:6991!
[ 3078.236224] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 3078.243431] Modules linked in: hns3 hclgevf hclge hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 3078.258880] CPU: 0 PID: 295 Comm: kworker/u8:5 Tainted: G O 5.14.0-rc4+ #1
[ 3078.269102] Hardware name: , BIOS KpxxxFPGA 1P B600 V181 08/12/2021
[ 3078.276801] Workqueue: hclge hclge_service_task [hclge]
[ 3078.288774] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 3078.296168] pc : napi_enable+0x80/0x84
tc qdisc sho[w 3d0e7v8 .e3t0h218 79] lr : hns3_nic_net_open+0x138/0x510 [hns3]
Once hns3_nic_net_open() is excute success, the flag
HNS3_NIC_STATE_DOWN will be cleared. So add checking for this
flag, directly return when HNS3_NIC_STATE_DOWN is no set.
Fixes: e888402789b9 ("net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT") Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Guangbin Huang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
The ixgbe driver currently generates a NULL pointer dereference with
some machine (online cpus < 63). This is due to the fact that the
maximum value of num_xdp_queues is nr_cpu_ids. Code is in
"ixgbe_set_rss_queues"".
Here's how the problem repeats itself:
Some machine (online cpus < 63), And user set num_queues to 63 through
ethtool. Code is in the "ixgbe_set_channels",
adapter->ring_feature[RING_F_FDIR].limit = count;
It becomes 63.
When user use xdp, "ixgbe_set_rss_queues" will set queues num.
adapter->num_rx_queues = rss_i;
adapter->num_tx_queues = rss_i;
adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
And rss_i's value is from
f = &adapter->ring_feature[RING_F_FDIR];
rss_i = f->indices = f->limit;
So "num_rx_queues" > "num_xdp_queues", when run to "ixgbe_xdp_setup",
for (i = 0; i < adapter->num_rx_queues; i++)
if (adapter->xdp_ring[i]->xsk_umem)
So I fix ixgbe_max_channels so that it will not allow a setting of queues
to be higher than the num_online_cpus(). And when run to ixgbe_xdp_setup,
take the smaller value of num_rx_queues and num_xdp_queues.
Both cxgb4 and csiostor drivers run on their own independent Physical
Function. But when cxgb4 and csiostor are both being loaded in parallel via
modprobe, there is a race when firmware upgrade is attempted by both the
drivers.
When the cxgb4 driver initiates the firmware upgrade, it halts the firmware
and the chip until upgrade is complete. When the csiostor driver is coming
up in parallel, the firmware mailbox communication fails with timeouts and
the csiostor driver probe fails.
Add a module soft dependency on cxgb4 driver to ensure loading csiostor
triggers cxgb4 to load first when available to avoid the firmware upgrade
race.
Thomas Gleixner [Tue, 28 Sep 2021 14:10:49 +0000 (16:10 +0200)]
net: bridge: mcast: Associate the seqcount with its protecting lock.
The sequence count bridge_mcast_querier::seq is protected by
net_bridge::multicast_lock but seqcount_init() does not associate the
seqcount with the lock. This leads to a warning on PREEMPT_RT because
preemption is still enabled.
Let seqcount_init() associate the seqcount with lock that protects the
write section. Remove lockdep_assert_held_once() because lockdep already checks
whether the associated lock is held.
Cai Huoqing [Tue, 28 Sep 2021 13:48:49 +0000 (21:48 +0800)]
net: mdio-ipq4019: Fix the error for an optional regs resource
The second resource is optional which is only provided on the chipset
IPQ5018. But the blamed commit ignores that and if the resource is
not there it just fails.
the resource is used like this,
if (priv->eth_ldo_rdy) {
val = readl(priv->eth_ldo_rdy);
val |= BIT(0);
writel(val, priv->eth_ldo_rdy);
fsleep(IPQ_PHY_SET_DELAY_US);
}
This patch reverts that to still allow the second resource to be optional
because other SoC have the some MDIO controller and doesn't need to
second resource.
Merge tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Fix vfio-ap leak on uninit (Jason Gunthorpe)
- Add missing prototype arg name (Colin Ian King)
* tag 'vfio-v5.15-rc4' of git://github.com/awilliam/linux-vfio:
vfio/ap_ops: Add missed vfio_uninit_group_dev()
vfio/pci: add missing identifier name in argument of function prototype
Merge tag 'm68k-for-v5.15-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull more m68k updates from Geert Uytterhoeven:
- signal handling fixes
- removal of set_fs()
[ The set_fs removal isn't strictly a fix, but it's been pending for a
while and is very welcome. The signal handling fixes resolved an issue
that was incorrectly attributed to the set_fs changes - Linus ]
* tag 'm68k-for-v5.15-tag3' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Remove set_fs()
m68k: Provide __{get,put}_kernel_nofault
m68k: Factor the 8-byte lowlevel {get,put}_user code into helpers
m68k: Use BUILD_BUG for passing invalid sizes to get_user/put_user
m68k: Remove the 030 case in virt_to_phys_slow
m68k: Document that access_ok is broken for !CONFIG_CPU_HAS_ADDRESS_SPACES
m68k: Leave stack mangling to asm wrapper of sigreturn()
m68k: Update ->thread.esp0 before calling syscall_trace() in ret_from_signal
m68k: Handle arrivals of multiple signals correctly
Merge tag 'nios2_fixes_for_v5.15_part1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fixes from Dinh Nguyen:
- Fix build warning for unmet dependency for EARLY_PRINTK
- Remove unused dram_start() function
* tag 'nios2_fixes_for_v5.15_part1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
NIOS2: setup.c: drop unused variable 'dram_start'
NIOS2: fix kconfig unmet dependency warning for SERIAL_CORE_CONSOLE
Prike Liang [Wed, 25 Aug 2021 05:36:38 +0000 (13:36 +0800)]
drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix
In the s2idle stress test sdma resume fail occasionally,in the
failed case GPU is in the gfxoff state.This issue may introduce
by firmware miss handle doorbell S/R and now temporary fix the issue
by forcing exit gfxoff for sdma resume.
Simon Ser [Mon, 27 Sep 2021 15:08:44 +0000 (15:08 +0000)]
drm/amdgpu: check tiling flags when creating FB on GFX8-
On GFX9+, format modifiers are always enabled and ensure the
frame-buffers can be scanned out at ADDFB2 time.
On GFX8-, format modifiers are not supported and no other check
is performed. This means ADDFB2 IOCTLs will succeed even if the
tiling isn't supported for scan-out, and will result in garbage
displayed on screen [1].
Fix this by adding a check for tiling flags for GFX8 and older.
The check is taken from radeonsi in Mesa (see how is_displayable
is populated in gfx6_compute_surface).
Changes in v2: use drm_WARN_ONCE instead of drm_WARN (Michel)
drm/amd/display: Fix Display Flicker on embedded panels
[Why]
ASSR is dependent on Signed PSP Verstage to enable Content
Protection for eDP panels. Unsigned PSP verstage is used
during development phase causing ASSR to FAIL.
As a result, link training is performed with
DP_PANEL_MODE_DEFAULT instead of DP_PANEL_MODE_EDP for
eDP panels that causes display flicker on some panels.
[How]
- Do not change panel mode, if ASSR is disabled
- Just report and continue to perform eDP link training
with right settings further.
Leslie Shi [Thu, 23 Sep 2021 08:05:31 +0000 (16:05 +0800)]
drm/amdgpu: fix gart.bo pin_count leak
gmc_v{9,10}_0_gart_disable() isn't called matched with
correspoding gart_enbale function in SRIOV case. This will
lead to gart.bo pin_count leak on driver unload.
It seems a positive dentry in kernfs becomes a negative dentry directly
through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
when accessing it in kernfs_dop_revalidate(), because it is only
initialized when created as negative dentry in kernfs_iop_lookup().
The problem can be reproduced by the following command:
cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi
A simple fixes seems to be initializing d->d_time for positive dentry
in kernfs_iop_lookup() as well. The downside is the negative dentry
will be revalidated again after it becomes negative in d_delete(),
because the revison of its parent must have been increased due to
its removal.
Alternative solution is implement .d_iput for kernfs, and assign d_time
for the newly-generated negative dentry in it. But we may need to
take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
on the parent directory, it is a little over-killing. Now the simple
fix is chosen.
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity fix from Eric Biggers:
"Fix an integer overflow when computing the Merkle tree layout of
extremely large files, exposed by btrfs adding support for fs-verity"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: fix signed integer overflow with i_size near S64_MAX
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vdpa fixes from Michael Tsirkin:
"Fixes up some issues in rc1"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa: potential uninitialized return in vhost_vdpa_va_map()
vdpa/mlx5: Avoid executing set_vq_ready() if device is reset
vdpa/mlx5: Clear ready indication for control VQ
vduse: Cleanup the old kernel states after reset failure
vduse: missing error code in vduse_init()
virtio: don't fail on !of_device_is_compatible
Merge tag 'mmc-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- renesas_sdhi: Fix regression with hard reset on old SDHIs
- dw_mmc: Only inject fault before done/error
* tag 'mmc-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: renesas_sdhi: fix regression with hard reset on old SDHIs
mmc: dw_mmc: Only inject fault before done/error
This function copies strings around between multiple buffers
including a large on-stack array that causes a build warning
on 32-bit systems:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c: In function 'hclge_dbg_dump_tm_pg':
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:782:1: error: the frame size of 1424 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
The function can probably be cleaned up a lot, to go back to
printing directly into the output buffer, but dynamically allocating
the structure is a simpler workaround for now.
Fixes: 04d96139ddb3 ("net: hns3: refine function hclge_dbg_dump_tm_pri()") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
According to the documentation the second resource is optional. But the
blamed commit ignores that and if the resource is not there it just
fails.
This patch reverts that to still allow the second resource to be
optional because other SoC have the some MDIO controller and doesn't
need to second resource.
Fixes: 672a1c394950 ("net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Cai Huoqing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
af_unix: Return errno instead of NULL in unix_create1().
unix_create1() returns NULL on error, and the callers assume that it never
fails for reasons other than out of memory. So, the callers always return
-ENOMEM when unix_create1() fails.
However, it also returns NULL when the number of af_unix sockets exceeds
twice the limit controlled by sysctl: fs.file-max. In this case, the
callers should return -ENFILE like alloc_empty_file().
This patch changes unix_create1() to return the correct error value instead
of NULL on error.
Out of curiosity, the assumption has been wrong since 1999 due to this
change introduced in 2.2.4 [0].
diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c
--- v2.2.3/linux/net/unix/af_unix.c Tue Jan 19 11:32:53 1999
+++ linux/net/unix/af_unix.c Sun Mar 21 07:22:00 1999
@@ -388,6 +413,9 @@
{
struct sock *sk;
+ if (atomic_read(&unix_nr_socks) >= 2*max_files)
+ return NULL;
+
MOD_INC_USE_COUNT;
sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1);
if (!sk) {
Randy Dunlap [Mon, 27 Sep 2021 21:48:23 +0000 (14:48 -0700)]
net: sun: SUNVNET_COMMON should depend on INET
When CONFIG_INET is not set, there are failing references to IPv4
functions, so make this driver depend on INET.
Fixes these build errors:
sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_start_xmit_common':
sunvnet_common.c:(.text+0x1a68): undefined reference to `__icmp_send'
sparc64-linux-ld: drivers/net/ethernet/sun/sunvnet_common.o: in function `sunvnet_poll_common':
sunvnet_common.c:(.text+0x358c): undefined reference to `ip_send_check'
Shannon Nelson [Mon, 27 Sep 2021 21:07:18 +0000 (14:07 -0700)]
ionic: fix gathering of debug stats
Don't print stats for which we haven't reserved space as it can
cause nasty memory bashing and related bad behaviors.
Fixes: aa620993b1e5 ("ionic: pull per-q stats work out of queue loops") Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Many architectures don't define virt_to_bus() any more, as drivers
should be using the dma-mapping interfaces where possible:
In file included from drivers/net/hamradio/dmascc.c:27:
drivers/net/hamradio/dmascc.c: In function 'tx_on':
drivers/net/hamradio/dmascc.c:976:30: error: implicit declaration of function 'virt_to_bus'; did you mean 'virt_to_fix'? [-Werror=implicit-function-declaration]
976 | virt_to_bus(priv->tx_buf[priv->tx_tail]) + n);
| ^~~~~~~~~~~
arch/arm/include/asm/dma.h:109:52: note: in definition of macro 'set_dma_addr'
109 | __set_dma_addr(chan, (void *)__bus_to_virt(addr))
| ^~~~
Add the Kconfig dependency to prevent this from being built on
architectures without virt_to_bus().
Fixes: bc1abb9e55ce ("dmascc: use proper 'virt_to_bus()' rather than casting to 'int'") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
An object file cannot be built for both loadable module and built-in
use at the same time:
arm-linux-gnueabi-ld: drivers/net/ethernet/micrel/ks8851_common.o: in function `ks8851_probe_common':
ks8851_common.c:(.text+0xf80): undefined reference to `__this_module'
Change the ks8851_common code to be a standalone module instead,
and use Makefile logic to ensure this is built-in if at least one
of its two users is.
Johan Almbladh [Mon, 27 Sep 2021 13:11:57 +0000 (13:11 +0000)]
bpf, x86: Fix bpf mapping of atomic fetch implementation
Fix the case where the dst register maps to %rax as otherwise this produces
an incorrect mapping with the implementation in 981f94c3e921 ("bpf: Add
bitwise atomic instructions") as %rax is clobbered given it's part of the
cmpxchg as operand.
The issue is similar to b29dd96b905f ("bpf, x86: Fix BPF_FETCH atomic and/or/
xor with r0 as src") just that the case of dst register was missed.
Sven Peter [Fri, 24 Sep 2021 13:45:02 +0000 (15:45 +0200)]
iommu/dart: Clear sid2group entry when a group is freed
sid2groups keeps track of which stream id combinations belong to a
iommu_group to assign those correctly to devices.
When a iommu_group is freed a stale pointer will however remain in
sid2groups. This prevents devices with the same stream id combination
to ever be attached again (see below).
Fix that by creating a shadow copy of the stream id configuration
when a group is allocated for the first time and clear the sid2group
entry when that group is freed.
# echo 1 >/sys/bus/pci/devices/0000\:03\:00.0/remove
pci 0000:03:00.0: Removing from iommu group 1
# echo 1 >/sys/bus/pci/rescan
[...]
pci 0000:03:00.0: BAR 0: assigned [mem 0x6a0000000-0x6a000ffff 64bit pref]
pci 0000:03:00.0: BAR 2: assigned [mem 0x6a0010000-0x6a001ffff 64bit pref]
pci 0000:03:00.0: BAR 6: assigned [mem 0x6c0100000-0x6c01007ff pref]
tg3 0000:03:00.0: Failed to add to iommu group 1: -2
[...]
iommu/vt-d: Drop "0x" prefix from PCI bus & device addresses
719a19335692 ("iommu/vt-d: Tweak the description of a DMA fault") changed
the DMA fault reason from hex to decimal. It also added "0x" prefixes to
the PCI bus/device, e.g.,
Sven Peter [Tue, 21 Sep 2021 15:39:34 +0000 (17:39 +0200)]
iommu/dart: Remove iommu_flush_ops
apple_dart_tlb_flush_{all,walk} expect to get a struct apple_dart_domain
but instead get a struct iommu_domain right now. This breaks those two
functions and can lead to kernel panics like the one below.
DART can only invalidate the entire TLB and apple_dart_iotlb_sync will
already flush everything. There's no need to do that again inside those
two functions. Let's just drop them.
pci 0000:03:00.0: Removing from iommu group 1
Unable to handle kernel paging request at virtual address 0000000100000023
[...]
Call trace:
_raw_spin_lock_irqsave+0x54/0xbc
apple_dart_hw_stream_command.constprop.0+0x2c/0x130
apple_dart_tlb_flush_all+0x48/0x90
free_io_pgtable_ops+0x40/0x70
apple_dart_domain_free+0x2c/0x44
iommu_group_release+0x68/0xac
kobject_cleanup+0x4c/0x1fc
kobject_cleanup+0x14c/0x1fc
kobject_put+0x64/0x84
iommu_group_remove_device+0x110/0x180
iommu_release_device+0x50/0xa0
[...]
Thomas Gleixner [Thu, 23 Sep 2021 16:04:25 +0000 (18:04 +0200)]
ALSA: pcsp: Make hrtimer forwarding more robust
The hrtimer callback pcsp_do_timer() prepares rearming of the timer with
hrtimer_forward(). hrtimer_forward() is intended to provide a mechanism to
forward the expiry time of the hrtimer by a multiple of the period argument
so that the expiry time greater than the time provided in the 'now'
argument.
pcsp_do_timer() invokes hrtimer_forward() with the current timer expiry
time as 'now' argument. That's providing a periodic timer expiry, but is
not really robust when the timer callback is delayed so that the resulting
new expiry time is already in the past which causes the callback to be
invoked immediately again. If the timer is delayed then the back to back
invocation is not really making it better than skipping the missed
periods. Sound is distorted in any case.
Use hrtimer_forward_now() which ensures that the next expiry is in the
future. This prevents hogging the CPU in the timer expiry code and allows
later on to remove hrtimer_forward() from the public interfaces.
driver core: Set deferred probe reason when deferred by driver core
When the driver core defers the probe of a device, set the deferred
probe reason so that it's easier to debug. The deferred probe reason is
available in debugfs under devices_deferred.
It's not enough to set net.ipv4.conf.all.rp_filter=0, that does not override
a greater rp_filter value on the individual interfaces. We also need to set
net.ipv4.conf.default.rp_filter=0 before creating the interfaces. That way,
they'll also get their own rp_filter value of zero.
selftests, bpf: Fix makefile dependencies on libbpf
When building bpf selftest with make -j, I'm randomly getting build failures
such as this one:
In file included from progs/bpf_flow.c:19:
[...]/tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:11:10: fatal error: 'bpf_helper_defs.h' file not found
#include "bpf_helper_defs.h"
^~~~~~~~~~~~~~~~~~~
The file that fails the build varies between runs but it's always in the
progs/ subdir.
The reason is a missing make dependency on libbpf for the .o files in
progs/. There was a dependency before commit 3ac2e20fba07e but that commit
removed it to prevent unneeded rebuilds. However, that only works if libbpf
has been built already; the 'wildcard' prerequisite does not trigger when
there's no bpf_helper_defs.h generated yet.
Keep the libbpf as an order-only prerequisite to satisfy both goals. It is
always built before the progs/ objects but it does not trigger unnecessary
rebuilds by itself.
Daniel Borkmann [Mon, 27 Sep 2021 12:39:21 +0000 (14:39 +0200)]
bpf, test, cgroup: Use sk_{alloc,free} for test cases
BPF test infra has some hacks in place which kzalloc() a socket and perform
minimum init via sock_net_set() and sock_init_data(). As a result, the sk's
skcd->cgroup is NULL since it didn't go through proper initialization as it
would have been the case from sk_alloc(). Rather than re-adding a NULL test
in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test
socket. The latter also allows to get rid of the bpf_sk_storage_free() special
case.
Daniel Borkmann [Mon, 27 Sep 2021 12:39:20 +0000 (14:39 +0200)]
bpf, cgroup: Assign cgroup in cgroup_sk_alloc when called from interrupt
If cgroup_sk_alloc() is called from interrupt context, then just assign the
root cgroup to skcd->cgroup. Prior to commit 8520e224f547 ("bpf, cgroups:
Fix cgroup v2 fallback on v1/v2 mixed mode") we would just return, and later
on in sock_cgroup_ptr(), we were NULL-testing the cgroup in fast-path, and
iff indeed NULL returning the root cgroup (v ?: &cgrp_dfl_root.cgrp). Rather
than re-adding the NULL-test to the fast-path we can just assign it once from
cgroup_sk_alloc() given v1/v2 handling has been simplified. The migration from
NULL test with returning &cgrp_dfl_root.cgrp to assigning &cgrp_dfl_root.cgrp
directly does /not/ change behavior for callers of sock_cgroup_ptr().
syzkaller was able to trigger a splat in the legacy netrom code base, where
the RX handler in nr_rx_frame() calls nr_make_new() which calls sk_alloc()
and therefore cgroup_sk_alloc() with in_interrupt() condition. Thus the NULL
skcd->cgroup, where it trips over on cgroup_sk_free() side given it expects
a non-NULL object. There are a few other candidates aside from netrom which
have similar pattern where in their accept-like implementation, they just call
to sk_alloc() and thus cgroup_sk_alloc() instead of sk_clone_lock() with the
corresponding cgroup_sk_clone() which then inherits the cgroup from the parent
socket. None of them are related to core protocols where BPF cgroup programs
are running from. However, in future, they should follow to implement a similar
inheritance mechanism.
Additionally, with a !CONFIG_CGROUP_NET_PRIO and !CONFIG_CGROUP_NET_CLASSID
configuration, the same issue was exposed also prior to 8520e224f547 due to
commit e876ecc67db8 ("cgroup: memcg: net: do not associate sock with unrelated
cgroup") which added the early in_interrupt() return back then.
libbpf: Fix segfault in static linker for objects without BTF
When a BPF object is compiled without BTF info (without -g),
trying to link such objects using bpftool causes a SIGSEGV due to
btf__get_nr_types accessing obj->btf which is NULL. Fix this by
checking for the NULL pointer, and return error.
Reproducer:
$ cat a.bpf.c
extern int foo(void);
int bar(void) { return foo(); }
$ cat b.bpf.c
int foo(void) { return 0; }
$ clang -O2 -target bpf -c a.bpf.c
$ clang -O2 -target bpf -c b.bpf.c
$ bpftool gen obj out a.bpf.o b.bpf.o
Segmentation fault (core dumped)
After fix:
$ bpftool gen obj out a.bpf.o b.bpf.o
libbpf: failed to find BTF info for object 'a.bpf.o'
Error: failed to link 'a.bpf.o': Unknown error -22 (-22)
bpf: Exempt CAP_BPF from checks against bpf_jit_limit
When introducing CAP_BPF, bpf_jit_charge_modmem() was not changed to treat
programs with CAP_BPF as privileged for the purpose of JIT memory allocation.
This means that a program without CAP_BPF can block a program with CAP_BPF
from loading a program.
Fix this by checking bpf_capable() in bpf_jit_charge_modmem().
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A bit late... I got sidetracked by back-from-vacation routines and
conferences. But most of these patches are already a few weeks old and
things look more calm on the mailing list than what this pull request
would suggest.
x86:
- missing TLB flush
- nested virtualization fixes for SMM (secure boot on nested
hypervisor) and other nested SVM fixes
- syscall fuzzing fixes
- live migration fix for AMD SEV
- mirror VMs now work for SEV-ES too
- fixes for reset
- possible out-of-bounds access in IOAPIC emulation
- fix enlightened VMCS on Windows 2022
ARM:
- Add missing FORCE target when building the EL2 object
- Fix a PMU probe regression on some platforms
Generic:
- KCSAN fixes
selftests:
- random fixes, mostly for clang compilation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
selftests: KVM: Explicitly use movq to read xmm registers
selftests: KVM: Call ucall_init when setting up in rseq_test
KVM: Remove tlbs_dirty
KVM: X86: Synchronize the shadow pagetable before link it
KVM: X86: Fix missed remote tlb flush in rmap_write_protect()
KVM: x86: nSVM: don't copy virt_ext from vmcb12
KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
KVM: x86: nSVM: restore int_vector in svm_clear_vintr
kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit
KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry
KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state
KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm
KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode
KVM: x86: reset pdptrs_from_userspace when exiting smm
KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit
KVM: nVMX: Filter out all unsupported controls when eVMCS was activated
KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs
KVM: Clean up benign vcpu->cpu data races when kicking vCPUs
...
The recent block layer refactoring broke the way how the pmem driver
abused device_add_disk. Fix this by properly passing the attribute groups
to device_add_disk.
Jia He [Wed, 22 Sep 2021 15:29:19 +0000 (23:29 +0800)]
ACPI: NFIT: Use fallback node id when numa info in NFIT table is incorrect
When ACPI NFIT table is failing to populate correct numa information
on arm64, dax_kmem will get NUMA_NO_NODE from the NFIT driver.
Without this patch, pmem can't be probed as RAM devices on arm64 guest:
$ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 128M
kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
kmem: probe of dax0.0 failed with error -22
vboxfs: fix broken legacy mount signature checking
Commit 9d682ea6bcc7 ("vboxsf: Fix the check for the old binary
mount-arguments struct") was meant to fix a build error due to sign
mismatch in 'char' and the use of character constants, but it just moved
the error elsewhere, in that on some architectures characters and signed
and on others they are unsigned, and that's just how the C standard
works.
The proper fix is a simple "don't do that then". The code was just
being silly and odd, and it should never have cared about signed vs
unsigned characters in the first place, since what it is testing is not
four "characters", but four bytes.
And the way to compare four bytes is by using "memcmp()".
Which compilers will know to just turn into a single 32-bit compare with
a constant, as long as you don't have crazy debug options enabled.
Pointers should be printed with %p or %px rather than cast to 'unsigned
long long' and printed with %llx. Change %llx to %p to print the secured
pointer.
io-wq: exclusively gate signal based exit on get_signal() return
io-wq threads block all signals, except SIGKILL and SIGSTOP. We should not
need any extra checking of signal_pending or fatal_signal_pending, rely
exclusively on whether or not get_signal() tells us to exit.
The original debugging of this issue led to the false positive that we
were exiting on non-fatal signals, but that is not the case. The issue
was around races with nr_workers accounting.
Fixes: 87c169665578 ("io-wq: ensure we exit if thread group is exiting") Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Eric W. Biederman <[email protected]> Reported-by: Linus Torvalds <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Keith Busch [Mon, 27 Sep 2021 15:43:06 +0000 (08:43 -0700)]
nvme: add command id quirk for apple controllers
Some apple controllers use the command id as an index to implementation
specific data structures and will fail if the value is out of bounds.
The nvme driver's recently introduced command sequence number breaks
this controller.
Provide a quirk so these spec incompliant controllers can function as
before. The driver will not have the ability to detect bad completions
when this quirk is used, but we weren't previously checking this anyway.
The quirk bit was selected so that it can readily apply to stable.
Jacob Keller [Wed, 8 Sep 2021 17:52:37 +0000 (10:52 -0700)]
e100: fix buffer overrun in e100_get_regs
The e100_get_regs function is used to implement a simple register dump
for the e100 device. The data is broken into a couple of MAC control
registers, and then a series of PHY registers, followed by a memory dump
buffer.
The total length of the register dump is defined as (1 + E100_PHY_REGS)
* sizeof(u32) + sizeof(nic->mem->dump_buf).
The logic for filling in the PHY registers uses a convoluted inverted
count for loop which counts from E100_PHY_REGS (0x1C) down to 0, and
assigns the slots 1 + E100_PHY_REGS - i. The first loop iteration will
fill in [1] and the final loop iteration will fill in [1 + 0x1C]. This
is actually one more than the supposed number of PHY registers.
The memory dump buffer is then filled into the space at
[2 + E100_PHY_REGS] which will cause that memcpy to assign 4 bytes past
the total size.
The end result is that we overrun the total buffer size allocated by the
kernel, which could lead to a panic or other issues due to memory
corruption.
It is difficult to determine the actual total number of registers
here. The only 8255x datasheet I could find indicates there are 28 total
MDI registers. However, we're reading 29 here, and reading them in
reverse!
In addition, the ethtool e100 register dump interface appears to read
the first PHY register to determine if the device is in MDI or MDIx
mode. This doesn't appear to be documented anywhere within the 8255x
datasheet. I can only assume it must be in register 28 (the extra
register we're reading here).
Lets not change any of the intended meaning of what we copy here. Just
extend the space by 4 bytes to account for the extra register and
continue copying the data out in the same order.
Change the E100_PHY_REGS value to be the correct total (29) so that the
total register dump size is calculated properly. Fix the offset for
where we copy the dump buffer so that it doesn't overrun the total size.
Re-write the for loop to use counting up instead of the convoluted
down-counting. Correct the mdio_read offset to use the 0-based register
offsets, but maintain the bizarre reverse ordering so that we have the
ABI expected by applications like ethtool. This requires and additional
subtraction of 1. It seems a bit odd but it makes the flow of assignment
into the register buffer easier to follow.
Jacob Keller [Wed, 8 Sep 2021 17:52:36 +0000 (10:52 -0700)]
e100: fix length calculation in e100_get_regs_len
commit abf9b902059f ("e100: cleanup unneeded math") tried to simplify
e100_get_regs_len and remove a double 'divide and then multiply'
calculation that the e100_reg_regs_len function did.
This change broke the size calculation entirely as it failed to account
for the fact that the numbered registers are actually 4 bytes wide and
not 1 byte. This resulted in a significant under allocation of the
register buffer used by e100_get_regs.
Fix this by properly multiplying the register count by u32 first before
adding the size of the dump buffer.
Registration of the ipoctal tty devices is unlikely to fail, but if it
ever does, make sure not to deregister a never registered tty device
(and dereference a NULL pointer) when the driver is later unbound.
Johan Hovold [Fri, 17 Sep 2021 11:46:17 +0000 (13:46 +0200)]
ipack: ipoctal: fix stack information leak
The tty driver name is used also after registering the driver and must
specifically not be allocated on the stack to avoid leaking information
to user space (or triggering an oops).
Drivers should not try to encode topology information in the tty device
name but this one snuck in through staging without anyone noticing and
another driver has since copied this malpractice.
Fixing the ABI is a separate issue, but this at least plugs the security
hole.