Kan Liang [Tue, 2 Feb 2021 20:09:09 +0000 (12:09 -0800)]
perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT
The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. Users can apply either the
PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample
type to retrieve the sample weight, but they cannot apply both sample
types simultaneously.
The new sample type shares the same space as the PERF_SAMPLE_WEIGHT
sample type. The lower 32 bits are exactly the same for both sample
type. The higher 32 bits may be different for different architecture.
Add arch specific arch_evsel__set_sample_weight() to set the new sample
type for X86. Only store the lower 32 bits for the sample->weight if the
new sample type is applied. In practice, no memory access could last
than 4G cycles. No data will be lost.
If the kernel doesn't support the new sample type. Fall back to the
PERF_SAMPLE_WEIGHT sample type.
There is no impact for other architectures.
Committer notes:
Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core
but not upstream yet.
Kan Liang [Tue, 2 Feb 2021 20:09:07 +0000 (12:09 -0800)]
perf tools: Support data block and addr block
Two new data source fields, to indicate the block reasons of a load
instruction, are introduced on the Intel Sapphire Rapids server. The
fields can be used by the memory profiling.
Add a new sort function, SORT_MEM_BLOCKED, for the two fields.
For the previous platforms or the block reason is unknown, print "N/A"
for the block reason.
Add blocked as a default mem sort key for perf report and perf mem
report.
Committer testing:
So in machines without this capability we get a "N/A" filling the new "Blocked"
column:
$ perf mem record ls
arch certs CREDITS Documentation include ipc Kconfig lib MAINTAINERS mm samples security usr block
COPYING crypto drivers fs init Kbuild kernel LICENSES Makefile net README scripts sound tools
virt
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
$
$ perf mem report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 6 of event 'cpu/mem-loads,ldlat=30/Pu'
# Total weight : 1381
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked Blocked
# ........ ....... ............ .................... ....................... ............. ...................... ............ ..... ............ ...... .......
#
32.87% 1 454 Local RAM or RAM hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91cef3078 libc-2.31.so Hit L1 or L2 hit No N/A
25.56% 1 353 LFB or LFB hit [.] strcmp ld-2.31.so [.] 0x00005586973855ca ls None L1 or L2 hit No N/A
22.59% 1 312 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0e3b18 ld.so.cache None L1 or L2 hit No N/A
8.47% 1 117 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceee570 libc-2.31.so None L1 or L2 hit No N/A
6.88% 1 95 LFB or LFB hit [.] _dl_relocate_object ld-2.31.so [.] 0x00007fe91ceed490 libc-2.31.so None L1 or L2 hit No N/A
3.62% 1 50 LFB or LFB hit [.] _dl_cache_libcmp ld-2.31.so [.] 0x00007fe91d0ebe60 ld.so.cache None L1 or L2 hit No N/A
Kan Liang [Tue, 2 Feb 2021 20:09:06 +0000 (12:09 -0800)]
perf tools: Support the auxiliary event
On the Intel Sapphire Rapids server, an auxiliary event has to be
enabled simultaneously with the load latency event to retrieve complete
Memory Info.
Add X86 specific perf_mem_events__name() to handle the auxiliary event.
- Users are only interested in the samples of the mem-loads event.
Sample read the auxiliary event.
- The auxiliary event must be in front of the load latency event in a
group. Assume the second event to sample if the auxiliary event is the
leader.
- Add a weak is_mem_loads_aux_event() to check the auxiliary event for
X86. For other ARCHs, it always return false.
Parse the unique event name, mem-loads-aux, for the auxiliary event.
Committer notes:
According to 61b985e3e775a3a7 ("perf/x86/intel: Add perf core PMU
support for Sapphire Rapids"), ENODATA is only returned by
sys_perf_event_open() when used with these auxiliary events, with this
in evsel__open_strerror():
case ENODATA:
return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. "
"Please add an auxiliary event in front of the load latency event.");
This is Ok at this point in time, but fragile long term, I pointed this
out in the e-mail thread, requesting a follow up patch to check if
ENODATA is really for this specific case.
Fixed up sizeof(MEM_LOADS_AUX_NAME) bug pointed out by Namhyung.
Kan Liang [Tue, 2 Feb 2021 20:09:05 +0000 (12:09 -0800)]
tools headers uapi: Update tools's copy of linux/perf_event.h
To get the changes in these csets:
2a6c6b7d7ad346f0 ("perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT") 61b985e3e775a3a7 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids")
This cures the following warning during perf's build:
Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
Committer notes:
Picked by hand as I had already merged the MMAP buildid patch that also touches
perf_event.h and is also only in {acme,tip}/perf/core, not yet upstream.
Athira Rajeev [Wed, 3 Feb 2021 06:55:37 +0000 (01:55 -0500)]
perf powerpc: Support exposing Performance Monitor Counter SPRs as part of extended regs
To enable presenting of Performance Monitor Counter Registers (PMC1 to
PMC6) as part of extended regsiters, this patch adds these to
sample_reg_mask in the tool side (to use with -I? option).
Simplified the PERF_REG_PMU_MASK_300/31 definition. Excluded the
unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for
CPU_FTR_ARCH_300.
Jianlin Lv [Wed, 3 Feb 2021 14:57:02 +0000 (22:57 +0800)]
perf probe: Add protection to avoid endless loop
if dwarf_offdie() returns NULL, the continue statement forces the next
iteration of the loop without updating the 'off' variable. It will cause
an endless loop in the process of traversing the compile unit. So add
exception protection for looping CUs.
symbol__inc_addr_samples() calls annotated_source__alloc_histograms ()
to allocate memory for sample histograms using calloc(). Here calloc()
fails since the size of symbol is huge. The size of a symbol is
calculated as difference between its start and end address.
Example histogram allocation that fails is:
sym->name is _end
sym->start is 0xc0000000027a0000
sym->end is 0xc008000003890000
symbol__size(sym) is 0x80000010f0000
In the above case, the difference between sym->start
(0xc0000000027a0000) and sym->end (0xc008000003890000) is huge.
This is same problem as in s390 and arm64 which are fixed in commits:
b9c0a64901d5 ("perf annotate: Fix s390 gap between kernel end and module start") 78886f3ed37e ("perf symbols: Fix arm64 gap between kernel start and module end")
When this symbol was read first, its start and end address was set to
address which matches with data from /proc/kallsyms.
Perf calls function symbols__fixup_end() which sets the end of symbol to
0xc008000003890000, which is the next address and this is the start
address of first module (icmp_checkentry in above) which will make the
huge symbol size of 0x80000010f0000.
On powerpc, kernel text segment is located at 0xc000000000000000 whereas
the modules are located at very high memory addresses,
0xc00800000xxxxxxx. Since the gap between end of kernel text segment and
beginning of first module's address is high, histogram allocation using
calloc fails.
Fix this by detecting the kernel's last symbol and limiting the range of
last kernel symbol to pagesize.
This patch fixes "perf inject --jit" to properly operate on
namespaced/containerized processes:
* jitdump files are generated by the process, thus they should be
looked up in its mount NS.
* DSOs of injected MMAP events will later be looked up in the process
mount NS, so write them into its NS.
* PIDs & TIDs from jitdump events need to be translated to the PID as
seen by "perf record" before written into MMAP events.
For a process in a different PID NS, the TID & PID given in the jitdump
event are actually ignored; I use the TID & PID of the thread which
mmap()ed the jitdump file. This is simplified and won't do for forks of
the initial process, if they continue using the same jitdump file.
Future patches might improve it.
This was tested by recording a NodeJS process running with
"--perf-prof", inside a Docker container, and by recording another
NodeJS process running in the same namespaces as perf itself, to make
sure it's not broken for non-containerized processes.
Namhyung Kim [Tue, 2 Feb 2021 09:01:18 +0000 (18:01 +0900)]
perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events
Like in __event__synthesize_thread(), I think it's better to use
scandir() instead of the readdir() loop. In case some malicious task
continues to create new threads, the readdir() loop will run over and
over to collect tids. The scandir() also has the problem but the window
is much smaller since it doesn't do much work during the iteration.
Also add filter_task() function as we only care the tasks.
Namhyung Kim [Tue, 2 Feb 2021 09:01:17 +0000 (18:01 +0900)]
perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads
To synthesize information to resolve sample IPs, it needs to scan task
and mmap info from the /proc filesystem. For each process, it opens
(and reads) status and maps file respectively. But as kernel threads
don't have memory maps so we can skip the maps file.
To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
It's about the peak virtual memory usage so only user-level tasks have
that. Note that it's possible to miss the line due to partial reads.
So we should double-check if it's a really kernel thread when there's no
VmPeak line.
Thus check "Threads:" line (which follows the VmPeak line whether or not
it exists) to be sure it's read enough data - just in case of deeply
nested pid namespaces or large number of supplementary groups are
involved.
Namhyung Kim [Tue, 2 Feb 2021 09:01:16 +0000 (18:01 +0900)]
perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis
To save memory usage, it needs to reduce the number of entries in the
proc filesystem. It's using /proc/<PID>/task directory to traverse
threads in the process and then kernel creates /proc/<PID>/task/<TID>
entries.
After that it checks the thread info using the /proc/<TID>/status file
rather than /proc/<PID>/task/<TID>/status. As far as I can see, they
are the same and contain all the info we need.
Using the latter eliminates the unnecessary /proc/<TID> entry. This can
be useful especially a large number of threads are used in the system.
In my experiment around 1KB of memory on average was saved for each
thread (which is not a thread group leader).
To do this, pass both pid and tid to perf_event_prepare_comm() if it
knows them. In case it doesn't know, passing 0 as pid will do the old
way.
Kan Liang [Thu, 21 Jan 2021 13:37:52 +0000 (05:37 -0800)]
perf stat: Add Topdown metrics events as default events
The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. From the Ice Lake and later platforms, the
Topdown information can be retrieved from the dedicated "metrics"
register, which isn't impacted by other events. Also, the Topdown
metrics support both per thread/process and per core measuring. Adding
Topdown metrics events as default events can enrich the default
measuring information, and would not cost any extra multiplexing.
Introduce arch_evlist__add_default_attrs() to allow architecture
specific default events. Add the Topdown metrics events in the X86
specific arch_evlist__add_default_attrs(). Other architectures can add
their own default events later separately.
Event duration_time in a metric expression requires special handling.
Improve test coverage by including a metric whose expression includes
duration_time. The actual metric is a copied from the L1D_Cache_Fill_BW
metric on my broadwell machine.
Linus Torvalds [Tue, 2 Feb 2021 18:26:09 +0000 (10:26 -0800)]
Merge tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.11-rc7, including fixes from bpf and mac80211
trees.
Current release - regressions:
- ip_tunnel: fix mtu calculation
- mlx5: fix function calculation for page trees
Previous releases - regressions:
- vsock: fix the race conditions in multi-transport support
- neighbour: prevent a dead entry from updating gc_list
- dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Previous releases - always broken:
- bpf, cgroup: two copy_{from,to}_user() warn_on_once splats for BPF
cgroup getsockopt infra when user space is trying to race against
optlen, from Loris Reiff.
- mac80211: fix station rate table updates on assoc
- r8169: work around RTL8125 UDP HW bug
- igc: report speed and duplex as unknown when device is runtime
suspended
- rxrpc: fix deadlock around release of dst cached on udp tunnel"
* tag 'net-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary
net: ipa: fix two format specifier errors
net: ipa: use the right accessor in ipa_endpoint_status_skip()
net: ipa: be explicit about endianness
net: ipa: add a missing __iomem attribute
net: ipa: pass correct dma_handle to dma_free_coherent()
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
net: mvpp2: TCAM entry enable should be written after SRAM data
net: lapb: Copy the skb before sending a packet
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
docs: networking: swap words in icmp_errors_use_inbound_ifaddr doc
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
net: ip_tunnel: fix mtu calculation
vsock: fix the race conditions in multi-transport support
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
ibmvnic: device remove has higher precedence over reset
...
Andreas Oetken [Tue, 2 Feb 2021 09:03:04 +0000 (10:03 +0100)]
net: hsr: align sup_multicast_addr in struct hsr_priv to u16 boundary
sup_multicast_addr is passed to ether_addr_equal for address comparison
which casts the address inputs to u16 leading to an unaligned access.
Aligning the sup_multicast_addr to u16 boundary fixes the issue.
Jakub Kicinski [Tue, 2 Feb 2021 16:51:25 +0000 (08:51 -0800)]
Merge tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-02-01
Please note the first patch in this series
("Fix function calculation for page trees") is fixing a regression
due to previous fix in net which you didn't include in your previous
rc pr. So I hope this series will make it into your next rc pr,
so mlx5 won't be broken in the next rc.
* tag 'mlx5-fixes-2021-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Release skb in case of failure in tc update skb
net/mlx5e: Update max_opened_tc also when channels are closed
net/mlx5: Fix leak upon failure of rule creation
net/mlx5: Fix function calculation for page trees
====================
Jakub Kicinski [Tue, 2 Feb 2021 16:48:17 +0000 (08:48 -0800)]
Merge branch 'net-ipa-a-few-bug-fixes'
Alex Elder says:
====================
net: ipa: a few bug fixes
This series fixes four minor bugs. The first two are things that
sparse points out. All four are very simple and each patch should
explain itself pretty well.
====================
Alex Elder [Mon, 1 Feb 2021 23:26:08 +0000 (17:26 -0600)]
net: ipa: use the right accessor in ipa_endpoint_status_skip()
When extracting the destination endpoint ID from the status in
ipa_endpoint_status_skip(), u32_get_bits() is used. This happens to
work, but it's wrong: the structure field is only 8 bits wide
instead of 32.
Fix this by using u8_get_bits() to get the destination endpoint ID.
Alex Elder [Mon, 1 Feb 2021 23:26:07 +0000 (17:26 -0600)]
net: ipa: be explicit about endianness
Sparse warns that the assignment of the metadata mask for a QMAP
endpoint in ipa_endpoint_init_hdr_metadata_mask() is a bad
assignment. We know we want the mask value to be big endian, even
though the value we write is in host byte order. Use a __force
tag to indicate we really mean it.
Heiner Kallweit [Mon, 1 Feb 2021 20:50:56 +0000 (21:50 +0100)]
r8169: fix WoL on shutdown if CONFIG_DEBUG_SHIRQ is set
So far phy_disconnect() is called before free_irq(). If CONFIG_DEBUG_SHIRQ
is set and interrupt is shared, then free_irq() creates an "artificial"
interrupt by calling the interrupt handler. The "link change" flag is set
in the interrupt status register, causing phylib to eventually call
phy_suspend(). Because the net_device is detached from the PHY already,
the PHY driver can't recognize that WoL is configured and powers down the
PHY.
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS
control message is passed with user-controlled
0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition.
The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400.
So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10
is the closest limit, with 0x10 leftover.
Same condition is currently done in rds_cmsg_rdma_args().
Xie He [Mon, 1 Feb 2021 05:57:06 +0000 (21:57 -0800)]
net: lapb: Copy the skb before sending a packet
When sending a packet, we will prepend it with an LAPB header.
This modifies the shared parts of a cloned skb, so we should copy the
skb rather than just clone it, before we prepend the header.
In "Documentation/networking/driver.rst" (the 2nd point), it states
that drivers shouldn't modify the shared parts of a cloned skb when
transmitting.
The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called
when an skb is being sent, clones the skb and sents the clone to
AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte
pseudo-header before handing over the skb to us, if we don't copy the
skb before prepending the LAPB header, the first byte of the packets
received on AF_PACKET sockets can be corrupted.
Jakub Kicinski [Tue, 2 Feb 2021 16:37:00 +0000 (08:37 -0800)]
Merge tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Two fixes:
- station rate tables were not updated correctly
after association, leading to bad configuration
- rtl8723bs (staging) was initializing data incorrectly
after the previous fix and needed to move the init
later
* tag 'mac80211-for-net-2021-02-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip
mac80211: fix station rate table updates on assoc
====================
net/mlx5e: Update max_opened_tc also when channels are closed
max_opened_tc is used for stats, so that potentially non-zero stats
won't disappear when num_tc decreases. However, mlx5e_setup_tc_mqprio
fails to update it in the flow where channels are closed.
This commit fixes it. The new value of priv->channels.params.num_tc is
always checked on exit. In case of errors it will just be the old value,
and in case of success it will be the updated value.
Daniel Jurgens [Mon, 1 Feb 2021 16:11:10 +0000 (18:11 +0200)]
net/mlx5: Fix function calculation for page trees
The function calculation always results in a value of 0. This works
generally, but when the release all pages feature is enabled it will
result in crashes.
Fixes: 0aa128475d33 ("net/mlx5: Maintain separate page trees for ECPF and PF functions") Signed-off-by: Daniel Jurgens <[email protected]> Reported-by: Colin Ian King <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Jakub Kicinski [Tue, 2 Feb 2021 04:23:44 +0000 (20:23 -0800)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-01
This series contains updates to igc and i40e drivers.
Kai-Heng Feng fixes igc to report unknown speed and duplex during suspend
as an attempted read will cause errors.
Kevin Lo sets the default value to -IGC_ERR_NVM instead of success for
writing shadow RAM as this could miss a timeout. Also propagates the return
value for Flow Control configuration to properly pass on errors for igc.
Aleksandr reverts commit 2ad1274fa35a ("i40e: don't report link up for a VF
who hasn't enabled queues") as this can cause link flapping.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
igc: check return value of ret_val in igc_config_fc_after_link_up
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
igc: Report speed and duplex as unknown when device is runtime suspended
====================
Dongseok Yi [Fri, 29 Jan 2021 23:13:27 +0000 (08:13 +0900)]
udp: ipv4: manipulate network header of NATed UDP GRO fraglist
UDP/IP header of UDP GROed frag_skbs are not updated even after NAT
forwarding. Only the header of head_skb from ip_finish_output_gso ->
skb_gso_segment is updated but following frag_skbs are not updated.
A call path skb_mac_gso_segment -> inet_gso_segment ->
udp4_ufo_fragment -> __udp_gso_segment -> __udp_gso_segment_list
does not try to update UDP/IP header of the segment list but copy
only the MAC header.
Update port, addr and check of each skb of the segment list in
__udp_gso_segment_list. It covers both SNAT and DNAT.
Vadim Fedorenko [Fri, 29 Jan 2021 22:27:47 +0000 (01:27 +0300)]
net: ip_tunnel: fix mtu calculation
dev->hard_header_len for tunnel interface is set only when header_ops
are set too and already contains full overhead of any tunnel encapsulation.
That's why there is not need to use this overhead twice in mtu calc.
Alexander Popov [Mon, 1 Feb 2021 08:47:19 +0000 (11:47 +0300)]
vsock: fix the race conditions in multi-transport support
There are multiple similar bugs implicitly introduced by the
commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and
commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading").
The bug pattern:
[1] vsock_sock.transport pointer is copied to a local variable,
[2] lock_sock() is called,
[3] the local variable is used.
VSOCK multi-transport support introduced the race condition:
vsock_sock.transport value may change between [1] and [2].
Let's copy vsock_sock.transport pointer to local variables after
the lock_sock() call.
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
Commit e5f0e8f8e456 ("net: sched: introduce and use qdisc tree flush/purge helpers")
introduced qdisc tree flush/purge helpers, but erroneously used flush helper
instead of purge helper in qdisc_replace function.
This issue was found in our CI, that tests various qdisc setups by configuring
qdisc and sending data through it. Call of invalid helper sporadically leads
to corruption of vt_tree/cf_tree of hfsc_class that causes kernel oops:
Lijun Pan [Fri, 29 Jan 2021 04:34:01 +0000 (22:34 -0600)]
ibmvnic: device remove has higher precedence over reset
Returning -EBUSY in ibmvnic_remove() does not actually hold the
removal procedure since driver core doesn't care for the return
value (see __device_release_driver() in drivers/base/dd.c
calling dev->bus->remove()) though vio_bus_remove
(in arch/powerpc/platforms/pseries/vio.c) records the
return value and passes it on. [1]
During the device removal precedure, checking for resetting
bit is dropped so that we can continue executing all the
cleanup calls in the rest of the remove function. Otherwise,
it can cause latent memory leaks and kernel crashes.
DENG Qingfang [Sat, 30 Jan 2021 13:43:34 +0000 (21:43 +0800)]
net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add
Having multiple destination ports for a unicast address does not make
sense.
Make port_db_load_purge override existent unicast portvec instead of
adding a new port bit.
VF queues were not brought up when PF was brought up after being
downed if the VF driver disabled VFs queues during PF down.
This could happen in some older or external VF driver implementations.
The problem was that PF driver used vf->queues_enabled as a condition
to decide what link-state it would send out which caused the issue.
Remove the check for vf->queues_enabled in the VF link notify.
Now VF will always be notified of the current link status.
Also remove the queues_enabled member from i40e_vf structure as it is
not used anymore. Otherwise VNF implementation was broken and caused
a link flap.
The original commit was a workaround to avoid breaking existing VFs though
it's really a fault of the VF code not the PF. The commit should be safe to
revert as all of the VFs we know of have been fixed. Also, since we now
know there is a related bug in the workaround, removing it is preferred.
Fixes: 2ad1274fa35a ("i40e: don't report link up for a VF who hasn't enabled") Signed-off-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Arkadiusz Kubalewski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Linus Torvalds [Mon, 1 Feb 2021 19:15:57 +0000 (11:15 -0800)]
Merge tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"The rockship rkisp1 driver will be promoted from staging in 5.11.
While not too late, do a few uAPI changes which are needed to better
support its functionalities"
* tag 'media/v5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: rockchip: rkisp1: extend uapi array sizes
media: rockchip: rkisp1: carry ip version information
media: rockchip: rkisp1: reduce number of histogram grid elements in uapi
media: rkisp1: stats: mask the hist_bins values
media: rkisp1: stats: remove a wrong cast to u8
media: rkisp1: uapi: change hist_bins array type from __u16 to __u32
Hans de Goede [Mon, 1 Feb 2021 15:29:56 +0000 (16:29 +0100)]
staging: rtl8723bs: Move wiphy setup to after reading the regulatory settings from the chip
Commit 81f153faacd0 ("staging: rtl8723bs: fix wireless regulatory API
misuse") moved the wiphy_apply_custom_regulatory() call to earlier in the
driver's init-sequence, so that it gets called before wiphy_register().
But at this point in time the eFuses which code the regulatory-settings
for the chip have not been read by the driver yet, causing
_rtw_reg_apply_flags() to set the IEEE80211_CHAN_DISABLED flag on *all*
channels.
On the device where I initially tested the fix, a Jumper EZpad 7 tablet,
this does not cause any problems because shortly after init the
rtw_reg_notifier() gets called fixing things up. I guess this happens
into response to receiving a (broadcast) packet with regulatory info
from the access-point ?
But on another device with a RTL8723BS wifi chip, an Acer Switch 10E
(SW3-016), the rtw_reg_notifier() never gets called. I assume that some
fuse has been set on this device to ignore regulatory info received from
access-points.
This means that on the Acer the driver is stuck in a state with all
channels disabled, leading to non working Wifi.
We cannot move the wiphy_apply_custom_regulatory() call back, because
that call must be made before the wiphy_register() call.
Instead move the entire rtw_wdev_alloc() call to after the Efuses have
been read, fixing all channels being disabled in the initial channel-map.
Kevin Lo [Sun, 20 Dec 2020 14:18:19 +0000 (22:18 +0800)]
igc: set the default return value to -IGC_ERR_NVM in igc_write_nvm_srwr
This patch sets the default return value to -IGC_ERR_NVM in
igc_write_nvm_srwr. Without this change it wouldn't lead to a shadow RAM
write EEWR timeout.
Kai-Heng Feng [Wed, 2 Dec 2020 07:50:17 +0000 (15:50 +0800)]
igc: Report speed and duplex as unknown when device is runtime suspended
Similar to commit 165ae7a8feb5 ("igb: Report speed and duplex as unknown
when device is runtime suspended"), if we try to read speed and duplex
sysfs while the device is runtime suspended, igc will complain and
stops working:
The more generic approach will be wrap get_link_ksettings() with begin()
and complete() callbacks, and calls runtime resume and runtime suspend
routine respectively. However, igc is like igb, runtime resume routine
uses rtnl_lock() which upper ethtool layer also uses.
So to prevent a deadlock on rtnl, take a different approach, use
pm_runtime_suspended() to avoid reading register while device is runtime
suspended.
Felix Fietkau [Mon, 1 Feb 2021 08:33:24 +0000 (09:33 +0100)]
mac80211: fix station rate table updates on assoc
If the driver uses .sta_add, station entries are only uploaded after the sta
is in assoc state. Fix early station rate table updates by deferring them
until the sta has been uploaded.
Linus Torvalds [Sun, 31 Jan 2021 19:48:12 +0000 (11:48 -0800)]
Merge tag 'x86_entry_for_v5.11_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"A single fix for objtool to generate proper unwind info for newer
toolchains which do not generate section symbols anymore. And a
cleanup ontop.
This was originally going to go during the next merge window but
people can already trigger a build error with binutils-2.36 which
doesn't emit section symbols - something which objtool relies on - so
let's expedite it"
* tag 'x86_entry_for_v5.11_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Remove put_ret_addr_in_rdi THUNK macro argument
x86/entry: Emit a symbol for register restoring thunk
Linus Torvalds [Sun, 31 Jan 2021 19:40:57 +0000 (11:40 -0800)]
Merge tag 'timers-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A fix for handling advertised, but non-existent 146818 RTCs correctly.
With the recent UIP handling changes the time readout of non-existent
RTCs hangs forever as the read returns always 0xFF which means the UIP
bit is set.
Sanity check the RTC before registering by checking the RTC_VALID
register for correctness"
* tag 'timers-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtc: mc146818: Detect and handle broken RTCs
Linus Torvalds [Sun, 31 Jan 2021 19:39:32 +0000 (11:39 -0800)]
Merge tag 'core-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull single stepping fix from Thomas Gleixner:
"A single fix for the single step reporting regression caused by
getting the condition wrong when moving SYSCALL_EMU away from TIF
flags"
[ There's apparently another problem too, fix pending ]
* tag 'core-urgent-2021-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Unbreak single step reporting behaviour
Linus Torvalds [Sun, 31 Jan 2021 19:37:43 +0000 (11:37 -0800)]
Merge tag 'powerpc-5.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"One fix for a bug in our soft interrupt masking, which could lead to
interrupt replaying recursing, causing spurious interrupts.
Thanks to Nicholas Piggin"
* tag 'powerpc-5.11-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: prevent recursive replay_soft_interrupts causing superfluous interrupt
Linus Torvalds [Sun, 31 Jan 2021 19:19:12 +0000 (11:19 -0800)]
Merge tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
- SUNRPC: Handle 0 length opaque XDR object data properly
- Fix a layout segment leak in pnfs_layout_process()
- pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
- pNFS/NFSv4: Improve rejection of out-of-order layouts
- pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
* tag 'nfs-for-5.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Handle 0 length opaque XDR object data properly
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
pNFS/NFSv4: Improve rejection of out-of-order layouts
pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn
pNFS/NFSv4: Try to return invalid layout in pnfs_layout_process()
pNFS/NFSv4: Fix a layout segment leak in pnfs_layout_process()
Linus Walleij [Sat, 2 Jan 2021 23:15:10 +0000 (00:15 +0100)]
leds: rt8515: Add Richtek RT8515 LED driver
This adds a driver for the Richtek RT8515 dual channel
torch/flash white LED driver.
This LED driver is found in some mobile phones from
Samsung such as the GT-S7710 and GT-I8190.
A V4L interface is added.
We do not have a proper datasheet for the RT8515 but
it turns out that RT9387A has a public datasheet and
is essentially the same chip. We designed the driver
in accordance with this datasheet. The day someone
needs to drive a RT9387A this driver can probably
easily be augmented to handle that chip too.
Sakari Ailus, Pavel Machek and Andy Shevchenko helped
significantly in getting this driver right.
Andrea Righi [Wed, 25 Nov 2020 15:18:22 +0000 (16:18 +0100)]
leds: trigger: fix potential deadlock with libata
We have the following potential deadlock condition:
========================================================
WARNING: possible irq lock inversion dependency detected
5.10.0-rc2+ #25 Not tainted
--------------------------------------------------------
swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
but this lock took another, HARDIRQ-READ-unsafe lock in the past:
(&trig->leddev_list_lock){.+.?}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
This lockdep splat is reported after:
commit e918188611f0 ("locking: More accurate annotations for read_lock()")
To clarify:
- read-locks are recursive only in interrupt context (when
in_interrupt() returns true)
- after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call
write_lock(&trig->leddev_list_lock) that would be blocked by CPU0
that holds trig->leddev_list_lock in read-mode
- when CPU1 (ata_ac_complete()) tries to read-lock
trig->leddev_list_lock, it would be blocked by the write-lock waiter
on CPU2 (because we are not in interrupt context, so the read-lock is
not recursive)
- at this point if an interrupt happens on CPU0 and
ata_bmdma_interrupt() is executed it will try to acquire host->lock,
that is held by CPU1, that is currently blocked by CPU2, so:
* CPU0 blocked by CPU1
* CPU1 blocked by CPU2
* CPU2 blocked by CPU0
*** DEADLOCK ***
The deadlock scenario is better represented by the following schema
(thanks to Boqun Feng <[email protected]> for the schema and the
detailed explanation of the deadlock condition):
CPU 0: CPU 1: CPU 2:
----- ----- -----
led_trigger_event():
read_lock(&trig->leddev_list_lock);
<workqueue>
ata_hsm_qc_complete():
spin_lock_irqsave(&host->lock);
write_lock(&trig->leddev_list_lock);
ata_port_freeze():
ata_do_link_abort():
ata_qc_complete():
ledtrig_disk_activity():
led_trigger_blink_oneshot():
read_lock(&trig->leddev_list_lock);
// ^ not in in_interrupt() context, so could get blocked by CPU 2
<interrupt>
ata_bmdma_interrupt():
spin_lock_irqsave(&host->lock);
Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so
that no interrupt can happen in between, preventing the deadlock
condition.
Apply the same change to led_trigger_blink_setup() as well, since the
same deadlock scenario can also happen in power_supply_update_bat_leds()
-> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context),
and potentially prevent other similar usages.
Linus Torvalds [Sun, 31 Jan 2021 01:51:06 +0000 (17:51 -0800)]
Merge tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four cifs patches found in additional testing of the conversion to the
new mount API: three small option processing ones, and one fixing domain
based DFS referrals"
* tag '5.11-rc5-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix dfs domain referrals
cifs: returning mount parm processing errors correctly
cifs: fix mounts to subdirectories of target
cifs: ignore auto and noauto options if given
Linus Torvalds [Sun, 31 Jan 2021 01:42:42 +0000 (17:42 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes in drivers. Both changing strings (one in a comment,
one in a module help text) with no code impact"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix description for parameter ql2xenforce_iocb_limit
scsi: target: iscsi: Fix typo in comment
Linus Torvalds [Sat, 30 Jan 2021 19:48:57 +0000 (11:48 -0800)]
Merge tag 's390-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix max number of VCPUs reported via ultravisor information sysfs
interface.
- Fix memory leaks during vfio-ap resources clean up on KVM pointer
invalidation notification.
- Fix potential specification exception by avoiding unnecessary
interrupts disable after queue reset in vfio-ap.
* tag 's390-5.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: uv: Fix sysfs max number of VCPUs reporting
s390/vfio-ap: No need to disable IRQ after queue reset
s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
Chinmay Agarwal [Wed, 27 Jan 2021 16:54:54 +0000 (22:24 +0530)]
neighbour: Prevent a dead entry from updating gc_list
Following race condition was detected:
<CPU A, t0> - neigh_flush_dev() is under execution and calls
neigh_mark_dead(n) marking the neighbour entry 'n' as dead.
<CPU B, t1> - Executing: __netif_receive_skb() ->
__netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process()
calls __neigh_lookup() which takes a reference on neighbour entry 'n'.
<CPU A, t2> - Moves further along neigh_flush_dev() and calls
neigh_cleanup_and_release(n), but since reference count increased in t2,
'n' couldn't be destroyed.
<CPU B, t3> - Moves further along, arp_process() and calls
neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds
the neighbour entry back in gc_list(neigh_mark_dead(), removed it
earlier in t0 from gc_list)
This leads to 'n' still being part of gc_list, but the actual
neighbour structure has been freed.
The situation can be prevented from happening if we disallow a dead
entry to have any possibility of updating gc_list. This is what the
patch intends to achieve.
David Howells [Fri, 29 Jan 2021 23:53:50 +0000 (23:53 +0000)]
rxrpc: Fix deadlock around release of dst cached on udp tunnel
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst
from an incoming packet to get stolen and attached to the UDP socket from
whence it is leaked when that socket is closed.
When a network namespace is removed, the wait for dst records to be cleaned
up happens before the cleanup of the rxrpc and UDP socket, meaning that the
wait never finishes.
Fix this by moving the rxrpc (and, by dependence, the afs) private
per-network namespace registrations to the device group rather than subsys
group. This allows cached rxrpc local endpoints to be cleared and their
UDP sockets closed before we try waiting for the dst records.
The symptom is that lines looking like the following:
unregister_netdevice: waiting for lo to become free
get emitted at regular intervals after running something like the
referenced syzbot test.
Thanks to Vadim for tracking this down and work out the fix.
Heiner Kallweit [Thu, 28 Jan 2021 22:01:54 +0000 (23:01 +0100)]
r8169: work around RTL8125 UDP hw bug
It was reported that on RTL8125 network breaks under heavy UDP load,
e.g. torrent traffic ([0], from comment 27). Realtek confirmed a hw bug
and provided me with a test version of the r8125 driver including a
workaround. Tests confirmed that the workaround fixes the issue.
I modified the original version of the workaround to meet mainline
code style.
v2:
- rebased to net
v3:
- make rtl_skb_is_udp() more robust and use skb_header_pointer()
to access the ip(v6) header
v4:
- remove dependency on ptp_classify.h
- replace magic number with offsetof(struct udphdr, len)
Ahmed S. Darwish [Thu, 28 Jan 2021 19:48:02 +0000 (20:48 +0100)]
net: arcnet: Fix RESET flag handling
The main arcnet interrupt handler calls arcnet_close() then
arcnet_open(), if the RESET status flag is encountered.
This is invalid:
1) In general, interrupt handlers should never call ->ndo_stop() and
->ndo_open() functions. They are usually full of blocking calls and
other methods that are expected to be called only from drivers
init and exit code paths.
2) arcnet_close() contains a del_timer_sync(). If the irq handler
interrupts the to-be-deleted timer, del_timer_sync() will just loop
forever.
3) arcnet_close() also calls tasklet_kill(), which has a warning if
called from irq context.
4) For device reset, the sequence "arcnet_close(); arcnet_open();" is
not complete. Some children arcnet drivers have special init/exit
code sequences, which then embed a call to arcnet_open() and
arcnet_close() accordingly. Check drivers/net/arcnet/com20020.c.
Run the device RESET sequence from a scheduled workqueue instead.
Rob Herring [Thu, 28 Jan 2021 19:45:15 +0000 (13:45 -0600)]
dt-bindings: Cleanup standard unit properties
Properties with standard unit suffixes already have a type and don't need
type definitions. They also default to a single entry, so 'maxItems: 1'
can be dropped.
adi,ad5758 is an oddball which defined an enum of arrays. While a valid
schema, it is simpler as a whole to only define scalar constraints.
Linus Torvalds [Fri, 29 Jan 2021 21:59:24 +0000 (13:59 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix the virt_addr_valid() returning true for < PAGE_OFFSET addresses.
- Do not blindly trust the DMA masks from ACPI/IORT.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Do not blindly trust DMA masks from firmware
arm64: Fix kernel address detection of __is_lm_address()
Linus Torvalds [Fri, 29 Jan 2021 21:54:40 +0000 (13:54 -0800)]
Merge tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes for a late rc:
- fix lockdep complaint on 32bit arches and also remove an unsafe
memory use due to device vs filesystem lifetime
- two fixes for free space tree:
* race during log replay and cache rebuild, now more likely to
happen due to changes in this dev cycle
* possible free space tree corruption with online conversion
during initial tree population"
* tag 'for-5.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix log replay failure due to race with space cache rebuild
btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch
btrfs: fix possible free space tree corruption with online conversion
- NVMe pull request from Christoph:
- add another Write Zeroes quirk (Chaitanya Kulkarni)
- handle a no path available corner case (Daniel Wagner)
- use the proper RCU aware list_add helper (Chao Leng)
- bcache regression fix (Coly)
- bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12,
but for now, we'll make it IRQ safe (Damien)
- null_blk zoned init fix (Damien)
- add_partition() error handling fix (Dinghao)
- s390 dasd kobject fix (Jan)
- nbd fix for freezing queue while adding connections (Josef)
- tag queueing regression fix (Ming)
- revert of a patch that inadvertently meant that we regressed write
performance on raid (Maxim)"
* tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
null_blk: cleanup zoned mode initialization
nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
nvme-multipath: Early exit if no path is available
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
block: fix bd_size_lock use
blk-cgroup: Use cond_resched() when destroy blkgs
Revert "block: simplify set_init_blocksize" to regain lost performance
nbd: freeze the queue while we're adding connections
s390/dasd: Fix inconsistent kobject removal
block: Fix an error handling in add_partition
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
Linus Torvalds [Fri, 29 Jan 2021 21:47:47 +0000 (13:47 -0800)]
Merge tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"We got the cancelation story sorted now, so for all intents and
purposes, this should be it for 5.11 outside of any potential little
fixes that may come in. This contains:
- task_work task state fixes (Hao, Pavel)
- Cancelation fixes (me, Pavel)
- Fix for an inflight req patch in this release (Pavel)
- Fix for a lock deadlock issue (Pavel)"
* tag 'io_uring-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
io_uring: reinforce cancel on flush during exit
io_uring: fix sqo ownership false positive warning
io_uring: fix list corruption for splice file_get
io_uring: fix flush cqring overflow list while TASK_INTERRUPTIBLE
io_uring: fix wqe->lock/completion_lock deadlock
io_uring: fix cancellation taking mutex while TASK_UNINTERRUPTIBLE
io_uring: fix __io_uring_files_cancel() with TASK_UNINTERRUPTIBLE
io_uring: only call io_cqring_ev_posted() if events were posted
io_uring: if we see flush on exit, cancel related tasks
David Gow [Wed, 27 Jan 2021 03:36:04 +0000 (19:36 -0800)]
soc: litex: Properly depend on HAS_IOMEM
The LiteX SOC controller driver makes use of IOMEM functions like
devm_platform_ioremap_resource(), which are only available if
CONFIG_HAS_IOMEM is defined.
This causes the driver to be enable under make ARCH=um allyesconfig,
even though it won't build.
By adding a dependency on HAS_IOMEM, the driver will not be enabled on
architectures which don't support it.
Fixes: 22447a99c97e ("drivers/soc/litex: add LiteX SoC Controller driver") Signed-off-by: David Gow <[email protected]>
[[email protected]: Fix typo in commit message pointed out in review] Signed-off-by: Stafford Horne <[email protected]>
Linus Torvalds [Fri, 29 Jan 2021 21:32:05 +0000 (13:32 -0800)]
Merge tag 'iommu-fixes-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- AMD IOMMU fix to make sure features are detected before they are
queried.
- Intel IOMMU address alignment check fix for an IOLTB flushing
command.
- Performance fix for Intel IOMMU to make sure the code does not do
full IOTLB flushes all the time. Those flushes are very expensive
on emulated IOMMUs.
* tag 'iommu-fixes-v5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Do not use flush-queue when caching-mode is on
iommu/vt-d: Correctly check addr alignment in qi_flush_dev_iotlb_pasid()
iommu/amd: Use IVHD EFR for early initialization of IOMMU features
Linus Torvalds [Fri, 29 Jan 2021 21:30:09 +0000 (13:30 -0800)]
Merge tag 'pm-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a deadlock in the 'kexec jump' code and address a possible
hibernation image creation issue.
Specifics:
- Fix a deadlock caused by attempting to acquire the same mutex twice
in a row in the "kexec jump" code (Baoquan He)
- Modify the hibernation image saving code to flush the unwritten
data to the swap storage later so as to avoid failing to write the
image signature which is possible in some cases (Laurent Badel)"
* tag 'pm-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: flush swap writer after marking
kernel: kexec: remove the lock operation of system_transition_mutex
Linus Torvalds [Fri, 29 Jan 2021 21:23:21 +0000 (13:23 -0800)]
Merge tag 'acpi-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix the handling of notifications in the ACPI thermal driver and
address a device enumeration issue leading to the presence of multiple
'MODALIAS=' entries in one uevent file in sysfs in some cases.
Specifics:
- Modify the ACPI thermal driver to avoid evaluating _TMP directly in
its Notify () handler callback and running too many thermal checks
for one thermal zone at the same time so as to address a work item
accumulation issue observed on some systems that fail to shut down
as a result of it (Rafael Wysocki)
- Modify the ACPI uevent file creation code to avoid putting multiple
'MODALIAS=' entries in one uevent file in sysfs which breaks
systemd-udevd (Kai-Heng Feng)"
* tag 'acpi-5.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: thermal: Do not call acpi_thermal_check() directly
ACPI: sysfs: Prefer "compatible" modalias
Linus Torvalds [Fri, 29 Jan 2021 21:18:23 +0000 (13:18 -0800)]
Merge tag 'drm-fixes-2021-01-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes for graphics, nothing too major, nouveau has a few
regression fixes for various fallout from header changes previously,
vc4 has two fixes, two amdgpu, and a smattering of i915 fixes.
All seems on course for a quieter rc7, fingers crossed.
vc4:
- Fix LBM size calculation
- Fix high resolutions for hvs5
i915:
- Fix ICL MG PHY vswing
- Fix subplatform handling
- Fix selftest memleak
- Clear CACHE_MODE prior to clearing residuals
- Always flush the active worker before returning from the wait
- Always try to reserve GGTT address 0x0
amdgpu:
- Fix a fan control regression on some boards
- Fix clang warning"
* tag 'drm-fixes-2021-01-29' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/kms/gk104-gp1xx: Fix > 64x64 cursors
drm/nouveau/kms/nv50-: Report max cursor size to userspace
drivers/nouveau/kms/nv50-: Reject format modifiers for cursor planes
drm/nouveau/svm: fail NOUVEAU_SVM_INIT ioctl on unsupported devices
drm/nouveau/dispnv50: Restore pushing of all data.
amdgpu: fix clang build warning
Revert "drm/amdgpu/swsmu: drop set_fan_speed_percent (v2)"
drm/i915/gt: Always try to reserve GGTT address 0x0
drm/i915: Always flush the active worker before returning from the wait
drm/i915/selftest: Fix potential memory leak
drm/i915: Check for all subplatform bits
drm/i915: Fix ICL MG PHY vswing handling
drm/i915/gt: Clear CACHE_MODE prior to clearing residuals
drm/vc4: Correct POS1_SCL for hvs5
drm/vc4: Correct lbm size and calculation
drm/nouveau/nvif: fix method count when pushing an array
Linus Torvalds [Fri, 29 Jan 2021 20:28:20 +0000 (12:28 -0800)]
tty: avoid using vfs_iocb_iter_write() for redirected console writes
It turns out that the vfs_iocb_iter_{read,write}() functions are
entirely broken, and don't actually use the passed-in file pointer for
IO - only for the preparatory work (permission checking and for the
write_iter function lookup).
That worked fine for overlayfs, which always builds the new iocb with
the same file pointer that it passes in, but in the general case it ends
up doing nonsensical things (and could cause an iterator call that
doesn't even match the passed-in file pointer).
This subtly broke the tty conversion to write_iter in commit 9bb48c82aced ("tty: implement write_iter"), because the console
redirection didn't actually end up redirecting anything, since the
passed-in file pointer was basically ignored, and the actual write was
done with the original non-redirected console tty after all.
The main visible effect of this is that the console messages were no
longer logged to /var/log/boot.log during graphical boot.
Fix the issue by simply not using the vfs write "helper" function at
all, and just redirecting the write entirely internally to the tty
layer. Do the target writability permission checks when actually
registering the target tty with TIOCCONS instead of at write time.
Damien Le Moal [Fri, 29 Jan 2021 14:47:25 +0000 (23:47 +0900)]
null_blk: cleanup zoned mode initialization
To avoid potential compilation problems, replaced the badly written
MB_TO_SECTS() macro (missing parenthesis around the argument use) with
the inline function mb_to_sects(). And while at it, simplify the
calculation of the total number of zones of the device using the
round_up() macro.
1) Fix two copy_{from,to}_user() warn_on_once splats for BPF cgroup getsockopt
infra when user space is trying to race against optlen, from Loris Reiff.
2) Fix a missing fput() in BPF inode storage map update helper, from Pan Bian.
3) Fix a build error on unresolved symbols on disabled networking / keys LSM
hooks, from Mikko Ylinen.
4) Fix preload BPF prog build when the output directory from make points to a
relative path, from Quentin Monnet.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, preload: Fix build when $(O) points to a relative path
bpf: Drop disabled LSM hooks from the sleepable set
bpf, inode_storage: Put file handler if no storage was found
bpf, cgroup: Fix problematic bounds check
bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
====================
Ronnie Sahlberg [Fri, 29 Jan 2021 03:35:10 +0000 (21:35 -0600)]
cifs: fix dfs domain referrals
The new mount API requires additional changes to how DFS
is handled. Additional testing of DFS uncovered problems
with domain based DFS referrals (a follow on patch addresses
DFS links) which this patch addresses.
Linus Torvalds [Fri, 29 Jan 2021 03:40:26 +0000 (19:40 -0800)]
Merge tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fix from Tyler Hicks:
"Fix a regression that resulted in two rounds of UID translations when
setting v3 namespaced file capabilities in some configurations"
* tag 'ecryptfs-5.11-rc6-setxattr-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: fix uid translation for setxattr on security.capability