Linus Torvalds [Thu, 12 Dec 2019 02:10:40 +0000 (18:10 -0800)]
Merge tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Fixes for AFS plus one patch to make debugging easier:
- Fix how addresses are matched to server records. This is currently
incorrect which means cache invalidation callbacks from the server
don't necessarily get delivered correctly. This causes stale data
and metadata to be seen under some circumstances.
- Make the dynamic root superblock R/W so that rpm/dnf can reapply
the SELinux label to it when upgrading the Fedora filesystem-afs
package. If the filesystem is R/O, this fails and the upgrade
fails.
It might be better in future to allow setxattr from an LSM to
bypass the R/O protections, if only for pseudo-filesystems.
- Fix the parsing of mountpoint strings. The mountpoint object has to
have a terminal dot, whereas the source/device string passed to
mount should not. This confuses type-forcing suffix detection
leading to the wrong volume variant being mounted.
- Make lookups in the dynamic root superblock for creation events
(such as mkdir) fail with EOPNOTSUPP rather than something like
EEXIST. The dynamic root only allows implicit creation by the
->lookup() method - and only if the target cell exists.
- Fix the looking up of an AFS superblock to include the cell in the
matching key - otherwise all volumes with the same ID number are
treated as the same thing, irrespective of which cell they're in.
- Show the volume name of each volume in the volume records displayed
in /proc/net/afs/<cell>/volumes. This proved useful in debugging as
it provides a way to map the volume IDs to names, where the names
are what appear in /proc/mounts"
* tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Show volume name in /proc/net/afs/<cell>/volumes
afs: Fix missing cell comparison in afs_test_super()
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
afs: Fix mountpoint parsing
afs: Fix SELinux setting security label on /afs
afs: Fix afs_find_server lookups for ipv4 peers
Jens Axboe [Wed, 11 Dec 2019 22:55:43 +0000 (15:55 -0700)]
io_uring: ensure we return -EINVAL on unknown opcode
If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.
Change io_op_needs_file() to have the following return values:
0 - does not need a file
1 - does need a file
< 0 - error value
and use this to pass back the right value for this invalid case.
Linus Torvalds [Wed, 11 Dec 2019 20:25:32 +0000 (12:25 -0800)]
Merge tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Mainly address a regression reported by David recently observed
together with overlayfs due to the improper return value of
listxattr() without xattr. Update outdated expressions in document as
well.
Summary:
- Fix improper return value of listxattr() with no xattr
- Keep up documentation with latest code"
* tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: update documentation
erofs: zero out when listxattr is called with no xattr
Linus Torvalds [Wed, 11 Dec 2019 20:22:38 +0000 (12:22 -0800)]
Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Remove code I accidentally applied when doing a minor fix up to a
patch, and then using "git commit -a --amend", which pulled in some
other changes I was playing with.
- Remove an used variable in trace_events_inject code
- Fix function graph tracer when it traces a ftrace direct function.
It will now ignore tracing a function that has a ftrace direct
tramploine attached. This is needed for eBPF to use the ftrace direct
code.
* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix function_graph tracer interaction with BPF trampoline
tracing: remove set but not used variable 'buffer'
module: Remove accidental change of module_enable_x()
Linus Torvalds [Wed, 11 Dec 2019 19:46:19 +0000 (11:46 -0800)]
pipe: simplify signal handling in pipe_read() and add comments
There's no need to separately check for signals while inside the locked
region, since we're going to do "wait_event_interruptible()" right
afterwards anyway, and the error handling is much simpler there.
The check for whether we had already read anything was also redundant,
since we no longer do the odd merging of reads when there are pending
writers.
But perhaps more importantly, this adds commentary about why we still
need to wake up possible writers even though we didn't read any data,
and why we can skip all the finishing touches now if we get a signal (or
had a signal pending) while waiting for more data.
[ This is a split-out cleanup from my "make pipe IO use exclusive wait
queues" thing, which I can't apply because it triggers a nasty bug in
the GNU make jobserver - Linus ]
Vasily Gorbik [Fri, 2 Aug 2019 10:42:59 +0000 (12:42 +0200)]
s390/kasan: add KASAN_VMALLOC support
Add KASAN_VMALLOC support which now enables vmalloc memory area access
checks as well as enables usage of VMAP_STACK under kasan.
KASAN_VMALLOC changes the way vmalloc and modules areas shadow memory
is handled. With this new approach only top level page tables are
pre-populated and lower levels are filled dynamically upon memory
allocation.
Heiko Carstens [Fri, 29 Nov 2019 11:59:59 +0000 (12:59 +0100)]
s390: remove last diag 0x44 caller
diag 0x44 is a voluntary undirected yield of a virtual CPU. This has
caused a lot of performance issues in the past.
There is only one caller left, and that one is only executed if diag
0x9c (directed yield) is not present. Given that all hypervisors
implement diag 0x9c anyway, remove the last diag 0x44 to avoid that
more callers will be added.
Worst case that could happen now, if diag 0x9c is not present, is that
a virtual CPU would loop a bit instead of giving its time slice up.
diag 0x44 statistics in debugfs are kept and will always be zero, so
that user space can tell that there are no calls.
ENOTSUP is just an internal kernel error and should never reach
userspace. The return value of the share function is not exported to
userspace, but to avoid giving bad examples let us use EOPNOTSUPP:
Thomas Richter [Fri, 29 Nov 2019 14:24:25 +0000 (15:24 +0100)]
s390/cpum_sf: Avoid SBD overflow condition in irq handler
The s390 CPU Measurement sampling facility has an overflow condition
which fires when all entries in a SBD are used.
The measurement alert interrupt is triggered and reads out all samples
in this SDB. It then tests the successor SDB, if this SBD is not full,
the interrupt handler does not read any samples at all from this SDB
The design waits for the hardware to fill this SBD and then trigger
another meassurement alert interrupt.
This scheme works nicely until
an perf_event_overflow() function call discards the sample due to
a too high sampling rate.
The interrupt handler has logic to read out a partially filled SDB
when the perf event overflow condition in linux common code is met.
This causes the CPUM sampling measurement hardware and the PMU
device driver to operate on the same SBD's trailer entry.
This should not happen.
This can be seen here using this trace:
cpumsf_pmu_add: tear:0xb5286000
hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0
hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
above shows 1. interrupt
hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0
hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0
above shows 2. interrupt
... this goes on fine until...
hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0
perf_push_sample1: overflow
one or more samples read from the IRQ handler are rejected by
perf_event_overflow() and the IRQ handler advances to the next SDB
and modifies the trailer entry of a partially filled SDB.
hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1
timestamp: 14:32:52.519953
Next time the IRQ handler is called for this SDB the trailer entry shows
an overflow count of 19 missed entries.
hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1
timestamp: 14:32:52.970058
Remove access to a follow on SDB when event overflow happened.
Thomas Richter [Thu, 28 Nov 2019 09:26:41 +0000 (10:26 +0100)]
s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
Function perf_event_ever_overflow() and perf_event_account_interrupt()
are called every time samples are processed by the interrupt handler.
However function perf_event_account_interrupt() has checks to avoid being
flooded with interrupts (more then 1000 samples are received per
task_tick). Samples are then dropped and a PERF_RECORD_THROTTLED is
added to the perf data. The perf subsystem limit calculation is:
maximum sample frequency := 100000 --> 1 samples per 10 us
task_tick = 10ms = 10000us --> 1000 samples per task_tick
The work flow is
measurement_alert() uses SDBT head and each SBDT points to 511
SDB pages, each with 126 sample entries. After processing 8 SBDs
and for each valid sample calling:
there is a considerable amount of samples being dropped, especially when
the sample frequency is very high and near the 100000 limit.
To avoid the high amount of samples being dropped near the end of a
task_tick time frame, increment the sampling interval in case of
dropped events. The CPU Measurement sampling facility on the s390
supports only intervals, specifiing how many CPU cycles have to be
executed before a sample is generated. Increase the interval when the
samples being generated hit the task_tick limit.
Guoqing Jiang [Wed, 27 Nov 2019 16:57:50 +0000 (17:57 +0100)]
raid5: need to set STRIPE_HANDLE for batch head
With commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ("raid5: don't set
STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
STRIPE_HANDLE flag for sh which is already in batch list.
However, the stripe which is the head of batch list should set this flag,
otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head),
it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.
David Howells [Wed, 11 Dec 2019 08:58:59 +0000 (08:58 +0000)]
afs: Show volume name in /proc/net/afs/<cell>/volumes
Show the name of each volume in /proc/net/afs/<cell>/volumes to make it
easier to work out the name corresponding to a volume ID. This makes it
easier to work out which mounts in /proc/mounts correspond to which volume
ID.
David Howells [Wed, 11 Dec 2019 08:06:08 +0000 (08:06 +0000)]
afs: Fix missing cell comparison in afs_test_super()
Fix missing cell comparison in afs_test_super(). Without this, any pair
volumes that have the same volume ID will share a superblock, no matter the
cell, unless they're in different network namespaces.
Normally, most users will only deal with a single cell and so they won't
see this. Even if they do look into a second cell, they won't see a
problem unless they happen to hit a volume with the same ID as one they've
already got mounted.
David Howells [Wed, 11 Dec 2019 08:56:04 +0000 (08:56 +0000)]
afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
Fix the lookup method on the dynamic root directory such that creation
calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP
rather than failing with some odd error (such as EEXIST).
lookup() itself tries to create automount directories when it is invoked.
These are cached locally in RAM and not committed to storage.
David Howells [Mon, 9 Dec 2019 15:04:45 +0000 (15:04 +0000)]
afs: Fix mountpoint parsing
Each AFS mountpoint has strings that define the target to be mounted. This
is required to end in a dot that is supposed to be stripped off. The
string can include suffixes of ".readonly" or ".backup" - which are
supposed to come before the terminal dot. To add to the confusion, the "fs
lsmount" afs utility does not show the terminal dot when displaying the
string.
The kernel mount source string parser, however, assumes that the terminal
dot marks the suffix and that the suffix is always "" and is thus ignored.
In most cases, there is no suffix and this is not a problem - but if there
is a suffix, it is lost and this affects the ability to mount the correct
volume.
The command line mount command, on the other hand, is expected not to
include a terminal dot - so the problem doesn't arise there.
Fix this by making sure that the dot exists and then stripping it when
passing the string to the mount configuration.
dt-bindings: net: ti: cpsw-switch: update to fix comments
After original patch was merged there were additional comments/requests
provided by Rob Herring [1]. Mostly they are related to json-schema usage,
and this patch fixes them. Also SPDX-License-Identifier has been changed to
(GPL-2.0-only OR BSD-2-Clause) as requested.
[1] https://lkml.org/lkml/2019/11/21/875 Fixes: ef63fe72f698 ("dt-bindings: net: ti: add new cpsw switch driver bindings") Signed-off-by: Grygorii Strashko <[email protected]>
[robh: Remove 2 more maxItems that aren't necessary] Signed-off-by: Rob Herring <[email protected]>
Mathias Nyman [Wed, 11 Dec 2019 14:20:07 +0000 (16:20 +0200)]
xhci: make sure interrupts are restored to correct state
spin_unlock_irqrestore() might be called with stale flags after
reading port status, possibly restoring interrupts to a incorrect
state.
If a usb2 port just finished resuming while the port status is read
the spin lock will be temporary released and re-acquired in a separate
function. The flags parameter is passed as value instead of a pointer,
not updating flags properly before the final spin_unlock_irqrestore()
is called.
Mathias Nyman [Wed, 11 Dec 2019 14:20:06 +0000 (16:20 +0200)]
xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
xhci driver claims it needs XHCI_TRUST_TX_LENGTH quirk for both
Broadcom/Cavium and a Renesas xHC controllers.
The quirk was inteded for handling false "success" complete event for
transfers that had data left untransferred.
These transfers should complete with "short packet" events instead.
In these two new cases the false "success" completion is reported
after a "short packet" if the TD consists of several TRBs.
xHCI specs 4.10.1.1.2 say remaining TRBs should report "short packet"
as well after the first short packet in a TD, but this issue seems so
common it doesn't make sense to add the quirk for all vendors.
Turn these events into short packets automatically instead.
This gets rid of the "The WARN Successful completion on short TX for
slot 1 ep 1: needs XHCI_TRUST_TX_LENGTH quirk" warning in many cases.
Similar to commit ac343366846a ("xhci: Increase STS_SAVE timeout in
xhci_suspend()") we also need to increase the HALT timeout to make it be
able to suspend again.
Henry Lin [Wed, 11 Dec 2019 14:20:04 +0000 (16:20 +0200)]
usb: xhci: only set D3hot for pci device
Xhci driver cannot call pci_set_power_state() on non-pci xhci host
controllers. For example, NVIDIA Tegra XHCI host controller which acts
as platform device with XHCI_SPURIOUS_WAKEUP quirk set in some platform
hits this issue during shutdown.
Mathias Nyman [Wed, 11 Dec 2019 14:20:03 +0000 (16:20 +0200)]
xhci: fix USB3 device initiated resume race with roothub autosuspend
A race in xhci USB3 remote wake handling may force device back to suspend
after it initiated resume siganaling, causing a missed resume event or warm
reset of device.
When a USB3 link completes resume signaling and goes to enabled (UO)
state a interrupt is issued and the interrupt handler will clear the
bus_state->port_remote_wakeup resume flag, allowing bus suspend.
If the USB3 roothub thread just finished reading port status before
the interrupt, finding ports still in suspended (U3) state, but hasn't
yet started suspending the hub, then the xhci interrupt handler will clear
the flag that prevented roothub suspend and allow bus to suspend, forcing
all port links back to suspended (U3) state.
Example case:
usb_runtime_suspend() # because all ports still show suspended U3
usb_suspend_both()
hub_suspend(); # successful as hub->wakeup_bits not set yet
==> INTERRUPT
xhci_irq()
handle_port_status()
clear bus_state->port_remote_wakeup
usb_wakeup_notification()
sets hub->wakeup_bits;
kick_hub_wq()
<== END INTERRUPT
hcd_bus_suspend()
xhci_bus_suspend() # success as port_remote_wakeup bits cleared
Fix this by increasing roothub usage count during port resume to prevent
roothub autosuspend, and by making sure bus_state->port_remote_wakeup
flag is only cleared after resume completion is visible, i.e.
after xhci roothub returned U0 or other non-U3 link state link on a
get port status request.
Mika Westerberg [Wed, 11 Dec 2019 14:20:02 +0000 (16:20 +0200)]
xhci: Fix memory leak in xhci_add_in_port()
When xHCI is part of Alpine or Titan Ridge Thunderbolt controller and
the xHCI device is hot-removed as a result of unplugging a dock for
example, the driver leaks memory it allocates for xhci->usb3_rhub.psi
and xhci->usb2_rhub.psi in xhci_add_in_port() as reported by kmemleak:
Merge tag 'fixes-for-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
USB: fixes for v5.5-rc2
Only four patches here this time around. Three of them are on dwc3
fixing some small bugs related to our 'started' flag.
None are major fixes but they're important nevertheless.
* tag 'fixes-for-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: gadget: fix wrong endpoint desc
usb: dwc3: ep0: Clear started flag on completion
usb: dwc3: gadget: Clear started flag for non-IOC
usb: dwc3: gadget: Fix logical condition
Chris Wilson [Thu, 21 Nov 2019 07:10:41 +0000 (07:10 +0000)]
drm/i915: Serialise with remote retirement
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
We managed to get confused about the shift direction at least once.
Let's switch to division/multiplcation instead. Add a number of pages
macro for this purpose. We still keep the order macro around too since
this is what alloc/free pages want.
free_page_order is a confusing name. It's not a page order
actually, it's the order of the block of memory we are hinting.
Rename to hint_block_order. Also, rename SIZE to BYTES
to make it clear it's the block size in bytes.
virtio-balloon: fix managed page counts when migrating pages between zones
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.
In another instance (older kernel), I was able to observe this
(negative page count :/):
[ 180.896971] Offlined Pages 32768
[ 182.667462] Offlined Pages 32768
[ 184.408117] Offlined Pages 32768
[ 186.026321] Offlined Pages 32768
[ 187.684861] Offlined Pages 32768
[ 189.227013] Offlined Pages 32768
[ 190.830303] Offlined Pages 32768
[ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009
In another instance (older kernel), I was no longer able to start any
process:
[root@vm ~]# [ 214.348068] Offlined Pages 32768
[ 215.973009] Offlined Pages 32768
cat /proc/meminfo
-bash: fork: Cannot allocate memory
[root@vm ~]# cat /proc/meminfo
-bash: fork: Cannot allocate memory
Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).
We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).
Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.
Fredrik Noring [Tue, 10 Dec 2019 17:29:05 +0000 (18:29 +0100)]
USB: Fix incorrect DMA allocations for local memory pool drivers
Fix commit 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of
guestimating DMA capabilities") where local memory USB drivers
erroneously allocate DMA memory instead of pool memory, causing
OHCI Unrecoverable Error, disabled
HC died; cleaning up
The order between hcd_uses_dma() and hcd->localmem_pool is now
arranged as in hcd_buffer_alloc() and hcd_buffer_free(), with the
test for hcd->localmem_pool placed first.
As an alternative, one might consider adjusting hcd_uses_dma() with
Wolfram Sang [Wed, 6 Nov 2019 21:21:01 +0000 (22:21 +0100)]
i2c: add helper to check if a client has a driver attached
As a preparation for an API conversion, factor out something frequently
used in the media subsystem. As an improvement, it bails out on both,
NULL and ERRPTR to handle the old and new API.
Hui Wang [Wed, 11 Dec 2019 05:13:21 +0000 (13:13 +0800)]
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
After applying the fixup ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, the
Line-out jack works well. And instead of adding a new set of pin
definition in the pin_fixup_tbl, we put a more generic matching entry
in the fallback_pin_fixup_tbl.
Jens Axboe [Tue, 10 Dec 2019 03:16:22 +0000 (20:16 -0700)]
io_uring: add sockets to list of files that support non-blocking issue
In chasing a performance issue between using IORING_OP_RECVMSG and
IORING_OP_READV on sockets, tracing showed that we always punt the
socket reads to async offload. This is due to io_file_supports_async()
not checking for S_ISSOCK on the inode. Since sockets supports the
O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list
of file types that we can do a non-blocking issue to.
Jens Axboe [Tue, 10 Dec 2019 03:58:56 +0000 (20:58 -0700)]
net: make socket read/write_iter() honor IOCB_NOWAIT
The socket read/write helpers only look at the file O_NONBLOCK. not
the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2
and io_uring that rely on not having the file itself marked nonblocking,
but rather the iocb itself.
Jens Axboe [Tue, 10 Dec 2019 03:01:01 +0000 (20:01 -0700)]
io_uring: run next sqe inline if possible
One major use case of linked commands is the ability to run the next
link inline, if at all possible. This is done correctly for async
offload, but somewhere along the line we lost the ability to do so when
we were able to complete a request without having to punt it. Ensure
that we do so correctly.
Jens Axboe [Tue, 10 Dec 2019 00:52:20 +0000 (17:52 -0700)]
io_uring: don't dynamically allocate poll data
This essentially reverts commit e944475e6984. For high poll ops
workloads, like TAO, the dynamic allocation of the wait_queue
entry for IORING_OP_POLL_ADD adds considerable extra overhead.
Go back to embedding the wait_queue_entry, but keep the usage of
wait->private for the pointer stashing.
Jens Axboe [Sun, 8 Dec 2019 04:03:59 +0000 (21:03 -0700)]
io-wq: remove worker->wait waitqueue
We only have one cases of using the waitqueue to wake the worker, the
rest are using wake_up_process(). Since we can save some cycles not
fiddling with the waitqueue io_wqe_worker(), switch the work activation
to task wakeup and get rid of the now unused wait_queue_head_t in
struct io_worker.
Jens Axboe [Sun, 8 Dec 2019 03:59:47 +0000 (20:59 -0700)]
io_uring: allow unbreakable links
Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.
For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.
It turns out that cpuidle_driver_state_disabled() can be called
before registering the cpufreq driver on some platforms, which
was not expected when it was introduced and which leads to a NULL
pointer dereference when trying to walk the CPUs associated with
the given cpuidle driver.
Fix the problem by making cpuidle_driver_state_disabled() check if
the driver's mask of CPUs associated with it is present and to set
CPUIDLE_FLAG_UNUSABLE for the given idle state in the driver's states
list if that is not the case to cause __cpuidle_register_device() to
set CPUIDLE_STATE_DISABLED_BY_DRIVER for that state for all cpuidle
devices registered by it later.
Fixes: cbda56d5fefc ("cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks") Reported-by: Daniel Lezcano <[email protected]> Tested-by: Daniel Lezcano <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Arnd Bergmann [Tue, 10 Dec 2019 19:59:24 +0000 (20:59 +0100)]
drm/amd/display: include linux/slab.h where needed
Calling kzalloc() and related functions requires the
linux/slab.h header to be included:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn21/dcn21_resource.c: In function 'dcn21_ipp_create':
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn21/dcn21_resource.c:679:3: error: implicit declaration of function 'kzalloc'; did you mean 'd_alloc'? [-Werror=implicit-function-declaration]
kzalloc(sizeof(struct dcn10_ipp), GFP_KERNEL);
A lot of other headers also miss a direct include in this file,
but this is the only one that causes a problem for now.
Arnd Bergmann [Tue, 10 Dec 2019 20:30:46 +0000 (21:30 +0100)]
drm/amd/display: fix undefined struct member reference
An initialization was added for two optional struct members. One of
these is always present in the dcn20_resource file, but the other one
depends on CONFIG_DRM_AMD_DC_DSC_SUPPORT and causes a build failure if
that is missing:
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:926:14: error: excess elements in struct initializer [-Werror]
.num_dsc = 5,
Add another #ifdef around the assignment.
Fixes: c3d03c5a196f ("drm/amd/display: Include num_vmid and num_dsc within NV14's resource caps") Reviewed-by: Zhan Liu <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Mark switch cases where we are expecting to fall through.
This patch fixes the following error:
LINUX/arch/sh/kernel/kgdb.c: In function 'kgdb_arch_handle_exception':
LINUX/arch/sh/kernel/kgdb.c:267:6: error: this statement may fall through [-Werror=implicit-fallthrough=]
if (kgdb_hex2long(&ptr, &addr))
^
LINUX/arch/sh/kernel/kgdb.c:269:2: note: here
case 'D':
^~~~
ftrace: Fix function_graph tracer interaction with BPF trampoline
Depending on type of BPF programs served by BPF trampoline it can call original
function. In such case the trampoline will skip one stack frame while
returning. That will confuse function_graph tracer and will cause crashes with
bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached.
YueHaibing [Sat, 7 Dec 2019 03:44:09 +0000 (11:44 +0800)]
tracing: remove set but not used variable 'buffer'
kernel/trace/trace_events_inject.c: In function trace_inject_entry:
kernel/trace/trace_events_inject.c:20:22: warning: variable buffer set but not used [-Wunused-but-set-variable]
module: Remove accidental change of module_enable_x()
When pulling in Divya Indi's patch, I made a minor fix to remove unneeded
braces. I commited my fix up via "git commit -a --amend". Unfortunately, I
didn't realize I had some changes I was testing in the module code, and
those changes were applied to Divya's patch as well.
This reverts the accidental updates to the module code.
Lukas Wunner [Tue, 10 Dec 2019 13:39:50 +0000 (14:39 +0100)]
ALSA: hda/hdmi - Fix duplicate unref of pci_dev
Nicholas Johnson reports a null pointer deref as well as a refcount
underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi -
fix vgaswitcheroo detection for AMD").
The commit iterates over PCI devices using pci_get_class() and
unreferences each device found, even though pci_get_class()
subsequently unreferences the device as well. Fix it.
Amir Goldstein [Fri, 6 Dec 2019 06:33:36 +0000 (08:33 +0200)]
ovl: relax WARN_ON() on rename to self
In ovl_rename(), if new upper is hardlinked to old upper underneath
overlayfs before upper dirs are locked, user will get an ESTALE error
and a WARN_ON will be printed.
Changes to underlying layers while overlayfs is mounted may result in
unexpected behavior, but it shouldn't crash the kernel and it shouldn't
trigger WARN_ON() either, so relax this WARN_ON().
Amir Goldstein [Sun, 17 Nov 2019 15:43:44 +0000 (17:43 +0200)]
ovl: fix corner case of non-unique st_dev;st_ino
On non-samefs overlay without xino, non pure upper inodes should use a
pseudo_dev assigned to each unique lower fs and pure upper inodes use the
real upper st_dev.
It is fine for an overlay pure upper inode to use the same st_dev;st_ino
values as the real upper inode, because the content of those two different
filesystem objects is always the same.
In this case, however:
- two filesystems, A and B
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
Non pure upper overlay inode, whose origin is in layer 1 will have the same
st_dev;st_ino values as the real lower inode. This may result with a false
positive results of 'diff' between the real lower and copied up overlay
inode.
Fix this by using the upper st_dev;st_ino values in this case. This breaks
the property of constant st_dev;st_ino across copy up of this case. This
breakage will be fixed by a later patch.
Amir Goldstein [Fri, 15 Nov 2019 11:33:03 +0000 (13:33 +0200)]
ovl: make sure that real fid is 32bit aligned in memory
Seprate on-disk encoding from in-memory and on-wire resresentation
of overlay file handle.
In-memory and on-wire we only ever pass around pointers to struct
ovl_fh, which encapsulates at offset 3 the on-disk format struct
ovl_fb. struct ovl_fb encapsulates at offset 21 the real file handle.
That makes sure that the real file handle is always 32bit aligned
in-memory when passed down to the underlying filesystem.
On-disk format remains the same and store/load are done into
correctly aligned buffer.
New nfs exported file handles are exported with aligned real fid.
Old nfs file handles are copied to an aligned buffer before being
decoded.
Amir Goldstein [Thu, 14 Nov 2019 20:28:41 +0000 (22:28 +0200)]
ovl: fix lookup failure on multi lower squashfs
In the past, overlayfs required that lower fs have non null uuid in
order to support nfs export and decode copy up origin file handles.
Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of
lower fs") relaxed this requirement for nfs export support, as long
as uuid (even if null) is unique among all lower fs.
However, said commit unintentionally also relaxed the non null uuid
requirement for decoding copy up origin file handles, regardless of
the unique uuid requirement.
Amend this mistake by disabling decoding of copy up origin file handle
from lower fs with a conflicting uuid.
We still encode copy up origin file handles from those fs, because
file handles like those already exist in the wild and because they
might provide useful information in the future.
There is an unhandled corner case described by Miklos this way:
- two filesystems, A and B, both have null uuid
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
In this case bad_uuid won't be set for B, because the check only
involves the list of lower fs. Hence we'll try to decode a layer 2
origin on layer 1 and fail.
Andy Shevchenko [Thu, 21 Nov 2019 14:02:07 +0000 (16:02 +0200)]
fbtft: Fix the initialization from property algorithm
When converting to device property API the commit 8b2d3aeeb7ec ("fbtft: Make use of device property API")
mistakenly placed the reading of the first value inside the loop,
that jumps over value after initialization sequence or sleep commands.
Move the above mentioned reading outside of the loop to restore
correct behaviour.
Besides that, we are using pre-increment operation which may lead to
out of the boundary access at the end of sequence. Thus, allocate buffer
with an additional element at the end to prevent out of the boundary
access.
Guenter Roeck [Tue, 3 Dec 2019 20:58:52 +0000 (12:58 -0800)]
drivers: Fix boot problem on SuperH
SuperH images crash too eearly to display any console output. Bisect
points to commit 507fd01d5333 ("drivers: move the early platform device
support to arch/sh"). An analysis of that patch suggests that
early_platform_cleanup() is now called at the wrong time. Restoring its
call point fixes the problem.
EJ Hsu [Wed, 4 Dec 2019 07:34:56 +0000 (23:34 -0800)]
usb: gadget: fix wrong endpoint desc
Gadget driver should always use config_ep_by_speed() to initialize
usb_ep struct according to usb device's operating speed. Otherwise,
usb_ep struct may be wrong if usb devcie's operating speed is changed.
The key point in this patch is that we want to make sure the desc pointer
in usb_ep struct will be set to NULL when gadget is disconnected.
This will force it to call config_ep_by_speed() to correctly initialize
usb_ep struct based on the new operating speed when gadget is
re-connected later.
Saravana Kannan [Mon, 9 Dec 2019 19:31:19 +0000 (11:31 -0800)]
of/platform: Unconditionally pause/resume sync state during kernel init
Commit 5e6669387e22 ("of/platform: Pause/resume sync state during init
and of_platform_populate()") paused/resumed sync state during init only
if Linux had parsed and populated a devicetree.
However, the check for that (of_have_populated_dt()) can change after
of_platform_default_populate_init() executes. One example of this is
when devicetree unittests are enabled. This causes an unmatched
pause/resume of sync state. To avoid this, just unconditionally
pause/resume sync state during init.
Thinh Nguyen [Wed, 27 Nov 2019 21:10:54 +0000 (13:10 -0800)]
usb: dwc3: ep0: Clear started flag on completion
Clear ep0's DWC3_EP_TRANSFER_STARTED flag if the END_TRANSFER command is
completed. Otherwise, we can't start control transfer again after
END_TRANSFER.
Thinh Nguyen [Wed, 27 Nov 2019 21:10:47 +0000 (13:10 -0800)]
usb: dwc3: gadget: Clear started flag for non-IOC
Normally the END_TRANSFER command completion handler will clear the
DWC3_EP_TRANSFER_STARTED flag. However, if the command was sent without
interrupt on completion, then the flag will not be cleared. Make sure to
clear the flag in this case.
Tejas Joglekar [Wed, 13 Nov 2019 06:15:16 +0000 (11:45 +0530)]
usb: dwc3: gadget: Fix logical condition
This patch corrects the condition to kick the transfer without
giving back the requests when either request has remaining data
or when there are pending SGs. The && check was introduced during
spliting up the dwc3_gadget_ep_cleanup_completed_requests() function.
Johan Hovold [Tue, 10 Dec 2019 11:25:58 +0000 (12:25 +0100)]
USB: atm: ueagle-atm: add missing endpoint check
Make sure that the interrupt interface has an endpoint before trying to
access its endpoint descriptors to avoid dereferencing a NULL pointer.
The driver binds to the interrupt interface with interface number 0, but
must not assume that this interface or its current alternate setting are
the first entries in the corresponding configuration arrays.
Ben Skeggs [Tue, 10 Dec 2019 02:15:44 +0000 (12:15 +1000)]
drm/nouveau/kms/nv50-: fix panel scaling
Under certain circumstances, encoder atomic_check() can be entered
without adjusted_mode having been reset to the same as mode, which
confuses the scaling logic and can lead to a misprogrammed display.
Fix this by checking against the user-provided mode directly.
Lyude Paul [Fri, 15 Nov 2019 21:07:20 +0000 (16:07 -0500)]
drm/nouveau/kms/nv50-: Limit MST BPC to 8
Noticed this while working on some unrelated CRC stuff. Currently,
userspace has very little support for BPCs higher than 8. While this
doesn't matter for most things, on MST topologies we need to be careful
about ensuring that we do our best to make any given display
configuration fit within the bandwidth restraints of the topology, since
otherwise less people's monitor configurations will work.
Allowing for BPC settings higher than 8 dramatically increases the
required bandwidth for displays in most configurations, and consequently
makes it a lot less likely that said display configurations will pass
the atomic check.
In the future we want to fix this correctly by making it so that we
adjust the bpp for each display in a topology to be as high as possible,
while making sure to lower the bpp of each display in the event that we
run out of bandwidth and need to rerun our atomic check. But for now,
follow the behavior that both i915 and amdgpu are sticking to.
Lyude Paul [Fri, 15 Nov 2019 21:07:19 +0000 (16:07 -0500)]
drm/nouveau/kms/nv50-: Store the bpc we're using in nv50_head_atom
In order to be able to use bpc values that are different from what the
connector reports, we want to be able to store the bpc value we decide
on using for an atomic state in nv50_head_atom and refer to that instead
of simply using the value that the connector reports throughout the
whole atomic check phase and commit phase. This will let us (eventually)
implement the max bpc connector property, and will also be needed for
limiting the bpc we use on MST displays to 8 in the next commit.
Lyude Paul [Fri, 15 Nov 2019 21:07:18 +0000 (16:07 -0500)]
drm/nouveau/kms/nv50-: Call outp_atomic_check_view() before handling PBN
Since nv50_outp_atomic_check_view() can set crtc_state->mode_changed, we
probably should be calling it before handling any PBN changes. Just a
precaution.
Hans de Goede [Thu, 24 Oct 2019 08:52:53 +0000 (10:52 +0200)]
drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware
We do not support atomic modesetting on pre-nv50 hardware, but until now
our connector code was setting drm_connector->state on pre-nv50 hardware.
This causes the core to enter atomic modesetting paths in at least:
1. drm_connector_get_encoder(), returning connector->state->best_encoder
which is always 0, causing us to always report 0 as encoder_id in
the drmModeConnector struct returned by drmModeGetConnector().
2. drm_encoder_get_crtc(), returning NULL because uses_atomic get set,
causing us to always report 0 as crtc_id in the drmModeEncoder struct
returned by drmModeGetEncoder()
Which in turn confuses userspace, at least plymouth thinks that the pipe
has changed because of this and tries to reconfigure it unnecessarily.
More in general we should not set drm_connector->state in the non-atomic
code as this violates the drm-core's expectations.
This commit fixes this by using a nouveau_conn_atom struct embedded in the
nouveau_connector struct for property handling in the non-atomic case.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1706557 Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Ben Skeggs <[email protected]>
Hans de Goede [Thu, 24 Oct 2019 08:52:52 +0000 (10:52 +0200)]
drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
Place the declaration of struct nouveau_conn_atom above that of
struct nouveau_connector. This commit makes no changes to the moved
block what so ever, it just moves it up a bit.
This is a preparation patch to fix some issues with connector handling
on pre nv50 displays (which do not use atomic modesetting).
Pete Zaitcev [Thu, 5 Dec 2019 02:39:41 +0000 (20:39 -0600)]
usb: mon: Fix a deadlock in usbmon between mmap and read
The problem arises because our read() function grabs a lock of the
circular buffer, finds something of interest, then invokes copy_to_user()
straight from the buffer, which in turn takes mm->mmap_sem. In the same
time, the callback mon_bin_vma_fault() is invoked under mm->mmap_sem.
It attempts to take the fetch lock and deadlocks.
This patch does away with protecting of our page list with any
semaphores, and instead relies on the kernel not close the device
while mmap is active in a process.
In addition, we prohibit re-sizing of a buffer while mmap is active.
This way, when (now unlocked) fault is processed, it works with the
page that is intended to be mapped-in, and not some other random page.
Note that this may have an ABI impact, but hopefully no legitimate
program is this wrong.
Bryan O'Donoghue [Thu, 28 Nov 2019 13:43:57 +0000 (13:43 +0000)]
usb: common: usb-conn-gpio: Don't log an error on probe deferral
This patch makes the printout of the error message for failing to get a
VBUS regulator handle conditional on the error code being something other
than -EPROBE_DEFER.
Deferral is a normal thing, we don't need an error message for this.
usb: core: urb: fix URB structure initialization function
Explicitly initialize URB structure urb_list field in usb_init_urb().
This field can be potentially accessed uninitialized and its
initialization is coherent with the usage of list_del_init() in
usb_hcd_unlink_urb_from_ep() and usb_giveback_urb_bh() and its
explicit initialization in usb_hcd_submit_urb() error path.
Chris Wilson [Mon, 9 Dec 2019 02:32:15 +0000 (02:32 +0000)]
drm/i915/gt: Detect if we miss WaIdleLiteRestore
In order to avoid confusing the HW, we must never submit an empty ring
during lite-restore, that is we should always advance the RING_TAIL
before submitting to stay ahead of the RING_HEAD.
Normally this is prevented by keeping a couple of spare NOPs in the
request->wa_tail so that on resubmission we can advance the tail. This
relies on the request only being resubmitted once, which is the normal
condition as it is seen once for ELSP[1] and then later in ELSP[0]. On
preemption, the requests are unwound and the tail reset back to the
normal end point (as we know the request is incomplete and therefore its
RING_HEAD is even earlier).
However, if this w/a should fail we would try and resubmit the request
with the RING_TAIL already set to the location of this request's wa_tail
potentially causing a GPU hang. We can spot when we do try and
incorrectly resubmit without advancing the RING_TAIL and spare any
embarrassment by forcing the context restore.
In the case of preempt-to-busy, we leave the requests running on the HW
while we unwind. As the ring is still live, we cannot rewind our
rq->tail without forcing a reload so leave it set to rq->wa_tail and
only force a reload if we resubmit after a lite-restore. (Normally, the
forced reload will be a part of the preemption event.)
intel_hdcp_transcoder_config() is clobbering some globally visible
state in .compute_config(). That is a big no no as .compute_config()
is supposed to have no visible side effects when either the commit
fails or it's just a TEST_ONLY commit.
Inline this stuff into intel_hdcp_enable() so that the state only
gets modified when we actually commit the state to the hardware.
Ville Syrjälä [Wed, 27 Nov 2019 20:12:09 +0000 (22:12 +0200)]
drm/i915/fbc: Disable fbc by default on all glk+
We're missing a workaround in the fbc code for all glk+ platforms
which can cause corruption around the top of the screen. So
enabling fbc by default is a bad idea. I'm not keen to backport
the w/a so let's start by disabling fbc by default on all glk+.
We'll lift the restriction once the w/a is in place.
drm/i915/perf: Allow non-privileged access when OA buffer is not sampled
SAMPLE_OA_REPORT enables sampling of OA reports from the OA buffer.
Since reports from OA buffer had system wide visibility, collecting
samples from the OA buffer was a privileged operation on previous
platforms. Prior to TGL, it was also necessary to sample the OA buffer
to normalize reports from MI REPORT PERF COUNT.
TGL has a dedicated OAR unit to sample perf reports for a specific
render context. This removes the necessity to sample OA buffer.
- If not sampling the OA buffer, allow non-privileged access. An earlier
patch allows the non-privilege access:
https://patchwork.freedesktop.org/patch/337716/?series=68582&rev=1
- Clear up the path for non-privileged access in this patch
Daniel Vetter [Wed, 4 Dec 2019 21:51:05 +0000 (22:51 +0100)]
MAINTAINERS: Match on dma_buf|fence|resv anywhere
I've spent a bit too much time reviewing all kinds of users all over
the kernel for this buffer sharing infrastructure. And some of it is
at least questionable.
Make sure we at least see when this stuff flies by.