Helge Deller [Mon, 12 Feb 2018 20:43:55 +0000 (21:43 +0100)]
parisc: Reduce irq overhead when run in qemu
When run under QEMU, calling mfctl(16) creates some overhead because the
qemu timer has to be scaled and moved into the register. This patch
reduces the number of calls to mfctl(16) by moving the calls out of the
loops.
Additionally, increase the minimal time interval to 8000 cycles instead
of 500 to compensate possible QEMU delays when delivering interrupts.
Helge Deller [Fri, 12 Jan 2018 21:51:22 +0000 (22:51 +0100)]
parisc: Check if secondary CPUs want own PDC calls
The architecture specification says (for 64-bit systems): PDC is a per
processor resource, and operating system software must be prepared to
manage separate pointers to PDCE_PROC for each processor. The address
of PDCE_PROC for the monarch processor is stored in the Page Zero
location MEM_PDC. The address of PDCE_PROC for each non-monarch
processor is passed in gr26 when PDCE_RESET invokes OS_RENDEZ.
Currently we still use one PDC for all CPUs, but in case we face a
machine which is following the specification let's warn about it.
The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
SMP stalls. The problem is some drivers call these routines with
interrupts disabled. Interrupts need to be enabled for flush_tlb_all()
and flush_cache_all() to work. This version adds checks to ensure
interrupts are not disabled before calling routines that need IPI
interrupts. When interrupts are disabled, we now drop into slower code.
The attached change fixes the ordering of cache and TLB flushes in
several cases. When we flush the cache using the existing PTE/TLB
entries, we need to flush the TLB after doing the cache flush. We don't
need to do this when we flush the entire instruction and data caches as
these flushes don't use the existing TLB entries. The same is true for
tmpalias region flushes.
The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
routines have been updated.
Secondly, we added a new purge_kernel_dcache_range_asm() routine to
pacache.S and use it in invalidate_kernel_vmap_range(). Nominally,
purges are faster than flushes as the cache lines don't have to be
written back to memory.
Hopefully, this is sufficient to resolve the remaining problems due to
cache speculation. So far, testing indicates that this is the case. I
did work up a patch using tmpalias flushes, but there is a performance
hit because we need the physical address for each page, and we also need
to sequence access to the tmpalias flush code. This increases the
probability of stalls.
Hui Wang [Fri, 2 Mar 2018 05:05:36 +0000 (13:05 +0800)]
ALSA: hda - Fix a wrong FIXUP for alc289 on Dell machines
With the alc289, the Pin 0x1b is Headphone-Mic, so we should assign
ALC269_FIXUP_DELL4_MIC_NO_PRESENCE rather than
ALC225_FIXUP_DELL1_MIC_NO_PRESENCE to it. And this change is suggested
by Kailang of Realtek and is verified on the machine.
Paul Mackerras [Fri, 2 Mar 2018 04:38:04 +0000 (15:38 +1100)]
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
The current code for initializing the VRMA (virtual real memory area)
for HPT guests requires the page size of the backing memory to be one
of 4kB, 64kB or 16MB. With a radix host we have the possibility that
the backing memory page size can be 2MB or 1GB. In these cases, if the
guest switches to HPT mode, KVM will not initialize the VRMA and the
guest will fail to run.
In fact it is not necessary that the VRMA page size is the same as the
backing memory page size; any VRMA page size less than or equal to the
backing memory page size is acceptable. Therefore we now choose the
largest page size out of the set {4k, 64k, 16M} which is not larger
than the backing memory page size.
Paul Mackerras [Fri, 23 Feb 2018 10:21:12 +0000 (21:21 +1100)]
KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages. Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.
Fixing this exposed bugs in kvmppc_create_pte(). We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.
This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true. Instead we
mark the page dirty explicitly with set_page_dirty_lock(). This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Add schedule points and reduce the number of loop iterations
the test_bpf kernel module is performing in order to not hog
the CPU for too long, from Eric.
2) Fix an out of bounds access in tail calls in the ppc64 BPF
JIT compiler, from Daniel.
3) Fix a crash on arm64 on unaligned BPF xadd operations that
could be triggered via interpreter and JIT, from Daniel.
Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================
Edward Cree [Wed, 28 Feb 2018 19:15:58 +0000 (19:15 +0000)]
net: ethtool: don't ignore return from driver get_fecparam method
If ethtool_ops->get_fecparam returns an error, pass that error on to the
user, rather than ignoring it.
Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Mike Manning [Mon, 26 Feb 2018 23:49:30 +0000 (23:49 +0000)]
net: allow interface to be set into VRF if VLAN interface in same VRF
Setting an interface into a VRF fails with 'RTNETLINK answers: File
exists' if one of its VLAN interfaces is already in the same VRF.
As the VRF is an upper device of the VLAN interface, it is also showing
up as an upper device of the interface itself. The solution is to
restrict this check to devices other than master. As only one master
device can be linked to a device, the check in this case is that the
upper device (VRF) being linked to is not the same as the master device
instead of it not being any one of the upper devices.
The following example shows an interface ens12 (with a VLAN interface
ens12.10) being set into VRF green, which behaves as expected:
# ip link add link ens12 ens12.10 type vlan id 10
# ip link set dev ens12 master vrfgreen
# ip link show dev ens12
3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
master vrfgreen state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
But if the VLAN interface has previously been set into the same VRF,
then setting the interface into the VRF fails:
# ip link set dev ens12 nomaster
# ip link set dev ens12.10 master vrfgreen
# ip link show dev ens12.10
39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue master vrfgreen state UP mode DEFAULT group default
qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
# ip link set dev ens12 master vrfgreen
RTNETLINK answers: File exists
The workaround is to move the VLAN interface back into the default VRF
beforehand, but it has to be shut first so as to avoid the risk of
traffic leaking from the VRF. This fix avoids needing this workaround.
Alastair D'Silva [Thu, 22 Feb 2018 04:17:38 +0000 (15:17 +1100)]
ocxl: Add get_metadata IOCTL to share OCXL information to userspace
Some required information is not exposed to userspace currently (eg. the
PASID), pass this information back, along with other information which
is currently communicated via sysfs, which saves some parsing effort in
userspace.
Darren Trapp [Wed, 28 Feb 2018 00:31:12 +0000 (16:31 -0800)]
scsi: qla2xxx: Fix FC-NVMe LUN discovery
commit a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify
fabric discovery") introduced regression when it did not consider
FC-NVMe code path which broke NVMe LUN discovery.
Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Signed-off-by: Darren Trapp <[email protected]> Signed-off-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Hannes Reinecke [Thu, 22 Feb 2018 08:49:37 +0000 (09:49 +0100)]
scsi: qla2xxx: ensure async flags are reset correctly
The fcport flags FCF_ASYNC_ACTIVE and FCF_ASYNC_SENT are used to
throttle the state machine, so we need to ensure to always set and unset
them correctly. Not doing so will lead to the state machine getting
confused and no login attempt into remote ports.
Hannes Reinecke [Thu, 22 Feb 2018 08:49:35 +0000 (09:49 +0100)]
scsi: qla2xxx: Fixup locking for session deletion
Commit d8630bb95f46 ('Serialize session deletion by using work_lock')
tries to fixup a deadlock when deleting sessions, but fails to take into
account the locking rules. This patch resolves the situation by
introducing a separate lock for processing the GNLIST response, and
ensures that sess_lock is released before calling
qlt_schedule_sess_delete().
Looking at the assembly of get_next_timer_interrupt(), address came
from %r8 (ffff95e1f6451188) which is pointing to list_head with single
entry at ffff95e5ff621178.
Michael Ellerman [Mon, 26 Feb 2018 04:22:22 +0000 (15:22 +1100)]
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
The subpage_prot syscall is only functional when the system is using
the Hash MMU. Since commit 5b2b80714796 ("powerpc/mm: Invalidate
subpage_prot() system call on radix platforms") it returns ENOENT when
the Radix MMU is active. Currently this just makes the test fail.
Additionally the syscall is not available if the kernel is built with
4K pages, or if CONFIG_PPC_SUBPAGE_PROT=n, in which case it returns
ENOSYS because the syscall is missing entirely.
So check explicitly for ENOENT and ENOSYS and skip if we see either of
those.
Masahiro Yamada [Fri, 16 Feb 2018 18:38:32 +0000 (03:38 +0900)]
kconfig: set SYMBOL_AUTO to the symbol marked with defconfig_list
The 'defconfig_list' is a weird attribute. If the '.config' is
missing, conf_read_simple() iterates over all visible defaults,
then it uses the first one for which fopen() succeeds.
However, like other symbols, the first visible default is always
written out to the .config file. This might be different from what
has been actually used.
For example, on my machine, the third one "/boot/config-$UNAME_RELEASE"
is opened, like follows:
$ rm .config
$ make oldconfig 2>/dev/null
scripts/kconfig/conf --oldconfig Kconfig
#
# using defaults found in /boot/config-4.4.0-112-generic
#
*
* Restart config...
*
*
* IRQ subsystem
*
Expose irq internals in debugfs (GENERIC_IRQ_DEBUGFS) [N/y/?] (NEW)
However, the resulted .config file contains the first one since it is
visible:
Linus Torvalds [Thu, 1 Mar 2018 23:56:15 +0000 (15:56 -0800)]
Merge tag 'drm-fixes-for-v4.16-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Pretty much run of the mill drm fixes.
amdgpu:
- power management fixes
- some display fixes
- one ppc 32-bit dma fix
i915:
- two display fixes
- three gem fixes
sun4i:
- display regression fixes
nouveau:
- display regression fix
virtio-gpu:
- dumb airlied ioctl fix"
* tag 'drm-fixes-for-v4.16-rc4' of git://people.freedesktop.org/~airlied/linux: (25 commits)
drm/amdgpu: skip ECC for SRIOV in gmc late_init
drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
drm/amdgpu: fix&cleanups for wb_clear
drm/amdgpu: Correct sdma_v4 get_wptr(v2)
drm/amd/powerplay: fix power over limit on Fiji
drm/amdgpu:Fixed wrong emit frame size for enc
drm/amdgpu: move WB_FREE to correct place
drm/amdgpu: only flush hotplug work without DC
drm/amd/display: check for ipp before calling cursor operations
drm/i915: Make global seqno known in i915_gem_request_execute tracepoint
drm/i915: Clear the in-use marker on execbuf failure
drm/i915/cnl: Fix PORT_TX_DW5/7 register address
drm/i915/audio: fix check for av_enc_map overflow
drm/i915: Fix rsvd2 mask when out-fence is returned
virtio-gpu: fix ioctl and expose the fixed status to userspace.
drm/sun4i: Protect the TCON pixel clocks
drm/sun4i: Enable the output on the pins (tcon0)
drm/nouveau: prefer XBGR2101010 for addfb ioctl
drm/radeon: insist on 32-bit DMA for Cedar on PPC64/PPC64LE
drm/amd/display: VGA black screen from s3 when attached to hook
...
Linus Torvalds [Thu, 1 Mar 2018 22:32:23 +0000 (14:32 -0800)]
Merge tag 'arc-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- MCIP aka ARconnect fixes for SMP builds [Euginey]
- preventive fix for SLC (L2 cache) flushing [Euginey]
- Kconfig default fix [Ulf Magnusson]
- trailing semicolon fixes [Luis de Bethencourt]
- other assorted minor fixes
* tag 'arc-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: setup cpu possible mask according to possible-cpus dts property
ARC: mcip: update MCIP debug mask when the new cpu came online
ARC: mcip: halt GFRC counter when ARC cores halt
ARCv2: boot log: fix HS48 release number
arc: dts: use 'atmel' as manufacturer for at24 in axs10x_mb
ARC: Fix malformed ARC_EMUL_UNALIGNED default
ARC: boot log: Fix trailing semicolon
ARC: dw2 unwind: Fix trailing semicolon
ARC: Enable fatal signals on boot for dev platforms
ARCv2: Don't pretend we may set L-bit in STATUS32 with kflag instruction
ARCv2: cache: fix slc_entire_op: flush only instead of flush-n-inv
xfs: don't start out with the exclusive ilock for direct I/O
There is no reason to take the ilock exclusively at the start of
xfs_file_iomap_begin for direct I/O, given that it will be demoted
just before calling xfs_iomap_write_direct anyway.
Jason Yan [Wed, 28 Feb 2018 01:11:10 +0000 (09:11 +0800)]
ata: do not schedule hot plug if it is a sas host
We've got a kernel panic when using sata disk with sas controller:
[115946.152283] Unable to handle kernel NULL pointer dereference at virtual address 000007d8
[115946.223963] CPU: 0 PID: 22175 Comm: kworker/0:1 Tainted: G W OEL 4.14.0 #1
[115946.232925] Workqueue: events ata_scsi_hotplug
[115946.237938] task: ffff8021ee50b180 task.stack: ffff00000d5d0000
[115946.244717] PC is at sas_find_dev_by_rphy+0x44/0x114
[115946.250224] LR is at sas_find_dev_by_rphy+0x3c/0x114
......
[115946.355701] Process kworker/0:1 (pid: 22175, stack limit = 0xffff00000d5d0000)
[115946.363369] Call trace:
[115946.456356] [<ffff000008878a9c>] sas_find_dev_by_rphy+0x44/0x114
[115946.462908] [<ffff000008878b8c>] sas_target_alloc+0x20/0x5c
[115946.469408] [<ffff00000885a31c>] scsi_alloc_target+0x250/0x308
[115946.475781] [<ffff00000885ba30>] __scsi_add_device+0xb0/0x154
[115946.481991] [<ffff0000088b520c>] ata_scsi_scan_host+0x180/0x218
[115946.488367] [<ffff0000088b53d8>] ata_scsi_hotplug+0xb0/0xcc
[115946.494801] [<ffff0000080ebd70>] process_one_work+0x144/0x390
[115946.501115] [<ffff0000080ec100>] worker_thread+0x144/0x418
[115946.507093] [<ffff0000080f2c98>] kthread+0x10c/0x138
[115946.512792] [<ffff0000080855dc>] ret_from_fork+0x10/0x18
We found that Ding Xiang has reported a similar bug before:
https://patchwork.kernel.org/patch/9179817/
And this bug still exists in mainline. Since libsas handles hotplug and
device adding/removing itself, do not need to schedule ata hot plug task
here if it is a sas host.
Wanpeng Li [Wed, 28 Feb 2018 06:03:31 +0000 (14:03 +0800)]
KVM: X86: Allow userspace to define the microcode version
Linux (among the others) has checks to make sure that certain features
aren't enabled on a certain family/model/stepping if the microcode version
isn't greater than or equal to a known good version.
By exposing the real microcode version, we're preventing buggy guests that
don't check that they are running virtualized (i.e., they should trust the
hypervisor) from disabling features that are effectively not buggy.
Linus Torvalds [Thu, 1 Mar 2018 18:50:01 +0000 (10:50 -0800)]
Merge tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform drivers fixes from Andy Shevchenko:
- fix a regression on laptops like Dell XPS 9360 where keyboard stopped
working.
- correct sysfs wakeup attribute after removal of some drivers to
reflect that they are not able to wake system up anymore.
* tag 'platform-drivers-x86-v4.16-5' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: wmi: Fix misuse of vsprintf extension %pULL
platform/x86: intel-hid: Reset wakeup capable flag on removal
platform/x86: intel-vbtn: Reset wakeup capable flag on removal
platform/x86: intel-vbtn: Only activate tablet mode switch on 2-in-1's
Linus Torvalds [Thu, 1 Mar 2018 18:06:39 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
"Make sure that we wake up userspace loggers. This fixes a race
introduced by the console waiter logic during this merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: Wake klogd when passing console_lock owner
Tom Lendacky [Fri, 23 Feb 2018 23:18:20 +0000 (00:18 +0100)]
KVM: SVM: Add MSR-based feature support for serializing LFENCE
In order to determine if LFENCE is a serializing instruction on AMD
processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state
of bit 1 checked. This patch will add support to allow a guest to
properly make this determination.
Add the MSR feature callback operation to svm.c and add MSR 0xc0011029
to the list of MSR-based features. If LFENCE is serializing, then the
feature is supported, allowing the hypervisor to set the value of the
MSR that guest will see. Support is also added to write (hypervisor only)
and read the MSR value for the guest. A write by the guest will result in
a #GP. A read by the guest will return the value as set by the host. In
this way, the support to expose the feature to the guest is controlled by
the hypervisor.
Tom Lendacky [Wed, 21 Feb 2018 19:39:51 +0000 (13:39 -0600)]
KVM: x86: Add a framework for supporting MSR-based features
Provide a new KVM capability that allows bits within MSRs to be recognized
as features. Two new ioctls are added to the /dev/kvm ioctl routine to
retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops
callback is used to determine support for the listed MSR-based features.
Ming Lei [Tue, 6 Feb 2018 12:17:42 +0000 (20:17 +0800)]
nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors
84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
has switched to do irq vectors spread among all possible CPUs, so
pass num_possible_cpus() as max vecotrs to be assigned.
For example, in a 8 cores system, 0~3 online, 4~8 offline/not present,
see 'lscpu':
[ming@box]$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
NUMA node(s): 2
...
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s):
...
1) before this patch, follows the allocated vectors and their affinity:
irq 47, cpu list 0,4
irq 48, cpu list 1,6
irq 49, cpu list 2,5
irq 50, cpu list 3,7
2) after this patch, follows the allocated vectors and their affinity:
irq 43, cpu list 0
irq 44, cpu list 1
irq 45, cpu list 2
irq 46, cpu list 3
irq 47, cpu list 4
irq 48, cpu list 6
irq 49, cpu list 5
irq 50, cpu list 7
There is a lock ordering created between mmap_sem and inode->i_rwsem
causing a lockdep splat [2] during a syzcaller test, this patch fixes
the issue by unlocking the mutex earlier. Functionally that's Ok since
we don't need to protect vfs_llseek.
Wen Xiong [Thu, 15 Feb 2018 20:05:10 +0000 (14:05 -0600)]
nvme-pci: Fix EEH failure on ppc
Triggering PPC EEH detection and handling requires a memory mapped read
failure. The NVMe driver removed the periodic health check MMIO, so
there's no early detection mechanism to trigger the recovery. Instead,
the detection now happens when the nvme driver handles an IO timeout
event. This takes the pci channel offline, so we do not want the driver
to proceed with escalating its own recovery efforts that may conflict
with the EEH handler.
This patch ensures the driver will observe the channel was set to offline
after a failed MMIO read and resets the IO timer so the EEH handler has
a chance to recover the device.
Linus Torvalds [Thu, 1 Mar 2018 16:17:01 +0000 (08:17 -0800)]
Merge tag 'gpio-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Fix up device tree properties readout caused by my own refactorings"
* tag 'gpio-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Handle deferred probing in of_find_gpio() properly
gpiolib: Keep returning EPROBE_DEFER when we should
Jiufei Xue [Tue, 27 Feb 2018 12:10:18 +0000 (20:10 +0800)]
block: display the correct diskname for bio
bio_devname use __bdevname to display the device name, and can
only show the major and minor of the part0,
Fix this by using disk_name to display the correct name.
Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reviewed-by: Omar Sandoval <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jiufei Xue <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Damien Le Moal [Wed, 28 Feb 2018 17:35:29 +0000 (09:35 -0800)]
mq-deadline: Make sure to always unlock zones
In case of a failed write request (all retries failed) and when using
libata, the SCSI error handler calls scsi_finish_command(). In the
case of blk-mq this means that scsi_mq_done() does not get called,
that blk_mq_complete_request() does not get called and also that the
mq-deadline .completed_request() method is not called. This results in
the target zone of the failed write request being left in a locked
state, preventing that any new write requests are issued to the same
zone.
Fix this by replacing the .completed_request() method with the
.finish_request() method as this method is always called whether or
not a request completes successfully. Since the .finish_request()
method is only called by the blk-mq core if a .prepare_request()
method exists, add a dummy .prepare_request() method.
kbuild: disable sparse warnings about unknown attributes
Currently, sparse issues warnings on code using an attribute
it doesn't know about.
One of the problem with this is that these warnings have no
value for the developer, it's just noise for him. At best these
warnings tell something about some deficiencies of sparse itself
but not about a potential problem with code analyzed.
A second problem with this is that sparse release are, alas,
less frequent than new attributes are added to GCC.
So, avoid the noise by asking sparse to not warn about
attributes it doesn't know about.
Ulf Magnusson [Tue, 13 Feb 2018 07:58:20 +0000 (08:58 +0100)]
Makefile: Fix lying comment re. silentoldconfig
The comment above the silentoldconfig invocation is outdated.
'make oldconfig' updates just .config and doesn't touch the
include/config/ tree.
This came up in https://lkml.org/lkml/2018/2/12/415.
While fixing the comment, make it more informative by explaining the
purpose of the unfortunately named silentoldconfig.
I can't make sense of the comment re. auto.conf.cmd and a cleaned tree.
include/config/auto.conf and include/config/auto.conf.cmd are both
created simultaneously by silentoldconfig (in
scripts/kconfig/confdata.c, by conf_write_autoconf()), and nothing seems
to remove auto.conf.cmd that wouldn't remove auto.conf. Remove that part
of the comment rather than blindly copying it. It might be a leftover
from an older way of doing things.
The include/config/auto.conf.cmd prerequisite might be there to ensure
that silentoldconfig gets rerun if conf_write_autoconf() fails between
writing out auto.conf.cmd and auto.conf (a comment in the function
indicates that auto.conf is deliberately written out last to mark
completion of the operation). It seems the Makefile dependency between
include/config/auto.conf and .config would already take care of that
though, since include/config/auto.conf would still be out of date re.
.config if the operation fails.
Filipe Manana [Wed, 28 Feb 2018 15:56:10 +0000 (15:56 +0000)]
Btrfs: fix log replay failure after unlink and link combination
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with a log that fails to replay,
causing a mount failure.
This happens because the log has inode reference items for both inode 258
(the first file we created) and inode 259 (the second file created), and
when processing the reference item for inode 258, we replace the
corresponding item in the subvolume tree (which has two names, "foo" and
"bar") witht he one in the log (which only has one name, "foo") without
removing the corresponding dir index keys from the parent directory.
Later, when processing the inode reference item for inode 259, which has
a name of "bar" associated to it, we notice that dir index entries exist
for that name and for a different inode, so we attempt to unlink that
name, which fails because the inode reference item for inode 258 no longer
has the name "bar" associated to it, making a call to btrfs_unlink_inode()
fail with a -ENOENT error.
Fix this by unlinking all the names in an inode reference item from a
subvolume tree that are not present in the inode reference item found in
the log tree, before overwriting it with the item from the log tree.
Filipe Manana [Wed, 28 Feb 2018 15:55:40 +0000 (15:55 +0000)]
Btrfs: fix log replay failure after linking special file and fsync
If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an EEXIST error is returned and mounting
the filesystem fails. Example scenario:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ mkfifo /mnt/testdir/foo
# Make sure everything done so far is durably persisted.
$ sync
# Create some unrelated file and fsync it, this is just to create a log
# tree. The file must be in the same directory as our special file.
$ touch /mnt/testdir/f1
$ xfs_io -c "fsync" /mnt/testdir/f1
# Rename our special file and then create a hard link with its old name.
$ mv /mnt/testdir/foo /mnt/testdir/bar
$ ln /mnt/testdir/bar /mnt/testdir/foo
# Create some other unrelated file and fsync it, this is just to persist
# the log tree which was modified by the previous rename and link
# operations. Alternatively we could have modified file f1 and fsync it.
$ touch /mnt/f2
$ xfs_io -c "fsync" /mnt/f2
<power failure>
$ mount /dev/sdc /mnt
mount: mount /dev/sdc on /mnt failed: File exists
This happens because when both the log tree and the subvolume's tree have
an entry in the directory "testdir" with the same name, that is, there
is one key (258 INODE_REF 257) in the subvolume tree and another one in
the log tree (where 258 is the inode number of our special file and 257
is the inode for directory "testdir"). Only the data of those two keys
differs, in the subvolume tree the index field for inode reference has
a value of 3 while the log tree it has a value of 5. Because the same key
exists in both trees, but have different index, the log replay fails with
an -EEXIST error when attempting to replay the inode reference from the
log tree.
Fix this by setting the last_unlink_trans field of the inode (our special
file) to the current transaction id when a hard link is created, as this
forces logging the parent directory inode, solving the conflict at log
replay time.
A new generic test case for fstests was also submitted.
Filipe Manana [Tue, 6 Feb 2018 20:39:20 +0000 (20:39 +0000)]
Btrfs: send, fix issuing write op when processing hole in no data mode
When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.
Trivial reproducer:
$ mkfs.btrfs -f -O no-holes /dev/sdc
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdc /mnt/sdc
$ mount /dev/sdd /mnt/sdd
Anand Jain [Thu, 22 Feb 2018 13:58:42 +0000 (21:58 +0800)]
btrfs: use proper endianness accessors for super_copy
The fs_info::super_copy is a byte copy of the on-disk structure and all
members must use the accessor macros/functions to obtain the right
value. This was missing in update_super_roots and in sysfs readers.
Moving between opposite endianness hosts will report bogus numbers in
sysfs, and mount may fail as the root will not be restored correctly. If
the filesystem is always used on a same endian host, this will not be a
problem.
Fix this by using the btrfs_set_super...() functions to set
fs_info::super_copy values, and for the sysfs, use the cached
fs_info::nodesize/sectorsize values.
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.
The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.
Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.
The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.
Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.
Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.
Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.
The real problem was already introduced with the rest of the code in 73c5de0051.
The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.
Nikolay Borisov [Mon, 8 Jan 2018 08:59:43 +0000 (10:59 +0200)]
btrfs: handle failure of add_pending_csums
add_pending_csums was added as part of the new data=ordered
implementation in e6dcd2dc9c48 ("Btrfs: New data=ordered
implementation"). Even back then it called the btrfs_csum_file_blocks
which can fail but it never bothered handling the failure. In ENOMEM
situation this could lead to the filesystem failing to write the
checksums for a particular extent and not detect this. On read this
could lead to the filesystem erroring out due to crc mismatch. Fix it by
propagating failure from add_pending_csums and handling them.
Jeff Mahoney [Fri, 16 Feb 2018 03:59:47 +0000 (22:59 -0500)]
btrfs: use kvzalloc to allocate btrfs_fs_info
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On
kernels built with NR_CPUS=8192, this can result in kmalloc failures
that prevent mounting.
There is work in progress to try to resolve this for every user of
srcu_struct but using kvzalloc will work around the failures until
that is complete.
As an example with NR_CPUS=512 on x86_64: the overall size of
subvol_srcu is 3460 bytes, fs_info is 6496.
platform/x86: intel-hid: Reset wakeup capable flag on removal
The intel-hid device will not be able to wake up the system any more
after removing the notify handler provided by its driver, so make
its sysfs attributes reflect that.
Fixes: ef884112e55c (platform: x86: intel-hid: Wake up the system from suspend-to-idle) Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
platform/x86: intel-vbtn: Reset wakeup capable flag on removal
The intel-vbtn device will not be able to wake up the system any more
after removing the notify handler provided by its driver, so make
its sysfs attributes reflect that.
Fixes: 91f9e850d465 (platform: x86: intel-vbtn: Wake up the system from suspend-to-idle) Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
Thomas Gleixner [Wed, 28 Feb 2018 20:14:26 +0000 (21:14 +0100)]
x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
The separation of the cpu_entry_area from the fixmap missed the fact that
on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
initial_page_table by the previous synchronizations.
This results in suspend/resume failures because 32bit utilizes initial page
table for resume. The absence of the cpu_entry_area mapping results in a
triple fault, aka. insta reboot.
With PAE enabled this works by chance because the PGD entry which covers
the fixmap and other parts incindentally provides the cpu_entry_area
mapping as well.
Synchronize the initial page table after setting up the cpu entry
area. Instead of adding yet another copy of the same code, move it to a
function and invoke it from the various places.
It needs to be investigated if the existing calls in setup_arch() and
setup_per_cpu_areas() can be replaced by the later invocation from
setup_cpu_entry_areas(), but that's beyond the scope of this fix.
We are using test_and_* operations on the status and flag fields of
struct sock_mapping. However, these functions require the operand to be
64-bit aligned on arm64. Currently, only status is 64-bit aligned.
Dave Airlie [Thu, 1 Mar 2018 04:03:14 +0000 (14:03 +1000)]
Merge branch 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few misc fixes for 4.16.
* 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: skip ECC for SRIOV in gmc late_init
drm/amd/amdgpu: Correct VRAM width for APUs with GMC9
drm/amdgpu: fix&cleanups for wb_clear
drm/amdgpu: Correct sdma_v4 get_wptr(v2)
drm/amd/powerplay: fix power over limit on Fiji
drm/amdgpu:Fixed wrong emit frame size for enc
drm/amdgpu: move WB_FREE to correct place
drm/amdgpu: only flush hotplug work without DC
drm/amd/display: check for ipp before calling cursor operations
Dave Airlie [Thu, 1 Mar 2018 04:02:32 +0000 (14:02 +1000)]
Merge tag 'drm-misc-fixes-2018-02-28' of git://people.freedesktop.org/drm-misc into drm-fixes
Two regression fixes here: a fb format regression on nouveau and a 4.16-rc1
regression with on LVDS with one sun4i device. Plus a sun4i and a virtio-gpu
fixes.
* tag 'drm-misc-fixes-2018-02-28' of git://people.freedesktop.org/drm-misc:
virtio-gpu: fix ioctl and expose the fixed status to userspace.
drm/sun4i: Protect the TCON pixel clocks
drm/sun4i: Enable the output on the pins (tcon0)
drm/nouveau: prefer XBGR2101010 for addfb ioctl
Dave Airlie [Thu, 1 Mar 2018 03:59:21 +0000 (13:59 +1000)]
Merge tag 'drm-intel-fixes-2018-02-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- 2 display fixes: audio av_enc_map overflow check, and Cannonlake PLL related register offset.
- 3 gem fixes: Clear for in-fence out-fence, fix for clearing exec_flags on execbuf failure, and add back global seqno to tracepoints that had been removed recently by other fence related patch.
* tag 'drm-intel-fixes-2018-02-28' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Make global seqno known in i915_gem_request_execute tracepoint
drm/i915: Clear the in-use marker on execbuf failure
drm/i915/cnl: Fix PORT_TX_DW5/7 register address
drm/i915/audio: fix check for av_enc_map overflow
drm/i915: Fix rsvd2 mask when out-fence is returned
Linus Torvalds [Thu, 1 Mar 2018 00:11:04 +0000 (16:11 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is the first set of bugfixes for ARM SoCs, fixing a couple of
stability problems, mostly on TI OMAP and Rockchips platforms:
- OMAP2 hwmod clocks must be enabled in the correct order
- OMAP3 Wakeup from resume through PRM IRQ was unreliable
- one regression on OMAP5 caused by a kexec fix
- Rockchip ethernet needs some settings for stable operation on
Rock64
- Rockchip based Chrombook Plus needs another clock setting for
stable display suspend/resume
- Rockchip based phyCORE-RK3288 was able to run at an invalid CPU
clock frequency
- Rockchip MMC link was sometimes unreliable
- multiple fixes to avoid crashes in the Broadcom STB DPFE driver
- fixes for LTO-compilation (orion, davinci, clps711x)
- one fix for an incorrect Kconfig errata selection
- a memory leak in the OMAP timer driver
- a kernel data leak in OMAP1 debugfs files"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
MAINTAINERS: update entries for ARM/STM32
ARM: dts: bcm283x: Move arm-pmu out of soc node
ARM: dts: bcm283x: Fix unit address of local_intc
ARM: dts: NSP: Fix amount of RAM on BCM958625HR
ARM: dts: Set D-Link DNS-313 SATA to muxmode 0
ARM: omap2: set CONFIG_LIRC=y in defconfig
ARM: dts: imx6dl: Include correct dtsi file for Engicam i.CoreM6 DualLite/Solo RQS
memory: brcmstb: dpfe: support new way of passing data from the DCPU
memory: brcmstb: dpfe: fix type declaration of variable "ret"
memory: brcmstb: dpfe: properly mask vendor error bits
ARM: BCM: dts: Remove leading 0x and 0s from bindings notation
ARM: orion: fix orion_ge00_switch_board_info initialization
ARM: davinci: mark spi_board_info arrays as const
ARM: clps711x: mark clps711x_compat as const
arm: zx: dts: Remove leading 0x and 0s from bindings notation
arm64: dts: Remove leading 0x and 0s from bindings notation
arm64: dts: cavium: fix PCI bus dtc warnings
MAINTAINERS: ARM: at91: update my email address
soc: imx: gpc: de-register power domains only if initialized
ARM: dts: rockchip: Fix DWMMC clocks
...
timers: Forward timer base before migrating timers
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.
If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.
In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.
But there is a worse problem than that. The following sequence of events
illustrates it:
- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.
The timer is queued at wheel level 2, with resulting expiry time = 60032
(due to level granularity).
- CPU1 enters idle @60007, with next timer expiry @60020.
- CPU0 is hotplugged at @60009
- CPU1 exits idle and runs the control thread which migrates the
timers from CPU0
timer1 is now queued in level 0 for immediate handling in the next
softirq because the requested expiry time 59969 is before CPU1 base->clk
60007
- CPU1 runs code which forwards the base clock which succeeds because the
next expiring timer. which was collected at idle entry time is still set
to 60020.
So it forwards beyond 60007 and therefore misses to expire the migrated
timer1. That timer gets expired when the wheel wraps around again, which
takes between 63 and 630ms depending on the HZ setting.
Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.
Arnd Bergmann [Wed, 28 Feb 2018 22:27:21 +0000 (23:27 +0100)]
Merge tag 'arm-soc/for-4.16/drivers-fixes' of https://github.com/Broadcom/stblinux into fixes
Pull "Broadcom drivers fixes for 4.16" from Florian Fainelli:
This pull request contains Broadcom SoCs drivers fixes for 4.16, please
pull the following:
- Markus provides two minor fixes to the Broadcom STB DPFE driver, one
to properly mask bits, and a second one to use the correct type. The
third commit is a consequence of a newer DFPE firmware which would
unfortunately crash without appropriate kernel changes.
* tag 'arm-soc/for-4.16/drivers-fixes' of https://github.com/Broadcom/stblinux:
memory: brcmstb: dpfe: support new way of passing data from the DCPU
memory: brcmstb: dpfe: fix type declaration of variable "ret"
memory: brcmstb: dpfe: properly mask vendor error bits
Arnd Bergmann [Wed, 28 Feb 2018 22:26:21 +0000 (23:26 +0100)]
Merge tag 'arm-soc/for-4.16/devicetree-fixes' of https://github.com/Broadcom/stblinux into fixes
Pull "Broadcom devicetree fixes for 4.16" from Florian Fainelli:
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
4.16, please pull the following:
- Mathieu fixes leading 0x and 0's from bindings and Device Tree source
files, he has done this treewide and most of his changes are already in
4.16
- Stefan provides two changes to the BCM283x DTS files in order to fix
DTC warnings
- Florian fixes the amount of RAM on the BCM958625HR reference board to
properly limit to what is initialized by the bootloader
* tag 'arm-soc/for-4.16/devicetree-fixes' of https://github.com/Broadcom/stblinux:
ARM: dts: bcm283x: Move arm-pmu out of soc node
ARM: dts: bcm283x: Fix unit address of local_intc
ARM: dts: NSP: Fix amount of RAM on BCM958625HR
ARM: BCM: dts: Remove leading 0x and 0s from bindings notation
Arnd Bergmann [Wed, 28 Feb 2018 22:24:01 +0000 (23:24 +0100)]
Merge tag 'imx-fixes-4.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
Pull "i.MX fixes for 4.16" from Shawn Guo:
- Fix i.MX GPC driver to remove power domains only when they are
initialized in imx_gpc_probe().
- Fix the broken Engicam i.CoreM6 DualLite/Solo RQS board DT to include
imx6dl.dtsi instead of imx6q.dtsi.
* tag 'imx-fixes-4.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6dl: Include correct dtsi file for Engicam i.CoreM6 DualLite/Solo RQS
soc: imx: gpc: de-register power domains only if initialized
Linus Torvalds [Wed, 28 Feb 2018 21:38:52 +0000 (13:38 -0800)]
Merge tag 'linux-kselftest-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes for various problems in test output, compile errors, and missing
configs"
* tag 'linux-kselftest-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: vm: update .gitignore with new test
selftests: memory-hotplug: silence test command echo
selftests/futex: Fix line continuation in Makefile
selftests: memfd: add config fragment for fuse
selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
selftests/android: Fix line continuation in Makefile
selftest/vDSO: fix O=
selftests: sync: missing CFLAGS while compiling
Eric Huang [Mon, 26 Feb 2018 22:36:19 +0000 (17:36 -0500)]
drm/amd/powerplay: fix power over limit on Fiji
power containment disabled only on Fiji and compute
power profile. It violates PCIe spec and may cause power
supply failed. Enabling it will fix the issue, even the
fix will drop performance of some compute tests.
Monk Liu [Wed, 24 Jan 2018 04:20:32 +0000 (12:20 +0800)]
drm/amdgpu: move WB_FREE to correct place
WB_FREE should be put after all engines's hw_fini
done, otherwise the invalid wptr/rptr_addr would still
be used by engines which trigger abnormal bugs.
This fixes couple DMAR reading error in host side for SRIOV
after guest kmd is unloaded.
Shirish S [Wed, 21 Feb 2018 10:40:33 +0000 (16:10 +0530)]
drm/amd/display: check for ipp before calling cursor operations
Currently all cursor related functions are made to all
pipes that are attached to a particular stream.
This is not applicable to pipes that do not have cursor plane
initialised like underlay.
Hence this patch allows cursor related operations on a pipe
only if ipp in available on that particular pipe.
The check is added to set_cursor_position & set_cursor_attribute.
Linus Torvalds [Wed, 28 Feb 2018 19:40:51 +0000 (11:40 -0800)]
Merge tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix some compiler warnings
- fix block reservations for transactions created during log recovery
- fix resource leaks when respecifying mount options
* tag 'xfs-4.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix potential memory leak in mount option parsing
xfs: reserve blocks for refcount / rmap log item recovery
xfs: use memset to initialize xfs_scrub_agfl_info