Commit 92e5aae45778 "kernel/watchdog: split up config options"
introduced SOFTLOCKUP_DETECTOR which selects LOCKUP_DETECTOR
instead of the latter to be selected itself.
ARC: [plat-axs10x] sdio: Temporary fix of sdio ciu frequency
DW sdio controller has external ciu clock divider controlled
via register in SDIO IP. It divides sdio_ref_clk
(which comes from CGU) by 16 for default. So default mmcclk
clock (which comes to sdk_in) is 25000000 Hz.
Note this is a preventive fix, in line with similar change for HSDK
where this was actually needed. see:
http://lists.infradead.org/pipermail/linux-snps-arc/2017-September/002924.html
ARC: [plat-hsdk] sdio: Temporary fix of sdio ciu frequency
DW sdio controller has external ciu clock divider controlled via
register in SDIO IP. Due to its unexpected default value
(it should divide by 1 but it divides by 8)
SDIO IP uses wrong ciu clock and works unstable
So add temporary fix and change clock frequency from 100000000
to 12500000 Hz until we fix dw sdio driver itself.
ARC: [plat-axs103] Add temporary quirk to reset ethernet IP
DW ethernet controller on AXS10x hangs sometimes after SW reset, so
add temporary quirk to reset DW ethernet controller IP core.
This quirk can be removed after axs10x reset driver
(see http://patchwork.ozlabs.org/patch/800273/)
or simple reset driver
(see https://patchwork.kernel.org/patch/9903375/)
will be available in upstream.
Olof Johansson [Wed, 4 Oct 2017 01:15:58 +0000 (18:15 -0700)]
Merge tag 'omap-for-v4.14/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Fixes for omaps for v4.14-rc cycle
Few minor fixes for omaps, mostly just boot time warning fixes:
- Drop undocumented camera binding that got merged during the merge window by
accident as I applied before Sakari's comments
- Fix soft reset warning for dra7 kexec boot for gpio1 as the optional clocks
need to be enabled for reset
- Fix dra7 kexec boot clock rate for McASP as the rate is no longer the default
rate after kexec
- Fix omap3 pandora MMC warning during boot
- Add am33xx SPI alias like we have on other SoCs
- Remove node for non-existing CPSW EMAC Ethernet on am43xx-epos-evm
* tag 'omap-for-v4.14/fixes-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am43xx-epos-evm: Remove extra CPSW EMAC entry
ARM: dts: am33xx: Add spi alias to match SOC schematics
ARM: OMAP2+: hsmmc: fix logic to call either omap_hsmmc_init or omap_hsmmc_late_init but not both
ARM: dts: dra7: Set a default parent to mcasp3_ahclkx_mux
ARM: OMAP2+: dra7xx: Set OPT_CLKS_IN_RESET flag for gpio1
ARM: dts: nokia n900: drop unneeded/undocumented parts of the dts
Olof Johansson [Wed, 4 Oct 2017 01:13:35 +0000 (18:13 -0700)]
Merge tag 'v4.14-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes
Adding the operating points on rk3368 like they were did not end up well
for the boards as all of them are missing their cpu supplies, the OPPs
actually need to follow the <target min max> format as the regulator is
shared between both clusters and the one rk3368 board I have, somehow also
doesn't like the higher opps at all - all of which I only realized after
I brought my rk3368 board online again, after its bootloader broke.
So we revert that OPP addition for now.
And also two fixes for the mipi dsi controller on rk3399, which was
referencing a clock to high up in the clock-tree so that an intermediate
gate could be disabled inadvertently and also needs a clock for its area
in the general register files of the rk3399 soc.
* tag 'v4.14-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: add the grf clk for dw-mipi-dsi on rk3399
arm64: dts: rockchip: Correct MIPI DPHY PLL clock on rk3399
Revert "arm64: dts: rockchip: Add basic cpu frequencies for RK3368"
Olof Johansson [Wed, 4 Oct 2017 01:10:12 +0000 (18:10 -0700)]
Merge tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux into fixes
Reset controller fixes for v4.14
- Remove misleading HSDK v1 suffix, as there is no v2 planned
- Add missing DT binding documentation for HSDK reset driver
- Fix HSDK reset driver dependencies
* tag 'reset-fixes-for-4.14' of git://git.pengutronix.de/git/pza/linux:
reset: Restrict RESET_HSDK to ARC_SOC_HSDK or COMPILE_TEST
ARC: reset: remove the misleading v1 suffix all over
ARC: reset: add missing DT binding documentation for HSDKv1 reset driver
ARC: reset: Only build on archs that have IOMEM
Olof Johansson [Wed, 4 Oct 2017 01:09:08 +0000 (18:09 -0700)]
Merge tag 'davinci-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes
A fix for random ethernet mac address problem
on DA850 EVM.
* tag 'davinci-fixes-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: dts: da850-evm: add serial and ethernet aliases
Olof Johansson [Wed, 4 Oct 2017 01:08:27 +0000 (18:08 -0700)]
Merge tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91 into fixes
Fixes for 4.14:
- three DT fixes for the newly introduced sama5d27_som1_ek board
- one treewide modification that didn't touch this new PM code: we
synchronize now to be coherent with the other ARM platforms
* tag 'at91-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nferre/linux-at91:
ARM: at91: Replace uses of virt_to_phys with __pa_symbol
ARM: dts: at91: sama5d27_som1_ek: fix USB host vbus
ARM: dts: at91: sama5d27_som1_ek: fix typos
ARM: dts: at91: sama5d27_som1_ek: update pinmux/pinconf for LEDs and USB
This updates the Gemini defconfig with drivers merged
for v4.13 or v4.14:
- ATA driver is merged
- DMA driver is merged
- RTC driver gets selected from default Kconfig
ARM: defconfig: FRAMEBUFFER_CONSOLE can no longer be =m
It is no longer possible to load this at runtime, so let's
change the few remaining users to have it built-in all
the time.
arch/arm/configs/zeus_defconfig:115:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/viper_defconfig:116:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
arch/arm/configs/pxa_defconfig:474:warning: symbol value 'm' invalid for FRAMEBUFFER_CONSOLE
Reported-by: kernelci.org bot <[email protected]> Fixes: 6104c37094e7 ("fbcon: Make fbcon a built-time depency for fbdev") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Olof Johansson <[email protected]>
Mike Rapoport [Tue, 3 Oct 2017 23:16:54 +0000 (16:16 -0700)]
include/linux/fs.h: fix comment about struct address_space
Before commit 9c5d760b8d22 ("mm: split gfp_mask and mapping flags into
separate fields") the private_* fields of struct adrress_space were
grouped together and using "ditto" in comments describing the last
fields was correct.
With introduction of gpf_mask between private_lock and private_list
"ditto" references the wrong description.
ERROR: Does not appear to be a unified-diff format patch
The logic to suppress the unified-diff check for cover letters is there
but is checking $file instead of $filename. Fix the variable to use the
correct one.
Sudip Mukherjee [Tue, 3 Oct 2017 23:16:49 +0000 (16:16 -0700)]
m32r: fix build failure
The allmodconfig build of m32r is failing with the error:
lib/mpi/mpih-div.o: In function 'mpihelp_divrem':
mpih-div.c:(.text+0x40): undefined reference to 'abort'
mpih-div.c:(.text+0x40): relocation truncated to fit:
R_M32R_26_PCREL_RELA against undefined symbol 'abort'
The function 'abort' was never defined for the m32r architecture.
Create 'abort' as is done in other arch like 'arm' and 'unicore32'.
printk_ratelimit() invokes ___ratelimit() which may invoke a normal
printk() (pr_warn() in this particular case) to warn about suppressed
output. Given that printk_ratelimit() may be called from anywhere, that
pr_warn() is dangerous - it may end up deadlocking the system. Fix
___ratelimit() by using deferred printk().
Jean Delvare [Tue, 3 Oct 2017 23:16:38 +0000 (16:16 -0700)]
kernel/params.c: fix an overflow in param_attr_show
Function param_attr_show could overflow the buffer it is operating on.
The buffer size is PAGE_SIZE, and the string returned by
attribute->param->ops->get is generated by scnprintf(buffer, PAGE_SIZE,
...) so it could be PAGE_SIZE - 1 long, with the terminating '\0' at the
very end of the buffer. Calling strcat(..., "\n") on this isn't safe, as
the '\0' will be replaced by '\n' (OK) and then another '\0' will be added
past the end of the buffer (not OK.)
Simply add the trailing '\n' when writing the attribute contents to the
buffer originally. This is safe, and also faster.
Jean Delvare [Tue, 3 Oct 2017 23:16:35 +0000 (16:16 -0700)]
kernel/params.c: fix the maximum length in param_get_string
The length parameter of strlcpy() is supposed to reflect the size of the
target buffer, not of the source string. Harmless in this case as the
buffer is PAGE_SIZE long and the source string is always much shorter than
this, but conceptually wrong, so let's fix it.
mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
find_{smallest|biggest}_section_pfn()s find the smallest/biggest section
and return the pfn of the section. But the functions are defined as int.
So the functions always return 0x00000000 - 0xffffffff. It means if
memory address is over 16TB, the functions does not work correctly.
To handle 64 bit value, the patch defines
find_{smallest|biggest}_section_pfn() as unsigned long.
mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
pfn_to_section_nr() has no issue even if it is defined as macro. But
section_nr_to_pfn() has overflow issue if sec is defined as int.
section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is
defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
value.
__remove_section() calculates start_pfn using section_nr_to_pfn() and
scn_nr defined as int. So if hot-removed memory address is over 16TB,
overflow issue occurs and section_nr_to_pfn() does not calculate correct
pfn.
To make callers use proper arg, the patch changes the macros to inline
functions.
Michal Hocko [Tue, 3 Oct 2017 23:16:23 +0000 (16:16 -0700)]
memremap: add scheduling point to devm_memremap_pages
devm_memremap_pages is initializing struct pages in for_each_device_pfn
and that can take quite some time. We have even seen a soft lockup
triggering on a non preemptive kernel
Michal Hocko [Tue, 3 Oct 2017 23:16:16 +0000 (16:16 -0700)]
mm, memory_hotplug: add scheduling point to __add_pages
Patch series "mm, memory_hotplug: fix few soft lockups in memory
hotadd".
Johannes has noticed few soft lockups when adding a large nvdimm device.
All of them were caused by a long loop without any explicit cond_resched
which is a problem for !PREEMPT kernels.
The fix is quite straightforward. Just make sure that cond_resched gets
called from time to time.
This patch (of 3):
__add_pages gets a pfn range to add and there is no upper bound for a
single call. This is usually a memory block aligned size for the
regular memory hotplug - smaller sizes are usual for memory balloning
drivers, or the whole NUMA node for physical memory online. There is no
explicit scheduling point in that code path though.
This can lead to long latencies while __add_pages is executed and we
have even seen a soft lockup report during nvdimm initialization with
!PREEMPT kernel
Fix this by adding cond_resched once per each memory section in the
given pfn range. Each section is constant amount of work which itself
is not too expensive but many of them will just add up.
Johannes Weiner [Tue, 3 Oct 2017 23:16:10 +0000 (16:16 -0700)]
mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
For quick per-memcg indexing, slab caches and list_lru structures
maintain linear arrays of descriptors. As the number of concurrent
memory cgroups in the system goes up, this requires large contiguous
allocations (8k cgroups = order-5, 16k cgroups = order-6 etc.) for every
existing slab cache and list_lru, which can easily fail on loaded
systems. E.g.:
This output is from an artificial reproducer, but we have repeatedly
observed order-7 failures in production in the Facebook fleet. These
systems become useless as they cannot run more jobs, even though there
is plenty of memory to allocate 128 individual pages.
Use kvmalloc and kvzalloc to fall back to vmalloc space if these arrays
prove too large for allocating them physically contiguous.
Colin Ian King [Tue, 3 Oct 2017 23:16:01 +0000 (16:16 -0700)]
lib/lz4: make arrays static const, reduces object code size
Don't populate the read-only arrays dec32table and dec64table on the
stack, instead make them both static const. Makes the object code
smaller by over 10K bytes:
Before:
text data bss dec hex filename
31500 0 0 31500 7b0c lib/lz4/lz4_decompress.o
After:
text data bss dec hex filename
20237 176 0 20413 4fbd lib/lz4/lz4_decompress.o
Oleg Nesterov [Tue, 3 Oct 2017 23:15:55 +0000 (16:15 -0700)]
exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
load_misc_binary() makes a local copy of fmt->interpreter under
entries_lock to avoid the race with kill_node() but this is not enough;
the whole Node can be freed after we drop entries_lock, not only the
->interpreter string.
Add dget/dput(fmt->dentry) to ensure bm_evict_inode() can't destroy/free
this Node.
Oleg Nesterov [Tue, 3 Oct 2017 23:15:48 +0000 (16:15 -0700)]
exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()
To ensure that load_misc_binary() can't use the partially destroyed
Node, see also the next patch.
The current logic looks wrong in any case, once we close interp_file it
doesn't make any sense to delay kfree(inode->i_private), this Node is no
longer valid. Even if the MISC_FMT_OPEN_FILE/interp_file checks were
not racy (they are), load_misc_binary() should not try to reopen
->interpreter if MISC_FMT_OPEN_FILE is set but ->interp_file is NULL.
And I can't understand why do we use filp_close(), not fput().
Oleg Nesterov [Tue, 3 Oct 2017 23:15:45 +0000 (16:15 -0700)]
exec: binfmt_misc: don't nullify Node->dentry in kill_node()
kill_node() nullifies/checks Node->dentry to avoid double free. This
complicates the next changes and this is very confusing:
- we do not need to check dentry != NULL under entries_lock,
kill_node() is always called under inode_lock(d_inode(root)) and we
rely on this inode_lock() anyway, without this lock the
MISC_FMT_OPEN_FILE cleanup could race with itself.
- if kill_inode() was already called and ->dentry == NULL we should not
even try to close e->interp_file.
We can change bm_entry_write() to simply check !list_empty(list) before
kill_node. Again, we rely on inode_lock(), in particular it saves us
from the race with bm_status_write(), another caller of kill_node().
Oleg Nesterov [Tue, 3 Oct 2017 23:15:42 +0000 (16:15 -0700)]
exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".
It looks like this code was always wrong, then commit 948b701a607f
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.
This patch (of 6):
load_script() can simply use i_name instead, it points into bprm->buf[]
and nobody can change this memory until we call prepare_binprm().
The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.
While at it, do whitespace/style cleanups.
NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.
userfaultfd: non-cooperative: fix fork use after free
When reading the event from the uffd, we put it on a temporary
fork_event list to detect if we can still access it after releasing and
retaking the event_wqh.lock.
If fork aborts and removes the event from the fork_event all is fine as
long as we're still in the userfault read context and fork_event head is
still alive.
We've to put the event allocated in the fork kernel stack, back from
fork_event list-head to the event_wqh head, before returning from
userfaultfd_ctx_read, because the fork_event head lifetime is limited to
the userfaultfd_ctx_read stack lifetime.
Forgetting to move the event back to its event_wqh place then results in
__remove_wait_queue(&ctx->event_wqh, &ewq->wq); in
userfaultfd_event_wait_completion to remove it from a head that has been
already freed from the reader stack.
This could only happen if resolve_userfault_fork failed (for example if
there are no file descriptors available to allocate the fork uffd). If
it succeeded it was put back correctly.
Furthermore, after find_userfault_evt receives a fork event, the forked
userfault context in fork_nctx and uwq->msg.arg.reserved.reserved1 can
be released by the fork thread as soon as the event_wqh.lock is
released. Taking a reference on the fork_nctx before dropping the lock
prevents an use after free in resolve_userfault_fork().
If the fork side aborted and it already released everything, we still
try to succeed resolve_userfault_fork(), if possible.
Shaohua Li [Tue, 3 Oct 2017 23:15:32 +0000 (16:15 -0700)]
mm: fix data corruption caused by lazyfree page
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. If page reclaim finds
such page, it will simply add the page to swap cache without pageout the
page to swap because the page is marked as clean. Next time, page fault
will read data from the swap slot which doesn't have the original data,
so we have a data corruption. To fix issue, we mark the page dirty and
pageout the page.
However, we shouldn't dirty all pages which is clean and in swap cache.
swapin page is swap cache and clean too. So we only dirty page which is
added into swap cache in page reclaim, which shouldn't be swapin page.
As Minchan suggested, simply dirty the page in add_to_swap can do the
job.
Shaohua Li [Tue, 3 Oct 2017 23:15:29 +0000 (16:15 -0700)]
mm: avoid marking swap cached page as lazyfree
MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap
cache between these two steps by page reclaim. Page reclaim could add
the page to swap cache and unmap the page. After page reclaim, the page
is added back to lru. At that time, we probably start draining per-cpu
pagevec and mark the page lazyfree. So the page could be in a state
with SwapBacked cleared and PG_swapcache set. Next time there is a
refault in the virtual address, do_swap_page can find the page from swap
cache but the page has PageSwapCache false because SwapBacked isn't set,
so do_swap_page will bail out and do nothing. The task will keep
running into fault handler.
Jeff Layton [Tue, 3 Oct 2017 23:15:25 +0000 (16:15 -0700)]
mm: have filemap_check_and_advance_wb_err clear AS_EIO/AS_ENOSPC
Eryu noticed that he could sometimes get a leftover error reported when
it shouldn't be on fsync with ext2 and non-journalled ext4.
The problem is that writeback_single_inode still uses filemap_fdatawait.
That picks up a previously set AS_EIO flag, which would ordinarily have
been cleared before.
Since we're mostly using this function as a replacement for
filemap_check_errors, have filemap_check_and_advance_wb_err clear AS_EIO
and AS_ENOSPC when reporting an error. That should allow the new
function to better emulate the behavior of the old with respect to these
flags.
Since commit 056b9d8a7692 ("mm: remove rodata_test_data export, add
pr_fmt"), rodata_test_data is used only inside rodata_test.c By
declaring it static, it gets properly allocated into .rodata section
instead of .data:
Ioan Nicu [Tue, 3 Oct 2017 23:15:13 +0000 (16:15 -0700)]
rapidio: remove global irq spinlocks from the subsystem
Locking of config and doorbell operations should be done only if the
underlying hardware requires it.
This patch removes the global spinlocks from the rapidio subsystem and
moves them to the mport drivers (fsl_rio and tsi721), only to the
necessary places. For example, local config space read and write
operations (lcread/lcwrite) are atomic in all existing drivers, so there
should be no need for locking, while the cread/cwrite operations which
generate maintenance transactions need to be synchronized with a lock.
Later, each driver could chose to use a per-port lock instead of a
global one, or even more granular locking.
Arnd Bergmann [Tue, 3 Oct 2017 23:15:10 +0000 (16:15 -0700)]
mm: meminit: mark init_reserved_page as __meminit
The function is called from __meminit context and calls other __meminit
functions but isn't it self mark as such today:
WARNING: vmlinux.o(.text.unlikely+0x4516): Section mismatch in reference from the function init_reserved_page() to the function .meminit.text:early_pfn_to_nid()
The function init_reserved_page() references the function __meminit early_pfn_to_nid().
This is often because init_reserved_page lacks a __meminit annotation or the annotation of early_pfn_to_nid is wrong.
On most compilers, we don't notice this because the function gets
inlined all the time. Adding __meminit here fixes the harmless warning
for the old versions and is generally the correct annotation.
Vitaly Wool [Tue, 3 Oct 2017 23:15:06 +0000 (16:15 -0700)]
z3fold: fix stale list handling
Fix the situation when clear_bit() is called for page->private before
the page pointer is actually assigned. While at it, remove work_busy()
check because it is costly and does not give 100% guarantee anyway.
Andrea brought to my attention that the L->{L,S} guarantees are
completely bogus for this case. I was looking at the diagram, from the
offending commit, when that _is_ the race, we had the load reordered
already.
What we need is at least S->L semantics, thus simply use
wq_has_sleeper() to serialize the call for good.
Sherry Yang [Tue, 3 Oct 2017 23:15:00 +0000 (16:15 -0700)]
android: binder: drop lru lock in isolate callback
Drop the global lru lock in isolate callback before calling
zap_page_range which calls cond_resched, and re-acquire the global lru
lock before returning. Also change return code to LRU_REMOVED_RETRY.
Use mmput_async when fail to acquire mmap sem in an atomic context.
Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
Also restore mmput_async, which was initially introduced in commit ec8d7c14ea14 ("mm, oom_reaper: do not mmput synchronously from the oom
reaper context"), and was removed in commit 212925802454 ("mm: oom: let
oom_reap_task and exit_mmap run concurrently").
2 locks held by a.out/4771:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
#1: (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
The problem is very similar to the one fixed by commit a459eeb7b852
("mm, page_alloc: do not depend on cpu hotplug locks inside the
allocator"). We are taking hotplug locks while we can be sitting on top
of basically arbitrary locks. This just calls for problems.
We can get rid of {get,put}_online_cpus, fortunately. We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.
The only thing to be careful about is that the target memcg doesn't
vanish while we are still in drain_all_stock so take a reference on it.
Michal Hocko [Tue, 3 Oct 2017 23:14:50 +0000 (16:14 -0700)]
mm, oom_reaper: skip mm structs with mmu notifiers
Andrea has noticed that the oom_reaper doesn't invalidate the range via
mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can
corrupt the memory of the kvm guest for example.
tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not
sufficient as per Andrea:
"mmu_notifier_invalidate_range cannot be used in replacement of
mmu_notifier_invalidate_range_start/end. For KVM
mmu_notifier_invalidate_range is a noop and rightfully so. A MMU
notifier implementation has to implement either ->invalidate_range
method or the invalidate_range_start/end methods, not both. And if you
implement invalidate_range_start/end like KVM is forced to do, calling
mmu_notifier_invalidate_range in common code is a noop for KVM.
For those MMU notifiers that can get away only implementing
->invalidate_range, the ->invalidate_range is implicitly called by
mmu_notifier_invalidate_range_end(). And only those secondary MMUs
that share the same pagetable with the primary MMU (like AMD iommuv2)
can get away only implementing ->invalidate_range"
As the callback is allowed to sleep and the implementation is out of
hand of the MM it is safer to simply bail out if there is an mmu
notifier registered. In order to not fail too early make the
mm_has_notifiers check under the oom_lock and have a little nap before
failing to give the current oom victim some more time to exit.
Vitaly Wool [Tue, 3 Oct 2017 23:14:47 +0000 (16:14 -0700)]
z3fold: fix potential race in z3fold_reclaim_page
It is possible that on a (partially) unsuccessful page reclaim,
kref_put() called in z3fold_reclaim_page() does not yield page release,
but the page is released shortly afterwards by another thread. Then
z3fold_reclaim_page() would try to list_add() that (released) page again
which is obviously a bug.
To avoid that, spin_lock() has to be taken earlier, before the
kref_put() call mentioned earlier.
sh: sh7269: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. If enum
values are defined, but never used, pinmux_pins[] contains (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
sh: sh7264: remove nonexistent GPIO_PH[0-7] to fix pinctrl registration
Pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. If enum
values are defined, but never used, pinmux_pins[] contains (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
sh: sh7757: remove nonexistent GPIO_PT[JLNQ]7_RESV to fix pinctrl registration
Commit 3810e96056ff ("sh: modify pinmux for SH7757 2nd cut") renamed
GPIO_PT[JLNQ]7 to GPIO_PT[JLNQ]7_RESV, and removed the existing users
from the pinmux_pins[] array.
However, pinmux_pins[] is initialized through PINMUX_GPIO(), using
designated array initializers, where the GPIO_* enums serve as indices.
Hence entries were not really removed, but replaced by (zero-filled)
holes. Such entries are treated as pin zero, which was registered
before, thus leading to pinctrl registration failures, as seen on
sh7722:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
Remove GPIO_PT[JLNQ]7_RESV from the enum to fix this.
sh: sh7722: remove nonexistent GPIO_PTQ7 to fix pinctrl registration
Patch series "sh: sh7722/sh7757i/sh7264/sh7269: Fix pinctrl registration",
v2.
Magnus Damm reported that on sh7722/Migo-R, pinctrl registration fails
with:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
pinmux_pins[] is initialized through PINMUX_GPIO(), using designated
array initializers, where the GPIO_* enums serve as indices. Apparently
GPIO_PTQ7 was defined in the enum, but never used. If enum values are
defined, but never used, pinmux_pins[] contains (zero-filled) holes.
Hence such entries are treated as pin zero, which was registered before,
and pinctrl registration fails.
I can't see how this ever worked, as at the time of commit f5e25ae52fef
("sh-pfc: Add sh7722 pinmux support"), pinmux_gpios[] in
drivers/pinctrl/sh-pfc/pfc-sh7722.c already had the hole, and
drivers/pinctrl/core.c already had the check.
Some scripting revealed a few more broken drivers:
- sh7757 has four holes, due to nonexistent GPIO_PT[JLNQ]7_RESV.
- sh7264 and sh7269 define GPIO_PH[0-7], but don't use it with
PINMUX_GPIO().
Patch 1 fixes the issue on sh7722, and was tested. Patches 3-4 should
fix the issue on the other 3 SoCs, but was untested due to lack of
hardware.
This patch (of 4):
On sh7722/Migo-R, pinctrl registration fails with:
sh-pfc pfc-sh7722: pin 0 already registered
sh-pfc pfc-sh7722: error during pin registration
sh-pfc pfc-sh7722: could not register: -22
sh-pfc: probe of pfc-sh7722 failed with error -22
pinmux_pins[] is initialized through PINMUX_GPIO(), using designated array
initializers, where the GPIO_* enums serve as indices. As GPIO_PTQ7 is
defined in the enum, but never used, pinmux_pins[] contains a
(zero-filled) hole. Hence this entry is treated as pin zero, which was
registered before, and pinctrl registration fails.
According to the datasheet, port PTQ7 does not exist. Hence remove
GPIO_PTQ7 from the enum to fix this.
Alexandru Moise [Tue, 3 Oct 2017 23:14:31 +0000 (16:14 -0700)]
mm, hugetlb, soft_offline: save compound page order before page migration
This fixes a bug in madvise() where if you'd try to soft offline a
hugepage via madvise(), while walking the address range you'd end up,
using the wrong page offset due to attempting to get the compound order
of a former but presently not compound page, due to dissolving the huge
page (since commit c3114a84f7f9: "mm: hugetlb: soft-offline: dissolve
source hugepage after successful migration").
As a result I ended up with all my free pages except one being offlined.
There's a typo in recent change of VM_MPX definition. We want it to be
VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4.
This bug does cause visible regressions. In arch_vma_name the vmflags
are tested against VM_MPX. With the incorrect value of VM_MPX, a number
of vmas (such as the stack) test positive and end up being marked as
"[mpx]" in /proc/N/maps instead of their correct names.
This confuses tools like rr which expect to be able to find familiar
vmas.
Colin Ian King [Tue, 3 Oct 2017 23:14:21 +0000 (16:14 -0700)]
scripts/spelling.txt: add more spelling mistakes to spelling.txt
Here are some of the more spelling mistakes and typos that I've found
while fixing up spelling mistakes in kernel error message text over the
past eight weeks.
Sudip Mukherjee [Tue, 3 Oct 2017 23:14:15 +0000 (16:14 -0700)]
alpha: fix build failures
The build of alpha allmodconfig is giving error:
arch/alpha/include/asm/mmu_context.h: In function 'ev5_switch_mm':
arch/alpha/include/asm/mmu_context.h:160:2: error:
implicit declaration of function 'task_thread_info';
did you mean 'init_thread_info'? [-Werror=implicit-function-declaration]
The file 'mmu_context.h' needed an extra header file.
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent
The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.
Signed-off-by: Alexei Starovoitov <[email protected]> Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation") Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: Daniel Borkmann <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Wu [Sat, 30 Sep 2017 09:47:23 +0000 (17:47 +0800)]
net: stmmac: dwmac-rk: Add RK3128 GMAC support
Add constants and callback functions for the dwmac on rk3128 soc.
As can be seen, the base structure is the same, only registers
and the bits in them moved slightly.
Omar Sandoval [Tue, 3 Oct 2017 21:57:16 +0000 (14:57 -0700)]
blk-mq-debugfs: fix device sched directory for default scheduler
In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched
directories if a default scheduler was already configured by
blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do
the same for the device-wide sched directory. Fix it.
Jens Axboe [Tue, 3 Oct 2017 21:58:15 +0000 (15:58 -0600)]
null_blk: change configfs dependency to select
A recent commit made null_blk depend on configfs, which is kind of
annoying since you now have to find this dependency and enable that
as well. Discovered this since I no longer had null_blk available
on a box I needed to debug, since it got killed when the config
updated after the configfs change was merged.
Joseph Qi [Sat, 30 Sep 2017 06:38:49 +0000 (14:38 +0800)]
blk-throttle: fix possible io stall when upgrade to max
There is a case which will lead to io stall. The case is described as
follows.
/test1
|-subtest1
/test2
|-subtest2
And subtest1 and subtest2 each has 32 queued bios already.
Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
bios as follows:
1) tg=subtest1, do nothing;
2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
left, no need to schedule next dispatch;
3) tg=subtest2, do nothing;
4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
left, no need to schedule next dispatch;
5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
to /; note that test1 and test2 each still has 16 queued bios left;
6) tg=/, try to schedule next dispatch, but since disptime is now
(update in tg_update_disptime, wait=0), pending timer is not scheduled
in fact;
7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
32 left. test1 and test2 each has 16 queued bios;
8) throtl_pending_timer_fn sees the left over bios, but could do
nothing, because throtl_select_dispatch returns 0, and test1/test2 has
no pending tg.
The blktrace shows the following:
8,32 0 0 2.539007641 0 m N throtl upgrade to max
8,32 0 0 2.539072267 0 m N throtl /test2 dispatch nr_queued=16 read=0 write=16
8,32 7 0 2.539077142 0 m N throtl /test1 dispatch nr_queued=16 read=0 write=16
So force schedule dispatch if there are pending children.
Once the network interface is brought up, the user just needs to run a
DHCP client to get IP address and routing setup.
As a side note, other Novatel Verizon USB730L models with the same
vid:pid end up exposing a standard ECM interface which doesn't require
any other kernel update to make it work.
Imre Deak [Mon, 2 Oct 2017 13:53:07 +0000 (16:53 +0300)]
drm/i915: Fix DDI PHY init if it was already on
The common lane power down flag of a DPIO PHY has a funky semantic:
after the initial enabling of the PHY (so from a disabled state) this
flag will be clear. It will be set only after the PHY will be used for
the first time (for instance due to enabling the corresponding pipe) and
then become unused (due to disabling the pipe). During the initial PHY
enablement we don't know which of the above phases we are in, so move
the check for the flag where this is known, the HW readout code. This is
where the rest of lane power down status checks are done anyway.
This fixes at least a problem on GLK where after module reloading, the
common lane power down flag of PHY1 is set, but the PHY is actually
powered-on and properly set up. The GRC readout code for other PHYs will
hence think that PHY1 is not powered initially and disable it after the
GRC readout. This will cause the AUX power well related to PHY1 to get
disabled in a stuck state, timing out when we try to enable it later.
We used to assign IRQs for all devices at boot-time, before any drivers
claimed devices. The following commits:
30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()") 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping hooks")
changed this so we now call pci_assign_irq() from pci_device_probe() when
we call a driver's probe method.
The ide_scan_pcibus() path (enabled by CONFIG_IDEPCI_PCIBUS_ORDER) bypasses
pci_device_probe() so it can guarantee devices are claimed in order of PCI
bus address. It calls the driver's probe method directly, so it misses the
pci_assign_irq() call (and other PCI initialization functions), which
causes failures like this:
ide0: disabled, no IRQ
ide0: failed to initialize IDE interface
ide0: disabling port
cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07)
CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057]
cmd64x 0000:00:02.0: can't reserve resources
CMD64x_IDE: probe of 0000:00:02.0 failed with error -16
ide_generic: please use "probe_mask=0x3f" module parameter for probing
all legacy ISA IDE ports
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x94/0xd0
sysfs: cannot create duplicate filename '/class/ide_port/ide0'
...
Fix the IRQ allocation issue by calling pci_assign_irq() from
ide_scan_pcidev() before probing the IDE PCI drivers, so that IRQs for a
given PCI device are allocated for the IDE PCI drivers to use them for
device configuration.
Recent pci_assign_irq() changes uncovered a problem with missing freeing of
ide_port class instance on hwif_init() failure in ide_host_register():
ide0: disabled, no IRQ
ide0: failed to initialize IDE interface
ide0: disabling port
cmd64x 0000:00:02.0: IDE controller (0x1095:0x0646 rev 0x07)
CMD64x_IDE 0000:00:02.0: BAR 0: can't reserve [io 0x8050-0x8057]
cmd64x 0000:00:02.0: can't reserve resources
CMD64x_IDE: probe of 0000:00:02.0 failed with error -16
ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x94/0xd0
sysfs: cannot create duplicate filename '/class/ide_port/ide0'
...
Linus Torvalds [Tue, 3 Oct 2017 17:40:36 +0000 (10:40 -0700)]
Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"The recent migration code updates assumed that migrations always
execute from the top to the bottom once and didn't clean up internal
states after each migration round; however, cgroup_transfer_tasks()
repeats the inner steps multiple times and the garbage internal states
from the previous iteration led to OOPS.
Waiman fixed the bug by reinitializing the relevant states at the end
of each migration round"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Reinit cgroup_taskset structure before cgroup_migrate_execute() returns
net: rtnetlink: fix info leak in RTM_GETSTATS call
When RTM_GETSTATS was added the fields of its header struct were not all
initialized when returning the result thus leaking 4 bytes of information
to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
to Alexander Potapenko for the detailed report and bisection.
Reported-by: Alexander Potapenko <[email protected]> Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats") Signed-off-by: Nikolay Aleksandrov <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Tue, 3 Oct 2017 17:05:12 +0000 (10:05 -0700)]
Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fixes from Tejun Heo:
"Rather important fixes this time.
- The new percpu area allocator had a subtle bug in how it iterates
the memory regions and could skip viable areas, which led to
allocation failures for module static percpu variables. Dennis
fixed the bug and another non-critical one in stat calculation.
- Mark noticed that the generic implementations of percpu local
atomic reads aren't properly protected against irqs and there's a
(slim) chance for split reads on some 32bit systems. Generic
implementations are updated to disable irq when read size is larger
than ulong size. This may have made some 32bit archs which can do
atomic local 64bit accesses generate sub-optimal code. We need to
find them out and implement arch-specific overrides"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: fix iteration to prevent skipping over block
percpu: fix starting offset for chunk statistics traversal
percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
Linus Torvalds [Tue, 3 Oct 2017 16:30:00 +0000 (09:30 -0700)]
Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Nothing too interesting.
Arnd's gcc-7 warning fixes that slipped through the cracks for two
release cycles (my bad), and two minor low level driver updates"
* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: don't ignore result code of ahci_reset_controller()
ata_piix: Add Fujitsu-Siemens Lifebook S6120 to short cable IDs
ata: avoid gcc-7 warning in ata_timing_quantize
Linus Torvalds [Tue, 3 Oct 2017 16:25:40 +0000 (09:25 -0700)]
Merge tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for 4.14-rc4 to resolved reported
issues.
There's a bunch of stuff in here based on the great work Andrey
Konovalov is doing in fuzzing the USB stack. Lots of bug fixes when
dealing with corrupted USB descriptors that we've never seen in
"normal" operation, but is now ensuring the stack is much more
hardened overall.
There's also the usual XHCI and gadget driver fixes as well, and a
build error fix, and a few other minor things, full details in the
shortlog.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (38 commits)
usb: dwc3: of-simple: Add compatible for Spreadtrum SC9860 platform
usb: gadget: udc: atmel: set vbus irqflags explicitly
usb: gadget: ffs: handle I/O completion in-order
usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
usb: gadget: udc: renesas_usb3: Fix return value of usb3_write_pipe()
usb: gadget: udc: renesas_usb3: fix Pn_RAMMAP.Pn_MPKT value
usb: gadget: udc: renesas_usb3: fix for no-data control transfer
USB: dummy-hcd: Fix erroneous synchronization change
USB: dummy-hcd: fix infinite-loop resubmission bug
USB: dummy-hcd: fix connection failures (wrong speed)
USB: cdc-wdm: ignore -EPIPE from GetEncapsulatedResponse
USB: devio: Don't corrupt user memory
USB: devio: Prevent integer overflow in proc_do_submiturb()
USB: g_mass_storage: Fix deadlock when driver is unbound
USB: gadgetfs: Fix crash caused by inadequate synchronization
USB: gadgetfs: fix copy_to_user while holding spinlock
USB: uas: fix bug in handling of alternate settings
usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
usb-storage: fix bogus hardware error messages for ATA pass-thru devices
...
Linus Torvalds [Tue, 3 Oct 2017 16:23:49 +0000 (09:23 -0700)]
Merge tag 'tty-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a small number (5) of patches for some reported TTY and
serial issues. Nothing major, a documentation update, timing fix,
error handling fix, name reporting fix, and a timeout issue resolved.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: sccnxp: Fix error handling in sccnxp_probe()
tty: serial: lpuart: avoid report NULL interrupt
serial: bcm63xx: fix timing issue.
mxser: fix timeout calculation for low rates
serial: sh-sci: document R8A77970 bindings
Linus Torvalds [Tue, 3 Oct 2017 16:22:11 +0000 (09:22 -0700)]
Merge tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are some small staging/IIO driver fixes for 4.14-rc4
Most of these have been in my tree for a while due to travels, sorry
for the delay. They resolve a number of small issues reported by
people, mostly for the iio drivers. Nothing major in here, full
details are in the shortlog.
All have been linux-next for a few weeks with no reported issues"
* tag 'staging-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
iio: core: Return error for failed read_reg
iio: ad7793: Fix the serial interface reset
iio: ad_sigma_delta: Implement a dedicated reset function
IIO: BME280: Updates to Humidity readings need ctrl_reg write!
iio: adc: mcp320x: Fix readout of negative voltages
iio: adc: mcp320x: Fix oops on module unload
iio: adc: stm32: fix bad error check on max_channels
iio: trigger: stm32-timer: fix a corner case to write preset
iio: trigger: stm32-timer: preset shouldn't be buffered
iio: adc: twl4030: Return an error if we can not enable the vusb3v1 regulator in 'twl4030_madc_probe()'
iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
staging: rtl8723bs: avoid null pointer dereference on pmlmepriv
staging: rtl8723bs: add missing range check on id
staging: vchiq_2835_arm: Fix NULL ptr dereference in free_pagelist
staging: speakup: fix speakup-r empty line lockup
staging: pi433: Move limit check to switch default to kill warning
staging: r8822be: fix null pointer dereferences with a null driver_adapter
staging: mt29f_spinand: Enable the read ECC before program the page
...
Sam Bobroff [Tue, 26 Sep 2017 06:47:04 +0000 (16:47 +1000)]
KVM: PPC: Book3S: Fix server always zero from kvmppc_xive_get_xive()
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the
value of state->guest_server as "server". However, this value is not
set by it's counterpart kvmppc_xive_set_xive(). When the guest uses
this interface to migrate interrupts away from a CPU that is going
offline, it sees all interrupts as belonging to CPU 0, so they are
left assigned to (now) offline CPUs.
This patch removes the guest_server field from the state, and returns
act_server in it's place (that is, the CPU actually handling the
interrupt, which may differ from the one requested).
Linus Torvalds [Tue, 3 Oct 2017 15:57:07 +0000 (08:57 -0700)]
Merge tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.
The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.
There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.
And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.
All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"
* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents
Linus Torvalds [Tue, 3 Oct 2017 15:27:50 +0000 (08:27 -0700)]
Merge tag 'char-misc-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are a handful of char/misc driver fixes for 4.14-rc4.
Nothing major, some binder fixups, hyperv fixes, and other tiny
things.
All of these have been sitting in my tree for way too long, sorry for
the delay in getting them to you. All have been in linux-next for a
few weeks, and despite some people's feeling about if linux-next
actually tests things, I think it's a good "soak test" for patches"
* tag 'char-misc-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: fcopy: restore correct transfer length
vmbus: don't acquire the mutex in vmbus_hvsock_device_unregister()
intel_th: pci: Add Lewisburg PCH support
intel_th: pci: Add Cedar Fork PCH support
stm class: Fix a use-after-free
nvmem: add missing of_node_put() in of_nvmem_cell_get()
nvmem: core: return EFBIG on out-of-range write
auxdisplay: charlcd: properly restore atomic counter on error path
binder: fix memory corruption in binder_transaction binder
binder: fix an ret value override
android: binder: fix type mismatch warning
Paul E. McKenney [Tue, 26 Sep 2017 03:19:02 +0000 (20:19 -0700)]
rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
is currently protected with READ_ONCE(). However, this protection is
unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
current CPU, (2) Although NMI handlers can update this field, they reset
it back to its old value before return, and (3) Interrupts are disabled,
so nothing else can modify it. The value of ->dynticks_nmi_nesting is
thus effectively constant, and so no protection is required.
This commit therefore removes the READ_ONCE() protection from these
two accesses.
Shu Wang [Tue, 12 Sep 2017 02:14:54 +0000 (10:14 +0800)]
ftrace: Fix kmemleak in unregister_ftrace_graph
The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.
[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]
powerpc/4xx: Fix compile error with 64K pages on 40x, 44x
The mmu context on the 40x, 44x does not define pte_frag entry. This
causes gcc abort the compilation due to:
setup-common.c: In function ‘setup_arch’:
setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’
This patch fixes the issue by removing the pte_frag initialization in
setup-common.c.
This is possible, because the compiler will do the initialization,
since the mm_context is a sub struct of init_mm. init_mm is declared
in mm_types.h as external linkage.
According to C99 6.2.4.3:
An object whose identifier is declared with external linkage
[...] has static storage duration.
C99 defines in 6.7.8.10 that:
If an object that has static storage duration is not
initialized explicitly, then:
- if it has pointer type, it is initialized to a null pointer
Fixes: b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") Signed-off-by: Christian Lamparter <[email protected]> Reviewed-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Jeremy Kerr [Wed, 27 Sep 2017 04:55:51 +0000 (12:55 +0800)]
powerpc: Fix action argument for cpufeatures-based TLB flush
Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot
and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the
cpufeatures code, specifying the number of sets to flush.
However, these functions take an action argument, not a number of
sets. This means we hit the BUG() in __flush_tlb_{206,300} when using
cpufeatures-style configuration.
This change passes TLB_INVAL_SCOPE_GLOBAL instead.
Bryant G. Ly [Mon, 2 Oct 2017 17:59:38 +0000 (12:59 -0500)]
scsi: ibmvscsis: Fix write_pending failure path
For write_pending if the queue is down or client failed then return -EIO
so that LIO can properly process the completed command. Prior we
returned 0 since LIO could not handle it properly. Now with commit fa7e25cf13a6 ("target: Fix unknown fabric callback queue-full errors")
that patch addresses LIO's ability to handle things right.
scsi: libiscsi: Fix use-after-free race during iscsi_session_teardown
Session attributes exposed through sysfs were freed before the device
was destroyed, resulting in a potential use-after-free. Free these
attributes after removing the device.
scsi: sd: Do not override max_sectors_kb sysfs setting
A user may lower the max_sectors_kb setting in sysfs to accommodate
certain workloads. Previously we would always set the max I/O size to
either the block layer default or the optional preferred I/O size
reported by the device.
Keep the current heuristics for the initial setting of max_sectors_kb.
For subsequent invocations, only update the current queue limit if it
exceeds the capabilities of the hardware.
scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP
SBC-4 states:
"A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the
maximum number of LBAs that may be unmapped by an UNMAP command"
"A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates
the maximum number of contiguous logical blocks that the device server
allows to be unmapped or written in a single WRITE SAME command."
Despite the spec being clear on the topic, some devices incorrectly
expect WRITE SAME commands with the UNMAP bit set to be limited to the
value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD.
Implement a blacklist option that can be used to accommodate devices
with this behavior.
Dave Airlie [Mon, 2 Oct 2017 23:34:06 +0000 (09:34 +1000)]
Merge tag 'drm-intel-fixes-2017-09-27' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
drm/i915 fixes for 4.14-rc3
Couple fixes for stable:
- Fix ELD connector types and consequently audio on DP (Jani).
- Ignore HDMI on Port A and consequently fix an ops on i915 probe
when VBT advertises HDMI on Port A (Jani).
And a small fix:
- That removes a reduntant hw_check on modeset. (Colin)
* tag 'drm-intel-fixes-2017-09-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915/bios: ignore HDMI on port A
drm/i915: remove redundant variable hw_check
drm/i915: always update ELD connector type after get modes
Eric Dumazet [Mon, 2 Oct 2017 19:20:51 +0000 (12:20 -0700)]
socket, bpf: fix possible use after free
Starting from linux-4.4, 3WHS no longer takes the listener lock.
Since this time, we might hit a use-after-free in sk_filter_charge(),
if the filter we got in the memcpy() of the listener content
just happened to be replaced by a thread changing listener BPF filter.
To fix this, we need to make sure the filter refcount is not already
zero before incrementing it again.
Josef Bacik [Mon, 2 Oct 2017 20:22:08 +0000 (16:22 -0400)]
nbd: fix -ERESTARTSYS handling
Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we
got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which
means we'd never requeue and just hang. We really need to return the
right value from the upper layer.
Fixes: fc17b6534eb8 ("blk-mq: switch ->queue_rq return value to blk_status_t") Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Jens Axboe <[email protected]>